Sounds very interesting
I can almost fit my entire image (1.5 terapixel remote sensing stuff) into main memory
;-)
seriously, really interesting machine given the performance and power draw
The drumbeat of system upgrades continues apace at microserver startup SeaMicro, with the company launching its third server node for its SM10000 "Atom smasher" in the past nine months. The new machine is based on the same Atom N570 chip and related NM10 Express chipset from Intel that was used in the prior generation of …
http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf
"In this paper, we analyze measurements of memory errors in a large fleet of commodity servers over a period of 2.5 years. ... We find that DRAM error behavior in the field differs in many key aspects from commonly held assumptions. For example, we observe DRAM error rates that are orders of magnitude higher than previously reported, with 25,000 to 70,000 errors per billion device hours per Mbit and more than 8% of DIMMs affected by errors per year. We provide strong evidence that memory errors are dominated by hard errors, rather than soft errors, which previous work suspects to be the dominant error mode."
To me, it sounds like they just got a bunch of improperly designed/installed DIMMs. From the paper:
Conclusion 1: We found the incidence of memory errors
and the range of error rates across different DIMMs to be
much higher than previously reported.
Conclusion 4: There is no evidence that newer genera-
tion DIMMs have worse error behavior.
Question is that do you want to pay Intel for ECC for these kind of errors or just make sure the servers DIMMs are with reasonable quality? I bet that reasonable quality DIMM is cheaper than paying tthe Intel ECC (i.e. so called server part) tax.
768 nodes is very impressive indeed. Although how many useful applications there are for such a large set of cpus with each cpu of which is a N570 which isn't very powerful. Webservers maybe but not anything else?
Having to think twice about what applications to run on an atom based server almost makes me think I'd be better off with a regular xeon based servers - at least I can throw anything at it and it is guaranteed to not come to a crawl because of single threaded performance not being up to par (?). I'd be curious to see any benchmark or performance data on the SM10000.
Web servers, hadoop sure seem like the sweet spot - that would mean the Googles, Facebooks would be the target customers. Both of these however have standardized on Xeons/Opterons across the board. It remains to be seen if they would take the trouble of deploying SM in a few targetted portions of their datacenters. It would make sense if the performance and cost was substantial . Notice they mention 1/4 the power and 1/6 the space both of which relate to cost but I am curious to see what the performance reduction is. In the end it is all about TCO..