They need a decent mascot.
I choose you, Gary Brolsma!
Numascale's non-universal memory architecture has been used to build a 324-CPU system with 108 Supermicro servers sharing a single system image and 20.7TB of memory – scoring a winning McCalpin STREAM benchmark. The system, with its cache-coherent shared memory, ran at 10.096TB/sec for the McCalpin Scale function. It was 53 …
Probably to do with Hypertransport, which has a history of being friendly to folks plugging other stuff into CPU sockets. For example, there are some FPGA products (e.g. Altera) that will go into an AMD socket, and AMD has been friendly to this sort of application since the mid 2000's when they brought out the Socket 940.
Although it doesn't specifically mention it in the article, the article does talk about 3 CPUs, which suggests the fourth socket is being used for something else, probably the connectivity. I guess the choice of Opteron is because HT is more friendly to this sort of application than QuickPath.
>> Does anybody have an example where this Single Image model is a better approach than the Hadoop / Spark style distributed model?
It says in the article. Anything that can be implemented using message passing or map-reduce is almost certainly easier to implement using shared memory.
Not so much big data as applications that need a large shared memory space and don't lend themselves to being split up into the sort of isolated steps with discrete inputs and output that Hadoop and its ilk support. This encompasses a large set of traditional supercomputing applications that revolve around big dense matrix operations.
Examples of this include finite element models used in engineering, computational fluid dynamics, or certain types of signal processing applications (e.g. geophysics applications for oil exploration). Any number of scientific applications use matrix operations.
This type of application computes relationships between n entities by representing the relationships in a n x n matrix. If the relationships are dense enough (i.e. there is a non-zero connection between enough pairs of elements) then the most efficient way of doing this is through a two dimensional array held in memory. As this is O(N^2) for memory the data sets can get very large very quickly.