errr... interbox latency??
spinlocks? Cache-line ping-pong with microsecond overhead?
Well, at least it's not over TCP <snurk>
Consider this like a Sesame Street episode for Symmetric Multi-Processor servers. "ScaleMP knows big. Now it wants to show you small." For the last couple of years, ScaleMP has sold software which turns smaller x86 servers into a single, hulking SMP similar to systems more common in the Unix realm. It can lash together boxes …
The best Mellanox ConnectX MPI latency I have seen is about 1.2 microseconds with a switch in between. The switch adds about 200 nanoseconds to the mix, so back to back (the configuration of the ScaleMP system) would be about 1 microsecond, assuming similar performance to MPI.
Certainly NUMA memory management in the OS helps, and I assume the COMA aspects are provided by the hypervisor. This could lead to some unhelpful double buffering, unless the OS kernel is aware of the underlying COMA caching.
This system sounds similar to what Virtual Iron was pitching in the past, but abandoned. Is there a relationship between ScaleMP and Virtual Iron??
Also, is this InfiniBand based approach what is under the covers of the larger ScaleMP systems?
Nothing to do with Virtual Iron, but the key is that becasue it isn't a standard NUMA architecture such as SGI Altix thelatency question is not relevant. The COMA memory chunk, caches Gigabytes of data. Predictive algorithms prefetch data blocks so that when computation is needed they are already resident in memory, hence no IB latency issue. The net issue is MPIcodesat the same speed as an IB cluster, support for large shared memory jobs, without cluster management complexity.
As far as the IB is converned the system uses standard Mellanox IB, including ConnectX if it shipping yet.
The larger, 8-32 socket system uses an internal IB switch to connect servers together.