back to article ScaleMP: Use RAM plus vSMP, not flash, to boost server performance

There are hypervisors that chop a single server into virtual bits, and other hypervisors that take multiple servers and make them look like one big virtual one. ScaleMP's vSMP hypervisor is the latter kind, and can be used to create a shared memory x86-based system that runs Linux that would normally require special processors …


This topic is closed for new posts.
  1. Anonymous Coward
    Anonymous Coward

    Still pugging something

    that nobody really wants. Nice concept and all, and two pages from el reg, bully, but it is way too expensive and injects problems where I didn't have any with native machines. And, if I'm not mistaken, there is no GPU to plug in anywhere, it comes on a USB stick. Yes, your pseudo NUMA world runs from a USB stick. Now THAT's enterprise grade people.

    1. Destroy All Monsters Silver badge

      Re: Still pugging something

      > that nobody really wants

      Your marketing credentials are impeccable.

      > it is way too expensive

      Compared to what?

      > injects problems where I didn't have any with native machines

      not sure whether you are talking about your home PC or your iSeries 795. Probably the former.

      > there is no GPU to plug in anywhere, it comes on a USB stick

      What are you saying?

    2. bazza Silver badge

      Re: Still pugging something

      "that nobody really wants."

      Hmmm, I don't think you've read the article properly, nor do I think you understand programmers either.

      If you look at what's been going on in CPU design over the last 15 years you can clearly see that the CPU manufacturers have concluded that the vast majority of programmers are not prepared to confront the un-scalability of the software they right.

      What Happened When Someone Built a Pure NUMA CPU

      For example, the Cell processor in the PS3 is perhaps the ultimate physical expression of the benefits of properly embracing NUMA in your software. The Cell doesn't give you the option - it's maths cores (the SPEs) are unable to directly address each other's memory - pure NUMA. This obliges the programmer to write software that is wholly NUMA aware. If you do that and know what you're doing you can get performance that even today Intel's biggest chips are only just challenging.

      Hiding NUMA from the programmer

      Whereas Intel, when they finally went NUMA, hid that from the programmer by making QPI synthesise an SMP environment. It took a lot of silicon, and their design panders to the 'average' use case of one machine running several different programs and needing good performance for each of them.

      Which Design Strategy Sold Best?

      Now, any kind of market analysis will show that Intel got it right, and the IBM, Sony and Toshiba got it wrong. Sure, Sony put the Cell in the PS3 and have sold a bundle of those, but the number of programmers who can fully exploit the CPU is really very low. IBM realised that too, which is why they dropped it a few years back. Much to my great annoyance in the world of high capacity mass-parallelism signal processing where such architectures are very familiar and exciting.


      So what does that analysis tell you? It tells us that programmers, mostly, cannot / do not / aren't allowed to spend time and effort properly architecting their code for true scalability.

      So lets carefully analyse of what ScaleMP have actually done with their hypervisor. In effect they've done an Intel. Intel, for multi-socket boxes, have a bunch of cores connected on a network (the QPI) that allows any of them to access any memory anywhere else as if it were a true SMP system. All that ScaleMP have done is written a hypervisor that, if you squint only a little bit, provides a bunch of virtual cores connected on a network (the Infiniband) that allows any of them to access any memory anywhere else, as if it were a true SMP system.

      Given that, and the clearly continued success of SMP (synthesised or not) in the modern NUMA world, how can you say "that nobody really wants" it. I think ScaleMP will do quite well once system developers realise what it is.

    3. ToddR

      Re: Still pugging something

      USB = flash?

      That will be same flash that EMC, NetApp, HP, IBM, Fusion-I/O, Violoin and Uncle Tom Cobely use in there enterprise arraye, tool

  2. Michael H.F. Wilkinson

    Latency, it's all about latency

    I have some rather memory-intensive code, that I did once run (or rather walk) on a ScaleMP machine (8 boards with 8 cores each (older incarnation)). Performance was dismal. Why? Each core thread may need access to each part of memory, because the outcome for each pixel in these huge images may depend on any pixel in the image, and you do not know beforehand which ones matter. Everything works hunky-dory as long as each processor only accesses the memory on its board, but the moment it needs large amounts of data from another board, latency kills performance. Getting a speed-up of 0.5 at 2 threads (if they weren't explicitly pinned to cores on the same board) is rather discouraging.

    What they are doing is putting a software layer over a distributed memory or NUMA machine, so as to hide the complexities and allow shared memory algorithms to run (or rather walk) without the need to rewrite the code. ScaleMP does hide the actual NUMA architecture very well. Curiously, this leads to problems when optimizing the parallel code for that particular hardware. Because details are too well hidden. You really need to understand the memory architecture and the latencies of the machine to design the appropriate algorithm. Parallel programming on shared-memory machines and distributed-memory/NUMA machines are two very different ball-games, often requiring a careful rethink of the algorithms, in order to get the processors spend their time working, not talking (just like an old-fashioned classroom), or waiting for data.

    A ScaleMP-like approach could work if the latencies are kept very low (like the QPI approach). On run-of-the-mill network connections (or even Infiniband), you need to rethink shared memory code, not so much because of bandwidth, but because of latency. For Cell/GPU type systems similar rethinks are needed, for much the same reason

    1. bazza Silver badge

      Re: Latency, it's all about latency

      @Michael Wilkinson,

      Yes, it certainly is all about the latency.

      The computing industry hadn't really done much to solve the problem of slooooow memory for many years; well, forever really. Everything we've got (caches, DDR, QPI, Hypertransport, this idea from ScaleMP, Flash disk caches, etc) are all about working round the problem of memory being too small, and too slow for the CPUs we have. It is a massively difficult problem to solve, and it doesn't look it will be solved any time soon.

      I program along the lines of Communicating Sequential Processes - forces me to get the scalability built in straight away, but takes a lot of thinking up front. Worth it in the end.

      1. Michael H.F. Wilkinson

        Re: Latency, it's all about latency

        CSP is powerful. The main problem I find is in keeping communication down, especially in terms of how often processes need to communicate. It is much cheaper to have a few large chunks shuttled from on process to another, than it is to have a whole lot of little messages. It is not just the latency in that case (it also plays a role), but it is also the synchronization that costs time (barriers are particularly costly).

  3. zooooooom


    page fault

  4. zavvyone

    I call Baloney - ScaleMP disappoints on performance, deployment time, support, ease of use

    ScaleMP does not perform as well in a multi user/mix workload environment

    Benchmarks shown are only for a single application running on the system with ScaleMP

    If you are considering shared memory computing get your own benchmarks, make it part of your purchase process engage your sales team and get them to work for your business. Most HPC companies have staff/benchmarks labs to run benchmarks to win/earn your business. Don't make a decision based on marketing hype.

    Look for modern/current/real case studies and references from credible/recent end-users.

    Double check the pricing - E5 processors are not nearly as expensive as the E7 processors - so make sure you are comparing as close to apples to apples as possible, include everything - even freight if you need to - (isn't ScaleMP head-quartered on the other side of the moon?) add virtualization software costs + time to get your application running on it + the hardware + installation/training vs same on the comparable solution.

    Ask about support availability - make sure you'll have global support that will back up the system/solutions you choose.

    I have heard first hand from people who tried to use ScaleMP and gave up, removing the dongles and going back to straight parallel computing on them and worse, returning the hardware for a full refund. If results are what you need and ease of use, try the shared memory sytsems that are hardware - based.

This topic is closed for new posts.

Biting the hand that feeds IT © 1998–2021