back to article Wikibon takes Fusion-io founder's FaME to logical conclusion

The Wikibon consultancy has taken an idea put forward by Fusion-io founder David Flynn and formulated a flash-as-memory-extension concept (FaME). The concept involves getting rid of external IO for IO-bound applications by running them and their data entirely in logical memory. Instead of having data stored on disk, which is …

  1. This post has been deleted by its author

    1. lset

      Unless my memory is not serving me correctly (it happens), HP have been/are working on a version of Linux to address this in line with their memresistor tech.

      Like you said, the idea of having only one pool of low latency non-volatile storage isn't a new one, this just seems like Flash players trying to prolong their product and get over the transition issue that we will eventually have to stuff like memresistor (or whatever becomes the industry standard).

  2. Sproing

    What goes around ...

    This smacks of the old extended/expanded memory cards (shudder). And it's great, non-volatile and all - but useful only for single user workloads, unless we're talking about stuffing servers full of these things, in which case the bottleneck shifts elsewhere as always. Most of what we deal with is either insanely large seismic survey datasets ( your costs in having multiple copies of that are unlikely to make bean counters happy ) or stuff which needs to be collaboratively dealt with, in which case you are usually better off operating in the fully connected model with access and tracking mediated by a central resource.

    If you've got the right workload, then fine, but otherwise ...

  3. Magellan

    Not Trivial, but Possible

    This creates a new non-uniform memory access (NUMA) problem. Before, NUMA's non-uniformity was an issue of memory being close to some processors, and far from others--that is, a cluster of local CPU/memory groups vs. a remote cluster of CPU and memory. All memory was not remote, memory was local to at least one processor.

    The way OS designed addressed this was to note the locality of data memory to the CPU addressing the data, and migrate either the data closer to the processor, or migrate the process (or threads) closer to the physical memory containing the data. The problem was, it took a long time for the operating system and applications to catch up to this architectural change. The early NUMA systems from each particular vendor were plagued with performance and scalability problems, until the operating systems and compilers were made truly NUMA-aware, and memory intensive apps such as databases were updated to take advantage of NUMA architectures. The first SGI Origins, HP Superdomes, and Sun Fire 15Ks had NUMA related performance problems. Also, earlier versions of VMware had issues on AMD Opteron and the IBM multi-chassis NUMA x-Series, which required careful consideration of vCPU alignment.

    With the FAME concept, there are varying levels of memory access for a given processor, even after memory/processor locality is applied. It becomes purely a data migration problem, similar to prefetching into a CPU cache. The migration of data between slower and faster memory will be very much like a caching problem to be solved, with scanning, evicting/demoting of cold data and promoting of hot data, but done at an OS virtual memory/page management level.

    The closest thing to this concept which has already been released is IBM's MAX5 memory only expansion blades and chassis, which introduced two latency levels of main memory access to an individual processor.

    One can expect similar performance issues early on such an architecture, but eventually operating systems, hypervisors, and database software will be adapted to take advantage of the new architectures.

    Given the considerable work already done in NUMA, this should happen faster than the first phase, even though it is a tougher problem to solve.

  4. Anonymous Coward
    Anonymous Coward

    For this to truly work a new kind of flash has to come along that is byte addressable. Current flash is all block addressable usually 64k or 128k at a time. If you want to map that into an address space and make it look like regular DRAM to the CPU you still need to do some magic behind the curtains. Which brings in the overhead again.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon