back to article US Department of Energy solicits AMD's help with nuke sims

AMD will join Intel in supporting Sandia National Lab's efforts to develop novel memory tech for use in Department of Energy (DoE) nuclear weapons simulations. The contract, awarded under the Advanced Memory Technology (AMT) program, is funded by the DoE's National Nuclear Security Administration (NNSA) as part of its post- …

  1. bazza Silver badge

    Harrumph harrumph, er, will someone please pick up memristor again and actually launch something, this time?

    I recall that at the time HP had it running quicker than DDR RAM, huge capacity, etc. Surely if anyone is serious about actually doing something about inadequacies of today's memory technology, and memristor genuinely does the job, now is the time? Just limping on with DDR derivatives is energy wasting lameness.

    1. Binraider Silver badge

      RAMbus had some nice ideas for boosting bandwidth; albeit attached to an power-thirsty and hot-running CPU. And royalty issues.

      SRAM might well be a way forward. CPU cache in some cases is now exceeding 256MB. Would SRAM physically located away from the CPU provide benefit over DDR?

      Having spent some time dealing with considerably smaller though still monstrous finite element analysis problems; a giant local machine is a whole lot better than setting up a job on a cluster when it comes to performance. Bandwidth for the volume of data involved was our bottleneck. Shedloads of RAM; of the fastest variety available was the biggest performance boost.

      If you can't fit what you're doing into a single node (likely - I could chew 4TB with a still relatively simple model) - most of your restrictions will inevitably come from networking. At that point, faster RAM than the network offers diminishing returns.

      1. bazza Silver badge

        Ah, well memory bandwidth / latency still matters, even if networking is a bottleneck.

        There's some multi-machine architectures (e.g. OpenVPX) where the links between nodes are very fast, and do not involve CPU time at all. Such interconnects essentially bypass the CPU cores entirely to operate - they just DMA data straight into / out of RAM buffers. For CPUs like modern Xeons, this inevitably means passing through the CPU's memory subsystem (even if the cores aren't aware this is happening), so there is an art needed in ensuring that the amount of memory pressure coming from the processing leaves room for the network bandwidth too. That requires careful decomposition of the processing problem.

        Older implementations actually involved a "bridge" chip between CPU, RAM and interconnect (bit bespoke, not done these days), the the RAM / Bridge being able to sustain full rate network and CPU memory demands.

        The "biggest" example of this kind of approach is Tofu, used on Fugaku and the K machine (Japanese supercomputers); high speed low latency interconnect wired directly into the CPU's brain, not some peripheral out on a PCIe bus that requires CPU time to manage, run a stack for, etc. Both of these machines are notable for how their peak benchmark performance is fairly representative of application performance.

    2. emfiliane

      Your first thought on seeing some of the specs for what will be one of the highest performance computing clusters in the world, is hey, let's invest more in some vaporware from the 00's that never panned out and may not even be theoretically possible. Even ReRAM, which was tagged as a memristor but never proven to be one, had as its greatest promise the possibility of stacking, if anyone could make the process work; 3D NAND and HBM eventually filled that role.

      There's still a paper or three a year from labs looking into it just in case, but anything compelling about the idea is long since done.

      Sometimes you have to let dead ends die.

      1. bazza Silver badge

        No one who matters cared if the devices did or did not implement the perfect, ideal memristor. The devices that were built did operate as memory. Quite good memory, too.

        The concept of refresh-less, non-volatile, high speed, high capacity zero-wear low power memory is a compelling one, the manufacturers are just too busy enjoying making money out of old tech. It was quite apparent from HP that what they'd got was highly effective and manufacturable, but like the rest of the tech industry they knew that one plays one's cards only when there is no other market option. If you can make $billions selling the same old derivative tripe for a few years more, you do not bring out what you can really do. Everyone does this.

        We have a lunatic situation today where there's at least 5 different memory technologies involved in just booting, well, anything (on chip SRAM and DRAM cache, DDR, BIOS eprom (often NOR?) and SSD (often NAND), 6 if you include a TPM too, and 7 if you want to include the mountain of spinning rust that we all depend on one way or other, and 8 / 9 if you're going to throw in tape / optical media too. Having just one sort of memory that can do all of those jobs would be a whole lot better and simpler, but there's a very large number of companies very happy with the current diversity of means of storage.

        Apart from the floppy disk makers. We've very nearly weaned ourselves off those, now. And it's been a while since anyone has done anything serious with punched paper tape / cards, or magentic drums, mercury delay lines or cathode ray stores. But those are the only sorts of storage tech we've actually managed to retire.

        Memristor and AI

        A trick that has been missed is the application of memristor-type memories to AI. AI is mostly all about adding numbers up very quickly. Doing this in analogue electronics is a whole lot more power efficient and quicker; there's even companies make devices that "add up" by adding voltages. It would be possible to sum values in a memristor cell too - the resistance is the sum of the current; timeslice the current for different values, and you get the sum. Now, if one's memory could also be the computational core for AI calcs; that'd be quite convenient. Very "edge" processing too I expect.

        1. Binraider Silver badge

          Regarding RAM as simple processors... A Ternary signalling system as opposed to Binary is a (dead) idea that probably should have more legs than it does.

          Down at the logic gate level; instead of 0 and 1 as the options, you have -1, 0 and 1. Addition and handling of negative numbers becomes native, rather than requiring interpretation.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like