back to article Meet the ‘DPU’ – accelerated network cards designed to go where CPUs and GPUs are too valuable to waste

Crack open a firewall or storage array, and on the motherboard you may well find a chip named “Octeon” from component-maker Marvell. Octeon's job is giving appliance-builders a chip that handles networking and security chores so they can focus on building brilliant firewalls or storage arrays. The chips can scale to 16 more …

  1. John Smith 19 Gold badge
    Unhappy

    Oh great. Yet another architecture to learn.

    With a whole new batch of implementation flaws to code around.

    Yeay for that.

    Although no doubt the software will be cut and pasted from SE developed with all the care and attention users should demand of a security application produced by a team of semi-lliterate code monkeys experienced devs.

    1. Roland6 Silver badge

      Re: Oh great. Yet another architecture to learn.

      Smart NICs were normal before the PC - so with them back in fashion, it looks like the wheel has gone full circle.

      >With a whole new batch of implementation flaws to code around.

      Most probably as code design and implementation lessons are going to have to be relearnt.

      1. NetBlackOps

        Re: Oh great. Yet another architecture to learn.

        Everything mainframe is new again!

        1. John Smith 19 Gold badge
          Unhappy

          "Everything mainframe is new again!"

          So true.

          Except, perhaps the reliability.

    2. Anonymous Coward
      Boffin

      @John Smith... Re: Oh great. Yet another architecture to learn.

      Ok...

      You are correct that this is an area where there's a learning curve.

      However got to flame the author for not doing his research.

      DPUs have been around for several years. Probably closer to a decade.

      Solarflare had been doing this... (they got bought out. ) I'm sure you're going to see their nextgen cards and framework for developers to shrink the learning curve.

  2. HCV

    East/West (between racks) than North/South (up and down a rack.”

    "East/West" refers to traffic between servers in a data center, "North/South" to traffic going in and out of the data center. That's a more useful distinction than whether traffic is staying within a rack.

    1. Paul Kinsler

      That's a more useful distinction

      I can easily that imagine the usage is not consistent, with different places having their own convention, depending on what they judge most "useful". And since the labels N/S/E/W bear no relation at all to what they are trying to communicate, I look forward to an ON CALL, or WHO ME, relating some disaster and/or near miss involving just such a confusion. Not that the protagonist would look forward to it, but then they don't know yet, do they?

    2. Ben Tasker

      That tends to depend on the setup you've got, and what level you're focusing at.

      Either way, within the rack, is East/West, but if you're looking at things on the rack level between racks may not necessarily be so:

      - If you've got a cross-connect between racks, then that would be East/West.

      - If your traffic has to transit your rack's main uplink (i.e. contends with "true" North/South) then that would be North/South.

      Having a cross connect is wise, but by no means a given.

      On the other hand, you're the D/C operator, then you're going to view it all East/West because it's not leaving your DC.

    3. katrinab Silver badge
      Paris Hilton

      Am I the only person who thought "East/West" referred to data going between the coasts of the US, eg from New York to California?

  3. John Sturdy
    Boffin

    Nothing new really

    It sounds a bit like the "I/O channel controllers" that have been around for a long time on mainframes (although probably less structured).

    1. Jan 0 Silver badge

      Re: Nothing new really

      Didn't they evolve into minicomputers that then of course needed their own "I/O channel controllers" that evolved into today's microprocessors ...?

      1. Roland6 Silver badge

        Re: Nothing new really

        Not really, the big driver was cost reduction. With the PC, there was surplus processing time on the machines CPU, thus the "I/O Channel Controller" could be redesigned to use the main CPU rather than its own dedicated CPU - making NIC's affordable to the many. This thinking also lead to dumb modem cards and cheap printers that relied on intelligent drivers; all tied to the Windows OS...

        1. Anonymous Coward
          Anonymous Coward

          Re: Nothing new really

          "Not really, the big driver was cost reduction."

          Back in the 1980s the development team were designing a new Ethernet bridge. Unfortunately the required computing power meant upgrading to the latest, expensive Intel microprocessor. That would also require a larger custom case - which would require extensive certification hoops.

          The answer was to use the existing design and box - with a modification of an added FPGA handling the lan filtering.

          A quick development cycle. As a bonus the competition had all gone down the raw cpu power route. They established the market pricing - and we enjoyed an unusually large margin with our lower development and production costs.

    2. Anonymous Coward
      Anonymous Coward

      Re: Nothing new really

      Channel controllers are very simple, little more than DMA engines executing a list of I/O transfers with very limited branching and looping ability. Smart NICs are more like communications controllers like the IBM 3725, which was a separate front-end processor, a full-function computer (though managed from the mainframe), running arbitrarily complex offloaded communications tasks. The communication controller was channel-attached to the mainframe, so underneath the mainframe network stack there would be a channel program communicating with the communications controller.

  4. Anonymous Coward
    Anonymous Coward

    There will be so many processors in the average computer that the Intel CPU will be just reserved for running its secret 'management' engine and revealing all your data.

  5. Anonymous Coward
    Anonymous Coward

    It's more than a SmartNIC

    Hmm. DPUs. Massive innovation going on in this space. The world is going to look a lot different in 5 years, thanks to the DPU.

    This is a good article, but I wish it talked more about ALL of the ways the DPU might matter. DPUs are relevant to networking, but the DPU is much bigger than the network. The quote from Kevin Deierling hits the nail on the head.

    “The DPU is really good at looking inside data and running storage and compression and security,”

    Since the DPU concept is newish, there’s no industry consensus about everything a DPU should do, but everyone does agree that the DPU exists to accelerate I/O by running data-intensive processes both faster and more efficiently than a CPU.

    Some of these data-intensive processes are relevant to networking, and so DPUs on a SmartNIC make sense. Others, like erasure coding, fit more comfortably within a storage acceleration mindset, so DPUs could end up on storage controllers. While others, like data-in-motion encryption, relate to security. It's even possible that DPUs could be advantageous for analytics or HPC.

    The real point of a DPU is being able to do a wide variety of data-intensive services efficiently, with high performance, at scale, without I/O bottlenecks, and without as much customization. Instead of having a storage controller, a network controller, a security accelerator, etc etc, you have storage nodes powered by the DPU, networking devices powered by the DPU, analytics servers powered by the DPU, all with a single API and programming interface.

    Then you end up with cross-functional advantages. Your load balancers get storage optimizations. Your storage servers have improved networking. All your devices get data-in-motion DPU powered encryption. And so on.

    What’s really interesting is just how disruptive this could be. Today, AWS Nitro needs a family of cards to perform some of these functions. The promise of the DPU is to do everything with a single card.

    Imagine that NVIDIA has the best DPU. Consequently, it could end up with the best NIC, the best storage controller, the best analytics accelerator, the best…

    Or a quiet startup might have an architecture that’s twice as fast as the NIVIDIA offering. Who knows?

    And if you think this is all vaporware, believe it or not, DPUs for functions outside networking, including storage acceleration, are already quietly chugging away in enterprise production environments. I know of quite a few enterprises who have purchased platforms with DPU (or DPUesque) capabilities because they needed I/O acceleration conventional approaches couldn’t offer.

    1. The Original Steve

      Re: It's more than a SmartNIC

      It does sound truly ground-breaking.

      I mean, a processing unit that does compression for storage, encryption for networking, running virtualisation for compute.

      Who knows, in 10 years they may even combine them into a single, super-chip. Maybe call it something more generic like a central processing unit?

      /stop sarcasm && stop snark

      1. Charles 9

        Re: It's more than a SmartNIC

        Perhaps it's safer to say that what we're seeing are more and more chokepoints developing as data loads increase. As graphics demands grew and grew more diverse, specialized graphical chipsets gave way to slightly-more-generalized GPUs. When GPUs became more useful, they put a strain on bus demand, necessitating the still-evolving PCI Express bus to keep it fed.

        Now in the network stack, we're kind of seeing the reverse. As throughputs continue to increase, latency becomes an issue because electrons can only move so fast, thus cutting down on trip times becomes a factor. Hearing about DPUs sounds natural to me: a way to take more and more of the I/O local in an effort to cut latency.

        I'll be interested in seeing where the next chokepoint emerges. RAM and storage tech are still evolving at a decent clip, so it's touch to predict which one chokes first.

    2. Anonymous Coward
      Anonymous Coward

      @AC Re: It's more than a SmartNIC

      DPUs have been around for almost a decade.

      You are correct that there is no consensus on what they can be used for... which is why you start to see frameworks that make them open to new ideas.

      Posted Anon because I've been talking to some friends over the years on how useful they can be ..

  6. Anonymous Coward
    Anonymous Coward

    I guess the good-news in this, for those of us who are still running human-scale networks, is that these can offload some processing that the CPU is currently doing. So (slightly) more work is available from the CPU. Meaning we can (slightly) postpone having to add CPUs and their (not-at-all-slightly) associated VMWare/Windows/etc license costs.

    1. NetBlackOps

      Especially as I've been seeing quite noticeable declines in system performance here on my Intel machines with each new security patch for those processors. Really murder on a dual-core i7, which is just sad. Used to be a real brute despite just those two cores.

  7. dinsdale54

    Having worked for a number of companies that have implemented DPUs - offload cards as we used to call them - they nearly always ended up being more trouble than they were worth. Aside from very real "new batch of implementation flaws" the problem usually boiled down to having some connection state information on the card. This is a nightmare when you want to load balance/fail over etc as you have to find a way to move state information between cards - which you end up doing via the main CPU anyway.

    If you are Amazon/Microsoft/Google you can work round these problems with tightly controlled configurations and if you are Joe Schmo running a single server in an office you probably aren't pushing any limits. For everybody else, you are opening the door to a world of difficult to diagnose network issues for the sake of a few % more free CPU cycles.

    Unusually I find myself agreeing with Gartner. "Applicable to less than 1%"

  8. elip

    interesting

    All of the real life DPUs I've seen used in the marginal-workload/start-up storage world, require a proprietary interconnect or other out of band 'network' (usually 100% blackbox with no admin/user visibility). With this, the value proposition quickly diminishes. Interesting that there's no mention of this in many of the cheerleading articles on the topic.

  9. Binraider Silver badge

    Remember when WinModems became a thing - handing the comms duties back to the CPU apart from the cable plug? I find it somewhat amusing that devices moving away from that software/CPU approach are being laughed at, because everyone in their right mind hated WinModems!

    The problems with drivers never go away regardless of architecture. In some of the more recent incarnations of desktop motherboard network adapters, Intel haven't bothered making drivers for Windows Server for "desktop" chipsets. As a result; I have a separate PCIE card in the case doing the job consuming more power and taking up a slot.

    I wouldn't care but I only ever boot to Windows as a last resort for a handful of titles; Server being the only palatable release these days where the user still has a modicum of control and choice over the installation. Not being bombarded by "please create a Microsoft account" every other mouse click for instance.

    1. Robert Grant

      They're being laughed at because people don't know that different solutions are economic at different times.

  10. Nano nano

    PPUs ?

    Slightly reminiscent of Seymour Cray's "peripheral processing units" in his Control Data systems.

  11. John Smith 19 Gold badge
    Gimp

    Deep Packet Inspection made easy.

    What every data fetishist wants.

  12. Fred Goldstein

    Calling this a DPU is ridiculous. A main CPU is a data processing unit. This is an I/O processor, not a data processor. It takes one or more communications channels and handles their I/O processing. They handle blocks of data, multiplexed, typically, via the IP header (which is just a mux header anyway), so let's call this a block multiplexor channel. And since it's a processor, its API consists of calling programs that run on it, so let's call the API Execute Channel Program, EXCP. Heck, you could offload disks onto these too; it's not as if they were lame little byte multiplexors, though one might make the argument that a packet is just serial bytes and thus that's a better name.

    Maybe this is such a great idea that IBM should acquire Marvell too. A great fit for those Power10 processors that support persistent memory, meaning mass storage devices that are directly addressed by byte, not needing a file system.

    //JOB

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon