back to article Could RISC-V become a force in high performance computing?

The RISC-V architecture looks set to become more prevalent in the high performance computing (HPC) sector, and could even become the dominant architecture, at least according to some technical experts in the field. Meanwhile, the European High Performance Computing Joint Undertaking (EuroHPC JU) has just announced a project …

  1. Will Godfrey Silver badge

    A mixed blessing?

    This very expansion capability if not managed carefully could be a potential Achilles heel - massively incompatible fragmentation

    1. werdsmith Silver badge

      Re: A mixed blessing?

      Indeed ARM performance is achieved through a lash up of extensions, but at least these have a single guiding oversight.

    2. thames

      Re: A mixed blessing?

      Massive incompatible fragmentation outside of the core instruction set pretty much describes x86 vector support, and that doesn't seem to have hurt it any.

      The big question is whether CPUs will be available on low cost hardware equivalent to a Raspberry Pi so that people can test and benchmark their code, or if you have to book time on the HPC system just to compile and test your code. That is what will make the difference.

      If you are doing serious vector work you need to use the compiler built-ins / extensions, which are more or less half way between assembly language and C. Good vector algorithms are not necessarily the same as non-vector algorithms, which means you need actual hardware on your desktop for development. This is the real advantage that x86 and (more recently) ARM have, and which RISC-V will need to duplicate.

      1. Richard 12 Silver badge

        Re: A mixed blessing?

        Technically, you don't need the actual hardware.

        You do need a simulator (not emulator) that you can trust to give genuinely comparable results.

        At present such simulators are aimed at hardware engineers designing the actual chips, and possibly compiler writers.

        amd64 vector acceleration has pretty much stabilised now though, to the point where some compilers can automatically vectorise certain simple loops. If the moon is in the seventh house and Jupiter aligns with Mars, anyway.

        1. milliemoo83

          Re: A mixed blessing?

          "If the moon is in the seventh house "

          And you don't accidentally use tomato ketchup as part of the recipe.

      2. that one in the corner Silver badge

        Re: A mixed blessing?

        > The big question is whether CPUs will be available on low cost hardware equivalent to a Raspberry Pi so that people can test and benchmark their code

        For CPUs that people have managed to get their hands on the small dev boards have been available, although not always with all the resources of the R'Pi (e.g. the early SiFive board in Arduino format, 'cos that CPU only had the memory etc to compare against Arduino-like MCUs so package it to use the same add-on boards). Designs pop up on IndieGoGo, Kickstarter, CrowdSupply (in increasing order of...).

        One list of such boards, including the Doctor Who one:

        If you meant to ask about boards carrying future CPUs with varied and interesting HPC extensions already in place, from what has gone before, I'd guess that, so long as the devices can be purchased then dev boards will be made by the usual suspects.

        However, one has to ask: how easy is it to get hold of Arm devices that have weird HPC-specific extensions on a dev board? Off to infamous web search engine, I guess...

      3. Bruce Hoult

        Re: A mixed blessing?

        RISC-V SBCs with draft 0.7.1 of the Vector ISA are currently available starting from $17 (Lichee RV 1 GHz 512 MB RAM

        That's the in-development spec as at the middle of 2019.

        The final 1.0 Vector spec is incompatible in detail but about 90% or 95% the same binary opcodes and the same semantics. Close enough that 0.7.1 hardware gives a good head start. Many useful algorithms such as memcpy(), memcmp(), strlen(), strcpy(), strcmp() and so forth are binary compatible between 0.7.1 and 1.0.

        There is no reason that in time RVV 1.0 implementations won't be available at similar price-points. It's not any harder to implement, just a few minor details were changed sue to feedback, experience using 0.7 and 0.8 and 0.9 and 0.10. (NO one has commercially produced anything except 0.7.1, but intermediate versions were implemented in the Spike emulator and in GNU binutils etc and programming experience gained with them.

        If RVV eventually makes its way into supercomputers (and it seems that it will), then you're going to be able to test your supercomputer code on a $10 or $100 SBC or $1000 laptop.

      4. rcxb Silver badge

        Re: A mixed blessing?

        Massive incompatible fragmentation outside of the core instruction set pretty much describes x86 vector support, and that doesn't seem to have hurt it any.

        There are only two manufacturers of x86-64 processors. There is hardly any fragmentation. You can easily follow those feature changes from the two manufacturers.

        You can't even imagine how things would look if there were hundreds of manufacturers, all doing things their own way, over the course of years. You would have no idea what you were getting. Just tracking those differences would be a massive effort.

        1. 3arn0wl Bronze badge

          Re: A mixed blessing?

          I wonder if it's the other way around?

          In our world the consumer is king.

          The customer will have certain expectations with regard to running software.

          The OEM will be keen to meet those needs in order to be / remain competitive.

          Therefore the OEM will make a considered determination regarding the processor used.

          Ergo chip designers will be mindful of the Standards set out by RISC-V International.

  2. Andy The Hat Silver badge

    It seems that it's a Intel v AMD processor argument rebranded as ARM v RISC-V IP with a potential bean-counter saving by going RISC-V (if you can get the processing power out of it).

    Perhaps it's simply a geo-political argument ... pesky Chinese getting control of ARM IP ...

    1. potato_chips

      Those apples are oranges

      Intel and AMD both share the same instruction set (although support for extensions may differ between processors) - RISC-V and Arm are two different instruction sets.

      RISC-V defines an ISA + allows you to add your own instructions. Some open-source HW designs are available.

      Arm defines an ISA + allows you to add your own instructions on M-class processors. You can design your own core if you have an architecture license or license one of Arm's own designs (which is what most companies do).

  3. A Non e-mouse Silver badge

    How much work is done by the host CPU and how much is done by the massive GPGPUs?

    1. bazza Silver badge

      Well, on Fugagku the answer is 100% in the CPU, and it hasn't got GPUs at all.

      Though that is a stretch, because the actual work is done in the honking great vector unit Fujitsu bolted on to the side of the ARM core. So, saying that it's ARM / Linux in HPC is actually a bit misleading: the ARM core itself isn't much used, and the Linux part is also kinda irrelevant to the performance too; the same vector object code executed within any other OS would run the same way.

      The things that make a really good supercomputer these days are compute nodes where vector processing, node-interlink and RAM interface are all on one piece of silicon. This is what the Japanese have done; they bolt in the Tofu interconnect on the chips too. That gives an immense fixed interconnect architecture that has minimal latency at a lot of throughput.

      The Tofu interconnect is the secret sauce that makes Fugaku's performance accessible in real applications. Nodes are not stymied waiting for data to arrive.

      Compute by the Interconnect

      I'm not sure about Tofu, but I've seen similar interconnects / fabrics that actually perform some algorithmic steps, saving having to perform them on the CPU. For example, say the algorithm to be executed involves a lot of matrix maths, and some of the operations are matrix transpositions. For a program to do that on a CPU, it's just a non-linear memory copy that's not particularly fast because data is written not in sequence. However, some interconnects can perform the matrix transpose whilst moving the data between nodes (afterall, it's a data rearrangement). What this means is that, with the algorithm divided between the CPU and the interconnect, you can get quite a speed increase.

  4. Fazal Majid

    Only if China pushes hard

    The main driving force behind RISC-V is China's need to wean itself off dependence on Intel and ARM architectures subject to US sanctions, which is why all the major Chinese tech companies like Huawei, Baidu and Alibaba have RISC-V chip design teams, although how far they can get with the US also sanctioning cutting edge fab technology is anyone's guess. The Chinese government also obviously has HPC needs and will support this.

    That said, RISC-V CPU performance is still far behind x64 and arm64.

    1. Scene it all

      Re: Only if China pushes hard

      Isn't a lot of the high tech fab stuff from Taiwan and Netherlands?

    2. Bruce Hoult

      Re: Only if China pushes hard

      “That said, RISC-V CPU performance is still far behind x64 and arm64.”

      Obviously, because RISC-V started much later.

      Currently off-the-shelf RISC-V SBCs are 3-4 years behind ARM. When the Intel/SiFive Horse Creek product ships in around six months, it will be a little over 1 year behind the RK3588 ARM boards. Ventana and MIPS have announced chips near current x86 and Apple chips, so I guess they’ll be shipping in around 2 years.

      RISC-V is several years (not decades) behind, but catching quickly.

    3. Jason Bloomberg Silver badge

      Re: Only if China pushes hard

      The main driving force behind RISC-V is China's need to wean itself off dependence ...

      It's not just China doing that but India too, and others who wish to have 'sovereignty', not be beholden to Uncle Sam's diktats and whims as self-proclaimed global policeman.

      It was utter stupidity of Trump to try and isolate and disadvantage China. It is utter stupidity for Biden to continue on the same path. All it's done is give China and others their wake-up call and provoked a response which will ultimately disadvantage the US long-term.

      1. TheInstigator Bronze badge

        Re: Only if China pushes hard

        There are other - more permanent solutions to the China problem

  5. 3arn0wl Bronze badge

    It's inevitable

    With the essential extensions ratified at the end of 2021, there's now nothing stopping RISC-V... It's just a matter of time.

    We're going to see it in datacentres, and we're going to see it as the primary processor in consumer electronics in the not too distant future too.

    I was looking at Alpine Linux packages recently, and I was surprised at just how many apps were already available for RISC-V.

    1. Anonymous Coward
      Anonymous Coward

      Re: It's inevitable

      "...there's now nothing stopping RISC-V"

      Nothing besides Intel, AMD, ARM and even TI... you don't think they'll quietly let their castles fall do you? Of course, it's only a matter of time until that Microsoft PR... "RISC-V is the future"... so who knows since the major software players aren't exactly married to any particular HW company (I'm pretty sure Wintel os dead).

      1. 3arn0wl Bronze badge

        Re: It's inevitable

        Let their castles fall? What do you think is happening at casa Intel at the moment? King Pat standing defiant as the profit towers come crashing down, and predicting more of the same for the next year...

        It seems he's trying to shore things up by looking to fabbing, but at the end of the day, he's already figured out that the reality is going to be "If you can't beat 'em : join 'em"

  6. martinusher Silver badge

    RISC-V is inherently high performance

    RISC-V comes from the same family as MIPS processors. Anyone who's worked with MIPS will know that they're rather fast. The fundamental architecture isn't that much different from ARM's so expect RISC-V to be able to do anything that an ARM can but just a bit better. (The original ARM was conceived as a low power/ low performance / minimal resources service processor; its been extended since then, obviously, but like the x86 its a lower performance architecture that's been extended and optimized.)

    1. DS999 Silver badge

      Re: RISC-V is inherently high performance

      Modern ARM is clearly superior to MIPS, it doesn't suffer from its weaknesses like branch delay slots - but neither does RISC-V as it learned from MIPS' mistakes. It is a very silly argument that RISC-V is somehow better than ARM because it came from MIPS (which makes it worse) or because ARM was originally conceived for low power applications (which is irrelevant because AArch64 is its own thing that's not backwards compatible with any previous ARM ISA)

      AAarch64 is probably the best ISA going since it was conceived so recently - more recently than RISC-V. It also includes a lot more stuff as standard, so there is much less room for fragmentation vs RISC-V which because it was originally designed for research not for commercial applications. The base ISA doesn't even include multiplication, though that is part of the "standard extensions" so no one is going to be using such a stripped down implementation for anything real.

      1. Bruce Hoult

        Re: RISC-V is inherently high performance

        “AAarch64 is probably the best ISA going since it was conceived so recently - more recently than RISC-V.”

        The ARMv8.0-A ISA was published in finished form in October 2012. The RISC-V ISA was ratified and published in frozen form in July 2019, with important additions to bring it to parity with ARMv8 in November 2021.

        People usually give RISC-V’s newness as a disadvantage, so this is a weird argument from you, especially given the facts are the opposite.

        Design of RISC-V started in 2010, a couple of years before ARMv8 was published. The *design* of ARMv8 clearly started much earlier. I don’t think anyone has really said how much earlier, but I suspect around 2003-2004 when amd64 made it clear it wasn’t going to be an Itanic-only future.

      2. SCP

        Re: RISC-V is inherently high performance

        "... superior to MIPS, it doesn't suffer from its weaknesses like branch delay slots ..."

        Using the branch delay slot (or load delay slot) was a deliberate architectural decision (and not just for MIPS) to allow compilers to optimally sequence instructions to achieve a goal of RISC architectures (an instruction completing each clock cycle). It also meant that by putting responsibility on the compiler/author a good deal of simplification of the pipeline silicon was possible; win-win.

        Whilst this does mean that you could write "nonsense" code (e.g. putting another branch instruction in the branch delay slot) this does not seem greatly worse than many other forms of nonsense you could write (along the lines of i++ = i++ in C). Being nonsense the archiecture specification declared that the results are unpredictable - but not unbounded; it could do either of the branches but not some random thing.

        In some ways this architecture seems preferable to the speculative execution optimization techniques of recent times in which a great deal of silicon and design complexity is expended on trying to execute other instructions out of order but holding off on any potentially adverse effects until it is certain that the instruction is due to be executed. This led to a whole raft of security vulnerabilities being discovered with such architectures and people hurriedly rolling things back.

        By making the execution of the instructions in the branch delay slot an active part of the execution thread, that thread takes the consequences of that instruction (e.g. any exceptions that instruction causes) - simplifying the processor execution model. Where I would have a concern is if the architecture made unbounded behaviour practical.

        1. Bruce Hoult

          Re: RISC-V is inherently high performance

          The problem is that branch delay slots and load delay slots only made sense for a single micro-architecture -- the first single-dispatch in-order one, generally with 5 pipeline stages.

          As soon as you did anything else ... dual or triple issue in-order, or OoO, the delay slots not only didn't help any more but also actively made implementations harder because of the screwed-up semantics.

          1. Richard 12 Silver badge

            Re: RISC-V is inherently high performance

            Exactly. One instruction per clock is slow in today's world of pipelined microarchitectures.

  7. david 12 Silver badge


    RISC-V with "vector and floating point operations".

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like