back to article Faulty instructions in Alibaba's T-Head C910 RISC-V CPUs blow away all security

Computer security researchers at the CISPA Helmholtz Center for Information Security in Germany have found serious security flaws in some of Alibaba subsidiary T-Head Semiconductor's RISC-V processors. The most serious of these, which affects the four T-Head C910 CPU cores in the TH1520 SoC, has been dubbed GhostWrite because …

  1. This post has been deleted by its author

  2. gnasher729 Silver badge

    First thought: WTF? Is there any other processor that allows physical addresses except at the very very very lowest OS level?

    Second thought: That’s a really serious bug, isnt it? If I use this instruction correctly and without any malicious intent on any other RISC 5 processor, then in these machines my code is completely broken?

    1. diodesign (Written by Reg staff) Silver badge

      Only on T-Head's CPUs

      The instruction is only broken on Alibaba's CPU cores, not the RISC-V ISA. The instruction isn't even defined by the ISA, it's a non-standard variant of a standard one, as I understand it.

      The RISC-V oversight body carefully and clearly defines how security should work. T-Head didn't follow the specs and incorrectly designed their implementation of the RISC-V vector extension in their CPU core so that memory addresses were treated as physical ones, not virtual, bypassing security checks regardless of privilege level.

      It's a T-Head problem, not an RV one.

      C.

      1. Anonymous Coward
        Anonymous Coward

        Re: Only on T-Head's CPUs

        It begs the question: Is T-Head stupid or evil? Or both?

        1. Dostoevsky Bronze badge

          Re: Only on T-Head's CPUs

          "Sufficiently advanced stupidity is indistinguishable from malice."

          HT to A.C. Clarke

        2. Blazde Silver badge

          Re: Only on T-Head's CPUs

          If it's any evil at all then it's stupid evil. They could have hidden an intentional backdoor way better than this.

        3. Jason Bloomberg Silver badge
          Pint

          Re: Only on T-Head's CPUs

          Stupid or evil? Or both?

          Or neither. Just like the rest of us - though I obviously exclude everyone who has never made a single mistake nor had any misunderstanding in their careers.

          Unlucky for them it got out, wasn't caught before it did, wasn't recognised as the problem it is.

          I'll raise a glass in sympathy for the unlucky sods who are having to face being demonised on top of fucking up royally. Me; I've never made a single mistake, never. Well, there was that one time... maybe a couple...

      2. Justthefacts Silver badge

        Re: Only on T-Head's CPUs

        But that’s *exactly* the point, why RISC-V principle is broken by design, not just in this.

        Any of the other CPUs, you know who you’re getting it from. If it’s Intel x86 CPU, and it’s broken, Intel are both your supplier, implementer and responsible. If it’s ARM CPU, and it’s broken, it’s ARM responsibility to fix the core.

        But these are *RISC-V* chips. They are generic. It’s just not possible to say “RISC-V chips are safe”. It’s a meaningless statement. There are a thousand implementations, some of which are broken, some not. You almost certainly do not know what the device you have on your desk is, let alone whether it is vulnerable, and no way to check it. If, for example, you are running with these Scaleway cloud people, how would you know? How could you know if it were newly fixed silicon or not - because even *they* don’t know.

        I’m glad the penny is finally starting to drop. But I’ll just add another point: “The RISC-V oversight body”. There is none. Doesn’t exist. RISCV org writes the paper ISA description, sure. But there’s no conformance spec, no conformance testing, no certification. It’s totally unlike professional engineering specs like Ethernet or Bluetooth.

        1. rcxb Silver badge

          Re: Only on T-Head's CPUs

          It’s just not possible to say “RISC-V chips are safe”.

          It's not possible to say x86-64 chips are safe, either...

          If, for example, you are running with these Scaleway cloud people, how would you know? How could you know if it were newly fixed silicon or not - because even *they* don’t know.

          A problem with cloud or other hosted multi-tenant services in general. The hypervisor makes it possible to lie to the VMs (or containers) about what CPU they are running on, so you can't be sure. If you don't trust your cloud/hosting provider to keep things secure for you, what the bloody hell good are they? That's pretty much their job description #1. You might as well let John Doe host your servers out of a rack in his cellar...

        2. Richard 12 Silver badge

          Re: Only on T-Head's CPUs

          Strike that, reverse it.

          The issue with RISC-V is that they're all different.

          Only the core instruction set is standard. You get an absolute minimal set of instructions. Everything else is a vendor-specific extension.

          So you either compile for that minimal standard set, or you can only run your code on RISC-V processors from one particular vendor - possibly only one chip.

          In the microcontroller space that's ok, but for application processors it's absolutely horrific.

          And not just because you're stuck with one vendor. It also means you're stuck with one compiler, because none of the others understand the vagaries of your particular chip.

          1. CGBS

            Re: Only on T-Head's CPUs

            They don't like it when you point out the Emperor's clothing selection, or why they have such a rabid, devoted group of fans desperate to push and do the hard work for the thing, despite the fact that all of the work seems to be accruing to the benefit of a certain singular locale in particular.

    2. Richard Tobin

      "If I use this instruction correctly and without any malicious intent on any other RISC 5 processor, then in these machines my code is completely broken?" This makes it rather surprising that the bug was not noticed immediately - or indeed in testing before release. Are there just very few programs that use the affected instructions?

    3. really_adf

      Is there any other processor that allows physical addresses except at the very very very lowest OS level?

      Clearly, an OS can't provide memory protection if any user process is able to write to arbitrary physical addresses, so the answer must be "no" or very close to that. But there doesn't have to be an OS...

      That’s a really serious bug, isnt it?

      In other words, how did this get missed? The only obvious answers I can think of are that it wasn't tested, was only tested with the CPU in a mode that doesn't use address translation (by which I roughly mean "without an OS") or was only tested with the translation being physical == virtual.

      A related question is whether this is a "simple" bug or not. It seems odd to me that only a particular instruction is mentioned as affected.

      1. fleddy

        As a CPU micro architect, using a physical address in the vector unit is easier... it's cheaper in hardware implementation terms.

        Using the VA would mean doing an MMU translation for every one of the vector elements, when accessed. This might slow the vector unit down or make it significantly bigger, to run at speed.

        But it's the CORRECT thing to do. Using physical addresses like this has been done before, for the same reasons. No sensible micro architect would allow it. The hardware implementation might push for it, or say translating the first address... then running with that, but the only sensible solution is to translate every address, not in the same page boundary as previous addresses.

        There is a problem however, that using physical addresses, that if precise exceotions. During a vector operation, say accessing the 5th element of the vector causes a page exception, this has to be dealt with by the hardware/software at a privileged level. Using physical addresses side steps this problem, but at a high cost in terms of safety.

        Such a choice would be acceptable for an embedded processor with no user access to these instructions.

        It's a bad choice of the implementers, that's come back to bite them.

        1. gnasher729 Silver badge

          Look, _everything_ goes through the MMU and the cache hierarchy. Without any exception. Every virtual address you stuff into the same hardware, and then you pick up things from a cache line.

          The amount of hardware involved is larger, but it’s the same hardware that’s involved in every single memory access, so any different route will end up costing you more.

        2. really_adf

          As a CPU micro architect, using a physical address in the vector unit is easier... it's cheaper in hardware implementation terms.

          True for everything, not just the vector unit, no?

          The hardware implementation might push for it, or say translating the first address... then running with that, but the only sensible solution is to translate every address, not in the same page boundary as previous addresses.

          Would it be a reasonable compromise (between hardware complexity/performance and software constraints) to require each read/written address range to be wtihin a page, so that translating the first address of each source/destination is sufficient? Conceptually similar to requiring "load/store word" instructions to use a word-aligned address, but at a larger scale.

          Such a choice would be acceptable for an embedded processor with no user access to these instructions.

          No doubt there are use cases for vector instructions in privileged code, but I'd guess at least as many in user-mode code and on that basis, designing them to be available in user mode but always use physical addresses makes no sense (except as a deliberate back door).

        3. BobD77

          You don't lookup every element. You lookup the page. The only complication comes when a vector spans a page boundary, but that's the same for any size load and store.

    4. the spectacularly refined chap Silver badge

      First thought: WTF? Is there any other processor that allows physical addresses except at the very very very lowest OS level?

      Sure. The classic example would be an X server, the traditional model has the server writing directly to the frame buffer. Requires permission (typically the server runs with root privileges) but it's otherwise a regular usermode program.

      1. gnasher729 Silver badge

        What are the bets this goes through the MMU?

  3. Bruce Hoult

    The instruction they ran is not even a valid RVV 1.0 instruction, let alone RVV 0.7.1

    `vse128.v` is a potential RVV 2.0 instruction. It is not listed in the RVV 1.0 spec, which stops at `vse64.v`.

    My GCC 15.0 (WIP) does not allow `vse128.v`.

    RVV 0.7.1 has only `vse.v` which stores whatever SEW currently is, and `vsb.v`, `vsh.v`, and `vsw.v` for explicitly sized stores.

    There is not even an explicit 64 bit store in RVV 0.7.1, let alone 128 bits.

    1. Ace2 Silver badge

      Re: The instruction they ran is not even a valid RVV 1.0 instruction, let alone RVV 0.7.1

      It’s a custom extension, which is made pretty clear in the article.

      1. Bruce Hoult

        Re: The instruction they ran is not even a valid RVV 1.0 instruction, let alone RVV 0.7.1

        The article is wrong.

        The "custom extension" in question is XTHeadVector, which is exactly the 0.7.1 draft version of the official RISC-V Vector extension, and is being called a "custom extension" for political reasons. It was a candidate for being the official version, but we on the working group had some new ideas and changed a few things before calling a slightly later version 1.0.

        The so-called instruction the researchers used is not a valid instruction in either THead's documentation or the original RVV 0.7.1 documentation. The field used to specify load & store sizes is simply not large enough to ever have a 128 bit version of that instruction. It isn't even big enough to have a 64 bit version.

  4. bananape4l

    i want one

    these are sub $5$ sbc ... you must disallow native code from un-trusted sources. but who would target such few devices in the wild to make a malware?

    1. An_Old_Dog Silver badge

      Re: i want one

      @banannape4l:

      If these devices sre incorporated into many device controllers, especially Internet-connected ones, their target value rises.

      Our place was attacked via our MANY HP JetDirects. A firmware upgrade program fixed that, but the principle remains.

    2. doublelayer Silver badge

      Re: i want one

      The SBCs it's in right now are not as cheap as you describe, but whether your number is correct or not is not the major issue. These are easy to find because they describe the SoC used right on their product pages. What we don't know is what else has one of these chips in it. For instance, manufacturers rarely tell you what CPU powers a printer, a television, a security camera, or lots of similar embedded things. Some of those can get code, either as a firmware update or as a third-party addition (for example, a networked television installing an app to stream something) which could exploit this. Now, there is a beachhead on your network and they have root access on it so can do things that a sandbox might have prevented.

      That's just considering consumer or office stuff you're likely to have, and given the youth of this chip, that's probably the most likely kind of device to find this chip in. However, such a thing is also a favorite attacker of infrastructure. I'm sure several RISC-V SoCs have been built into industrial equipment that can have effects on safety if disabled or messed with. Probably not this one, but I don't know that for sure. The kind of people who like turning off water and electricity would quite like being able to elevate or to trigger something that requires a restart. In fact, they can combine their efforts. Elevate to root, install something that will execute the instruction that crashes the machine as early in the boot process as possible, then execute it once. Kind of like the CrowdStrike bug did, but to embedded infrastructure instead of desktops. Ideally, it should be harder to get this to execute on those machines due to other security measures, but that isn't an ideally I want to rely on too much.

  5. Gene Cash Silver badge

    "They've also published a website, ghostwriteattack.com"

    Why? WTAF does that add to the vulnerability reporting, besides grandstanding?

    I'm not going to a site with a name like that...

    1. IanRS

      Re: "They've also published a website, ghostwriteattack.com"

      But a site called trustmeiamharmless.com would be fine?

      How about www.questionablecontent.net? (It is a long-running comic strip.)

      If your approach to on-line security is based on the website name then you might not be as safe as you think.

      1. Snake Silver badge

        Re: IanRS

        Annnd, you missed the OP's point entirely.

    2. doublelayer Silver badge

      Re: "They've also published a website, ghostwriteattack.com"

      You seem to have two separate points. Why have a website for a vulnerability? I don't know, so you can find the information faster than finding the website for their institution, which I certainly couldn't have told you was https://www.helmholtz.de/en/about-us/helmholtz-centers/centers-a-z/centre/cispa-helmholtz-center-for-information-security/, then find the information wherever it's posted. Educational websites are often a maze and sometimes break links because there are multiple teams working on the site and none of them speak to each other, let alone the researchers. While this looks to be a research-focused organization rather than a university, I doubt their site management is very different.

      Then you complain about the name. What name is it supposed to be? The vulnerability is called GhostWrite because you can write when you're not supposed to without detection, and it is a possible attack. The only other one that seems likely is ghostwritevulnerability.com. What name would you not object to?

  6. Conor Stewart

    This is likely just the beginning

    RISC-V is praised for being an open source ISA so anyone can implement a processor based on it. If professional processor designers, intel, AMD, T head, etc can get it so wrong at times then what hope do the rest of us have?

    People often talk about how a company can create their own RISC-V processor for their product that is exactly as they need it, especially startups so that they don't need to pay licensing fees and how we will now have competitors to ARM, but is this really what we want? Do we want our devices to contain processors designed by less experienced teams with likely less rigorous testing?

    My trust may be misplaced but I would rather a company use an ARM core (or a reputable RISC-V alternative) in a product than try to roll their own RISC-V core, exactly because of issues like this which will likely only get more prevalent as more people start designing their own cores, I really don't see open source cores as a good alternative either unless they have been properly tested.

    1. CowHorseFrog Silver badge

      Re: This is likely just the beginning

      The licensing fee story is bullshit. ARM charges cents per CPU. Theres something seriously wrong if you worry about paying 10c for a cpu licensing fee when you are selling a product worth hundreds or thousands of dollars.

      google: arm licensing fee

      > ARM earns fixed upfront license fees when they deliver IP to partners and variable royalties from partners for each chip they ship that con- tains ARM IP. The licensing fees vary between an estimated $1 million to 10 million. The royalty is usually 1 to 2% of the selling price of the chip.

      1. Aitor 1

        Re: This is likely just the beginning

        They also limit you in many ways, just ask Qualcomm, or look at changes in licensing.

        They essentially own you, and can make "the Vader change in agreements" for newer isas, etc.

        Still a good option, but not ideal.

        1. CowHorseFrog Silver badge

          Re: This is likely just the beginning

          Newer ISA are irrelevant when you are licensing a particular CPU design today. When tomorrow comes an dyou need a NEW CPU thats a new license, todays license are irrelevant.

    2. abend0c4 Silver badge

      Re: This is likely just the beginning

      It's not just about being able to design your own processor, but also about being able to change your CPU supplier without having to make significant changes to your software. Though if you look at the myriad different features that may be present in different models of ARM, Intel and AMD processors - and their various physical form factors - that might not be as easy as it sounds.

      There's nothing, as far as I'm aware, that would prevent the big players from introducing RISC-V chips (and having microcoded processors would be a head start), but designing working silicon is quite hard - especially if you also want bleeding-edge performance - and the profit comes from the proprietary knowledge behind it. So, I'd be very surprised to see performant RISC-V processors free of any kind of IP encumberances, at least in so far as their implementation on the wafer is concerned. There may be a niche for cheap "open" designs to be added as control planes to other devices (there's already a version of the ESP-32 WiFi module with a RISC-V core, for example) where there's added value.

      Open source software may be able (for now) to depend on contributors moonlighting from their day jobs, but fabs have high fixed costs and in the end they'll only make what's profitable.

      1. Peter Gathercole Silver badge

        Re: This is likely just the beginning

        The problem with microcoding a processor like RISC-V is that while it may make problems like this easier to fix, it adds a whole load of complexity to the chip design, requiring bigger dies and more power, and this is the antithesis of a pure RISC design.

        The issue I see with both RISC-V and ARM is that while they start out simple, people rapidly want to put more and more instructions into 'extensions' that make each design more complex, and are often added as 'optional' standardised extensions which can be added to a particular core design. This is what has made ARM implementations so diverse. If you want a particular set of these optional instruction extensions, the choice of vendors is seriously reduced. And if you keep to the core set, you will miss out on many of the performance benefits.

        For the most part, ARM implementations differ to RISC-V ones, mainly because most vendors take actual processor designs from ARM, and build these onto their own SoCs. So unless the translation of the design or it's fabrication is done incorrectly, the cores of a specific design should act the same . But there are of course the ARM processor makers who have architectural licenses. These people could make the similar design mistakes as here, but because of the cost of the license, they are likely to put significant testing to ensure compliance.

        ARM appear to have a robust test harness/suite to prove their designs. I don't know whether they make these available to architectural license holders to prove compatibility.

        1. Justthefacts Silver badge

          Re: This is likely just the beginning

          “The problem with microcoding a processor like RISC-V is that while it may make problems like this easier to fix, it adds a whole load of complexity to the chip design, requiring bigger dies and more power, and this is the antithesis of a pure RISC design.”

          Which is exactly why “pure RISC designs” don’t exist, and everybody has been telling the RISCV evangelist morons that for over a decade. The additional complexity you describe exists, but there are payback advantages in practice, which you don’t appreciate until you’ve been doing this as a day-job for 20 years. And instruction decoding *sounds* complex to noobs, but it’s actually tiny relative to everything else. Really tiny. Honestly, so small it’s barely measurable in practice. The complexity of the entire instruction decoder circuitry is maybe 2% of the cost of the branch predictors. Which is only a fraction of the whole CPU core. And the whole chip area is dominated by *cache area*, with the entire CPU cores relegated to finding bits of spare silicon to fit nicely around the caches to minimise track lengths. Making a CPU without microcode is just *dumb* from a modern perspective.

  7. JLV

    true boffinry

    Lovely testing approach with diffing between implementations of same spec.

    Diffing is a hugely underrated tool for software testing. Have a huge batch process that transforms large amounts of data? Well, you could write complicated unit tests. Or you could throw lots of data into it and verify that before/after outputs, except for whatever you were trying to fix. Which, btw, does not exclude writing targeted unit tests for complex edge cases.

    Complex html pages that get generated from data? Again, you could check things element by element. Do we have the expected title tag contents? What about the third h2.info? Or you could compare it to the last run and verify changes are as expected. (No, that doesn't cover CSS and JS behavior, but it gets you surprisingly far).

  8. CowHorseFrog Silver badge

    Typical chinese quality. Hardly a shock with tofu bridges, tunnels and buildings, they bring their fine traditionto cpus.

    Basically this was never tested, goes to show how valuable those QC golden stickers are.

    1. Anonymous Coward
      Anonymous Coward

      Or ... it was specifically commissioned by the PRC's DoD, for its supercomputers that are isolated from internet and run only "trusted" code ... and Alibaba thought nobody would notice the abysmal gaping security hole that this creates when used in its internet-connected multiuser environments running arbitrary code (cloud) ...

      1. Snake Silver badge

        RE: isolated from the Internet

        I seriously doubt the 2 laptops and one gaming console mentioned in the vulnerability list qualify as "isolated".

    2. Irongut Silver badge

      Because you wouldn't get huge flaws like this in good old American designed chips from a good old American company like say... Intel?

      HAHAHAHAHAHAHAHAHAHAHAHA

      Racist moron.

      1. gnasher729 Silver badge

        No, you wouldn’t get a bug like this from Intel.

        The first question with every chip is: Does it work in normal operation? The latest Intel disaster was “yes, but not for as long as you expected”. With this chip, there is a bug where it just doesn’t work. Now this bug can be exploited by hardware due to its nature, but primarily the chip just doesn’t work properly.

        Now it’s daft to say this bug was caused by the skin colour of its designers. But it was caused by the attitude of a company that cared even less about its customers than Intel.

    3. gnasher729 Silver badge

      I am told that if you order products in quantity from a Chinese supplier, you pick the price, and the quality, and you get what you paid for.

      Order 10,000 2TB SSDs. Depending on the price that you picked you will get anything from expensive 2TB SSDs that last for years, to a very cheap SSD case with a 512 MB chip inside.

  9. 3arn0wl

    Obit

    I guess the T-Head chips have played their role : providing a powerful enough option for developers to do their work. There are more powerful processors, with ratified extensions, coming along now.

    The C910 is 5 years old now, I think.

    1. Justthefacts Silver badge

      Re: Obit

      No there aren’t.

      T-Head SOC is by far the * highest quality* RISCV on the market with the *most* testing. Few RISCV implementations do anything but emit smoke out of the box. TBH, few RISCV boxes even have anything in them at all, other than a note saying “To Follow when Parts are Available in 5 months time”.

      1. CowHorseFrog Silver badge

        Re: Obit

        Yeh theres absolutely nothing wrong with an instruction that ignores the MMU and lets programs from any process read any absolute address from the lowest privilege level.

        WIth that logic we should also remove passwords from all banking website, after all they only get in the way of people trying to do business.

        1. doublelayer Silver badge

          Re: Obit

          Would it kill you to actually understand what people say? In this case, the comment you replied to did not say that this instruction was fine. From that comment alone, you can determine that they think RISC-V in general is bad and that this vulnerability, while bad, is better than everything else. If you need more clarification, look at the other comments posted by the same person. I don't agree that it's quite that bad, but you've taken up a valiant effort to attack a point that nobody said and the specific person you're responding to would probably disagree with more than anyone else.

          1. CowHorseFrog Silver badge

            Re: Obit

            Im sure someone who can read and appreciate whom my comment is directed towards. Do i really need to hold your hand ?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like