I always assumed that NASA would be ...
Who would have predicted that they would be involved in a
riskRISCy business deal ?
Chip designer SiFive said Tuesday its RISC-V-compatible CPU cores will power NASA’s just-announced High-Performance Spaceflight Computer (HPSC). The computer system will form the backbone for future manned and unmanned missions, including those to the Moon and Mars. Its microprocessor will be developed under a three-year $50 …
Maybe you haven't kept up with the news but countries like China and Russia are going all in with RISC-V precisely because it's open-source and not subject to trade bans.
I predict it will not be long before the U.S. government labels RISC-V as an antagonistic platform because it allows adversaries like China and Russia keep up with Western CPU technology.
Well it really will be the day that the US government loses the plot when it shuns the best efforts of one of its top universities, which builds on 40 years of excellence with RISC.
It's an instruction set!
(imo America isn't going to be successful in stemming China's 50-year effort to become an economic powerhouse by attempting to prevent them having access to technologies.)
Huawei has its issues (but at least not hardcoded backdoors like Cisco)
Hikvision's stuff IS a security issue (The monolithic SOFIA binary) - and that bleeds over into most of the other brands and no-name too (_all_ the SOC DVRs and network cameras are essentially identical under the surface)
Killer app, really?
They're just replacing an ancient CPU technology with a more modern one and they chose the RISC-V architecture in the hope that it will still be in use 20 years from now.
Otherwise, there's nothing about RISC-V that makes it an enabling technology for computing in space. But even if there was, how is that relevant for earth bound computing, where the vast majority of computing happens? Furthermore, RISC-V doesn't have a "credibility" problem that needs fixing, it just doesn't have such a developed eco-system as Arm/x86, yet, and NASA won't fix that.
One of the issues with radiation hardening is the need to avoid very small or delicate features which can be destroyed by radiation impact. A few months on a visit to the ISS is one thing, a few years as a mission-critical component on an interplanetary mission is quite another. It will be interesting to see how compact these multi-core processors can be made.
The flip side of smaller silicon is the lower profile for exposure to radiation and that mass reduction can be used for additional graded-z shielding. And any mission critical parts of the chip (e.g. watchdog timer) can still be fabricated with a 250 nm to 150 nm process on a higher layer above the smallest process (which should provide additional shielding to the layers below). 3D flash chips come to mind, with 176 layers on the die.
Microchip swallowed Microsemi which supplies chips to the MIL-STD-750 ("Test Methods for Semiconductor Devices", published by the United States Department of Defense) standard. So I'm sure they have a Cyclotron or two for testing and know exactly what is needed for compliance involving Total Ionizing Dose (TID), Neutron Irradiation, Single Event Effects (SEE), Single-Event Latch-up (SEL), Single Event Gate Rupture (SEGR) and Single Event Burnout (SEB).
There is absolutely no doubt in my mind that the processors eventually produced will be high reliability radiation hardened semiconductor chips, that are totally fit for purpose.
Radiation hardening is all about designing to an expected exposure profile with a minimum functional lifetime. The truth is that nothing lasts forever, it is impossible to build anything that will not eventually fail. For example, once you dope a semiconductor that process never actually stops (it slows down by many many orders of magnitude), but given a long enough time even at room temperature the dopants will all eventually be homogeneous throughout the entire chip.
It certainly is about exposure profile over an operational lifetime.
Shielding can bring its own problems. If it's thick enough, great, the shielded payload is untouched. If it's not, then it's worse than useless; all it does is convert the incoming single high energy particle into a blunderbus of collision debris, which is practically guaranteed to ensure that the chip gets clobbered.
A deep space payload is going to encounter something hard and fast enough, eventually, that'll overcome any amount of shielding. Sometimes you just have to resort to making the chip able to take the hit for itself.
I'm pretty sure that the feature size used by the BAE 750 based card of yore is going to remain a necessity, as will the saphire substrate and all the additional fault detection logic. Which will pretty much guarantee that, per core, the performance will not be any better than is currently in service. We're not going to be see super computing done in space, any time soon!
Ii found this blog about how VxWorks, JPL's favourite OS, supports multicores: here
"A deep space payload is going to encounter something hard and fast enough, eventually, that'll overcome any amount of shielding."
Once you're building IN space, using water(ice) bags will make life a lot easier. All those lovely hydrogen atoms bunched up nice and tight work pretty well at slowing things down
Using voting arrays of systems is a oretty good way of ensuring that any given particle hit won't be a major issue and once we're out at mars or further we're going to have to be looking at something better than solar power unless you want to see silly-size arrays (with attendant inertia and twisting moment problems)
At modern linewidths, achieving TID is rather trivial. The scaling of gate thresholds actually reduces radiation sensitivity the smaller you go. Almost any technology at 14nm or below will survive 500krad - anybody who tells you otherwise has something to sell you.
SEU/SEE is relatively well-understood, and can be achieved using triple-majority voting, plus error-detecting codes on memories. Also, even terrestrial CPUs often have SEU tolerant designs, with SECDED in their caches, because it’s needed in near-threshold voltage design. I think all Intel server chips do, and I know many quite boring standard ARM chips do. You don’t really need to make any special CPU for that, the secret sauce is at the system-level.
The kicker nowadays is latch-up. That’s hard, because you can get micro-latch up that is really hard to detect, but can cause damage. And I don’t think shielding really *works* functionally. You just can’t attenuate the radiation events sufficiently to avoid needing full measures, and once you have taken the design measures correctly, why shield?
I actually have IP in this area, in the board architecture and OS. From analysis and actual tests on a beam line, we demonstrated that you can take “any off-the-shelf 7nm CPU” and make a space-grade radiation-tolerant system. However, TBH, despite my best efforts to get traction and sales, I got nowhere in the market.
Not bitter, but I think the reality is: whatever the spin given to the funding bodies, the large space companies get *enough* performance out of the old ways of doing things that the mileage of importing a new philosophy to save a few hundred watts and couple kilos is just not worth the perceived risk.
In the end, just take any old CPU and spend oodles of cash porting it to the most modern rad-hard-by-design silicon technology, and all the old-space companies are happy. Partly because they are on cost-plus. They state a “need” for a rad-hard chip, and they receive the funding to develop it. It is what it is.
Than rad hardness and power consumption.
You have to ask wha'ts the mission of these processors?
Normally the mission of a probe is to send data back home to Earth.
If you want to collect a great deal of data, crunch it down and then send the synopsis to Earth that's something else.
I can see use cases for it, but possibly not as many as people might think.
The processors that ran Shuttle were versions of the IBM S360 architecture developed for militar apps, not specifically the Shuttle programme (although the IO was).
We'll see how "successful" the design is by how many organisations outside NASA use it for their missions as well.
"It's designed to be used on a spaceship"... Wouldn't the that argument apply to any CPU they built for a spaceship?
I don't think that the details of the design is particularly relevant beyond the fact that NASA are using RISC-V. What would be more interesting is why they chose it over ARM or x86.
My guess as to why they chose Risc V is that they didn't. I'm guessing that it came along with the choice of contractor, someone willing to do the work to make a CPU that will only ever be bought in very small numbers, someone for whom the kudos of having done it is potentially valuable all by itself, meaning they're willing to undercut proposals based on anything else.
Arm cores do not need the publicity, nor does x64, nor PowerPC really. Risc V maybe benefits from the publicity.
A concern I have is that for NASA, this may be a bit of a risk. RiscV is undoubtedly capable from a technical point of view, but it's far from guaranteed to survive and thrive commercially, or (worse) may become China's CPU (in the sense they end up being the only ones driving it commercially). If it does fizzle commercially, then software support will drop away, or perhaps NASA becomes dependent on Chinese authored software tools. NASA, if they end up with only RiscV rad hardened CPUs, very much needs RiscV to be commercially healthy in the West.
This is true. From quite a chunk of time of my career spent on this exact problem:
Way back in the 00’s to very early 10’s, ARM were very keen to get involved in the space industry. They did quite a lot of internal R&D, paid for entirely on their own dime, and had a dedicated Sales Manager, Product Manager and dev team for the space industry. Despite their best efforts, they never won a *single* contract with European Space Agency (ESA).
The contracts always went to Gaisler for their LEON-FT, owned by Jiri Gaisler, who was previously an employee of ESA as their Head of Department, and then “struck out on his own”. Apart from the fact that 100% of the multi-ten-million revenues of Gaisler came from ESA contracts for over a decade. You can throw rocks at me for implying…..but these are simply *facts*.
Anyway, round about 2012-2013 IIRC, ARM effectively gave up on the space industry, decided there was no actual business to be had, and re-assigned the dev team and associated Sales.
It's the way the EU bureaucracy works. It would be very normal for Gaisler to be paid in briefcases full of €500 notes - or, for a few notorious years, even in €1,000 notes. Many EU institutional accounts never pass audit, because unexplained and structurally unexplainable anomalies (member nations not passing back audited accounts being a favourite).
But don't knock the EU because of that. Most US Gov spending runs through analogous pig troughs, as of course do Russian and Chinese.
Does fairness consign a business or nation to being small fry? Just ask Charles Darwin.
RISC v... less instructions by 300% from ARM. NO license fees... can be customized, Can gang math co-processor externally.
This is big news for RISC-V and the concept of open source. Intel and ARM have stifled computer development for decades.
This doesn’t make any sense:
“Less instructions by 300%”. I assume you mean “requires 3x less code-space for the same functionality”? Then this is just wrong. People have measured it on real codebases, it’s almost identical on average. There’s the odd small win for RISCV here and there, because it’s sort of THUMBlike….but equally there’s a few small wins for ARM in other cases too. If you want to claim 10% better on some hyper-specific benchmark you particularly want it for, and have measured it, I’m not going to argue.
“No license fees”….On a project like this, ARM would give you the licensing to do what you need, around $50k. Targeting the core to a new rad-hard technology, including all the bits and bobs, is not going to cost below $1M even for a bare-bones project like this. The mask set at 7nm is going to cost you $5-10M. Honestly, please give me a reason to care about a line item less than 1% of project cost.
“Can gang math co-processor externally”….errr, so can ARM.
“Big news for RISCV”….Not really. The big news is that in Europe they had their own open-source pet project for nearly two decades now called LEON-FT. Mostly Gaisler made it. Now probably RISCV will be the new pet project replacing LEON-FT. Tell me why I should care in the wider world?
Mission data munging on board presupposes that everything you can ever do to the data has been thought of beforehand. But we already know that's not a good idea. There's lots of stuff that was done with Voyager data that hadn't been possible to do on the ground or in space at the time it was launched.
Look at the Voyager interplanetary data. NASA were diligently recording that boring meaningless data for decades until some bright spark thought of a use for it. It was famous at the time because the tape archive had become nearly unreadable, and it was a major piece of data recovery work to make that scientific investigation possible.
Far better to repatriate raw data, as much as possible, and sort it out on the ground afterwards. That opens up all manner of value boosting opportunities. Plus that reduces processing demands on board.
Agreed, nothing missed at all, but it's difficult to see how they're going to exploit the extra grunt. I think they don't need it for flying spacecraft. Faster data compression? Faster bandwidth radio links? That makes sense.
Missions beyond LEO need a lot of on-board processing simply to navigate and aim the instruments. It cannot be done remotely.
A flyby of a moon can last a few seconds, during which the spacecraft needs to do a huge amount of work to ensure all the instruments can collect as much data as possible.
The more accurately the instruments can be aimed, the more sensitive they can be.
It'll then spend the next few weeks uploading, then go to sleep again for months.
There's also the need for rapid "find the most interesting spots" during approach. There might not be enough time for such targeting data to be sent back from the spacecraft, a human rapidly pick which places look the most interesting and send the list back.
If the spacecraft can be told "look for things that look like this - or just surprising" perhaps it can do the fine targeting itself.
(Have been wracking my memory - and wild web searches - but I just can't bring the details to mind, grr:)
One of the old space probes took photos of some previously-unknown moonlet (?) because one of the programmers had found a bit of room to insert a pattern-recognition routine as an freebie extra feature (and was allowed to put it in).
(IIRC one of the Voyager or Pioneer craft, but probably not)
That's only soft real time, VxWorks is hard real time.
Preempt RT is in a bit of a mess at the moment, having fallen behind quite badly... Just looked, seems to be better off now than it had been.
Also, there's a key feature of VxWorks that Linux can't match: VxWorks' loader. If you don't know what it is or what you can do with it, we'll, it's complicated. But it has allowed NASA to save launched missions that, had they been running other OSes, would have been lost.
People who're knowledgeable say most users don't need hard real-time. Even when they think they do. They can make do with soft-real time or reduced unbounded latency.
If you really need hard real-time you can always alter the kernel and handle the interrupts yourself. That's as hard real-time as you can get, better even than VxWorks.
"People who're knowledgeable say most users don't need hard real-time."
The ECU in your car does need hard real-time. (Image a misfire at motorway speeds.)
The CU in your robot does need hard real-time. (The robot arm hits something as it wasn't commanded to stop in time.)
The FCU/ECU in your plane (commercial or military) does need hard real-time.
The CU in your modern train (elevation or suspension) does need hard real-time. (The maglev could smash into the track ripping up miles of it, killing passengers.)
The FCU in your spacecraft (at least on take-off and landing) does need hard real-time. (Only ever one missed instruction from a RUD.)
One advantage of hard real-time in VxWorks is that you can run all of your critical tasks individually separated in time and space from all of the other tasks running at lesser criticality. Say for a car, the ECU could become part of a single multicore system where the rest of the car systems (probably not the entertainment) run on the other cores. So one control unit for the car rather than 10 or 11. In avionics where weight is a real issue then this federated systems approach is even more critical.
For medical e.g. MRI or nuclear, for instance, you can claim that soft real-time is applicable, but I for one would rather have guaranteed responses, rather than a best effort of software. (We have all experienced the temporary hang of Windows or Linux at one time or another.)
"If you really need hard real-time you can always alter the kernel and handle the interrupts yourself. That's as hard real-time as you can get, better even than VxWorks."
But in doing so you are cutting yourself off from the rest of the generic OS infrastructure, modularity and other system wide aspects. Yes, that is fine in a tiny closed control loop, but not as part of a larger federated system.
I have used VxWorks, because you did, but there are other real-time OS out there that I have used, and written drivers for. But none have a better pedigree and ease of working with than VxWorks.
I could bore you for hours on the subject of multicore redundancy ...
[BTW - the first non-Linux OS to support RISC-V was VxWorks, and that was [secret|many] years ago.]
With the RISC-V vector extensions only recently stabilised, and many other feature sets still in flux, this seems like a really odd choice for a project that should be able to remain unchanged for the next 20+ years or so, without OTA updates and fixes to patch up whatever ISA and architectural issues that may have remained undetected until it's running for a decade in a deep space probe or so.
As uncool PPC is, you'd be hard-pressed to find any stability or other 'exciting' issues with it. Same for older ARM and x86 cores.
The Libre-SoC project website may look like it is dead, but from their git's the team are still beavering away. And IBM OpenPower did announce a new Baseboard Management Controller (BMC) called LibreBMC back in 2021-05-10, which have their next meeting on 2022-09-15. Reading through the online minutes of previous meetings it 100% is the Libre-SoC. So is sounds like the 22 nm "PowerPi" chip, is still being worked on, with the unpatented Seymour Cray CDC-6600 scoreboard for extra performance, using less logic, less die area and less power!
That project was dead before it started. You can’t build a viable GPU using a Power CPU chip and a software OpenGL rasteriser.
It’s number of transistors per square inch that counts in a GPU and general purpose CPUs have too many.
I’d forgive them if they had shipped even the most basic GPU by now…. …but not even a frame buffer card?
> I’d forgive them if they had shipped even the most basic GPU by now…. …but not even a frame buffer card?
So you would like the penthouse in a skyscraper, before the foundation is in place. The roadmap has the GPU near the end because it is not easy, it is damn hard to get right and they have been in contact with lots of experts, and are still trying to maximise what can be done while minimising what is needed, while maximising reuse. The thing is the design has a lot of innovation in it, and it takes a lot of time to absorb and analyse an insane amount of information. If their goal was to sell a blob riddled unauditable GPU, that could done simply by buying off the shelf IP with NDA's and blobs that can sit there for years never updated. But that is not the goal for the project, the goal for it all is to have 100% open source, 100% auditable everything.
Their target is commercial-grade Video Decoder (minimum 720p) and helps restore trust in the software *and* hardware unlike the Intel and AMD chips. And you can lump the closed source Nvidia blob riddled GPU's in there as well.
> It’s number of transistors per square inch that counts in a GPU and general purpose CPUs have too many.
That is a bit of an odd statement, the CPU, GPU and VPU in this case are integrated and share resources, so the critical GPU parts will have minimal gates as usual and the CPU and VPU less due the to scoreboard. Instead of the CPU needing to send requests along a relatively slow bus (PCIe) for the GPU to do things, that delay is severely reduced, because you are dealing directly with on chip registers.
> You can’t build a viable GPU using a Power CPU chip and a software OpenGL rasteriser.
Look at the slides from a presentation 2 years ago, page 6 shows a 8/12/16 SMP processor and page 7 shows the roadmap. I'd say that they are at the "Develop Vector ISA with 3D and Video extensions, under watchful eye of OpenPOWER Foundation" stage. From the latest videos I've seen on youtube ('limitations of mis-named "Scalable" Vector ISAs') mostly highlighting the problems and (good, mostly bad) solutions with other chips (Power, ARM, Intel, RISC-V, Cray), that they are still working on the details of what is required for vectors and and are thinking about 3D and Video extensions but that they are not quite there yet.
I've no idea if the plan is that all 8 to 16 cores will have their own VPU, but the ones that do have the potential to process 128 vector registers using a scoreboard (cray) for out-of-order execution. Surely the potential ability to process 1024 (8 core) to 2048 vectors (16 core), makes it at least possible to do most things required of a 3D GPU, with their own custom (Vulkan compliant) Kazan 3D Driver. This comes to mind.
Another great option or way to silicon tech SIFIVE corp. viewed to total next future, there are "photonics tech", inmune to radiation, very speedy and extreme lower power requeriment. This last option there are a winning one. What there are problem ? America suffer lack of oneself confidente ?
Seems like a solid project 50million/3years isn't negligible, but it's not a high volume deal and I have no idea how many people are actually capable of delivering a compliant CPU to NASA considering most of this industry engineering wise is under 30. By the looks of it they are also desperately trying to hire for safety/automotive too despite their marketing material suggesting they already have solutions.
Either way a good endorsement for RISC-V claims. Companies may come and go but ISAs should ideally not be tied to them for such deeply embedded applications.
What is it with this world and 'Speed and Power'? Internet access is sold by it, server and desktop CPUs are sold by it.....because that's what matters in those terrestrial use-cases.
Out in space, or places with a piss-poor magnetosphere etc., I would argue that reliability and/or longevity is more important than huge processing speed...so IMO these should be built using a process size way larger then the current 10, 7, 5 and now 3 nm processes for survivability. As others have pointed out, there is only so much sheilding can do - sooner or later, something very energetic will cause direct or indirect gate damage.
Same goes for the storage. If it's flash, go for SLC with plenty of error correction and redundancy.
That said, NASA/JPL must have thought of the risks involved in such a processing grunt step change...including the power supply and consumption.
If I am not mistaken a lot of space-bound CPUs use comparatively large manufacturing processes for precisely this reason. The raw processing power for essential functions like navigation and communication are not particularly heavy. Store and dump data to Earth and let us do the computation here for everything non-flight critical are the default strategy for good reason.
The RAM is one of the most awkward components. Apollo era magnetic core RAM was by definition impregnable to a radiation induced bitflip. Whereas a silicon transistor, especially one on a extremely compact production process is easy to torch. ECC (sometimes) helps, better to prevent radiation getting in full stop of course.
The defected MiG-25 being found using to use ancient valve/magnetic core electronics might be an apocryphal story; but it would technically provide radiation hardening to the system.
Carrying around sufficiently lead-lined caskets to do radiation protection for critical components is obviously expensive in context of rocketry. Alpha and beta radiation are pretty easy to stop. Gamma, not to much.
As with everything, design tradeoffs to be had!
The European Space Agency is doing a similar thing -- with their own tech firms.
Interestingly, it is a CPU with a "bootstrap option to select SPARC or RISC-V instruction sets".
I would like to see the boot sequence -- is there an instruction overlap that allows to distinguish the ISA? Do they dynamically change ISA depending on the phase of the mission?
As an interesting aside, increasing the MIPS of a space processor, has a very non-intuitive effect on software reliability. Even most people in the space industry don’t understand this.
Most of the evidence they have that their development processes actually give them the software correctness they value, is based on “we’ve had it in orbit for 15 years, and it’s never gone wrong”. The majority of the development process is focused on code-review, coding standards, and code autogeneration. *Test* is much, much less. They think they do a lot of test, but not by comparison with out-of-industry norms, even compared to non-safety-critical. They don’t fuzz-test for example.
But the industry standard for an Onboard Computer is 1 MIPS, single core. Almost all the control computers, that’s what they are. And if you think about it, that means it’s only executing 1/10000 the number of instructions of a quad-core executing at 2.5 GHz. So, 15 years divided by 10,0000…..how excited would you be to hear about a Linux server with an uptime of half a day? That really is the conversion factor.
The un-appreciated reality, is that spacecraft software is believed correct by a combination of being very, very few LOC, and *not actually executing for long enough to experience enough edge-cases*. Speeding an OBC up to “normal speed”will expose far more software bugs in orbit by coverage of edge cases.