
The predecessor to HaL 9000?
I'm sorry Dave but no HaL 9000 has ever made a mistake. We have a perfect record...bzzt...feh....zap...
SpaceX and HPE will put a modest little supercomputer into space next week to test how computer systems operate in extreme conditions. On Monday, August 14, HPE’s Spaceborne Computer will blast off to the International Space Station aboard a SpaceX CRS-12 rocket. It’s part of an experiment to examine if commercial off-the- …
Not quite how I'd have gone about things, but I assume the experts know what they are doing.
I'd have sent a box packed with many different sorts of chips running software to detect and quantify errors, with the aim of figuring out exactly how much ECC and self-validating software it takes to make sure a computer can operate reliably even with the radiation. Perhaps eventually involving having two less-than-reliable conventional processors operating in lock-step, with a third rad-hardened, very minimal chip constantly comparing them and initiating a reset every time they do something different. It'd probably weigh less than packing everything behind a big sandwich of plastic, lead and boron.
This isn't rocket science, even if it's being used in a rocket.
The lockstep+comparator idea doesn't really work for modern processors where cache plays a significant part in performance. E.g. there will be correctable cache errors ('soft' errors) from time to time, but they can't be guaranteed to be the same place and same time on each member of the set.
If the comparators are simple, e.g. relying on instruction-level and/or memory-access level comparison, a soft error that occurs on only one member of the set will cause different behaviour from the ones without the soft error, and therefore (in your proposal) cause a reset.
In the real world, fault tolerant systems such as Tandem NonStop (now of course HPE NonStop) nowadays use cache-based commodity processors, same as everyone else. For error detection they use rather specialised comparators that do the comparisons on the eventual IO accesses caused by the instructions. If the IO accesses differ, something went wrong.
There's more to it than that, but more than I have time to describe right now.
Lockstep on conventional processors doesn't work on Earth anymore, which is why is was dropped by the fault tolerant computing HPE NonStop team back during the transition between MIPS and Itanium and Itanium. Modern processors just do too much internal error correction and have too much pulled into the SoC to give a boundary that can be checkpointed and subject to a majority vote. Itanium featured radiation hardened latches in its pipeline and a lockstep mode, but at the cost of running at a fraction of the speed of normal mode and you're paying for at least two if not three processors.
The NonStop team started with an Itanium design that moved the checkpoint voting to the memory subsystem with a hardware checker and memory replication between boxes over optical links, but eventually figured out how to do the whole thing in software, which in turn has allowed them to transition to standard x86 blades.
"Perhaps eventually involving having two less-than-reliable conventional processors operating in lock-step"
Can't vote with two systems, you'd never know which had an error because you wouldn't know the right answer. Use three and the two matching answers can be used. It's different to a normal cluster where you're only detecting failure since here you're also detecting subtle errors.
Solar activity is pretty low right now so things might be just fine. Not rad-hardening electronics is a road SpaceX has been down a couple of times. They ruined some returned experiments when the fridge on board a Dragon capsule got zapped and there was at least one other incident that I can't recall the details of off hand.
Been there, flown that, won the NASA prize.
Simple thermostat without a microprocessor: Temperature sensor, power MOSFET, comparator circuit, trim pot to set temperature, two resistors (inc. hysteresis)
Simple thermostat with a microprocessor: Temperature sensor, power MOSFET, PIC microcontroller. Plus it can do PWM with PID feedback, and soft start.
It's common in electronics to use microcontrollers for absolutely everything now because they are of near-negligable cost and can usually do the task of several more basic components.
Now, yes, you're going to need some well-built stuff to use while in transit between Earth and Mars, but is this also true for computers at rest, powered down, and packed up? Can ionizing radiation have deliterious effects for data or even hardware that isn't operating yet but will be? I would think this to be an interesting question as well as most of the computing power one would take to Mars wouldn't be in use during the trip, only once one arrives.
It takes a lot more energy to damage electronics that are powered off. If a powered off computer is permanently damaged on the trip to Mars, your astronauts are probably dead. I'd worry more about NAND, since it needs to preserve state, but error correction would presumably handle it. Probably you're going to mirror everything anyway, so that should account for the (perhaps unlikely?) case where a single energetic particle is traveling at just the right angle to upset more bits in the same word than ECC can correct.
Radiation inside the Van Allen belts is very low; Except for solar flares it's a non-issue. About 99% of the total solar radiation is deflected by Earth's magnetic field. For Mars, it's another story -- it weighs in at nearly .7 sievert per week. For comparison, the ISS receives about 150 mSv **per year**. It's not a valid test because the environment isn't anything like it would be out there. Regular PCs are already on the ISS, with no real ill effect other than a few extra reboots here and there.
Yet another reason I don't think a manned mars mission is going to happen.
I'd like it to happen. I think it should happen. I wish it would happen. But in the end, it won't - because some group national leaders is eventually going to have to look at the bill and realise that's a hell of a lot of money even by government standards. Especially as the public is going to insist upon bringing the astronauts back again afterwards.
Yet another reason I don't think a manned mars mission is going to happen.
I think it'll happen, but not for several decades yet, and not at anything like the scale those suicidal would-be colonist idiots imagine.
Since any interplanetary vessel is going to have to have a storm shelter for its crew (which most spaceship designers envisage being inside the ship's water tank), it makes sense for that to also be the location for the core computing systems (the water will be a handy coolant too).
And who among us hasn't ever dived into the server room to escape unwelcome visitors?
I'm not sure where you are getting your figures from. On the surface of Mars you are protected by the atmosphere, and Mars itself, so the radiation is similar to that on the ISS. You would also build a shelter from regolith, or situate it in a lava tube, which would give excellent shielding when you didn't need to be working on the surface or during solar events.
The journey there and back are another matter, but hopefully the trip will be under 3 months and the ship itself will provide some shielding. Overall we're talking risk of death by cancer increased by a few percent. If that bothers you, don't go.
Umm. Mars has no magnetic field and an atmospheric pressure about 1/160 that of Earth. The ISS "storm shelter" is about 0.5% of the Earths atmosphere equivalent.
To get the equivalent protection of the Earths atmosphere at ground level on Mars takes a layer of regolith about 3m thick.
As for where I got my information this guy, who should be quite well informed on the subject.
I'm not going to watch a video. Curiosity has a device to measure radiation. It gives 0.67 mSv per day, which about double the ISS exposure rate. Your figure of .7 Sv per week, or 100 mSv per day, is out by several orders of magnitude. Maybe you have your units wrong, or are confusing exposure during the journey with exposure once arrived. (If you missed a "milli" and confused weekly with daily, that would do it.)
http://www.sci-news.com/space/science-mars-radiation-measurements-surface-01629.html
On Earth we routinely simulate much of the space environment with one massively significant exception: Cosmic Rays, relativistic particles with extreme theory-breaking energies and unknown origin. We have some reasonable approximations that are a PITA to use at all, and impossible to use on whole systems, as they require de-lidding chips and exposing the naked silicon to heavy ion beams.
Cosmic Rays don't care about the van Allen Belts or Earth's magnetic field. But, thankfully, they are filtered quite nicely by Earth's atmosphere, converting into cascades of other relativistic particles that include muons and pions. These particles themselves have vanishingly short lifetimes when observed in the Lab, yet when coming from a Cosmic Ray cascade, they manage to live long enough to reach the Earth's surface, all due to their startlingly high relativistic speeds.
Cosmic Rays are the The Hulk of radiation, and since we have no clue how to make them on Earth, if you want to expose your equipment to Cosmic Rays, you need to send that equipment above the Earth's atmosphere.
And not far above it either! LEO does just fine.
And for anyone who is a fan of big numbers: https://en.wikipedia.org/wiki/Oh-My-God_particle.
That's probably more computing power than the entire processing power of all the GNC systems of all LV's to date. The processors on Apollo were pocket calculator power IE 32KIPS, Shuttle GPC's started at 0.4MIPS and upgraded to 1MIPS each. The ISS runs (IIRC) 40MHz 386s. The bigger Mars rovers run Power PC's at around 200MIPS (and $100K a board, hence the interest in OTS processing).
As for radiation RAM started using on chip ECC because of radioisotopes in the packaging material decades ago. They don't report statics because a)It would tie up valuable pins and b)Who cares as long as the state read out is the same as the state read in.
Servers should have ECC for ram as standard, and logging processes as standard for SNMP (obviously the packet delay will be a bit of an issue).
Likewise "spinning rust" is AFAIK a lot more rad hard but it induces motion in the structure, unless you have pairs of contra-rotating disks to cancel those forces out. Sounds crazy but despite its size the ISS is not actually attached to anything
Obviously HPE are hoping a good result ouf of this will make them the goto supplier for HPC systems but getting hardware NASA certified and you can bet it will have to be NASA certified if any kind of software is running that's mission critical and the mission is NASA funded.
IOW upgrading to new processors is usually a massive PITA, which is why space runs with hardware generations behind the SoA in processing power. SX accepts the systems will reset and is OK with that, but getting that accepted by NASA for ISS docking must have been a nightmare.
https://www.quora.com/What-are-computers-used-for-on-the-ISS
So there's three main US computers - of which one is considered Primary, one Backup and one Standby at any one time - and three main Russian computers which work simultaneously. These are accessed using laptops, seven US, seven Russian, running Linux. These systems govern the stuff you really don't want to go wrong.
Less critical stuff - inventory control, note taking, on board experiments, email etc - is handled by some Windows laptops, mainly Thinkpads as can be seen from photographs from onboard the ISS.
You're right, the ISS is full of laptops.. Lenovo Thinkpads mainly. However these are FAR from considered "off the shelf". In one way or another, specialised Thinkpads have been flying to space since 1993 aboard STS-61. They're quite significantly modified however to meet stringent NASA requirements.
Here's an interesting story from a few years ago, posted to nasaspaceflight.com by one of the IBM project managers responsible for initially putting the Thinkpads on the shuttle.
https://forum.nasaspaceflight.com/index.php?topic=27043.0
TL;DR - The laptops on the ISS aren't "off the shelf" at all.
they have been going to that space station for flippin decades and must have a wealth of information from various ibm thinkpads and advent computers from dixons unless all they have been doing is taking golf clubs and tennis rackets for some sort of human interest angle to get on the news.
HPE has now owned SGI for long enough that all their best engineers will have left and the remaining ones will have been eliminated as redundancies. Therefore, all that's left is HPE engineers... which got rid of all their useful people throughout the dictatorship of the past 3 suits in charge.
I have some serious questions though.
1) If an HPE computer produces the wrong results due to random behavior... Is this considered a success or a failure?
2) If an HPE computer fails in space and support is needed, is the call routed through mission control first or does it go directly to India?
3) How will the cooling system impact the ISS? HPE last I checked only uses one model of fan and it's REALLY REALLY loud... on purpose because they think that if Ferraris sound faster because of how loud they are, then computers should too.
4) 56Gbp/s interconnect? Wasn't this supposed to be a supercomputer? I buy old used 56Gbp/s infiniband equipment for pennies on the dollar these days. Super computers should be running 10 times that by now. Or is this the HPE version where we sell yesterday's technology today?