
A candidate...
...for a future edition of El Reg's 'This Old Box', perhaps?
Commander Chris Ferguson and pilot Doug Hurley have fixed space shuttle Atlantis' General Purpose Computer (GPC) 4, which clapped out last night requiring transfer of its duties to another of the shuttle's quintet of GPCs. The pair have reloaded software into the unit and it has "been added to the common set of GPCs and is …
Just upvoted your post, with enthusiasm. "That's affirm", as they say in Mission Control.
Man, that'd be the mother of all "This Old Box" columns.
While you're at it, Vultures: if you're going to do a "This Old Box" column on the Space Shuttle GPC, then you absolutely _have_ to do one on the Apollo computers: the Guidance & Navigation computer aboard the CM, as well as the LM onboard computer. There's some interesting stories about how the fussy LM computer aboard Apollo 14 nearly prevented them landing until they figured out how to hack it -- yeah, that's right, hack it -- in a procedure involving geeks on the ground hacking a backup LM computer in Houston and radioing up the hacks to Shepard and Mitchell in lunar orbit. Talk about "BOFH And Proud".
http://www.universetoday.com/42045/apollo-14/ :
"After separating from the command module in lunar orbit, the LM Antares also had two serious problems. First, the LM computer began getting an ABORT signal from a faulty switch. NASA believed that the computer might be getting erroneous readings like this if a tiny ball of soldering material had shaken loose and was floating between the switch and the contact, closing the circuit. The immediate solution was to tap on the panel next to the switch. This did work briefly, but the circuit soon closed again. If the problem occurred after the descent engine fired, the computer would think the signal was real and would initiate an auto-abort, causing the Ascent Stage to separate from the Descent Stage and climb back into orbit. NASA scrambled to find a solution, and determined the fix would involve reprogramming the flight software to ignore the false signal..."
http://en.wikipedia.org/wiki/Apollo_14 :
"After separating from the command module in lunar orbit, the LM Antares also had two serious problems. First, the LM computer began getting an ABORT signal from a faulty switch. NASA believed that the computer might be getting erroneous readings like this if a tiny ball of solder had shaken loose and was floating between the switch and the contact, closing the circuit. The immediate solution—tapping on the panel next to the switch—did work briefly, but the circuit soon closed again. If the problem recurred after the descent engine fired, the computer would think the signal was real and would initiate an auto-abort, causing the Ascent Stage to separate from the Descent Stage and climb back into orbit. NASA and the software teams at MIT scrambled to find a solution, and determined the fix would involve reprogramming the flight software to ignore the false signal. The software modifications were transmitted to the crew via voice communication, and Mitchell manually entered the changes (amounting to over 80 keystrokes on the LM computer pad) just in time."
http://history.nasa.gov/afj/compessay.htm :
From the Apollo Flight Journal, an overview of Apollo on-board computer systems. No wussy-assed GUI OS's for the old Right Stuff boys.
http://www.universetoday.com/42045/apollo-14/ ...again, from their overview of Apollo 14 computer and landing radar issues. As usual, an emergency software patch introduces new bugs. Sound familiar...?
"Another problem occurred during the powered descent. The LM radar altimeter failed to lock automatically onto the moon’s surface, depriving the navigation computer of vital information on the vehicle altitude and groundspeed. This was an unintended consequence of the software patch. After the astronauts cycled the landing radar breaker, the unit successfully acquired a signal near 15,000 m. Just in the nick of time. Shepard then manually landed the LM closer to its intended target than any of the other six moon landing missions."
By comparison, it sounds like the Atlantis crew had it pretty easy.
Why would anyone even think that slapping a random chip up there would be a good idea? Yeah, for everyday entertainment, or casual use, or non-essential admin work, but for life-critical functions?
Hell, they've *already* got four of the things doing nothing but double-checking each other's calculations, and it's an incredibly hostile environment (think what happens if there's a tiny bit of condensation - you can't just open a window, not to mention flying through varying EM fields, cosmic rays, and potentially unstable power - and tech support is several months away). By comparison, the liftoff was nothing.
I'd much rather they were using these old clunkers, presumably with chips that have 30+ year errata's, billions of hours of real-world testing in commercial and industrial environments and production lines established for decades, than trying to slap an Intel with an undiscovered FDIV bug up there. Plus the old chips were predictable to an extreme degree (down to the individual cycle). Modern chips are multi-core, have vast caches, unpredictable bus intersections and all sorts of problems. That's not what you want. And 1.4MHz is more than fast enough for doing anything critical, it's going to be vastly outweighed by the time for that action to actually have a physical effect.
"But will it play Crysis?"
They're in space! Why would they want to play First Person Shooter #378? SPACE!
(Although I'm sure coming generations, should they get the chance to orbit the planet, will be more interested in checking their "Facebook 3" or "Google++" feeds than actually taking a moment to appreciate the view or the achievement or anything like that.)
They are still making RCA 1802's for certain military applications, and they were originally designed in 1976. Of course, they were the first real microprocessors that were radiation and ESD hardened.
Of course its not fast, but its much likely to continue working in conditions that would cause most other processors to just die instantly.
"and production lines established for decades,"
Not really. But the instruction set is more or less a stock S/360 mainframe and AFAIK they use *standard* mil spec TTTL chips so you can repair it if you have such parts (eBay is NASA's friend as well).
The *real* cost is in all the certification tests to prove that it works in this environment. In terms of gate count you could probably put the whole thing on 1 ASIC (*including* the RAM even at mil spec gate densities).
Embedded designers usually work on the principle of fast enough to get the job done.
BTW These are the *upgraded* GPC's from the mid 90's with DRAM memory (replacing core store) and the whole thing ( CPU + *monster* I/O subsystem) in one box instead of two.
If it ain't broke, don't fix it it essentially the safest thinking here - fine I understand the compulsion in that.
However, that processing speed and memory requirement are easily achievable in a single chip which is surely going to be the most stable possible design, both for mechanical vibration and as far as anti-radiation casings without a huge weight penalty go.
In fact, the main radiation defense in the NASA article says it is a "memory scrubber", in other words it is implemented in software and/or the memory controller meaning they deliberately allow exposure to radiation and then bypass the effects.
It should be possible to recreate the system in it's 'built for purpose' function but utilising some modern advantages in form. I'll send them an email so they can get on with it in time for the next one.
Oh. Hang on...
"no one straps the latest-and-greatest desktop computer..."
Surely there's a little daylight between "the latest-and-greatest desktop computer" and a computer with the processing power and memory of a circa-1980s IBM 286? Not that it matters now I suppose *sob* and it is simply representative of the shuttle's late-70s construction. But it is promising in a way (for the likes of Elon Musk), that one could replace it with one solid-state smartphone containing fifty thousand times more processing power (and an ability to play mp3s so as not to require Elton John to pipe his music up to the astronauts live) and so much smaller and lighter that one could insulate it enough to enable it to survive conditions that would turn the astronauts into reddish mush.
"But it is promising in a way (for the likes of Elon Musk), that one could replace it with one solid-state smartphone containing fifty thousand times more processing power (and an ability to play mp3s so as not to require Elton John to pipe his music up to the astronauts live) and so much smaller and lighter that one could insulate it enough to enable it to survive conditions that would turn the astronauts into reddish mush."
Do you really think they use this gear to play music with? Besides, astronauts have taken more modern kit into space routinely, and if you look into the early Shuttle missions you'll see kit that blew away contemporary microcomputers more widely used on Earth. For example:
http://en.wikipedia.org/wiki/GRiD_Compass
The gear in question, however, absolutely has to function with complete reliability. It can't crap out with a blue screen half way to orbit: such situations are hazardous enough already.
Why do critical safety and avionics systems need to play MP3s? Why do they need 50k times more processing power? The system has proven quite adequate for surface to orbit (and back) travel many, many times. Upgrading would be frivolous, pointless and potentially hazardous; there's nothing to gain and everything to lose.
Astronauts can and do use significantly more modern and powerful computers. I doubt they're using flight control systems to send tweets.
Why spend millions environmentally testing and installing something when what they have does the job just fine.
Also: Denser components and more power = more chance of getting zapped by a cosmic ray. It's an occasional problem here on Earth, and I imagine that it's a more likely event up there.
If it ain't broke, don't fix it.
You're comparing apples with oranges there - talking about size and then comparing it with the criticality of being hit.
Everything on the shuttle's GPC's is critical as it is designed for purpose. NASA know that and designed in a system to bypass the inevitable damage. My thought about re-packaging the GPC's into a smaller form improves the situation by being harder to hit. As a re-package, it should include the "memory scrubber" already present to deal with those unlucky whacks. The risk is reduced and the countermeasure is equally as effective. Then you can go on to all the advantages (increased supplies/cargo payload, etc), perhaps even ease of replacement, which reminds me..
I note you have switched to a material argument, rather than continuing with power or size. If you have any specific data about how easily the state of any given material is changed by contact with forms of radiation commonly found during shuttle orbits, please do point them out as it's the only thing I'd like tidied up.
This post has been deleted by its author
I think it's absoutely bloody fantastic.
Lesson in reliability and code efficiency. Computer absolutely fit-for-purpose. Doesn't play games, surf the web, let you watch porn. It flies the shuttle, exactly what it was designed for.
Remember Apollo 11? How much "intelligence" that machine had? Nowadays we slide a gig. of RAM into our machines, and think nothing. They had "core rope".
Times change, but acheivements do not.
I'm pretty sure this was as robust a piece of 80s tech as they could get, but considering that computing has progressed by a number of orders of magnitude since then, I'm pretty sure one could develop a similar design today in a much smaller package: complete with the necessary redundancies to account for EMI.
One of the developers kindly sent me his personal commentary of one of the few books written on it.
36KB of "Woven rope" ROM with 1KB of RAM implemented by ultrasonic delay lines triplicated and majority voted for reliability in a package weighing roughly 90lb. I also thought the electrochromic displays on the instruction panel were pretty cool, but they never caught on.
Increasing the instruction set size by deliberately causing a math overflow on a variable then *executing* it was my second actual encounter of the idea of self modifying code (the first was the Bell Labs Blit terminal). I'm not sure what other sacred cows of modern software development MIT slaughtered to get the job done.
It's processing speed was *literally* (at 32KIPS ) the same as a modern pocket calculator.
Slow by today's standards if you're talking about pure MIPS, but with a DECENT timeshare OS like MTS (Michigan Terminal System) which was the basis of computing at my alma mater (no, it wasn't Univ. of Michigan, it was Wayne State U. a few miles from Ann Arbor) it could support hundreds, if not thousands of simultaneous users all beavering away in FORTRAN G or H, Assembler/360 or even (shudder) COBOL, APL and a few other weird languages. I did physics simulations on one of these beasts. developing in FORTRAN G because it was cheaper to run then switching to FORTRAN H for the actual simulation runs, and no doubt the simulation would run much faster on my laptop. But my laptop can only support one user at a time (even under Linux due to lack of hardware support for multiple terminals), not hundreds.
Reliability and uptime in "5-nines" territory and that was just the "ordinary" commercial model.
I think that pretty much sums up the good and bad in the article.
Good: 40 years ago a dream team of the worlds best engineers did something really amazing.
Bad: 40 years later they are all retired or dead, and the nation is now so lame that it may never do anything that great again.
It's good enough to get the job done.
Which for hard core embedded system types (it fails, people die) is pretty much the key issue.
The 5 GPCs also communicate with 25 Multiplexer De Multiplexers (MDM). These each have 16 slots for I/O boards but it's not clear if they can pre filter the data so the GPC's only have to deal with "interesting" data. I've never found a reference which says how many of them are fully loaded either (these boxes are pretty heavy). Honeywell (who make them) do a range of ISS hardware with boxes holding 4,8,12 and 16 but I'll bet they don't fit.
The Shuttle was one of the first to use the 1553b LAN standard. The 25 links clock at just about 1mbs each. The later 1773b uses FO cable, could operate as a drop in or use a high speed mode of 20mbs each (it should be more lightning resistant which has been one of the reasons for launch scrubs).
The whole thing could be done in a single ASIC even given that MilSpec hardware is typically 1-2 generations behind commercial density *but* they live in fear of *unreported* changes in the mfg process (Gate array cell design or fab sequence changes in small but *critical* ways, rendering the new stuff more sensitive to latchup/single event upset etc) taking out the whole unit at a critical moment.
GPS's are programmed in a high level language called HAL/S, a language used nowhere else (a bit like the Shuttles aircon fluid but rather less toxic), which is a pity as it allows entry of sup-scripted and super-scripted variables on a standard terminal. More usefully it also had matrix and vector functions (handy if you do a lot of navigation, or computer graphics) and data types, along with real time scheduling functions. Quite nice for the early 1970s.
The software used to loaded in phases (depending on flight stage) from a reel to reel tape drive but this went solid state about 10-20 years ago.
Shuttle software development (done by what was IBM Federal Systems in Houston) was *the* model for the CMU Software Engineering Institute's CMM standard for how good software teams are. They also built the software for Skylab. They did not seem to use CASE and their key technique seemed to be the "Structured walkthrough." It is estimated they cost 10x the average cost of writing a line of code in the US. A lesson for all programmers perhaps?
I think I recall reading in Siddiqi's "Sputnik and the Soviet Space Challenge" that in the seventies, the Soviets managed to save something like over a tonne of mass on their Soyuz launchers by scrapping all the old analog electronics and control systems and replacing them with digital, so there's two sides to holding on to proven tech.