Perhaps space eggheads should formally verify their programs.
NASA's CAPSTONE silence down to a software flaw
NASA has explained what caused communication issues with its CAPSTONE spacecraft: a bug in the code. CAPSTONE (Cislunar Autonomous Positioning System Technology Operations and Navigation Experiment) was launched atop a Rocket Lab Electron in June and on July 4 the company's Photon spacecraft deployed CAPSTONE for a several …
COMMENTS
-
-
Friday 8th July 2022 16:25 GMT A Non e-mouse
Assuming you can formally prove the software*, all that does is show that the software matches the specifications. Not that the specifications are right in the first place.
[*] Formally proving software isn't easy. It's not as if, in your IDE, you go Tools -> Prove and get a simple box back saying "Proved Correct"
-
-
Monday 11th July 2022 11:57 GMT Xalran
Sure, but it costs two arms, a leg, an eye a kidney and a large bit of the liver...
In telecom 30 years ago unitary tests of each and every path in the code was the norm ( and it was mostly automated ) on testbeds ( read : PSTN or MSC exchanges used as testbeds ), then a lighter 'Integration' campaign was done on the first exchange with the new software ( and eventually new hardware tied to the new software ) that did include traffic overload tests. And even then there still was funky weird cases of bugs that showed up months after the software was deployed everywhere.
The whole process was around two years : one year of development and testbed tests, 6 months of integration/comissionning on a new exchange, 3 months of local, on call babysitting by $TELCO equipment builder techs, 3 months to generalize the new software to the rest of the exchanges in a country.
-
Friday 8th July 2022 15:46 GMT steamnut
Testing times..
It beggars belief that seemingly basic errors still happen today.
Surely the writers of the test harnesses should cover all possible states and use cases without exception? And then another team, that only has sight of the requirements specification, writes another testing suite.
The cost of a lost mission makes a few software engineers' salaries chicken feed.
-
Friday 8th July 2022 16:06 GMT Andy Non
Re: Testing times..
I agree, but I wonder if it is sometimes the case there there are many trillions of possible permutations and combinations of circumstances that it isn't possible to test for all of them. Rather like the Swiss cheese model, where many different factors need to line up for a fatal error to occur:
https://en.wikipedia.org/wiki/Swiss_cheese_model
-
-
Saturday 9th July 2022 21:07 GMT Anonymous Coward
Re: Testing times..
Possible errors in any system are subject to the Chaos Theory. The slightest difference in a starting condition can produce a totally different result. I am always amazed by how many digital electronics people are ignorant of the potential for metastability conditions - particularly when an asynchronous signal is fed into a clocked gate.
-
-
-
Friday 8th July 2022 19:38 GMT A Non e-mouse
Re: How to write space software
To me, the biggest take away is that they agree what the software is supposed to do from the outset, before they even think of writing code. How many of us have been involved with software projects where the specs keep on changing? If you can have agreed specs, it makes it so much easier.
(The "don't blame the person, blame the process" culture is important too: It allows everyone to learn from mistakes)
-
Saturday 9th July 2022 02:35 GMT Bitsminer
Re: How to write space software
There is usually a "V&V" process invoked as part of system development.
V for verification:did they build it right?
V for validation: did they build the right system?
The failsafes like radios seeing no action for a few hours or inactive attitude control are probably lessons learned from long ago.
Space is hard.
-
Saturday 9th July 2022 21:14 GMT Anonymous Coward
Re: How to write space software
Mt friends often laugh at me for having at least a Plan B when I do anything. My career in IT taught me that you cater for the specific things you know can go wrong - and also try to handle the contingency of a generic failure.caused by an unexpected condition. What Donald Rumsfeld correctly called "The unknown unknowns".
-
Tuesday 12th July 2022 10:17 GMT Peter Gathercole
Re: How to write space software
I always worked on three plans. Plan A, contingency plan B where there was a chance that the work could still be completed with some additional steps, and the back-out plan.
Of course, where I had a problem, there was always a conflict between plan B and the back-out plan. Where you have a time critical service, the service managers don't really like using the contingency plan if it eats into the time necessary to restore the system to it's previous condition before the work.
This is not quite so easy when your asset is in a remote (in this case, a really remote) location.
-
-
-
-
-
-
-
-
Sunday 10th July 2022 04:59 GMT MachDiamond
Re: Whats Happens If
"Whats Happens If
The Fault Detection System develops a fault?"
You jest, but when I was working at a job fixing audio equipment, too many times the protection circuitry in a power amp was the cause of the problem. While blowing up speakers isn't good, they were often cheaper to fix than an amplifier with some weird problem.
-
Friday 8th July 2022 23:10 GMT Anonymous Coward
The software recovered by itself
What I can glean from the primary sources - Advanced Space and NASA Ames.
The problem started with a ground controller sending a misformatted query to CAPSTONE.
The radio software detected this and shut down which was probably intended.
The fault detection software didn't perceive the radio shutdown as a fault, which it should have, but it is unclear whether or not the radio software told it about the fault, so the misprogramming could have been either in the radio software or the fault software.
The flight software eventually cleared the fault, probably because it couldn't contact Earth, probably by instituting the reboot that the fault software was expected to do, and probably via the fault software. While it was doing that it kept CAPSTONE on course
Note that the boffins at mission control didn't do anything to make any of this happen (other than fat fingering their query).
All in all this is an example of good software with built in redundancies and recovery plans baked in.
What they know is in the middle of the article:
https://advancedspace.com/capstone-tcm1-success/