back to article NASA's CAPSTONE silence down to a software flaw

NASA has explained what caused communication issues with its CAPSTONE spacecraft: a bug in the code. CAPSTONE (Cislunar Autonomous Positioning System Technology Operations and Navigation Experiment) was launched atop a Rocket Lab Electron in June and on July 4 the company's Photon spacecraft deployed CAPSTONE for a several …

  1. Anonymous Coward
    Anonymous Coward

    Perhaps space eggheads should formally verify their programs.

    1. Greybearded old scrote Silver badge
      1. Mike 125

        Knuth proved his code correct.

        He knew that trying it doesn't prove the absence of bugs- merely their presence.

    2. A Non e-mouse Silver badge

      Assuming you can formally prove the software*, all that does is show that the software matches the specifications. Not that the specifications are right in the first place.

      [*] Formally proving software isn't easy. It's not as if, in your IDE, you go Tools -> Prove and get a simple box back saying "Proved Correct"

      1. stiine Silver badge

        It doesn't matter if the software is perfect, hardware can have issues, too. I remember a router who's ASIC added the length of the padding bytes to the length of the data field of the packet if it was of a particular type.

    3. StrangerHereMyself Silver badge

      Unit test, unit test and unit test.

      And integration tests and end-to-end tests to see the results of every possible scenario you can come up with no matter how far fetched.

      Only then will you be able to sleep soundly as a software engineer.

      1. Anonymous Coward
        Anonymous Coward

        There are a finite number of things that you can think of that could go wrong. There are an infinite number of things that could go wrong.

        As Murphy's Law states "Anything that can go wrong - will go wrong" - and Sod's Law adds the rider "...at the worst possible moment"

        1. swm

          As Murphy's Law states "Anything that can go wrong - will go wrong" - and Sod's Law adds the rider "...at the worst possible moment"

          And Roget's corollary says, "Murphy was an optimist."

        2. StrangerHereMyself Silver badge

          You'll be surprised how many software engineers fail to even think through basic, happy flow scenario's. If you have unit tests you can at least point these out to them.

      2. Xalran

        Sure, but it costs two arms, a leg, an eye a kidney and a large bit of the liver...

        In telecom 30 years ago unitary tests of each and every path in the code was the norm ( and it was mostly automated ) on testbeds ( read : PSTN or MSC exchanges used as testbeds ), then a lighter 'Integration' campaign was done on the first exchange with the new software ( and eventually new hardware tied to the new software ) that did include traffic overload tests. And even then there still was funky weird cases of bugs that showed up months after the software was deployed everywhere.

        The whole process was around two years : one year of development and testbed tests, 6 months of integration/comissionning on a new exchange, 3 months of local, on call babysitting by $TELCO equipment builder techs, 3 months to generalize the new software to the rest of the exchanges in a country.

    4. Anonymous Coward
      Anonymous Coward

      There's a proverb in my language, which translates like this.

      When accident happens at sea, the people on land are very wise.

    5. MachDiamond Silver badge

      It is tested and tested and tested which is why it's so confounding that so much actually does go wrong. I've got all sorts of consumer electronics that's never had an issue as long as I've owned it. Much less expensive and with higher performance too.

  2. elregidente

    > After all, in space, no one can hear you blue screen.

    Ooooooooooooooh that deserves a comment :-)

    +1 thumbs up

    New to me :-)

    1. Anonymous Coward
      Anonymous Coward

      "In space, no one can hear a guru meditation."

  3. steamnut

    Testing times..

    It beggars belief that seemingly basic errors still happen today.

    Surely the writers of the test harnesses should cover all possible states and use cases without exception? And then another team, that only has sight of the requirements specification, writes another testing suite.

    The cost of a lost mission makes a few software engineers' salaries chicken feed.

    1. Andy Non Silver badge

      Re: Testing times..

      I agree, but I wonder if it is sometimes the case there there are many trillions of possible permutations and combinations of circumstances that it isn't possible to test for all of them. Rather like the Swiss cheese model, where many different factors need to line up for a fatal error to occur:

      https://en.wikipedia.org/wiki/Swiss_cheese_model

    2. A Non e-mouse Silver badge
      Mushroom

      Re: Testing times..

      It's a technology demonstrator. They're trying new stuff. Sometimes new stuff doesn't work as planned.

    3. fidodogbreath

      Re: Testing times..

      The cost of a lost mission makes a few software engineers' salaries chicken feed.

      No cost is 'chicken feed' to a corporate bean counter.

      1. Andy Non Silver badge

        Re: Testing times..

        Nowadays, the cost of chicken feed isn't chicken feed either!

    4. Flocke Kroes Silver badge

      Re: Cost of a lost mission

      This is a really cheap mission, intended to test software that will be used with multiple really beyond ridiculously over priced missions ... some with astronauts.

    5. StrangerHereMyself Silver badge

      Re: Testing times..

      Uhmm, no. As this was a very low cost mission. The software development was probably outsourced to India for $6 an hour as to be on par with the Calamity Capsule.

    6. Anonymous Coward
      Anonymous Coward

      Re: Testing times..

      Possible errors in any system are subject to the Chaos Theory. The slightest difference in a starting condition can produce a totally different result. I am always amazed by how many digital electronics people are ignorant of the potential for metastability conditions - particularly when an asynchronous signal is fed into a clocked gate.

    7. Ace2 Silver badge

      Re: Testing times..

      The Apollo 11 lander never once took off from the surface of the moon - until the actual mission. The Webb telescope mirror never once unfurled in a zero G vacuum - until the actual mission. Some things can’t be tested.

  4. Zenubi

    How to write space software

    Specifically the shuttle code. Very interesting article.

    https://www.fastcompany.com/28121/they-write-right-stuff

    1. A Non e-mouse Silver badge

      Re: How to write space software

      To me, the biggest take away is that they agree what the software is supposed to do from the outset, before they even think of writing code. How many of us have been involved with software projects where the specs keep on changing? If you can have agreed specs, it makes it so much easier.

      (The "don't blame the person, blame the process" culture is important too: It allows everyone to learn from mistakes)

      1. Bitsminer Silver badge

        Re: How to write space software

        There is usually a "V&V" process invoked as part of system development.

        V for verification:did they build it right?

        V for validation: did they build the right system?

        The failsafes like radios seeing no action for a few hours or inactive attitude control are probably lessons learned from long ago.

        Space is hard.

        1. Anonymous Coward
          Anonymous Coward

          Re: How to write space software

          Mt friends often laugh at me for having at least a Plan B when I do anything. My career in IT taught me that you cater for the specific things you know can go wrong - and also try to handle the contingency of a generic failure.caused by an unexpected condition. What Donald Rumsfeld correctly called "The unknown unknowns".

          1. Peter Gathercole Silver badge

            Re: How to write space software

            I always worked on three plans. Plan A, contingency plan B where there was a chance that the work could still be completed with some additional steps, and the back-out plan.

            Of course, where I had a problem, there was always a conflict between plan B and the back-out plan. Where you have a time critical service, the service managers don't really like using the contingency plan if it eats into the time necessary to restore the system to it's previous condition before the work.

            This is not quite so easy when your asset is in a remote (in this case, a really remote) location.

      2. Alan Brown Silver badge

        Re: How to write space software

        "To me, the biggest take away is that they agree what the software is supposed to do from the outset"

        THIS is what software architecture is about and always has been

        Approaching it any other way is a recipe for failure

  5. Jan K.

    Inload? Outload??

    Since up and down really doesn't make any sense out there, shuldn't the datastreams to and from the spacecrafts more correctly be called "inloads" and "outloads"?

    1. Spherical Cow Silver badge

      Re: Inload? Outload??

      Up and down don't make any sense on Earth either when it comes to data transfer. If I have a PC on a hill and I transfer a file to a server at sea level, did I upload it or download it?

      1. This post has been deleted by its author

        1. Benegesserict Cumbersomberbatch Silver badge

          Re: Inload? Outload??

          That is correct... from a certain point of view.

        2. John Brown (no body) Silver badge

          Re: Inload? Outload??

          You have a Klein bottle. Are ingressing or egressing?

        3. Anonymous Coward
          Anonymous Coward

          Re: Inload? Outload??

          Apparently showman Barnum came up with a way to get people to pass smoothly through a series of exhibits. The final sign said "To the Egress".

  6. The Oncoming Scorn Silver badge
    Coat

    Whats Happens If

    The Fault Detection System develops a fault?

    1. Spherical Cow Silver badge
      Joke

      Re: Whats Happens If

      There's a system checking for those faults which is in turn checked by another system looking for faults... it's FDSs all the way down ;-)

      1. TJ1
        Joke

        Re: Whats Happens If

        That's due to those Turtles - there is no evidence that any Turtle ever wrote a software bug!

    2. stiine Silver badge

      Re: Whats Happens If

      Read Douglas Adams' "Mostly Harmless". I think the computer blacked out.

      1. TRT Silver badge

        Re: Whats Happens If

        The Grebulons?

    3. Anonymous Coward
      Anonymous Coward

      Re: Whats Happens If

      That's a case when a software error contingency handler triggers the same error. Usually recursion into either a loop and possibly total resource depletion.

    4. MachDiamond Silver badge

      Re: Whats Happens If

      "Whats Happens If

      The Fault Detection System develops a fault?"

      You jest, but when I was working at a job fixing audio equipment, too many times the protection circuitry in a power amp was the cause of the problem. While blowing up speakers isn't good, they were often cheaper to fix than an amplifier with some weird problem.

    5. Peter Gathercole Silver badge

      Re: Whats Happens If

      You end up putting mutually checking fault systems. Preferably an odd number more than one of them.

  7. Anonymous Coward
    Boffin

    The software recovered by itself

    What I can glean from the primary sources - Advanced Space and NASA Ames.

    The problem started with a ground controller sending a misformatted query to CAPSTONE.

    The radio software detected this and shut down which was probably intended.

    The fault detection software didn't perceive the radio shutdown as a fault, which it should have, but it is unclear whether or not the radio software told it about the fault, so the misprogramming could have been either in the radio software or the fault software.

    The flight software eventually cleared the fault, probably because it couldn't contact Earth, probably by instituting the reboot that the fault software was expected to do, and probably via the fault software. While it was doing that it kept CAPSTONE on course

    Note that the boffins at mission control didn't do anything to make any of this happen (other than fat fingering their query).

    All in all this is an example of good software with built in redundancies and recovery plans baked in.

    What they know is in the middle of the article:

    https://advancedspace.com/capstone-tcm1-success/

    1. Paul Hovnanian Silver badge
      Coat

      Re: The software recovered by itself

      "The problem started with a ground controller sending a misformatted query to CAPSTONE."

      At least now we know where Bobby Tables is working.

      1. John Brown (no body) Silver badge
        Coffee/keyboard

        Re: The software recovered by itself

        Git!

      2. Anonymous Coward
        Anonymous Coward

        Re: The software recovered by itself

        The explanatory XKCD

      3. herman

        Re: The software recovered by itself

        I think Bobby Tables got a summer job at Shaw Communications in Canada.

        1. John Brown (no body) Silver badge
          Pirate

          Re: The software recovered by itself

          "I think Bobby Tables got a summer job at Shaw Communications in Canada."

          So he logged into Rogers to screw over the competition?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like