back to article Known software issue grounds Ingenuity Mars copter as it attempted fourth flight

Software issues prevented the fourth scheduled flight of NASA's Mars Ingenuity helicopter. The delay was not unexpected nor prevents future flights. In fact, NASA reassured a waiting world that the helicopter is "safe and in good health." The reason for the failure is also known: a software bug that results in a watchdog …

Page:

  1. Anonymous Coward
    Anonymous Coward

    Ace PFY skills

    So, am I getting this right, turn it off and back on and it should be fine?

    1. Stumpy
      1. nintendoeats

        Re: Ace PFY skills

        Press the button...no not like on your shirt...are you from the past?

    2. Paul Crawford Silver badge

      Re: Ace PFY skills

      Typically the watchdog will do that for you. With extreme prejudice...

      1. bombastic bob Silver badge
        Trollface

        Re: Ace PFY skills

        *BAD* watchdog - no biscuit!

    3. bombastic bob Silver badge
      Happy

      Re: Ace PFY skills

      turn it off and back on

      One good healthy cycle of the big red flat power switch, and that oughta do it, yeah

      (I miss those switches, they look like they actually do something and make a sound when they do it)

    4. Anonymous Coward
      Anonymous Coward

      Re: Ace PFY skills

      Call out charges to do that might be a tad on the high side, though..

  2. steviebuk Silver badge

    Did....

    ....it get a new, forced Windows update*

    I'll get my coat.

    *I know its not running Windows.

    1. oiseau
      Facepalm

      Re: Did....

      ... a new, forced Windows update

      No.

      It's probably yet another incarnation for the infamously crappy Intel e1000e driver module. 8^/

      O.

    2. Jimmy2Cows Silver badge
      Joke

      Meanwhile...

      Somehwere, buried deep within the rumbling bowels of Redmond, an MS black-ops world-domination skunkworks team, featuring the world's greatest moustache-twirling super-villain developers, is tirelessly beavering away on plans to update Windows on any machine not running windows.

    3. Anonymous Coward
      Linux

      Re: Did....

      And not so long ago we were celebrating the fact that Ingenuity ran Linux.

      Fortunately, turn it off and then on again works with penguins too. Just don't tell the animal rights advocates.

    4. TRT Silver badge

      Re: Did....

      Probably an OTA firmware update to implement GPS geo-fencing for drones or something...

      1. Unoriginal Handle

        Re: Did....

        OTA ? Through a vacuum?

        </pedant>

  3. werdsmith Silver badge

    In my fanciful imagination, I picture these guys working on this and not feeling that they are working at all. Just playing with some of the best kit in the world. And when I say "playing" I don't mean like toying.

  4. Mike 137 Silver badge

    A quite common problem

    To ensure a system stays alive, a watchdog timer periodically resets it unless interrupted by valid code. It should be timed to prevent it interfering with functionality, but as systems get more complicated this becomes quite a hard nut to crack. In such a case as this, the watchdog appears to time out (and reset the system) before a valid process has completed. Reducing the time the valid process takes is indeed the quick fix, but it appears in this case that they're having to wait a variable time for some real time event. A difficult enough problem on Terra Firma, but at that distance I take my hat off to NASA.

    1. Anonymous Coward
      Anonymous Coward

      Re: A quite common problem

      > Reducing the time the valid process takes is indeed the quick fix, but it appears in this case that they're having to wait a variable time for some real time event

      Just speculation on my part but maybe they tried to put too much into one step - for good reasons - and are now splitting it into two steps so that the watchdog can be safely reset in between?

      That split probably has knock-on effects so a bit of a pain when you want to do it in a hurry.

    2. Anonymous Coward
      Anonymous Coward

      @Mike 137 - Re: A quite common problem

      Did they try asking that grey-bearded guy in the back how they were doing it for Voyager probes ? It seems their watchdog is still working fine after more than 40 years of service.

  5. Pete 2 Silver badge

    Have you tried switching it off and on again?

    > The rover’s flight control software has been stable and healthy for almost two years, and why mess with a good thing there?

    A good question.

    If the only (small) issue is that 15% of the time the helicopter won't initialise into flight mode, it doesn't sound too risky to simply wait a while and try it again.

    Isn't that the answer that software support gives, 90% of the time?

    1. John Robson Silver badge

      Re: Have you tried switching it off and on again?

      I only hope that the transition to "landing" can't get similarly borked.

      1. Steve K

        Re: Have you tried switching it off and on again?

        Hopefully also they don't hit the end of the 30-day flight window before they fix/test it, otherwise Perseverance may be off on the rest of its mission.

        I know it's not part of the plan, but it would be cool if they could put Ingenuity into a "follow me" mode where it keeps (say) 20m away in a series of hops from Perseverance until it finally breaks.

        I suppose that the risk is that something breaks and it then hits the rover and jams/breaks something.

        What a cool job to have!!

        1. zuckzuckgo Silver badge

          Re: Have you tried switching it off and on again?

          > it would be cool if they could put Ingenuity into a "follow me" mode...

          Range anxiety is the problem. Short flight times and very long recharge times.

          Maybe NASA could tap into the 15 billion dollars Biden is proposing for 500,000 new charging stations and install some on Mars.

        2. Anonymous Coward
          Anonymous Coward

          Re: Have you tried switching it off and on again?

          Looks like they are going to:

          https://www.nasa.gov/press-release/nasa-s-ingenuity-mars-helicopter-to-begin-new-demonstration-phase

      2. Natalie Gritpants Jr

        Re: Have you tried switching it off and on again?

        Landing always succeeds, the following operations may fail.

        1. Anonymous Coward
          Anonymous Coward

          @Natalie Gritpants Jr - Re: Have you tried switching it off and on again?

          If you can walk away from the plane, it was a good landing. If you can still use that plane after it landed, it was an excellent landing.

    2. Andy The Hat Silver badge

      Re: Have you tried switching it off and on again?

      Depends on how much power is used before initialisation fails - it may have to go into a recharge state for a while before retry.

      I find it surprising that a 15% chance of invalid faults being flagged under no-fault conditions wasn't discovered and fixed during pre-flight testing. After all the basic requirement is that it flies and this fault report causes it not to fly. When something is sitting on Mars you don't really want to be sorting out whether a fault flag is real or not before you try again as the result could terminate the mission ...

      1. Anonymous Coward
        Boffin

        Re: Have you tried switching it off and on again?

        It turns out that testing such a thing is hard. You don't actually completely know the environmental conditions where it's going to end up, and to the extent you do know them they're very hard to reproduce on Earth: you can easily get the temperature and pressure correct (which they did) in a hypobaric chamber, but getting the gravity completely correct is very hard indeed. You can get it half-correct by attaching, essentially, a string to the top of the vehicle and offloading a lot of its weight, which they did, but it's still sitting in a 1g gravitational field so things like how much the blades droop won't be correct. The only way to get that correct would be to lift your hypobaric chamber high above the ground and then lower it down with an acceleration of roughly 6m/s^2 for long enough to do the test inside it. That's ... hard.

        1. aks

          Re: Have you tried switching it off and on again?

          AFAIR Ingenuity was always intended as an advanced prototype. It's there on this mission to learn what works and what doesn't. If it achieves any serious scientific results, that's a very welcome bonus.

          Fingers firmly crossed that it does achieve those results, but it's already a tremendous success in this initial phase.

    3. Empty1

      Re: Have you tried switching it off and on again?

      only to discover some pilock left a floppy in the drive.

  6. Elledan

    Remote server, extreme edition

    Managing remote computers is a pretty common thing in today's IT world, whether it's some industrial kit or a stack of servers running in a data center half-way across the country or globe.

    Those happily have the option of a call-of-shame to a colleague or support person whenever the gear in question does not come back online in a healthy state after a reboot. Here it sounds like the nearby rover is the closest thing that JPL's engineers have to a 'call support to push the power button on the server' fallback :)

    1. Lon24

      Re: Remote server, extreme edition

      Yea, but I still have that moment of panic.

      Even though I've done it hundreds of times, I know it will work and how long it takes to boot. But, I guess, I'm not the only one who suffers 'time dilation' during the wait so it never comes up quite when expected. The alternative of looking at a clock seems to make it even worse.

      Boy, I wonder how many of Ingenuity's engineers have a pacemaker ...

      1. Anonymous Coward
        Anonymous Coward

        Re: Remote server, extreme edition

        Oh yes.

        Decades in the business and I still suffer from that "oh, shit, it's not working" moment (usually about 10 seconds from when everything comes back to life as expected).

      2. katrinab Silver badge
        Unhappy

        Re: Remote server, extreme edition

        Especially when ping times to Mars are measured in minutes.

    2. Anonymous Coward
      Anonymous Coward

      Re: Remote server, extreme edition

      Even on Mars, their response, resolution times and success rates are better than BT.

  7. Boris the Cockroach Silver badge
    Coat

    We just need

    amanfrommars to drive over and fix it for us..... hes already there after all

    Coat... because its chilly outside.. especially on mars

    1. Astrohead

      Re: We just need

      A Man *FROM* Mars.

    2. KarMann Silver badge
      Alien

      Re: We just need

      Well, I hope you're happy now.

  8. Anonymous Coward
    Anonymous Coward

    Nothing there...

    Not even a picture of nothing.

  9. amanfromMars 1 Silver badge

    The Trials and Tribulations as Experienced in Quantum Leaping are of course, Other Worldly ....

    .... but well within Human Reach*

    If the only (small) issue is that 15% of the time the helicopter won't initialise into flight mode, it doesn't sound too risky to simply wait a while and try it again.

    Isn't that the answer that software support gives, 90% of the time? .... Pete 2

    Works well every time in the past, Pete 2, and there no reason to expect IT to change in the future in order to forestall and prevent OS failure and programs crashing with massive cracks exposing great hacks.

    All could very easily be totally lost to NASA Mars Command and Control if drifted and gifted to Commanding Control from elsewhere foreign and alien for Future Flight Operations with the Benefit of Hindsights Providing Advanced Future Planning with Immersive Live Presentations ...... with Media BroadBandCasting Exemplary Demonstrations with ACTive Virtually Augmented Realisations.

    Have they considered that Overtaking Plan a Possibility if a Current Actuality is difficult to accept and enjoy is definitely different and attractively engaging of both the diffident and indifferent and revolutionary evolutionary.

    * Easy Peasy with an OE Instruction Manual to Follow with Indicative Pictures Aiding Greater Understanding of Core Systems Working.

  10. karlkarl Silver badge

    It is really interesting (but a little annoying) to see that one device that NASA has sent into space written using existing "modern" software engineering practices and semi off-the-shelf has been the most problematic. And I don't think it is to do with the radiation shielding either.

    So from the looks of it, they are using C++ so I can't quite blame Python and its stupid dependency hoarding.

    Is it some janky reverse engineered driver that they have had to use for Linux because the hardware vendor doesn't give a crap? Perhaps I can blame the fact that SoC are hobbyist toys rather than anything reliable?

    Likely they will look into it and avoid this kind of development methodology entirely. It would be interesting if they could put out an experience paper so I can point future clients towards it as evidence that some ideas are simply no good for decent engineering.

    1. Anonymous Coward
      Anonymous Coward

      Re: Software engineering

      Although I've no idea what software engineering the copter runs, I agree that there is something 'modern' about it that doesn't sit quite right with me. From the article on the systems it runs there seems to be a GHz-class CPU running a normal OS, plus multiple MCUs acting - essentially - as sensor nodes, which seems to me (granted as an occasional embedded dev but not in anything so critical or quite as complex as this) rather overkill to run a helicopter. Obviously one would set the WDT to attempt to encompass the potential range of spinup times, but having such a complex system would be expected to increase the probability of something being missed in testing.

      Although it would be inappropriate to compare the HW in this machine to that required to land something on the Moon, one does still think that a lesser applications processor, running something more realtime/bare metal with direct control over its sensors, would have been more suited to the idiom of 'plan for the expected, but expect the unexpected' that space is.

    2. martinusher Silver badge

      >Perhaps I can blame the fact that SoC are hobbyist toys rather than anything reliable?

      "SoC", or as I'd call them, "microcontrollers", aren't hobbyist toys. They're actually serious bits of kit that used in practically anything that moves these days. For a motor controller, for example, the processor is almost an afterthought, its used to initialize and integrate the specialist peripherals that perform tasks like determine shaft speed and torque, run the control loops and generate the waveforms needed to actually drive the motor. The resulting device is a functional block, one of many in a big system, and it really helps the layering/structuring to be able to treat it as a black box with well known properties. You can integrate a lot of this into an applications processor but you increase task interdependencies which are difficult to predict and so manage. (Get it wrong and you get a watchdog timeout.......you can see where this going.)

      I should also give a nod to these 'hobbyists'. One of the side effects of making high performance microconttollers and related software environments like the Arduino available to anyone is that it widens the net of people able to work at this level. This benefits everyone -- the hobbyists, the professional (aka "hoibbyist who gets paid (a lot) to indulge their hobby"), manufacturers (you tendt o use parts families that you're familair with). Its win-win all the way. These 'amatuers' can be very inventive and are quite able to show a seaoned pro a thing or two. Worth having on a team.

      1. karlkarl Silver badge

        No, I don't believe they are using micro-controllers for the helicoptor. It is an off-the-shelf SoC. The Qualcomm Snapdragon 801 SoC.

        https://www.bgr.in/news/nasas-ingenuity-mars-helicopter-uses-same-qualcomm-snapdragon-801-chip-as-samsung-galaxy-s5-oneplus-one-955319/

        https://thenewstack.io/how-the-first-helicopter-on-mars-uses-off-the-shelf-hardware-and-linux/

        Whilst what the engineers have done is clearly very fantastic. It seems to be that the more disciplined traditional embedded style of development is still better compared to the new stuff. This alone was a really interesting experiment and got useful results.

    3. Anonymous Coward
      Pirate

      So wait. You mean 'these people flew a helicopter on Mars but they had some small problems which they overcame' and 'their prpgramme was so successful it got extended' are examples of a development methodology to avoid? O...K.

      1. karlkarl Silver badge

        Yes. The people did a really good job. The development style was not good.

        If this happened with a lesser team (i.e you and I), it would have resulted in much poorer results.

  11. Conundrum1885

    Its on MARS

    So you'd expect problems with the software. Better to play it safe as in space no one can hear you scream.

    1. redpawn

      Re: Its on MARS

      Mars has an atmosphere and Perseverance has microphones so...

  12. steviebuk Silver badge

    Am I the only one...

    ....that would also be interested in seeing them doing their testing on the copy on Earth? Surely they have a copy of the copter in another remote office that they also control remotely to simulate being on Mars. So I assume test out their update ideas on that first?

    1. Stoneshop
      Boffin

      Re: Am I the only one...

      Why remote? It's easy enough to insert a gateway into the link that simply holds any packet for the required time (dependent on the distance between Earth and Mars at that moment, so 4 to 24 minutes approximately) before passing it on.

      For a lab test involving a five-node cluster with one member that would be about 40km away from the nearest two after its real-world deployment they were simply installed in a single rack including all the comms gear, and a 40km spool of fiber on top of the cabinet. Saved a lot of walking when simulating error conditions.

    2. Anonymous Coward
      Boffin

      Re: Am I the only one...

      They certainly have copies on Earth, and they certainly test things on them. Whether they book time on the JPL space simulator and set everything up to look as much like Mars as possible in it depends on whether it's available and how long that takes, I would think (copies of Ingenuity certainly were tested in it). I would imagine the reason they can talk about the known 15% chance that they'll hit the watchdog timer is because they tested to work that out, for instance.

  13. Anonymous Coward
    Anonymous Coward

    Another boring mars mission

    I would not be shocked to find out the data transmission is a bit slow and erratic and eventually someone finds out its downloading updates from earth all the time.

    Also why is the footage so crap? A nice 360 of the landscape including the lander would seem possible. First its uploaded full fat from drone to the lander . Crunched down into some sort of 50mb file then sent to earth.

    I have the impression nobody works at nasa but rocket engineers and all they do is stand at blackboards doing sums.

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like