Ace PFY skills
So, am I getting this right, turn it off and back on and it should be fine?
Software issues prevented the fourth scheduled flight of NASA's Mars Ingenuity helicopter. The delay was not unexpected nor prevents future flights. In fact, NASA reassured a waiting world that the helicopter is "safe and in good health." The reason for the failure is also known: a software bug that results in a watchdog …
Somehwere, buried deep within the rumbling bowels of Redmond, an MS black-ops world-domination skunkworks team, featuring the world's greatest moustache-twirling super-villain developers, is tirelessly beavering away on plans to update Windows on any machine not running windows.
To ensure a system stays alive, a watchdog timer periodically resets it unless interrupted by valid code. It should be timed to prevent it interfering with functionality, but as systems get more complicated this becomes quite a hard nut to crack. In such a case as this, the watchdog appears to time out (and reset the system) before a valid process has completed. Reducing the time the valid process takes is indeed the quick fix, but it appears in this case that they're having to wait a variable time for some real time event. A difficult enough problem on Terra Firma, but at that distance I take my hat off to NASA.
> Reducing the time the valid process takes is indeed the quick fix, but it appears in this case that they're having to wait a variable time for some real time event
Just speculation on my part but maybe they tried to put too much into one step - for good reasons - and are now splitting it into two steps so that the watchdog can be safely reset in between?
That split probably has knock-on effects so a bit of a pain when you want to do it in a hurry.
> The rover’s flight control software has been stable and healthy for almost two years, and why mess with a good thing there?
A good question.
If the only (small) issue is that 15% of the time the helicopter won't initialise into flight mode, it doesn't sound too risky to simply wait a while and try it again.
Isn't that the answer that software support gives, 90% of the time?
Hopefully also they don't hit the end of the 30-day flight window before they fix/test it, otherwise Perseverance may be off on the rest of its mission.
I know it's not part of the plan, but it would be cool if they could put Ingenuity into a "follow me" mode where it keeps (say) 20m away in a series of hops from Perseverance until it finally breaks.
I suppose that the risk is that something breaks and it then hits the rover and jams/breaks something.
What a cool job to have!!
> it would be cool if they could put Ingenuity into a "follow me" mode...
Range anxiety is the problem. Short flight times and very long recharge times.
Maybe NASA could tap into the 15 billion dollars Biden is proposing for 500,000 new charging stations and install some on Mars.
Depends on how much power is used before initialisation fails - it may have to go into a recharge state for a while before retry.
I find it surprising that a 15% chance of invalid faults being flagged under no-fault conditions wasn't discovered and fixed during pre-flight testing. After all the basic requirement is that it flies and this fault report causes it not to fly. When something is sitting on Mars you don't really want to be sorting out whether a fault flag is real or not before you try again as the result could terminate the mission ...
It turns out that testing such a thing is hard. You don't actually completely know the environmental conditions where it's going to end up, and to the extent you do know them they're very hard to reproduce on Earth: you can easily get the temperature and pressure correct (which they did) in a hypobaric chamber, but getting the gravity completely correct is very hard indeed. You can get it half-correct by attaching, essentially, a string to the top of the vehicle and offloading a lot of its weight, which they did, but it's still sitting in a 1g gravitational field so things like how much the blades droop won't be correct. The only way to get that correct would be to lift your hypobaric chamber high above the ground and then lower it down with an acceleration of roughly 6m/s^2 for long enough to do the test inside it. That's ... hard.
AFAIR Ingenuity was always intended as an advanced prototype. It's there on this mission to learn what works and what doesn't. If it achieves any serious scientific results, that's a very welcome bonus.
Fingers firmly crossed that it does achieve those results, but it's already a tremendous success in this initial phase.
Managing remote computers is a pretty common thing in today's IT world, whether it's some industrial kit or a stack of servers running in a data center half-way across the country or globe.
Those happily have the option of a call-of-shame to a colleague or support person whenever the gear in question does not come back online in a healthy state after a reboot. Here it sounds like the nearby rover is the closest thing that JPL's engineers have to a 'call support to push the power button on the server' fallback :)
Yea, but I still have that moment of panic.
Even though I've done it hundreds of times, I know it will work and how long it takes to boot. But, I guess, I'm not the only one who suffers 'time dilation' during the wait so it never comes up quite when expected. The alternative of looking at a clock seems to make it even worse.
Boy, I wonder how many of Ingenuity's engineers have a pacemaker ...
.... but well within Human Reach*
If the only (small) issue is that 15% of the time the helicopter won't initialise into flight mode, it doesn't sound too risky to simply wait a while and try it again.Isn't that the answer that software support gives, 90% of the time? .... Pete 2
Works well every time in the past, Pete 2, and there no reason to expect IT to change in the future in order to forestall and prevent OS failure and programs crashing with massive cracks exposing great hacks.
All could very easily be totally lost to NASA Mars Command and Control if drifted and gifted to Commanding Control from elsewhere foreign and alien for Future Flight Operations with the Benefit of Hindsights Providing Advanced Future Planning with Immersive Live Presentations ...... with Media BroadBandCasting Exemplary Demonstrations with ACTive Virtually Augmented Realisations.
Have they considered that Overtaking Plan a Possibility if a Current Actuality is difficult to accept and enjoy is definitely different and attractively engaging of both the diffident and indifferent and revolutionary evolutionary.
* Easy Peasy with an OE Instruction Manual to Follow with Indicative Pictures Aiding Greater Understanding of Core Systems Working.
It is really interesting (but a little annoying) to see that one device that NASA has sent into space written using existing "modern" software engineering practices and semi off-the-shelf has been the most problematic. And I don't think it is to do with the radiation shielding either.
So from the looks of it, they are using C++ so I can't quite blame Python and its stupid dependency hoarding.
Is it some janky reverse engineered driver that they have had to use for Linux because the hardware vendor doesn't give a crap? Perhaps I can blame the fact that SoC are hobbyist toys rather than anything reliable?
Likely they will look into it and avoid this kind of development methodology entirely. It would be interesting if they could put out an experience paper so I can point future clients towards it as evidence that some ideas are simply no good for decent engineering.
Although I've no idea what software engineering the copter runs, I agree that there is something 'modern' about it that doesn't sit quite right with me. From the article on the systems it runs there seems to be a GHz-class CPU running a normal OS, plus multiple MCUs acting - essentially - as sensor nodes, which seems to me (granted as an occasional embedded dev but not in anything so critical or quite as complex as this) rather overkill to run a helicopter. Obviously one would set the WDT to attempt to encompass the potential range of spinup times, but having such a complex system would be expected to increase the probability of something being missed in testing.
Although it would be inappropriate to compare the HW in this machine to that required to land something on the Moon, one does still think that a lesser applications processor, running something more realtime/bare metal with direct control over its sensors, would have been more suited to the idiom of 'plan for the expected, but expect the unexpected' that space is.
>Perhaps I can blame the fact that SoC are hobbyist toys rather than anything reliable?
"SoC", or as I'd call them, "microcontrollers", aren't hobbyist toys. They're actually serious bits of kit that used in practically anything that moves these days. For a motor controller, for example, the processor is almost an afterthought, its used to initialize and integrate the specialist peripherals that perform tasks like determine shaft speed and torque, run the control loops and generate the waveforms needed to actually drive the motor. The resulting device is a functional block, one of many in a big system, and it really helps the layering/structuring to be able to treat it as a black box with well known properties. You can integrate a lot of this into an applications processor but you increase task interdependencies which are difficult to predict and so manage. (Get it wrong and you get a watchdog timeout.......you can see where this going.)
I should also give a nod to these 'hobbyists'. One of the side effects of making high performance microconttollers and related software environments like the Arduino available to anyone is that it widens the net of people able to work at this level. This benefits everyone -- the hobbyists, the professional (aka "hoibbyist who gets paid (a lot) to indulge their hobby"), manufacturers (you tendt o use parts families that you're familair with). Its win-win all the way. These 'amatuers' can be very inventive and are quite able to show a seaoned pro a thing or two. Worth having on a team.
No, I don't believe they are using micro-controllers for the helicoptor. It is an off-the-shelf SoC. The Qualcomm Snapdragon 801 SoC.
https://www.bgr.in/news/nasas-ingenuity-mars-helicopter-uses-same-qualcomm-snapdragon-801-chip-as-samsung-galaxy-s5-oneplus-one-955319/
https://thenewstack.io/how-the-first-helicopter-on-mars-uses-off-the-shelf-hardware-and-linux/
Whilst what the engineers have done is clearly very fantastic. It seems to be that the more disciplined traditional embedded style of development is still better compared to the new stuff. This alone was a really interesting experiment and got useful results.
Why remote? It's easy enough to insert a gateway into the link that simply holds any packet for the required time (dependent on the distance between Earth and Mars at that moment, so 4 to 24 minutes approximately) before passing it on.
For a lab test involving a five-node cluster with one member that would be about 40km away from the nearest two after its real-world deployment they were simply installed in a single rack including all the comms gear, and a 40km spool of fiber on top of the cabinet. Saved a lot of walking when simulating error conditions.
They certainly have copies on Earth, and they certainly test things on them. Whether they book time on the JPL space simulator and set everything up to look as much like Mars as possible in it depends on whether it's available and how long that takes, I would think (copies of Ingenuity certainly were tested in it). I would imagine the reason they can talk about the known 15% chance that they'll hit the watchdog timer is because they tested to work that out, for instance.
I would not be shocked to find out the data transmission is a bit slow and erratic and eventually someone finds out its downloading updates from earth all the time.
Also why is the footage so crap? A nice 360 of the landscape including the lander would seem possible. First its uploaded full fat from drone to the lander . Crunched down into some sort of 50mb file then sent to earth.
I have the impression nobody works at nasa but rocket engineers and all they do is stand at blackboards doing sums.