So...
How long until the entire 737 fleet (original/classic/next gen/max) is grounded because it was only luck and some chewing gum that kept them in the air after failing all of the modern testing due to a lack of redundancy?
Further details have emerged on the 737 Max flight control software bug discovered at the end of June, with reports suggesting that belated tests by a US regulator found the hitherto unknown bug. The Seattle Times, Boeing's hometown newspaper for many years, explains in detail how timid Federal Aviation Authority regulators …
From what I've read the actual underlying problem is that the new 737 Max isn't really the same as the previous 737's and in many ways ought to have been given a new name ... however, some of Boeing's large customers have a line of "we keep out costs down by only using 737's so we only have to maintain and train for one plane" so to keep the customer happy they called it a 737 and treated it as a new variant of the 737 line so doesn't need any special treatment or training etc.
Right, not only did they want to be able to claim that it didn't need any training--since Airbus was legitimately making such a claim for their upgraded plane, they didn't bother to tell anyone about the new feature that caused the crashes. One would expect that some executive deliberately decided that, and I wonder if any memos to that effect will eventually be leaked.
The Max would be better described as an all-new aircraft, disguised to look like a 737 in the hope that the FAA would fall for the ruse and allow Boeing and the airlines to train the flight crews with nothing more than a differences course provided in an iPad.
The Max is no more a 737 variant than a Mack Truck is a variant of a Ford transit.
Basically, when Boeing came to refresh the 737, they wanted to provide significantly improved fuel economy, to counter the much newer and better performing aircraft from Airbus. The problem? They could only do that by fitting new engines. Engines that are so much larger than the previous generation that the Max had to be fitted with extended landing gear to stop the bottom of the new engine cowlings scraping along the runway.
But the problems ran so much deeper than that. The larger diameter engines moved the centre-of-mass of the engines lower. Which moved the centre-of-mass of the aircraft. Which meant not only that the flight surfaces were the wrong size and in the wrong place, but that the entire flight geometry of the aircraft was screwed up.
In a conventional airliner, the flight characteristics [in flight trim] should be level flight. Let go of everything and the aircraft should jut fly along, straight and level, until the fuel runs out. With the Max, that was impossible, because the mass of the new engines screwed up handling. So Boeing invented "MCAS" to generate a completely fake flight characteristic for the aircraft. Just like the F-16 is fly-by-why and inherently unstable [cannot fly without the computer], so the Max used MCAS to keep the thing stable when it would otherwise not be.
That's why the partial failures of the MCAS system caused such catastrophic failures, even when the pilots followed the correct procedures.
There were other issues, for example the MCAS design changed significantly during development, but when the plane was tested for certification, the old design limits were tested, not the new, more aggressive ones. And of course the inspectors were Boeing employees, recruited and paid for by Boeing, not independent FAA inspectors. In other words, serious regulator capture issue, right there.
Ultimately, though, the crashes were caused by greed. Airlines didn't want to pay the additional cost of developing an all-new replacement for the 737. Threatened with loss of sales to rival Airbus, Boeing chose to cut all sorts of corners and basically ended up pushing an old design too far.
In the heart of Boeing are engineers and designers who are brilliant at what they do. These issues aren't "engineering-led", but "management-led". Congress should order the FAA to step aside and have the relevant House Committees conduct a thorough hearing. The House should demand access to internal emails, summon witnesses from engineering and management and fully understand how the series of questionable decisions that led to the Max being certified came to pass.
But there's an interesting fly in this ointment. Unlike aviation regulation agencies elsewhere in the world [say the CAA in the UK as an example] the FAA has two roles. One is to inspect aviation and ensure that it remains safe for the public. The other is to promote the use of aviation to the nation and the world. Those two missions are in uneasy conflict.
You only need to look at recent history to realise that the FAA were one of the last agencies to order the aircraft grounded to see the impact of poor governance.
Bottom line is that Boeing are too important for balance-of-trade to take a serious Congressional hit for this. Especially with the serial failures the current administration have with trade deals. Expect this one to be swept under the rug in due course.
I'm not sure that's entirely accurate.
I imagine the max has no parts in common with the original 1960s design, so in that sense is a new aircraft, the whole design concept dates back to that, which is why all the 737 variants are rather distinctive with their low to the ground attitude, rather different to everything else. Original 737s had the engines under the wing in a very 1950s config.
Whilst AIUI the Max does have some tricks to extend the landing gear a certain amount, fundamentally the problem is that there is no room for longer legs without a complete structural redesign, which would indeed classify it as the all new aircraft they were trying to avoid.
That in turn, given the larger engines, leads to a very unconventional engine placement further forward and with the fans somewhat overlapping the wing - so much *higher* than other craft, instead of engines completely below the wing as is conventional. You can clearly see in the headline photo the unconventional engine placement. Compare to say this: https://simpleflying.com/wp-content/uploads/2019/04/Slide1-2-700x394.jpeg
That was where the problem came in. AIUI and of course there are as many versions of this as self appointed pundits (like me!), they ran into a problem at high angles of attack near the stall of very unpleasant and banned handling characteristics where control forces reduced when they should have increased. Not that anyone should have flown the aircraft into that region, but never the less a fix was needed. The original concept of MCAS dealt with this, and probably was OK. Then my understanding is that our 'good' friend feature creep came calling. Now they had this piece of software that was sorting out this major handling issue they had to deal with. And neat, that looks ideal for sorting out some other issues. So it got tweaked and extended in authority further into the flight envelope, and the rest is well documented.
I heard a similar explanation from an aeronautical journalist on the CBC (Canada) only the way he put it
was with the new wings and the new engines, Boeing should have designed a new air frame but to do it
on the cheap they chose to bolt the new hardware on to the old 737 air frame thus screwing up the flight geometry and necessitating a software "fix". They also got to keep the old 737 designation.
Haven't seen anything alluding to this in the mainstream press.
Definite upvote for mentioning this and the more than likely outcome
"The Max would be better described as an all-new aircraft, disguised to look like a 737 in the hope that the FAA would fall for the ruse and allow Boeing and the airlines to train the flight crews with nothing more than a differences course provided in an iPad.
The Max is no more a 737 variant than a Mack Truck is a variant of a Ford transit.
Basically, when Boeing came to refresh the 737, they wanted to provide significantly improved fuel economy, to counter the much newer and better performing aircraft from Airbus. The problem? They could only do that by fitting new engines. Engines that are so much larger than the previous generation that the Max had to be fitted with extended landing gear to stop the bottom of the new engine cowlings scraping along the runway.
But the problems ran so much deeper than that. The larger diameter engines moved the centre-of-mass of the engines lower. Which moved the centre-of-mass of the aircraft. Which meant not only that the flight surfaces were the wrong size and in the wrong place, but that the entire flight geometry of the aircraft was screwed up.
In a conventional airliner, the flight characteristics [in flight trim] should be level flight. Let go of everything and the aircraft should jut fly along, straight and level, until the fuel runs out. With the Max, that was impossible, because the mass of the new engines screwed up handling. So Boeing invented "MCAS" to generate a completely fake flight characteristic for the aircraft. Just like the F-16 is fly-by-why and inherently unstable [cannot fly without the computer], so the Max used MCAS to keep the thing stable when it would otherwise not be.
That's why the partial failures of the MCAS system caused such catastrophic failures, even when the pilots followed the correct procedures.
There were other issues, for example the MCAS design changed significantly during development, but when the plane was tested for certification, the old design limits were tested, not the new, more aggressive ones. And of course the inspectors were Boeing employees, recruited and paid for by Boeing, not independent FAA inspectors. In other words, serious regulator capture issue, right there.
Ultimately, though, the crashes were caused by greed. Airlines didn't want to pay the additional cost of developing an all-new replacement for the 737. Threatened with loss of sales to rival Airbus, Boeing chose to cut all sorts of corners and basically ended up pushing an old design too far.
In the heart of Boeing are engineers and designers who are brilliant at what they do. These issues aren't "engineering-led", but "management-led". Congress should order the FAA to step aside and have the relevant House Committees conduct a thorough hearing. The House should demand access to internal emails, summon witnesses from engineering and management and fully understand how the series of questionable decisions that led to the Max being certified came to pass.
But there's an interesting fly in this ointment. Unlike aviation regulation agencies elsewhere in the world [say the CAA in the UK as an example] the FAA has two roles. One is to inspect aviation and ensure that it remains safe for the public. The other is to promote the use of aviation to the nation and the world. Those two missions are in uneasy conflict.
You only need to look at recent history to realise that the FAA were one of the last agencies to order the aircraft grounded to see the impact of poor governance.
Bottom line is that Boeing are too important for balance-of-trade to take a serious Congressional hit for this. Especially with the serial failures the current administration have with trade deals. Expect this one to be swept under the rug in due course."
Congrats for wrapping up the whole MAX fiasco. I'm reading pprune.org from time to time and the above is indeed what the consensus seems, amongst pro pilots.
Upvote incoming.
This isn't the first time the conflicted role of the FAA has caused a crash. The Paris DC-10 crash in 1974 was caused by a design fault that was known about before the crash. But the FAA allowed a soft-pedal approach to correcting the defect to save pressure on McDonnell Douglas.
I doubt that this one is going safely under the rug...
The negative publicity that the aircraft has attracted and warranted, allied to that old favourite "instead of fixing it, let's change the name!" (known as the Windscale Waltz for those old enough to remember pre-Sellafield nuclear power) should be enough to hole Boeing's sales plans fatally beneath the waterline...
Expect Ryanair to make a killing on acquiring some very reasonably in the near future!
How long until the entire 737 fleet (original/classic/next gen/max) is grounded because it was only luck and some chewing gum that kept them in the air after failing all of the modern testing due to a lack of redundancy?
It's not just redundancy. They now have two of each for the computers and sensors... there probably should be three to "vote" as if one burps now, which one of the two is the problem?
The AoA sensors are not critical in that the plane can fly just fine with them as long as the pilots (and the flight control computers) know to ignore that input modality altogether with a failure (and two sensors are fine for that). Now airspeed, attitude and altitude information is actually critical and those sensors/systems are typically (at least) triple redundant (and there are always even those steam gauges for them).
>pilots ... know to ignore that input modality
The computers make the plane feel and fly like a regular 737 so there is no need for any re-training
(Unless the computer fails, in which case it flys and feels like no other aircraft you have ever seen)
- just put that in the small print and we are ok.
"The AoA sensors are not critical in that the plane can fly just fine with them as long as the pilots (and the flight control computers) know to ignore that input modality altogether with a failure"
... Yet AoA seems to be a recurring theme in airliner pilot error/accidents - it was a key parameter in AF447 accident as well. From this armchair pilot's point of view it looks like you really need some working AoA instrumentation to fly the aircraft "instruments only" in bad conditions.
MeToo
I remember when fly-by-wire was first mooted, the principle was to use three systems, with a common high-level system design but independently developed, then have 2-out-of-3 voting on the outputs.
Whatever happened to that principle for safety-critical systems ?? Presumably too expensive ?
I work with rail safety SW. We only need 2 channels (SIL4) since emergency brake is always an option. It's possible to use 2 flight computers if they have internal safety redundancy such as lock step. If so 2 computers is just fine. Reading this article it seems they are not. OMG!
For relevant and interesting reading I would recommend looking up QF72. It was an Airbus back in 2008 that did a brutal nose dive due to a single faulty angle of attack sensor value. Sound familiar?
It might have been a bit flip due to cosmic (or other) radiation but the root cause was never determined. It's a long story.
It's not just redundancy. They now have two of each for the computers and sensors... there probably should be three to "vote" as if one burps now, which one of the two is the problem?
... and the sensors should ideally be of different types, to guard against any systematic error that might lead identical sensors to mis-read in the same way under unusual circumstances.
This isn't rocket science. Other jets -- even other Boeings -- have multiply-redundant systems of different types for this very reason.
Who knows. The problem that the US government has is that both the Boeing and FAA brands are, internationally speaking, toast. That is at least partly it's fault (i.e. multiple Administrations of all flavours going back decades have not ensured that the FAA is doing its job properly). The stables need a thorough cleaning, and part of that will be showing the world that this is being done thoroughly. Letting those responsible off the hook would look like a job not done properly. Given the alacrity with which federal prosecutors have leapt into action, one suspects that the USG has recognised the danger of doing nothing.
(to wife): Yes dear, the decorating will done by Christmas
(Wife to me): "In which case, I'd better get on with it because I don't trust you with a paint brush.."
(I do stuff that goes "beep"[1]. She does the rest).
[1] Also stuff that goes "meow" or "woof"[2]. And stuff that's fried or wok'ed or not standard British cuisine.
[2] Unless it involves getting up at unsociable hours of the morning (like 6:30am) in which case we (sort of) share the duty. One day YoungestDog *will* learn how to use the cat door so that he can go out to releive his bladder in the garden..
So it turns out the FAA is to blame as well as the manufacturer. Were all the other aviation authorities also sleeping at the wheel?
None of them picked these faults up until it was way too late. While it's easy to throw blame at the FAA and Boeing and they they deserve it! Why didn't anyone else pick it up?
On another note if this happened to SpaceX with maybe 4 people on board they would be lucky to ever get clearance again. 300 people have died due to these oversights. It is not acceptable!
Definitely not all agencies. I knew someone from the NTSB in the pre-computer days who claimed to have investigated more FAA-caused crashes than pilot-error crashes. Keep in mind that the FAA is still the equivalent of Ajit Pai's Verizon-owned FCC in that they "work" for Boeing and the airlines and not for the flying public.
"So it turns out the FAA is to blame as well as the manufacturer."
That's what happens when you get regulatory capture.
"Were all the other aviation authorities also sleeping at the wheel?"
The FAA was supposed to be independent. Making a big fuss and calling the FAA "captured" would have moved the trade wars the USA was already waging under the table into wide open mode.
This way, the FAA was publicly shown up and the other agencies simply have to fold their arms until it's sorted out - and ALL FAA decisions get to be second guessed for the forseeable future. It's a pity 300+ people had to die but their relatives can sue Boeing and the US government.
Don't worry, Boeing's software and hardware are (finally) going to be gone over with a fine tooth comb, just like the USA has been doing with everyone else's kit for years.
> and ALL FAA decisions get to be second guessed for the forseeable future
Going to make international flights tricky for a while.
Everyone else bans Boeing aircraft until their own agencies have checked them out - in response the US bans airbus from its airspace.
Looks like the only people left flying are Aeroflot
Tupolev, and they are the greatest planes ever Trump's BFF says so
As a matter of fact, the majority of the Aeroflot fleet consists of various Airbus models (about 135). Aeroflot also has 66 Boeings and 49 Sukhoi Superjets. I can't find a single Aeroflot Tupolev in the current fleet.
"Everyone else bans Boeing aircraft until their own agencies have checked them out - in response the US bans airbus from its airspace."
The reason that would be absolute proof of protectionism is that the FAA already certify Airbus airframes themselves, and don't rely on European regulators for that. If they banned Airbus because Boeing murdered a bunch of people and now nobody trusts them, it would be so blatant we might as well send the trade war nuclear.
So it turns out the FAA is to blame as well as the manufacturer.
Blame for the FAA's diminished role (and credibility) lies heavily with Congress and the Obama and Trump administrations. Boeing lobbied very heavily for the so-called "self-certification" regime.
In terms of getting regulatory practices changed, they got what they paid for. Tragically, their bribery exercise of monetary free speech killed almost 400 innocent people.
> Tragically, their bribery exercise of monetary free speech killed almost 400 innocent people.
Thanks to Trump, the same is happening in many other industries, with many more people dying, such as coal, oil, gas fracking, ... It makes one wonder who is 'listening' to all the monetary free speech.
I don't remember which politician got the deregulation, industry self-regulation, ball rolling but to point at the Dems, and it may have been them, is disingenuous because you know the GOP, also, fully supports whatever their financial contributors point them at.
This is another example of why there needs to be no private money, strict spending limits on campaigns, and no professional lobbying (highly regulated and recorded lobbying) in politics.
As a side note and whether there is an FAA or not, Boeing have been building aircraft for a very long time and there is no excuse for them to build poorly designed aircraft including any of the systems required for safe flight. Having said that maybe those in charge of the largest shareholdings and the top corporate officers should be jailed for murder. There is no accountability.
The post crash Boeing software I assume are release candidates for the FAA and so have undergone internal testing.
If so the quality is appalling, considering the situation.
I'm beginning to wonder if Boeing and the FAA have some underhand deal going on here, where Boeing lay honey traps and let the FAA know, to make the FAA look good for the media.
The FAA schills "earn" their stripes (and future private sector positions) and Boeing gets their craft certified by October.
The October date is the only thing Boeing cares about.
I mean how else can you explain this shitty software that pre-crash would have been the actual flight software used?? Is this really the standard of Boeing software in their other craft?
"The FAA schills "earn" their stripes (and future private sector positions) and Boeing gets their craft certified by October."
Certified by the FAA perhaps.
EASA will look at the FAA's inspections and run their own sets if they're not satisfied, as will CASA (who don't have a dog in the fight as Australia/NZ don't build aircraft)
EASA will look at the FAA's inspections and run their own sets if they're not satisfied, as will CASA (who don't have a dog in the fight as Australia/NZ don't build aircraft)
I'd say Transport Canada and Dirección General de Aeronáutica Civil will take a very close look at the very least, just like Agência Nacional de Aviação Civil. And given the current trade war Trump unleashed, CAAC will most likely declare the FAA certification insufficient, even if only for political reasons.
"EASA will look at the FAA's inspections and run their own sets if they're not satisfied, as will CASA (who don't have a dog in the fight as Australia/NZ don't build aircraft)"
I'm not sure about Australia nowdays, but there is certainly a aircraft building industry in New Zealand - in fact a P-750 XTOL was spotted at a NORK airshow, where it wasn't supposed to be. Both these and the Cressco topdressing aircraft from which it was derived are built in commercial quantities, and there was the NZ designed and built Bennet PL-11 Airtruck (Transavia also built a Transavia PL-12 Aitruk variant in Australia).
There is also a specialist industry building replica WW1 aircraft - plus two of the world's three flying Mosquitos were built here.
Admittedly only general aviation level, but still aircraft manufacture nonetheless.
This post has been deleted by its author
Or a real hardware engineer. That's the classes I learned about ECC codes in.
But my degree is in Computer Engineering so I had hardware and software classes. And it's been a long, long time so maybe it was in a software class to.
A classic case of the brainy and arrogant nerds designing something outside of real practical knowledge of the real world, i.e. ignoring the pilots who fly the bloody thing. In fact they'd be thinking that they were so smart they can replace the humans altogether....yeah, right, f'k off AI.
Simpler than that really. Boeing getting spanked by Airbus in the market, sales executives demand a better cheaper plane with limited development time, some technical work outsourced offshore, deadlines need to be met, shortcuts and confusion kick in, blam.
I've seen much the same at other companies, although frankly usually when designing a Billy big-mouthed bass.
"A classic case of the brainy and arrogant nerds designing something outside of real practical knowledge of the real world..."
I wonder if Boeing employed the same software engineers that programmed the chassis ECU on my BMW. When one wheel speed sensor failed it indicated all three GOOD sensors were faulty !!!
This is such a basic thing to miss in such a system :-(
It can actually be quite hard to guard against in code as the abstract machine used for its execution (at least for C and C++) does not model memory corruption, which means the compiler can optimize away any checks for it! Which is why standards such as Do178 require code coverage to be demonstrated in the code binary and not just at the source code level.
Much easier to catch if you've using hardware with some sort of memory ECC or parity hardware.
https://en.wikipedia.org/wiki/ECC_memory#Problem_background
"error rates increase rapidly with rising altitude; for example, compared to sea level, the rate of neutron flux is 3.5 times higher at 1.5 km and 300 times higher at 10–12 km (the cruising altitude of commercial airplanes).[4] As a result, systems operating at high altitudes require special provision for reliability."
Which is why avionics invariably use EDAC memory.
Coincidentally, there are some comments relevant to this discussion in a recent article and the commentards have touched on bit-flipping errors...
https://www.theregister.co.uk/2019/08/01/esa_encrypted_commsi/
In particular this comment by @Norman Nescio...
https://forums.theregister.co.uk/forum/all/2019/08/01/esa_encrypted_commsi/#c_3839602
Unfortunately even EDAC may not work well in some scenarios.
IIRC in the early days the problem was actually radiation emiited by the semiconductor cases - the ceramic itself could contain alpha and beta emitters. Thus, in memory systems where the devices were 1 bit wide (so, 22 discrete ICs for a 16 bit CPU), this would work well; the change of two packages flipping a bit in the same word simultaneously is almost zero.
But when it comes to high altitude neutrons or high energy particles with modern memory, where an event can cause a shower of particles and the memory is multiple bits wide, what happens then? Flipping a single bit, or even enough bits in one word to prevent error correction for that word, may not be enough.
EDAC will protect against certain event classes but not all, which is why processor redundancy is needed.
"which means the compiler can optimize away any checks for it!"
One would hope that any avionics systems would be using specialist compilers - preferably at least partly formally proven - that are specifically designed not to do this. I'd be rather nervous getting on a plane who's control system had been compiled with "gcc -O3" !
"Astonishingly, until the 737 Max crashes, the aircraft was flying with no redundancy at all for the flight control computers." This statement is completely incorrect, there are two backup systems to the FCC they are called Pilots. It is one of the reasons that the pilots are there, to take over WHEN automation fails.
Unlike some commercial aircraft the automatics can all be switched off in the 737 and it can them be flown just like any manual aircraft. Unfortunately, the airline beancounters do not like the expense of training completely manual flight and will often reprimand crews that try to practice manual flying . The result is that unlike earlier decades where pilots were proud of their manual flying skills, modern pilots tend to avoid actually flying the aircraft manually apart from the very constrained periods of take off and landing.
Automation is geared to failover to manual operation in both Airbus and Boeing aircraft (Air France 447 is a case where the failover to pilot control resulted in a crash) The problem is that the automatics handover to the flight crew when there is a problem, so it is not only manual flying that might be the first that the pilot has done at height in that aircrafft _ever_, it is also recovery from the problem that caused the automatics to fail.
Imagine a world with self driving cars where drivers had operated for years with only needing to assist in parking and leaving a parking slot. Now imagine if there is a failure the car reverts to a manual (stick shift) with no synchromesh so needing double declutching and heel and toe braking and gear change, no power steering or power brakes AND it has a problem and is in heavy traffic. This gives a very simplistic idea of what can happen in the cockpit when the automation drops out and hands control to the pilot.
Boeing designed the 737 decades ago for pilots who like to fly 'manually' but many of the new generation of pilots are not capable of flying manually to the same level as they are not allowed the training or experience.
IFF the aviaiton industry makes the automation even more efficient and reliable with no need to fail over then the experience levels of the flight crews will become even lower and they will be of even less use in an emergency. The best way forward would then be to never fail over to the flight crews indeed operate without them as a fully automated autonomous aircraft - there are many flying now for the military. Expect to see them at an airport near you sooner than you would think.
Would you fly in a fully automated aircraft? You are actually already doing so for most of the flight and the exceptions the flight crew are there to handle they often (demonstrably) cannot. So the decision is not in your hands.
> automatics can all be switched off in the 737 and it can them be flown just like any manual aircraft.
No. The 737 MAX is unstable in that it is divergent in pitch due to the engines, and in particular the cowlings, being too far forward aerodynamically, and the engines producing more power (required because of extra weight compared to old 737). Manual control in many situations can not be done fast enough to keep the aircraft from pitching up and stalling. Recovering from a stall leads to another pitch up. This is not a matter of training.
It's only divergent in pitch at AoA near the stall, hence MCAS only kicking in during manual flight, flaps up, at high angles of attack. Boeing allow the trim cut-out to disable MCAS, if the aircraft couldn't be flown without it that wouldn't be an option. The problem being if you're massively out of trim it's next to impossible to manually trim due to the forces through the system.
Technically it's not truly divergent, however the stick force required to raise the nose a given amount reduces at high AoA. So unless the pilots are aware of the issue the effect would be similar as the pitch rate would increase while they held the stick in a fixed position.
"...so that it can stick to its announced schedule"
Aye, that's right, lets not have any trivial things like reliability or safety get in the way. The top brass is losing out on their bonuses so we must stick to the schedule come hell or high water.
Speaking personally I know where they can stick their flying coffin.
What gasted my flabber is the full quote:
"According to a third person familiar with the details, Boeing expects to have this new software architecture ready for testing toward the end of September. Meanwhile, it will continue certification activities in parallel so that it can stick to its announced schedule and hope for clearance from the FAA and other regulators in October."
Surely certification should be dependent on the results from testing?
Even if something is found in testing which requires a rewrite or even a redesign, the corporate pressure to get the plane out means it's simply not going to happen.
Unless there's a whistleblower, that plane is going to fly in autumn, at least in the US. The post-Brexit CAA rubber-stamping approval because the FAA did is also a horrible possibility.
Worked on critical infrastructure and fully redundant systems for years. The issue is that memory failure, cpu failures, io failures, storage failures, etc. are common.
These idiots didn’t have all this worked out and just slapped two systems to go into master/slave mode.
There will be more issues going forward. Doesn’t make me feel confident about their ability to design redundancy into their computing systems.
$9 engineers can’t do it.
> what the fuck has boeing degenerated into?
McDonald Douglas.
I wouldn't be the least bit surprised all of the new crop of software team members are well under 35 years of age and have next to no Avionics development experience.
Personally I have a strong suspicion some of the sensors have unshielded leads and/or traces are too close so they are getting cross-talk and data corruption. Or just as bad, rampant failure to do bounds checking, failure to zero-set reused buffers, or logic written in such a way such that datastructures are unpacked utterly wrong (transpose altitude for AoA) when certain fields are set "wrong" - perhaps a really shitty CAN Bus protocol handler with race conditions?
In the old days, Airbus IIRC used 3 systems - 2 slaves and an arbitrator, with the software for the 3 systems being written by different teams. A few years ago I read somewhere that they wanted to move to all 3 to running the same software/single team design - obviously for reasons of cost, and also there was some mention of offshore/outsourced development. No idea if those changes were actually implemented
The Boeing 777 (not the new version which I have no knowledge of) uses a triplex flight control computer architecture and as you note, the processors in each lane (as they are known) are different architectures (known as processor dissimilarity).
The reason is quite simple really: It is possible that there is an unknown microcode flaw in a processor, but by using different architectures, the probability of a microcode flaw showing up at the same time in all 3 architectures is mathematically infeasible.
The safety requirement for such systems is that the probability of catastrophic failure is < 10^^-9 per flight hour.
Clearly that aircraft was under a completely different team from the 737 MAX.
A bit of Pepsi promotional history from 1996...
They didn't even have them in master/slave mode.
Only ONE computer was active. The active one toggled between flights. Lose that and you have NO flight computer operating.
Two computers in master/slave mode isn't sufficient anyway. You always need an odd number in order to avoid voting deadlocks.
This stuff was solved 40+ years ago in avionics. If Boeing's screwed it up that badly then there are serious safety ramifications across their entire range.
I'v had a little bit of experience with this type of work as well. You can't really do the traditional 'write the code and just run tests on it' methodology because all you're doing is proving that under the circumstances of the test it didn't fail. (Its a variation on the well known yet totally useless "it ran overnight / over the weekend" test methodology.) Reliability has to be baked in at the design stage, it can be a very tedious process and its definitely not bean counter friendly. Its also not a programming task -- the implementation will probably involve writing code but its not 'software' as many applications programmers understand it.
I should feel a bit sorry for Boeing because if they're finding bugs in a system that's supposed to be already deployed then the entire system design is suspect. The sympathy passes, though, when I realize that these planes were characterized by packing in the punters so tightly that they set new standards in passenger discomfort (including the notorious restroom which is so small that nobody can actually use it).
it can be a very tedious process
Especially as a lot of it will be testing for obscure edge cases that people unused to safety-critical systems just won't be used to thinking about.
"Oh - we didn't think about that" is not a good thing to hear from a designer of safety-critical systems.
They need the big red button in the center of the console that says "in case of any in-flight instability, press here to disconnect computer" - those jets are designed to be difficult for a trained pilot to do anything but keep it in the air under manual control - just need the basic fly by wire systems functional. Yes the software was bad, the lack of redundancy was bad, but the failure to train pilots to just shut the system in question down at the first sign of trouble is another big factor - lots of pissed off pilots when they discovered this issue was "considered not to be important enough to run through their simulator training sessions".
My thoughts exactly, although I am neither an engineer nor a nerd so I may be shot down in flames .... uh ... yeah ... but - it was George Bernard Shaw who said with one watch you always know the time, but with two (which are always slightly different) you never know. With a third, and fourth etc. how does that get any better?
Surely the third system should be the pilot. Could the two planes which crashed not have survived since by all accounts the pilots were trying to do the right thing instinctively? Ok I am really sticking my neck out but is it not basic flying skills at least to maintain or restore a level flight path, no 21st Century hocus pocus required?
And if not, why not?
No, because these new airframes aren't manually flyable (according to the standards)
The original 737 back in the 60s was manually flown, and the flight computers were indeed "pilot aids".
Over the years, Boeing have increased the scope of those "aids" and reduced pilot training requirements until the 737-8200/Max has become impossible to fly if they're not functioning perfectly. They aren't "aids" anymore, they're essential.
And yet Boeing still designed them with the complete lack of redundancy that's fairly reasonable of an optional aid, but utterly recklessly stupid for how essential they actually are.
One wonders which other Boeing aircraft are similarly designed, and merely have a wider error margin due to accident of operating conditions or pilot history?
I think you were downvoted because you completely missed Shaw's point. (I used to know a horologist who was also a Shaw nut).
If you have only one watch and everybody is using it, there are no arguments about the time. It may be going fast or slow, it doesn't matter from the point of view of synchronisation. But from the point of utility, with only one watch you have no way of knowing if it is going wrong.
If you have two watches, you can spot the deviation but do not know which one is wrong.
If you have more than two, it becomes progressively easier to identify which one(s) are going out of tune.
In the early years of marine chronometers, half pay officers and those on shore duties would pay captains to take their personal chronometers on voyages and monitor their performance. Some ships would have more than 20 on board. Thus over time information was built up on different chronometer types, and by averaging and ignoring outliers, longitude navigation became more reliable.
Imagine you start with two watches that are synchronized: if one drifts, you never know which is wrong. If you start with three, you can tell which is defective because the other two will match. If all three are different, you've probably got bigger problems.
... this issue was "considered not to be important enough to run through their simulator training sessions".
Or, perhaps more to the point, this was not an issue in earlier models of 737, and to draw particular attention to it would have highlighted the lie that the 737 MAX was just like the old 737s.
No incentive to get it right.
If left long enough, I've seen regulators in some industries just do the paperwork. Not bother checking physical realities, as "we are sure they always do as said" because, well, *previously* they always did. Then at some point the business starts cutting corners for deadlines or cashflow, and well, it gets under the regulators eyes until something goes wrong.
This is becoming ludicrous. As more time goes by and more flaws are found the whole situation just stinks to high heaven. Boeing seem to have a major problem with trying to design and implement on the cheap without any care for safety. History has taught us that planes need redundancy on critical systems such as this and it looks as if Boeing are just pissing all this past learning away by designing the 737 Max on the cheap. Meanwhile the inherent flaws in the FAA certification processes have allowed them to get away with this.
Long story short, Boeing only care about selling as many planes as possible for the cheapest possible price. The longer this goes on, the less I'll want to get on a modern Boeing plane in future (and as a frequent flyer, I never thought I'd say this).
Flight control systems operating without any redundancy at all? Just... wow.
Trump just dialed up his idiotic trade war to 11, and this is as close to tailor made for retaliation as anything I've seen.
What would be the odds of the Chinese (who led the grounding of the 737) declaring that they didn't trust the FAA's conclusion (which, let's face it, is something they can quite legitimately claim) and insisting on doing their own tests before re-certifying it? What if they get all the other regulators like Europe to do the same?
They can easily add a few months to the 737 grounding, and all of it totally legit.
"No Sir, we just want to be thorough because your capitalist FAA was clearly compromised, and - wait, it's actually suspicious you try to push this, we should look at all the other FAA certified planes too. We'll disallow each model one by one until we have re-tested them all. Retaliation, seriously? We merely do the job your subverted FAA clearly wasn't doing, before more planes drop out of the sky. With the stress you put on our economy we can't afford the risk"
If the testing is thorough then it could add years (not months) and require full redesign with proper redundancy (at least 3 active computers with voting) and full manual override capability. (The trim wheels should have a manual power assist so that the pilots do not need to be world class strongmen to operate them.)
Requiring pilots to have full type certification on the 737 Max 8 as it differs from previous 737s would also be nasty for Boeing.
Icon for what should happen to Boeing senior management ===============>
What would be the odds of the Chinese (who led the grounding of the 737) declaring that they didn't trust the FAA's conclusion (which, let's face it, is something they can quite legitimately claim) and insisting on doing their own tests before re-certifying it? What if they get all the other regulators like Europe to do the same?
It is a pretty safe bet, that CAAC and EASA will at the very least scrutinise the FAA certification very thoroughly and may well redo it themselves. And I expect more or less the same from other regulatory agencies from countries with a competing aviation industry like Brazil and Canada. Japan is having enough problems getting its own aviation industry from the ground and Russia and Ukrain aren't really competing, so those countries and agencies may very well rely on CAAC and EASA.
The sad truth is that people will soon forget all about this, just like they've forgotten all the rudder related issues of the 737.
And just like the big aircraft suppliers have forgotten the past and are firmly back into the 'do it cheap, do everything in house, everyone else are idiots' ways of thinking. Over the last 10 or 15 years everything has become extremely parochial again.
The 737max is one great steaming pile of... It is the only passenger jet aircraft currently out there, to the best of my knowledge, that has a dynamically unstable airframe. Basically, at extreme angles of attack the engine nacelles start to generate lift pushing the aircraft FURTHER INTO the stall. This should never ever happen for a non military aircraft. How on earth any regulator can allow this to pass muster is beyond me.
The only planes designed this way are flown by the military and, to quote from the seminal IEEE article by Gregory Travis on the subject, they "are also fitted with ejection seats." So combined with their no redundancy approach to engineering, past manufacturing issues (debris in planes esp near wires, non conforming parts on the NG, brittle slat rails on NG and MAX, etc) my motto firmly remains, "If it's Boeing, I ain't going!"
The only way to teach those lot a real lesson is to permanently ground the max and make them start from scratch (and maybe lock up the CEO).
The 737-8200 branding seen on the freshly painted Ryanair plane is almost certainly for the specific variant they have ordered rather than every MAX plane. Their variant has 200 seats and an extra emergency exit to "safely" accommodate a quick exit. Most MAX 8 planes will have ~11 less seats and 737-8189 doesn't sound quite as great....
Whats more likely is that the 737 MAX 8 will now just be called 737-8. Same goes for the 7/9/10 variants.
The 737-8200 (aka 737 MAX 200) is indeed a custom high-capacity version of the 737 MAX 8 for Ryanair.
The boring non-marketing designations used in official documentation for the 737 MAX series have always been 737-7, 737-8, 737-9 and 737-10 (not to be confused with the previous-generation 737-700, 737-800 and 737-900 of course...).
The issue is well known in avionics circles which is why I am astounded (yet again!) that it was clearly not considered.
The bit flip is actually caused by atmospheric free neutrons and the density increases significantly with altitude with a corresponding increase in the probability of bit flips (which can cause both 1 to 0 and 0 to 1 events). These are high energy objects that can cause Single Event Upsets
In addition, the lead in tin lead balls on BGA devices contain a small amount of Pb210 (part of the uranium decay chain) that ends up as Pb206 (stable) via Po210 (Polonium) which is an alpha particle emitter which can cause bit flips independent of altitude (Tin lead solder is still used in safety critical avionics).
SEUs affect all RAM (volatile) devices and mitigating is not particularly difficult in most circumstances (although it can be time consuming - clearly something else that was not done to meet the very racy schedule).
For memory devices (and controllers), ECC is required for such situations and processor cores must have internal data path protection (usually ECC for L2, parity for L1).
SRAM based FPGAs (configuration) were a no-no until the advent of newer tools and parts that permit internal triple redundancy and partial reconfiguration although the vast majority of such functions still use flash based devices (the ProASIC 3 line from Microsemi is very popular for these functions).
The probability of SEUs is based on a number of things (including the effective neutron cross section of the memory bit) and is never zero.
Bit flips have been recorded in data centres in Denver (5,000 feet altitude).
Quite apart from all the other clear design flaws in MCAS, this is criminally negligent in my view; the kit can control a flying control surface and SEUs were not considered and mitigated against? Utter madness.
There is even a facility that is often used by semiconductor manufacturers to profile the susceptibility of their devices.
With all the knowledge we have of this phenomenon, it is simply not reasonable to have not considered the effects.
"the kit can control a flying control surface and SEUs were not considered and mitigated against? Utter madness."
The MCAS kit as originally specified was allegedly intended to have a limited-authority (maybe 25% of jackscrew travel, or something like that??) one-shot effect on a flight control surface. Perhaps in those circumstances it *might* just about have been acceptable to not have much resilience designed in (but the system might also have not had the authority to achieve the intended effect either).
As time went by, the fundamental MCAS design got transmuted into "keep retrying till the aircraft/system is back in control. No limits." So 25% authority on a one off basis, to full authority, whatever it takes, and nobody considered it might call for improvements in sysem resilience and recovery mechanisms?
Presumably MCAS variations got a "delta" design review rather than a "start from a blank sheet of paper" review, just like the 737 in general hasn't had a proper design review for decades.
(This from one of the earlier well-informed blogs on the subject - aircurrent maybe?)
Is Ralph Nader still on the case?
The MCAS kit as originally specified was allegedly intended to have a limited-authority (maybe 25% of jackscrew travel, or something like that??) one-shot effect on a flight control surface. Perhaps in those circumstances it *might* just about have been acceptable to not have much resilience designed in (but the system might also have not had the authority to achieve the intended effect either).
I may be mistaken, but from what I picked up this is a wider issue than just MCAS - this concerns the overall flight control computers which makes the issue considerably more ugly as it concerns overall flight redundancy.
Thanks for the detailed explanations, by the way, very educational.
Further reading on the design changes from "limited authority, not safety critical" to "ooops, it's killed people, more than once, and there were engineers who knew it would", plus the related engineering and regulatory implications, in this handy article from June 22 (not especially technical):
https://www.seattletimes.com/seattle-news/times-watchdog/the-inside-story-of-mcas-how-boeings-737-max-system-gained-power-and-lost-safeguards/
Atmospheric free neutrons are cosmic rays. The cosmic rays that reach ground level are the result of high energy particles interacting with the atmosphere, a particle shower in fact; neutrons have a lot more penetration than charged particles and a much bigger cross section than X-rays, so they have more effect on computer systems.
That's why they are called "cosmic rays" and not "cosmic particles", just as X-rays refer to a large number of photons.
"the lead in tin lead balls on BGA devices contain a small amount of Pb210 (part of the uranium decay chain) that ends up as Pb206 (stable) via Po210 (Polonium) which is an alpha particle emitter"
Those are on the outside of the CPU/memory case. Alpha particles don't penetrate. Neutrons do.
What surprises me is that water-jacket shielding for high altitude avionics isn't standard.
What surprises me is that water-jacket shielding for high altitude avionics isn't standard.
I suggest you start calculating how much water would be required and what impact that would have on lift-off weight. More importantly, what are you going to do when that jacket springs a leak in the inside, where the electronics are?
Had you taken the trouble to read the user guide I linked, you would have seen a table for 'Alpha Particle'
I suggest doing a bit of research because the alpha particles are indeed part of the solder balls and can and do cause soft errors. The test is specified by Xilinx:
Spartan-6 and UltraScale+ FPGA alpha data is based on alpha foil testing and package alpha emissivity of 0.001 counts/cm2/hr. Virtex-6, 7 series, and UltraScale FPGA alpha data estimated using real-time underground cave testing.
From UG116
I agree entirely. Personally, I think the MCAS was classed as a DAL-D device (Device failing is minor inconvenience), so safety critical requirements were not needed.
Now, FAA have, I think, upgraded the DAL to DAL-A/B (Device failing is hazardous or catastrophic), causing to add many checks to the process. One of those checks is to simulate bit-flips/stuck-at faults etc. (That's why I reckon the MCAS is a SRAM FPGA because the most common bit-flip failures is in the State-Machines).
YMMV
(Have done > 7 years doing DO-254, going up to DAL-B in severity)
And has been falling for decades.
IIRC the first time this was raised was the packaging material of 64Kb parts.
Yes, that is a 64 kilo bit part.
I guess they didn't like to spring for the extra cost of the 2 bit error detecting, 1 bit error correcting memory cards.
Cheap skates.
Oddly enough, this is not always true.
Xilinx (who have aspirations for space applications and have provided new tools for the job) have, for many years, run the Rosetta Experiment (aimed at understanding in more detail SEU susceptibility).
With that knowledge, they have changed the way they actually implement memory cells in their FPGAs and the error rate (for both configuration RAM and block RAM) have been decreasing for a given neutron beam strength.
UG116 which is updated every 6 months shows that the neutron induced error rate has decreased from the 28nm node through the 16nm node, and the errors are a lot lower than earlier generations.
The soft error rate for block rams at the 16nm node are less than a tenth of the soft error rate for the 180nm node for instance.
Counter-intuitive, I know, but these are real measured data. (I spent a lot of time doing SEU analysis for various devices for avionics).
Not really.
Sensitivity rises if only the minimum line width shrinks.
In reality the results of a previous generation will guide improvements. The first of these was the realization that the physical layout of the transistors on the chip can make a difference. Then, that how static rams are implemented can also have an effect.
So depending on the scale of the effect on the sensitivity smaller devices can be better, as some have turned out to be, once the problem is viewed as important.
After creating ways to measure this kind of issue in an automotive processor, as well as novel design tools to allow us to design in such a way that resultant systems can be demonstrated to be at the required safety level for ISO26262, I must say that I didn't expect to read about what I thought was a specialist area in the Register.
Yes. Because after getting the AD, Ethiopian Air did nothing to ensure their pilots knew about the AD or had any training at all to handle what was correctable with the thumb trim switch. Instead, those pilots let an easily surmountable situation devolve into a deadly crash. But if they can wash their hands of it by saying that the AD did not require that training and therefore there's nothing they needed to do but bin it, then it looks good for Ethiopian pride.
Those pilots knew. They followed the AD.
The cycle rate and range of MCAS is so fast that it is literally impossible to reverse unless you sit there cycling its power on and off. Not feasible when you've also got to fly a plane that is already demanding absolutely maximum effort from both pilots to hold the sticks back.
In flight sim training since, it has been repeatedly demonstrated that most pilots would not be able to recover the aircraft despite knowing which failure was going to be simulated.
Now go sit under a piledriver and let it mash that knowledge through your brain.
There is a narrative that it is 100% Boeing's fault and the poor pilots who fought valiantly and heroically could have done nothing at all to save the plane. Reality is more complicated, but people don't seem to like that and the votes here reflect that.
The pilots of Ethiopian 302 did not follow the checklist fully. Read the preliminary report. Here is the checklist (from Air Canada, but Ethiopian should have had a comparable one): https://cimg0.ibsrv.net/gimg/pprune.org-vbulletin/1080x1177/thumbnail_03dc14b910e0b79951314db9c04969b942057c2b.jpg
They retracted the flaps. Any pilot who had actually read the information available about MCAS at the time should have known not to do this. MCAS is disarmed with flaps deployed.
They allowed their speed to get dangerously high. There are human factors issues for why they did, but it was still a mistake.
The trim system is designed in such a way that pilots can always override MCAS motion with the thumb switch. No, they do not need to wait until MCAS fully screws them over for 10 seconds; I say again they can interrupt it. And MCAS will not interrupt them as they wind the trim back - although it will take them longer as (perversely) they can only command a slower rate of change than the computer. Unless there is a yet-unrevealed further trim control problem (which would be the biggest bombshell yet) the pilots still had enough control to counter MCAS.
Of course Boeing is at fault. But that doesn't leave the pilots saintly and blameless. What they were faced with, is a high-stakes game of Bop It. They should not have been made to play it, and Boeing must accept the blame for that, but it was not an unwinnable game.
The designers and engineers in comfortable offices had years to get it right. The pilots had a few minutes amid a cacophony of alarms and the threat of death. The pilots did not deserve to be in that situation. But they still had a chance and did not grasp it. That is the unfortunate story here. It should not be taboo to say this.
It will never fly again. The inspectors will keep finding new flaws. Who is going to sign off on it and risk having it crash for a 3rd time? Not to mention who in his right mind will ever want to fly in it?
They should give up and convert the planes into freighters, as was done with the Super Connie and DC10.
Never mind all the technical stuff. I've flown in all sorts of semi airworthy heaps but their lack of sophistication and pilots that could fly instead of being flight computer operators meant the risk was negligible. You'll never get me in a 737 Max and I expect that goes for the rest of the world.
"Astonishingly, until the 737 Max crashes, the aircraft was flying with no redundancy at all for the flight control computers. [...] The Seattle Times reported that this has now been redesigned so the two onboard computers run in an active:standby configuration. Previously the units merely swapped over in between flights."
This isn't even cost cutting any longer (they already had two computers on board) or incompetence, this is sheer and utter idiocy.
And if I am ever proposed a flight on a Boeing, I will make sure to ask if they actually know because they tested that the computer system is redundant and resilient to cosmic rays --because even toddlers these days know that cosmic rays (or any other ionizing radiation energetic enough) can mess with the memory and if that memory keeps you alive you need redundancy.
> big discount from Boeing, and placed a large order.
It was a 'Letter of Intent' which may, or may not, result in an actual order or sale. This can be used as leverage against Airbus or may simply be a marketing ploy by Boeing that they would not fulfil.
https://simpleflying.com/iag-airbus-captivity/
<quote> ...and despite Boeing's well-reported close relationship with the American regulator, the odds of all the world's civil aviation authorities taking it at its word is now lower than it was before the crashes. </quote>
Surely the rest of the world is more likely to doubt anything the FAA says *because* of their close relationship with Boing, not *despite* it?
Black helicopter, because I'm wondering how Boing will fsck up the Apache - which apparently they designed...
other than the military and chip manufacturers?
Carriers were also guilty of ignoring the effects of "Cosmic Rays", to my knowledge Cathay Pacific (Hong Kong) was the first carrier to subject crew route assignments to limitations imposed by "Cosmic Rays" in flights headed over the pole.
And America is the greatest? I think not.
It's truly amazing they haven't killed more people. Flight critical systems on a single computer, no redundancy and vulnerable to a single bit flip?!
That does it, you won't get me on a 737-MAX for at least a decade. What other vulnerabilities have these amateurs built into this plane?