Two cores? How do you know which one is wrong?
Shouldn't it be a minimum of three cores, voted 2-oo-3?
Japanese chip designer Arm really doesn't want to be overtaken in the world of autonomous cars by the likes of Intel, Nvidia, and other rivals. The Softbank-owned semiconductor architects have thus injected a safety feature normally reserved for real-time CPUs into their highest-end application processor core, in a bid to lure …
"You don't know which one's wrong, you just flush and reload both."
They probably figured a brief backtrack is simpler and easier to manage than say a fluke event simultaneously hitting two out of three cores in different ways resulting in a three-way lock: no possible consensus because NONE of the three agree.
There has to be at least three to allow recovery. It doesn't matter if that is three copies of the processor, or three copies of a control variable in RAM.
I've programmed systems where there are 4 copies, just to be extra sure, as it's highly unlikely that two copies will fail in exactly the same way, but if two were to fail in differing ways (with modern, ever smaller geometry memory devices or processors, a cosmic ray can run through many gates) then you still get a consensus.
These days Size, Weight and Power (SWaP) is king in most applications, so multicore is the way even mission critical systems are going.
8 cores on your device, then have 4 in lock step performing one algorythm, and the other 4 also in lock step performing a different implementation of the same high level requirements. (Duplicate, non-identical, versions to mitigate against coding errors.)
Run a high integrity RTOS e.g. DO178C DAL-A certified multicore VxWorks 653 and Robert is your uncle.
Which is approximately the arrangement described in the next to last paragraph.
Essentially, you could have four cores in a cluster running in split mode with a hypervisor, operating systems, and general applications and ASIL B-grade code in operation – then four cores in lockstep mode, running a realtime operating system and ASIL D-grade safety-critical vehicle control software on top.
I like it. Now, what about the cost?
"I've programmed systems where there are 4 copies, just to be extra sure, as it's highly unlikely that two copies will fail in exactly the same way, but if two were to fail in differing ways (with modern, ever smaller geometry memory devices or processors, a cosmic ray can run through many gates) then you still get a consensus."
So what do you do if two of them DO fail in exactly the same way...or THREE of them simultaneously fail in different ways? Either way, you end up with a tie (two-way in the former, four-way in the latter) and thus no consensus.
I wonder what is really more likely to be wrong: the CPU executing the software, or the actual software itself?
While having a trap for a hardware error in the CPU registers is a good thing it is only a start, you need to have ECC memory as well and even both are not a substitute for an overall hardware watchdog to deal with, say, an OS-level lock up.
Then we are still left with the rather uneasy aspect of how reliable and safe the masses of AI-based image recognition and driving control code can really be.
ECC Support: Yes
The Cortex-A76AE comes with memory protection as standard. It supports Single Error Correction, Double Error Detection (SECDED) ECC and Parity protection in the L1 cache, and SECDED ECC protection with the ability to correct in-line, on the L2 and L3 caches.
Lock-step execution made sense back in the old days, when instruction cycles were predictable and there were many chips making up a system.
It's far easier to implement on the same die, but it make far less technical sense -- it's not protecting against likely faults. The primary value seems to be in marketing.
As for three processors and voting... failures are going to be vanishingly rare. If they weren't, far more computations would be silently corrupted.