Re: @A.C. – RobHib's thoughts.
Glad you enjoyed the CHC stuff. I await your further thoughts with interest.
Re: http://www.trilliumsoftware.com/success/_pdf/DataIQ_Fall12-Nigel-Article.pdf
"the A380 Airbus is viewed as the world’s smartest plane as it is said to run on over one billion lines of code. It contains a large number of sensors and microprocessors, each monitoring and reporting on the health of its systems. For instance, each of its four engines generates 20 terabytes of data on its performance in flight every hour."
It's far from a definitive source. Others welcome. I'm sensing a severe case of marketing/Chinese whispers here.
Based on what I know about engines, sensors, controls, and connectivity, it is almost infinitely improbable that 20TB an hour is recorded in this picture. It's barely plausible that 20TB an hour is *processed* (without being recorded). Even allowing for massive compression (which would be theoretically easy, but practically might be a challenge, due to the adverse environment alongside an aircraft engine, though a DSP might do the job).
NB what follows is not intended as personal criticism. You have been told something, and taken it at face value (as have others). But it's garbage (at worst) or widely misunderstood (being generous). That's OK, there's a lot of that about. I happen to see the implausibility in this case because it's an area I used to know a bit about.
We can take a look at this in a couple of ways: e.g. how much data can we get out of the engine control+monitoring system, as limited by its connectivity. And how much data is plausibly going into that system, based on sensor count and sample rate, stuff like that. Back of the envelope, scientific wild assed guess, not intended to be strictly accurate.
*) Data out of the engine control+monitoring
Assume there's no recording system which can survive engine mounting so we need to get the data out over a network. It'll be ARINC664 or AFDX or whatever. Think full duplex ethernet, with some design guidelines to make it better suited to time-critical and safety-critical applications.
Assume 100MBit/s (couldn't find a definitive statement). Make that 10Mbyte/s. Use ALL of it for this recording stuff. So, 10Mbyte/s, in the 3600 seconds in an hour.
Thus: 3.6 GB an hour that we can get from engine to elsewhere.
Use a handful of them if you wish, use Gbit if you wish, you can't plausibly make that to Terabytes an hour, whatever the "big data" people say. Apart from anything else, a computer on the engine connected to those sensors has to *send* the data. That capabiliy just isn't there right now. It might be there in three or five years time. Maybe.
*) Data into the engine
The main output of the engine control is a fuel flow control value. It is based largely on the pilot's lever angle. A raft of other important factors which figure into the calculations or must be measured for other reasons include various rotational speeds, various pressures and temperatures, and so on. Various sensors are duplicated or triplicated. Various outputs are monitored for safety/integrity reasons. There's lots more stuff too, to do with things like thrust reverser interlocks and other important stuff. This'll do for now.
Let's for the sake of argument assume 500 sensors on the engine. That may be a bit of an overestimate; the nice people in RR, GE, etc will confirm or otherwise, but it doesn't actually matter much here. Also, it doesn't vary hugely from engine to engine. Let's assume that on average we're sampling every 20ms (some will be more frequent, others less). Let's assume 4 bytes per sample (again, generous). 500 sensors at 4 bytes = 2kbytes/sample. At 50 samples/second that is 100kbytes/second. 3600 seconds per hour:
Thus: 360 MB per hour of raw unprocessed sensor data.
Maybe there's advanced engine health monitoring to consider as well. One day. Today, that's for test purposes only (you can do lots of engine health monitoring with just the existing signals in the control unit).
So two entirely independent approaches both suggest there's **very very roughly** a Gbyte an hour of raw data. Certainly nowhere near terabytes an hour per engine.
Corrections and clarifications welcome.
Some wag on slashdot already questioned the Terabytes/hour data rate and suggested that it'd easily get to terabytes an hour if you put it into XML (text dataname, text timestamp, text datavalue, text units, etc).
I still struggle to see how it gets to terabytes an hour.