Dominic Connor, Quant Headhunter

Depends on the type of data

As I understand it, CERN data is lots of numbers with relatively few fields per record.

Also it will often be accessed sequentially, since the vast majority of numerical algorithms are implemented that way, the loading is sequential and even when parallel algos are used you usually still aren't doing random access.

A business DB will be queried as in "list the expiry dates of our cat food inventory, grouped by date and flavour"

This may be a multi table query, something experimental data does a lot less of.

With high volume numerical data you may not even bother with an index since it may be the same size as the data itself and either use the sequence number or a datestamp.

