A new hairdo
After this wall of words, I am thoroughly confused and left breathless.
Am I gettingold?
The Apache Software Foundation has today announced Apache Arrow, its new project which aims to provide a cross-system data layer for columnar in-memory analytics. While Apache projects normally go through incubation periods, Arrow has been immediately announced as a Top-Level Project, and its code – seeded from the Apache …
Columnar doesn't imply no-SQL. Redshift, Vertica, Hive-on-Orc all do SQL on columnar stores.
What it lets you do is aggregations on individual columns with much reduced IO overhead, and IO is harder to scale than CPU.
So SELECT DISTINCT or SUM() is much faster but SELECT * slower. And BI problems tend to be all about aggregates (how much have we sold this month) rather than transactions.
Conditional indexes, pre-computed stats in R-Trees, vast RAM buffering. I'm still not feeling the earth move here. It can't be map/reduce parallelism either, because that's not news.
You've always been able to optimise this stuff if you're prepared to trade off UPDATE/INSERT speed and/or ACID integrity to do it. Where is the fairy dust here?