back to article Apache Foundation rushes out Arrow as 'Top-Level Project'

The Apache Software Foundation has today announced Apache Arrow, its new project which aims to provide a cross-system data layer for columnar in-memory analytics. While Apache projects normally go through incubation periods, Arrow has been immediately announced as a Top-Level Project, and its code – seeded from the Apache …

  1. Destroy All Monsters Silver badge
    Windows

    A new hairdo

    After this wall of words, I am thoroughly confused and left breathless.

    Am I gettingold?

    1. GrumpenKraut
      Pint

      Re: A new hairdo

      Aha, I am not the only one!

      1. Lysenko

        Re: A new hairdo

        As far as I can tell it is a gizmo to optimize things like:

        SELECT AVG(Age) FROM JBieberFans

        ...or am I missing the point about "columnar data processing"?

        1. Brewster's Angle Grinder Silver badge

          Re: A new hairdo

          ...or am I missing the point about "columnar data processing"?

          I read it as being for situations outside the database. Heresy, I know.

          1. Francis Boyle Silver badge

            I think that

            In the future I'll stick to the astrophysics articles.

        2. Dan 55 Silver badge

          Re: A new hairdo

          Apparently columnar data processing means holding a no-SQL database in shared memory for access by more than one process and organised in a way which allows quick processing by the CPU.

          I had to go to another site to find that out...

          1. Adam 52 Silver badge

            Re: A new hairdo

            Columnar doesn't imply no-SQL. Redshift, Vertica, Hive-on-Orc all do SQL on columnar stores.

            What it lets you do is aggregations on individual columns with much reduced IO overhead, and IO is harder to scale than CPU.

            So SELECT DISTINCT or SUM() is much faster but SELECT * slower. And BI problems tend to be all about aggregates (how much have we sold this month) rather than transactions.

            1. Lysenko

              Re: A new hairdo

              Conditional indexes, pre-computed stats in R-Trees, vast RAM buffering. I'm still not feeling the earth move here. It can't be map/reduce parallelism either, because that's not news.

              You've always been able to optimise this stuff if you're prepared to trade off UPDATE/INSERT speed and/or ACID integrity to do it. Where is the fairy dust here?

  2. Anonymous Coward
    Anonymous Coward

    Google hasn't heard of it yet...

    ... a search for Apache Arrow gives

    http://www.newarchery.com/products/arrowrest/apache-19/

    and similar for the top 10 hits.

  3. mathnode

    This is probably one of the most exciting projects to appear for a long time. Just as the out of core processing in libraries like pandas, blaze, and ibis are maturing, we now have now have a method to interchange the result sets generated with much less overhead.

    1. Brewster's Angle Grinder Silver badge

      You Keep Using That Word, I Do Not Think It Means What You Think It Means

      You and I have different definitions of the word "exciting".

  4. Paul J Turner

    – seeded from the Apache Drill project –

    I see what you did there...

  5. Scaffa

    I've always had this running joke about management consultants where I just make up tech-sounding words and call it a revolution.

    I didn't think it would ever make it into an article, but thanks El Reg!

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like