back to article Great Graph Database Debate: Abandoning the relational model is 'reinventing the wheel'

Welcome back to the latest Register Debate in which writers discuss technology topics, and you the reader choose the winning argument. The format is simple: we propose a motion, the arguments for the motion ran on Monday and today, and the arguments against on Tuesday and tomorrow. During the week you can cast your vote on …

  1. Filippo Silver badge

    I'm not sufficiently involved with DB architecture to provide a reasoned answer. However, I can't help noticing that this is a poll about a negative clause, and I wonder how many of those who have voted were confused as a result. It's not even the first time a Reg Debate does that, and I seem to recall it being discussed last time as well. Wouldn't these article series be improved, if the poll was not formulated as a negative?

    1. Tom 38
      Joke

      Wouldn't these article series be improved, if the poll was not formulated as a negative?

      I think you mean "Would these article series be improved if the poll was formulated as a positive?"

      1. katrinab Silver badge
        Megaphone

        No, because there is a middle ground between "positive" and "negative". "not negative" includes that middle ground.

        1. Lucy in the Sky (with Diamonds)

          Don't say ever no never again...

          My Ol' Pappi always said to me, "Don't no say no nothing about double negatives to nobody, never again, or I won't tell what I wouldn't do to you."

    2. Steve Button Silver badge

      I got to this part... "Vectorized Query Execution – Using a vectorized processing model [PDF] (as opposed to a tuple-at-a-time model)" and at that point my brain just started acting like Charlie Brown when the adults are talking in the room.

      So, I guess you could say I'm also not sufficiently involved in DB architecture.

      Although I know a bad one when I see it, and I've seen plenty of those.

      1. pavlo

        These is call a "query processing model". I cover this topic in my DB class. The lecture is on Youtube:

        https://youtu.be/ck3PkXTOueU?t=1824

        -- Andy

  2. Greybearded old scrote Silver badge
    WTF?

    Riiight

    It's all very well showing great performance with experimental extensions, but when will it be implemented in production DBMSs? Real Soon Now huh?

    1. EarthDog

      Re: Riiight

      considering that the paper was written in 2023 maybe a year. Seriously, this isn't like coming up with a new power up pack in a video game.

      1. ifadams

        Re: Riiight

        A year is optimistic for anything beyond early POC/Beta. While I'm not arguing that an RDBMS can't get such functionality, I think there is a serious issue in trivializing how hard it is to add functionality to these systems given the enormous complexity of maintaining consistency at scale, nor am I convinced the complexity of shoving it into an RDBMS is worth it.

        1. yetanotheraoc Silver badge

          Re: Riiight

          Still have Oracle 11 here (2007). Trying to get the remnants moved to Oracle 19 (2019), but it's not easy due to regulations.

    2. yetanotheraoc Silver badge

      Re: Riiight

      Good point.

      "Their results show that DuckDB, with the above extensions, outperforms the native graph DBMS by up to 10x. These are state-of-the-art results from January 2023 and not from five years ago."

      Top researchers using bleeding edge tech describes precisely zero places I have ever worked. One question I have is what the results might have been if the researchers had put similar effort into extending the native graph DB and compared that to off-the-shelf DuckDB. On the other hand, if all we care about is results today and throughout our business cycle, it may not matter which approach is theoretically superior. Just count the clever people working on each approach.

      1. Michael Wojcik Silver badge

        Re: Riiight

        All the technologies you use now were "bleeding edge" at some point.

        And the force of the argument here is precisely against needing "bleeding edge" technology. It's that an evolution in RDBMS capabilities, which is already underway thanks to SQL/PGQ standardization, removes the need to switch from an established technology to a significantly different one.

        Your question about "the researchers had put similar effort into extending the native graph DB" doesn't make sense. The capabilities they added to DuckDB are already in graph DBMSes that support GQL, because they're a subset of GQL. What the paper shows is that it's feasible to add the SQL/PGQ enhancements to an existing analytic RDBMS and when they did so the performance was superior to the existing GRDBMS.

        I'm not qualified to have a strong opinion on the debate here, but this particular line of argument is irrelevant to it. The question at hand is whether GDBMSes fundamentally handle a significant subset of use cases better than RDBMSes. "GDBMSes have X and RDBMSes don't have X yet, but it's been shown they can have it" supports the RDBMS side, not the GDBMS one.

        1. yetanotheraoc Silver badge

          Re: Riiight

          `Your question about "the researchers had put similar effort into extending the native graph DB" doesn't make sense.`

          You purposely showed it in the worst light.

          Suppose you have a competition between a race car and a trolley car. The trolley car is far more efficient travelling on the tracks. So the race car engineers get busy bolting some extra steel wheels on the race car, and now it beats the pants off the trolley car when travelling around the tracks.

          Me: What would have happened if they had spent the same effort improving the trolley car?

          You: The trolley car already has steel wheels.

          1. CRConrad

            Re: Riiight

            Yeah, because a trolley with added rubber tires will be much better at racing around the streets than a car... You (inadvertently, I bet) came on the exact right metaphor: Trying to make a trolley beat a race car would mean turning it into a race car.

  3. SUDO-SU

    With RDBMS I've seen a lot of garbage. Most people don't have the skills or a dedicated dba that will solve these problems for them or architect a good schema.

    So if a graph DB can provide benefits of a well architected relational DB, without needing to be an advanced user, then graph dbs may become something

    1. F. Frederick Skitty Silver badge

      In my experience, Graph DBs just allow incompetent programmers to make an even worse mess than they would with a Relational one. The complexties of graph query languages don't help. I was forced to use MarkLogic at one company, simply to justify a license that we'd acquired thanks to a former colleague's lack of impulse control. Technically impressive but an absolute nightmare to work with (MatkLogic that is, not my former colleague - he was just a nightmare).

      1. yetanotheraoc Silver badge

        The research paper we would all like to see

        Graph vs relational - Which one produces the worse shit-show in the hands of incompetent programmers?

        1. spireite Silver badge

          Re: The research paper we would all like to see

          Graph I wager....

          When I see NoSQL stuff implemented, it's usually because they can't be arsed to organise and 'schemaless' fits the bill.

          NoSql is a perfect 'product' for the lazy dev.

      2. Michael Wojcik Silver badge

        Honestly, I'm having trouble thinking of a technology that turns bad programmers into good ones. The only one I can think of that works in a significant number of cases is education.

        COBOL was the first widely-deployed attempt at that, as far as I can recall, and it didn't succeed. COBOL may have let non-programmers write parts of programs, but it didn't make them good programmers. Functional programming has its advantages, but it didn't turn bad programmers into good ones. Same for structured programming, object orientation, 4GLs, StackOverflow, GitHub Copilot, and so on.

        1. sgp

          It's a bit of a moot argument. If you stick to the basics, relational data modelling really isn't that hard. If you can't deliver, it's probably because what you are designing is way too complex and you need an expert anyway.

  4. Anonymous Coward
    Anonymous Coward

    I've spent a lot of time over the last couple of decades working with systems developed on top of classic relational databases...always, and I mean ALWAYS the developers end up creating something that requires endless money thrown at more resources to handle the garbage that they've built...it's never their logic, it's always that the database doesn't have enough RAM or CPU...yes, you can make incredibly efficient solutions with relational databases...but you can't do it on the budgets available because as people have said above, there usually isn't any money available for a DBA...especially early on in the development cycle.

    Stop me if you've never had to deal with that one ugly query that has tons of JOINS and UNIONS in it that brings one specific part of the solution to a crawl because someone, at some point decided..."well, we probably need to see everything on this page in one table".

    Furthermore, database hygiene with relational databases is a massive problem...again, stop me if you haven't heard "we'd better leave the old data there, just in case we need it!" despite your begging and pleading, highlighting that keeping that old useless data means you're dragging back 9 million rows in a single query and that it's costing them a fortune to keep the old data around.

    Yes, you can build efficient solutions, no your average non-technical user doesn't understand the implications of the things they ask for.

    Anything that prevents non-technical people from shredding the fucking gusset out of their own project is a win, it's not re-inventing the wheel. It's taking the corners off and making the wheel round instead of square.

    1. Michael Wojcik Silver badge

      Anything that prevents non-technical people from shredding the fucking gusset out of their own project is a win

      <Fred Brooks>There is no silver bullet.</Fred Brooks>

    2. Code For Broke

      Database hygiene is a fine topic. But you piss me off by implying that tons of crappy data is a problem native to RDBS. It's a problem native to human nature and is also well evidenced in essentially every computer storage solution ever conceived, including graph dbs.

      Seriously, not cool.

      1. Anonymous Coward
        Anonymous Coward

        I didn't imply that it is a problem native to RDBS...I implied that it is a problem made worse by RDBS.

        1. sgp

          Except that it isn't. In your example, why would you not be able to tweak your query to not include the 9 million rows that are not needed? What kind of cost do you think is associated with that?

    3. spireite Silver badge
      Pirate

      Entity Framework and ORMs in general have a lot of blame in this ugly query thing you mention

  5. Pascal Monett Silver badge
    Mushroom

    "there usually isn't any money available for a DBA"

    So if they can't do it right the first time around, then they can pay forever to keep it hobbling along until they understand that they need to do it right.

    Excuse me if I have little patience with Borkzilla-era managers who know nothing about how things work but are happy as soon as they see their pet project on their screen, and costs be damned.

    So, let's forget relational because it's too complicated. Well bugger, but IT is complicated.

    Personally, if you're a manager and you haven't yet understood that, you should be condemned to working with paper and punchcards until you get it.

    That not being possible, paying ever more for ever more resources is an acceptable substitute.

    1. EarthDog

      Re: "there usually isn't any money available for a DBA"

      DBs are complex because the things they model are complex. E.g. human relationships.

      1. _olli

        Re: "there usually isn't any money available for a DBA"

        More often things stored into DBs event aren't inherently really that complex. There's just this school of system architects who think that "complexity is good" and oh boy can they coat everything with additional layers of complexity. Presto, mess is ready.

  6. EarthDog

    I've said it once and I will say it again

    First off +1 to @sudo-su and @Pascal Monett

    A big chunk of the problem is most programmers are functionally illiterate when it comes database models. There is is trend to get rid of DBAs. In one case I commented on they got rid of the DBA and replaced them with a programmer on the team. Basically ending up with a programmer with possibly no background and/or interest in data who is actually and embedded DBA and bound to make all the same old mistakes and re-invent the wheel time after time.

  7. yetanotheraoc Silver badge

    Now we're getting somewhere

    Good stuff to chew on in this article. Let's see what Thursday brings.

  8. recharged95

    Keyword in this debate is "well-architected" and "most" typically cases.

    Now we've reached the scrum argument tactic, if the RDBMS can't do what you need: "it's not the RDBMS", it's, "you're doing it wrong".

    Sure you can incorporate a lot of the advantages of a graph db into a rdbms, but having it retain purity/usability of a normal row/col is hard. OracleText comes to mind back in the day. Most architects know a good schema is hard, complex. Graph dbs just make it easier. But yes if done right, schema/rdbms can be blazingly fast.

  9. Ian 3
    Childcatcher

    More to database architecture than performance? Won't someone think of the data modellers?

    The motion is about providing a 'significant advantage' but only raw read/write query performance advantages seem to be used in support. For use cases with a lot of many-to-many relationships and objects that can be linked to the same other objects for different reasons at the same, the significant advantage isn't performance but ability to model with reasonable clarity, and I'm sometimes very happy to sacrifice performance for that (especially when performance is still perfectly reasonable). The way real world things relate to other real world things is rarely a neat, hierarchical relational model with nice foreign keys, and sometimes modelling as a graph has a 'significant advantage'. (And before someone mentions it, if your RDBMS model is full of FK to FK mapping tables with added relationship meta-data, then you've just built a graph, and your SQL will be 'interesting' and hard to manage)

  10. Steve Channell
    Boffin

    Nobody loves SQL, but it works

    Lost to time is the fact you could use SQL to query IDMS (CODASYL network database) and IMS (mainframe hierarchical database), but the performance couldn't match relational (unless it was organized for your specific query).

    While SQL queries are wordy compared call-api, a parameterized prepared query "SELECT * FROM CUSTOMERS WHERE CUSTOMER_ID = @id;" can be cached and reused (especially for DB2, where it is compiled to an application plan, or Sybase/ MS SQL/Server where prepared statements are compiled to TSQL procedures)

    SQL relational has the significant advantage that tabular/column-store tables can be scanned in parallel by highly tuned (often automatically) DBMS and sharded over many nodes when partitioned. It can also take advantage of vector instructions and GPGPU - the performance advantages RDBMS enjoy will only get bigger over time.

    Where relational suffers is recursive queries, but even this can be addressed with in-memory databases (either on the server or local copy of immutable history) and LSM (log structured merge).

    The outlier is not fashionable graph-databases, but time-series and document databases - even these can be optimized by splicing databases that break them into an array of key/value pairs -nobody in their right mind would store an FpML contract as document, when the only thing that changes is the price and cashflow schedules.

    1. smarrteboje

      Re: Nobody loves SQL, but it works

      I think you're being quite flippant with your assertion that recursive queries only need be addressed with an in-memory database. Indexing and query complexity are both pretty big factors when you need to start doing this.

      I'm also totally unsold on this argument that relational databases are the only format that allow for optimized memory reads, or sharded reads across threads/servers, as if somehow a graph database storage engine is just throwing blobs of data across time and space in a random fashion and the only way to traverse them is in linear time. The reality is totally different.

      1. Steve Channell

        Re: Nobody loves SQL, but it works

        I wasn't suggesting that recursive queries only need to be addressed with in-memory databases, but that in-memory databases address many of the issues associated with them.

        There are two types of recursive queries (three if you include graph extensions) : [1] programmatic recursion (e.g. Entity-Framework, Hibernate, etc) where each node visit uses lazy-loading to fetch the next tree, [2] Common-Table-Expressions/connect-by where the RDBMS recursively call the execution engine to build result set. In-memory databases help these recursive calls because the data is already in memory.

        You're right to highlight that recursive call that do not have sympathetic indexes are a particular problem because it is not clear which session need to be killed to stop performance problems. Cyclic Graphs need max-recursion limits to prevent never-ending queries.

        Relational databases have no special advantage with parallel scans, but they have been doing them for decades and optimizer support is mature.

  11. Robert Grant

    It depends on what "the same use cases" is.

    Graph databases are an optimisation: they're amazing at queries like "count all my friends of friends of friends" but they are bad at queries like "give me the mean of all the ages of people in this database". Unsurprisingly, they're good at graph-like traversal but bad at relational operations.

    Relational databases are pretty good at everything, but not the best at many things. Choosing them is not the premature optimisation choice, which is good.

    1. Roland6 Silver badge

      Re. "the same use cases"

      I noted that the implications of this caveat are being (deliberately?) overlooked.

      To me the debate only really has any relevance if graph databases are better than relational for the business mainstay use cases/applications: CRM, HR, Finance, ERP etc.

      So far it seems graph DB’s are good for some specific use cases/problem domains which are outside of mainstream business applications, hence don’t really compete with relational.

      1. Robert Grant

        Re: Re. "the same use cases"

        Yep. Anything graph-related, e.g. a social network, or a large-scale representation of things that relate somehow (e.g. if you're Amazon and want to query across all your networked devices, or an IoT provider wanting to query all your devices in the field and how they relate in a mesh) it works.

        I will say at least Neo4J is also quite good for exploratory work, but that's more because it doesn't enforce schemas and has a nice visualisation technology, than anything to do with it using graphs.

    2. smarrteboje

      Yes, it's almost like the debate motion itself is poorly formulated. I don't think there is a single database storage model that could be said to be the best for most use cases. Like in every technology decision, there are better choices and worse choices, trade offs and time pressures. At best, I think if FOR wins this debate then we will have learned that the El Reg readership wouldn't choose a graph database to solve most problems, which is hardly learning much.

  12. Ashto5

    Small is beautiful

    Small is beautiful and quick.

    If the data is not needed TODAY then move it to an archive DB and let the good times roll.

  13. xyz Silver badge

    Oar in....

    There are uses for graph dbs but I've never had to go there, even though I've been prodded in that direction by management wanting to be seen to be cooool and hip.

    On the other hand let loose the code first nazis and their ORM bombs on a relational db and you get a world of hurt.

    I've been extracting ndf files (datawarehouse type science data files) recently and God does that lot feel so old fashioned.

    All in all, you choose your tool for what you need, not what is the new shiney. Data, in whatever format, should be readily accessable and to be honest there is always a way to get it and a bit more of work to get it fast. The key things for me are understand, test, learn how to be smarter and produce stuff that works.

    BTW, with Reg specials like this, I always smell a salesman in the background.

  14. James Anderson

    He lost me.

    It was over 30 years ago that I encountered my first relational DB and have been using the, on and off ever since so I would expect to be on the RDBMS side if the vote.

    However it’s also about 30 years since I first got my hands on a unix box ( Sun Microsystems Shoebox ) and I whole heartedly endorse the “do one thing and do it really well” philosophy of early unix systems.

    So rather than admit there are a couple of things that are better done using a different tool he proposes adding yet more functionality to an already bloated and complex system. Commercial database products are already full of features and extensions that very few customers ever use but add to the cost and complexity of the system.

    1. sgp

      Re: He lost me.

      The same is true of unix (like) systems. The idea is that each component of that system does one thing (well). If you think of the relational database as a system of components, it's entirely the same thing. In 20 years time, graph db systems will be full of technical debt and features that almost nobody uses (anymore). But why should you care about this? This is the same kind of reasoning that leads to so many new programming languages while the same kind of features can be added to existing ones.

  15. Binraider Silver badge

    Surely the key problem for a non-relational database logical model, above anything else, is that humans mostly think in either hierarchical or relational terms?

    Regardless of what clever tech is used hierarchical or relational terms have serious advantages at the user-end.

    And to be blunt, given corporate IT policies concerning rolling out tools, you're lucky half the time if you can get a copy of excel 64 bit & access; rather than a server to experiment on...

  16. Alan Bourke

    Is this like how NoSQL was the death of the RDBMS

    despite the fact you can't get a report out of it.

  17. Tom 7

    Most of my experience with DBs

    suggest this may be a new way to obfuscate the customers data in ways that sql didnt allow.

  18. Anonymous Coward
    Anonymous Coward

    You just can't get good staff these days

    It's axiomatic that the langage (and database) every developer uses is the best, for them.

    I care more about getting someone in to make changes 5 years later. All those Perl and PHP scripts from 2002 are of no interest to the Python and R kids graduating 2022.

    Relational may not be interesting but it generally works, the big 3 generally bolt on a few features at 2 year intervals that do 80% of what the new and cool DBs can, while the performance of the servers will increase to run them a bit faster and the efficiency of the DBMS will be a little better than before.

    If you want to use a GraphDB, go for it, but don't pretend it's for technical reasons any more than running Apple vs Android.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like