back to article The Great DB debate: SQL extensions won't solve the graph problem

Welcome back to the latest Register Debate in which writers discuss technology topics, and you the reader choose the winning argument. The format is simple: we propose a motion, the arguments for the motion ran on Monday and Wednesday, and the arguments against on Tuesday and today. Read over the arguments: you have until …

  1. Groo The Wanderer Silver badge

    I have to agree with the point on query languages; the edge cases that a graph database addresses cannot simply be done as "SQL Extensions" to the existing standards. They're not quite within the same domain of data processing.

  2. Anonymous Coward
    Anonymous Coward

    Not really an argument...

    "If SQL extensions were enough to solve the graph problem, I would trust this committee to halt its work."

    Really? Why would you do that.

    Just because SQL could provide extensions to solve the graph problem, that doesn't mean that there's no need for standards in the graph database, or that graph databases should simply disappear.

  3. xkcd9891

    The paper that Jim Webber has cited is based on Kuzu: https://kuzudb.com/

    From the website: "Kùzu is an in-process property graph database management system built for query speed and scalability."

    Please feel free to try it out, github repo: https://github.com/kuzudb/kuzu

  4. Chubango

    I see El Reg's polling is still atrocious. For goodness' sake learn to formulate questions in a manner that is unambiguous. Keep it simple, stupid! This is easily fixed by two easy alterations:

    1) Use only positive language that affirms a statement ie "Graph DBs provide a significant advantage over well-architected relational databases for most of the same use cases"

    2) Do not confuse the issue with "for" and "against" the poorly-stated motion and instead go with "agree" and "disagree"

  5. Doctor Syntax Silver badge

    AFAICS there are three aspects to this: data model, query language and processing engine.

    I'm not familiar with graph databases but it did seem to me that edges simply model those relationships in ER representations which would be modelled as link tables in a relational database. Alternatively are there situations where a graph data model does not have an equivalent relational model? If there are none, then it would seem that the data models are equivalent*.

    If they are equivalent then could a pre-processor convert a graphical query language statement to SQL and vice versa? If so then does that mean the languages are equivalent?

    If so that just leaves one question: Different workloads work best with different processing engines - who knew?

    * Link tables are only needed for many-to-many relationships. I can't see that graph databases would be unable to cope with relational models that have simple joins and that there would not be situations where a relational model would not have an equivalent graph model.

    1. yetanotheraoc Silver badge

      data access

      "AFAICS there are three aspects to this: data model, query language and processing engine."

      You left off the storage controller. When I read Wiederhold (1987) _Database Design, 2nd ed_, I was *so* happy in chapter two (2-3 Blocks and Buffers, 2-4 Storage Architecture), thinking "I'm glad I will never have to implement this". The one thing I did learn is you can't rely on the OS for read/write, you need direct control of the storage device.

      With a graph, (A) the node can hold a pointer to the data, which is not much different from the relational storage, or (B) the node can hold the actual data, which is completely different. I guess there's also (C) where the leaf holds the data, but that doesn't make any sense for the graph I am familiar with. Keep in mind we are talking about how the data is laid out on disk. B only works well for small sizes of data, and means any aggregate query will be painfully slow, but for a simple select once you satisfy the where clause the query is done because the data is right there.

      So unless I read this wrong, a graph db really depends on how much data there is per node as well as what operations you need to do on the result set. For "most" use cases relational is better. So what? Please tell us about the "other" use cases where your data really is a graph, but you might prefer to use a relational db ... or not.

      1. Doctor Syntax Silver badge

        Re: data access

        "you need direct control of the storage device"

        My database engine was Informix which, at least on Unix boxes, allowed access to the character devices in /dev which is about the closest you can get without having the database engine include a H/W driver. Linux doesn't offer that although it does provide something close in terms of tweaking the block device (it's so long since I looked at this that I can't remember the details).

        Although it's possible to use the OS's file system (reads and writes to files in /srv have the same semantics as devices in /dev) this was usually avoided for production systems except for earlier versions and what they called the Standard Engine.

  6. Anonymous Coward
    Anonymous Coward

    this is quite a good article - https://www.prisma.io/dataguide/intro/comparing-database-types

    and reading through it ... it feels neither of these will have my vote as always it much depends of the use case one wants to achieve

  7. John H Woods

    "Oh no, not again."

    Starting to look appallingly close to 52:48 ...

    1. nautica Silver badge
      Boffin

      Re: "Oh no, not again."

      "Now, if we would only limit our sample size to the first ten or fifteen respondents..."

  8. Alastair Green

    Graph data management is a data model, not an implementation strategy

    I used to work for Neo4j and was responsible for their efforts to bring vendors together around the twin-track approach of SQL/PGQ (a read-only extension to SQL) and GQL (a full CRUD declarative language), both using the property graph model. I am currently the Vice-chair of Linked Data Benchmark Council, and lead a working group of LDBC lmembers who are looking at proposals for property graph schemas as a future facet of the GQL graph query language standard. What follows is purely my personal opinion.

    I think this debate is based on a false counterposition, or equivalently, a layer violation.

    Graph databases have become a practical category because they do quite a few things better than SQL databases. If that wasn't true then this debate wouldn't be happening. It does not follow that they do everything better, nor does it follow that SQL databases can't catch up. And it certainly doesn't follow that you can't build a graph database on an SQL engine. What is true is that it took graph databases to get SQL to wake up to graph data. It's great to see someone like Andy so revved up about beating "native" graph databases from a performance perspective: the result of this focus and this kind of competition can only be ... better graph databases.

    A graph database is characterized by the data model and the surfaces (APIs) used to maintain and query the data. There are many implementation strategies, and I suspect that Jim is right -- for different kinds of "graph natural" workloads, different implementation strategies may yield better or worse results. The equation of graph database with graph analytics engine stacks things in one way. Equally, it's plausible that SQL TP engines are likely to contain many features that can be leveraged by graph OLTP databases.

    For a lot of use-cases, many people find the property graph data model is a more intuitive (more abstract) way of conceiving of information represented as data than the relational (or SQL tabular) model. It's closer to the conceptual data models that people start with when they want to model an information space (like ERM); it's closer to the pictures we draw of entities and relationships even if we have no formal training in data modelling. And, graph queries based on matching pattern graphs against data graphs to yield collections of subgraphs are a very compact and obvious way of expressing complex joins, including recursive joins. [Note that the CWI work does not map variable-length Kleene-* path matches to recursive SQL queries, by the way]. Insert/merge operations can be thought of establishing or modifying sub-graphs, rather than as decomposed elementary operations. These features lead to more compact declarative query statements: the use of patterns to suggest pictures of data is a powerful advance on "structured English".

    SQL databases, as they have evolved in the last forty-plus years, have tended to spurned "graph APIs". The reason that a graph API has appeared in SQL is because languages like SPARQL, and especially Cypher, have caught the imagination of users (and of not a few database engineers). So here, at least, we ought to give credit to graph databases for moving things forward.

    SQL/PGQ is, as Andy points out, a language that "quotes" GQL. It is a foreign function mechanism. A graph view is defined over the SQL schema (and that view is not a mechanical transposition of the referential integrity graph formed by FK relationships; it is a true view allowing a quite different data model to be offered). The guts of the PGQ query is a graph pattern match, and that is written in a sub-language of GQL. But an update to the graph view must be carried out by modifying the base tables which underly the graph view, using regular SQL DML verbs. This means that there is a sharp mismatch between querying and maintaining the data. Now, if SQL/PGQ were enhanced to support graph pattern-based updates of the graph view, then the result would be that SQL quoted even _more_ of GQL. This would tend to push the "relational engine" underneath into being a GQL engine, not a SQL engine.

    If Andy is right then in the long run PostgreSQL or DuckDB or Relational AI will be the best, fastest, most operational graph database, and Neo4j and TigerGraph (and many other "native" contenders) will fade into history. I would not preclude (or bet on) that outcome. But I would be very, very surprised to see graph databases disappear as a result. Back to the false counterposition.

    The confusion of model/surface/paradigm and implementation strategy shows up in other ways. The fact that graph databases often allow schema-free experimentation doesn't mean that they shouldn't support a mandatory-schema model like SQL as well. I think they have to, in order to match up to enterprise usage requirements, and to support maximum optimization, as Andy rightly points out. Lack of schema is a weakness of some graph databases: the answer to that is not to spurn graphs, but to get graph schema sorted out as a standard and product feature. The fact that graph schema or a composable graph query language are harder than table schema or table views is not a reason to give up on pushing the graph model forward, any more than the complexity of query optimization was a reason to retreat from the declarative models of QUEL or SQL to record-oriented interfaces.

    My vote is that graph databases do something useful that goes beyond what SQL databases have historically achieved -- and I hope that results in a convergence of the best of both.

    1. minwu

      Re: Graph data management is a data model, not an implementation strategy

      Graph data management is a data model, not an implementation strategy——This title says everything I want to say

    2. Staniforth

      Re: Graph data management is a data model, not an implementation strategy

      Very good - thanks for this in particular:

      "The fact that graph databases often allow schema-free experimentation doesn't mean that they shouldn't support a mandatory-schema model like SQL as well. I think they have to, in order to match up to enterprise usage requirements, and to support maximum optimization, as Andy rightly points out. Lack of schema is a weakness of some graph databases: the answer to that is not to spurn graphs, but to get graph schema sorted out as a standard and product feature. The fact that graph schema or a composable graph query language are harder than table schema or table views is not a reason to give up on pushing the graph model forward, any more than the complexity of query optimization was a reason to retreat from the declarative models of QUEL or SQL to record-oriented interfaces."

      This touches on my biggest grip against several graphdb implementations

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon