back to article Father of SQL says yes to NoSQL

The co-author of SQL, the standardized query language for relational databases, has come out in support of the NoSQL database movement that seeks to escape the tabular confines of the RDBMS. Speaking to The Register as SQL marks its 50th birthday, Donald Chamberlin, who first proposed the language with IBM colleague Raymond …

  1. Bebu
    Windows

    The long view

    Its probably worth remembering that SQL isn't the relational model (or vice versa.) In Michael Stonebreaker's Ingres Papers the original Ingres query language QUEL, I think, was claimed to be superior to the SQL if that time.

    Chamberlain's points about non SQL databases in a distributed and massively scaled environment are probably valid. I don't have the knowledge or experience to say.

    I have Codd's original paper in an ACM anthology which is still worth reading for its clarity of vision.

    The Ingres Papers are worth reading just to realize what they were doing, when and with what. Not much since then is really new.

    1. Charlie Clark Silver badge
      Thumb Up

      Re: The long view

      I envy you. I have copy of one of Chris Date's books with me and was lucky enough to meet Fabian Pascal to discuss how the industry constantly seeks to undermine the advantages of a rigorous understanding by pushing their substandard but shiny implementations. I recently revisited a fairly standard task (essentially filling missing rows in a calculated data set) that I personally can't easily solve in a single query (I find subqueries difficult to think with) but which became crystal clear through the use of views and CTEs.

      It's been suggested that SQL was deliberately hobbled by vendors to push their products, in ways we've seen with other standards such as from C++ to HTML. This gets even worse when you read the literature that is supposed to teach you about relational databases, but which almost invariably promote particular implementations and their assumptions.

      There are also several misconceptions about both relational databases and NoSQL. Relational databases are often not just more reliable because of integrity constraints, but also more powerful, because of the power of the underlying relational calculus. The "o", meaning "(not) only" is the key letter in NoSQL and is the get-out clause when you want to work with ephemeral data, and can live without integrity. But this is not really much different to a data processing pipeline where you mine data as it enters the system. There's nothing special with the databases that can't be done with a relational store if you disable some of the constraints. But in nearly all situations losing integrity will impose the highest cost on your work.

      Scalability for relational databases using clusters has been possible for years. It takes some work to get right and will cost something but far less than surrendering data integrity will when it inevitably comes to bite.

      1. JoeCool Silver badge

        Re: The long view

        The *intent* of SQL was to simplify the query language aka lower the entry bar.

        1. Bruce Ordway

          Re: The long view... lower the entry bar

          Yes, I remember SQL as so easy.... after we switched from UniData,

      2. TaabuTheCat

        Re: The long view

        Fabian Pascal is a name I haven't heard mentioned in a really, really long time. I remember my first encounter with a database. FoxPro. I had no clue what I was doing and started hanging out in the dial-up forums. Fabian was there, railing for people to understand the importance of the relational model, normalization. etc. I remember him being incredibly patient, answering all my questions, and basically giving me a running start to doing things the SQL way. Probably never thanked him enough for setting me on the right path, but he was instrumental in my career. Funny how being lucky enough to cross paths with a few of the right people can really change your trajectory.

        1. Charlie Clark Silver badge
          Thumb Up

          Re: The long view

          When I wanted to borrow a book from my neighbour on databases and he gave me "Practical Issues in Database Management", tellingly no longer in print. I'm not going to pretend to be a guru in the underlying maths, but working through this book, I learned most of what you need for modelling: if you normalise your data properly then you should be able to do anything you want with it. Indices, views, etc. can all be added later to speed up particular queries. And, yes, I am very much aware that some modern systems need to cope with incredible throughputs, but denormalisation, ahem, normally doesn't help there either.

          I haven't been in touch for a while but he's still writing and I know he's always grateful for a headsup or a donation: life in San Francisco, for those not living off a VC's teat, can be very expensive.

          For anyone working with Postgres I can also heartily recommend Dmitri Fontaine's book on development which contains some good examples of common tasks you might need to deal with.

        2. Anonymous Coward
          Anonymous Coward

          Re: The long view

          Ah normalisation. A blessing and a curse. Great if you want to simplify your data...nightmarish if you go too far and your solution ends up with more dodgy joins and unions than a Las Vegas wedding chapel causing your solution to move slower than a pisshead at closing time.

          1. Charlie Clark Silver badge

            Re: The long view

            It seems a truism that when someone mentions the importance of normalisation, someone else suggests that joins are responsible for slow performance. There's no reason why this should logically be the case, but if it is it's another reminder of the importance of a implementation, both in database design and in the engine. In general, joins on foreign keys will use indices and should, therefore, be as fast as denormalised queries would be.

            It is possible to discuss how many degrees of normalisation you need to go through: is the fifth normal form really necessary? It is often not; but there's usually no need to duel over a theoretical discussion: normalisation is a process and not an end in itself. I think the main point with normalisation is that, especially with views, it does not in any way inhibit access to denormalised representations, as long as these are based on projections from the underlying data. There are, of course, situations when the data to be anaylsed requires some processing (graph traversal, aggregates, calculations, etc.) that you don't need to run each time for each query. Materialised views and/or temporary tables let you deal handle this without risking the integrity of your data. If the situation demands it, this might involve more than one copy of the database and, therefore, more hardware.

            It's similar when it comes to importing massive amounts of data in a pipeline. It will quickly become clear that it's faster to suspend constraints, import data and then reactivate constraints rather than checking row for for. You can still wrap everything up in a transaction, and you can add steps for pre-processing, sanitising, etc. And, yes, you might need some beefy hardware for the pipeline. I think this is especially the case for time series data which is now produced in staggering quantities by companies monitoring everything we do.

            1. Anonymous Coward
              Anonymous Coward

              Re: The long view

              "In general, joins on foreign keys will use indices and should, therefore, be as fast as denormalised queries would be"

              Should be, usually aren't. Entire query caching software solutions exist to work around problematic massive queries with tons of JOIN commands in them...an entire extra layer of software to account for crappy database design.

      3. G40

        Re: The long view

        “deliberately hobbled by vendors to push their products, in ways we've seen with other standards such as from C++ to HTML.”

        And your sources to demonstrate and support this claim can be found where?

        1. nintendoeats

          Re: The long view

          Perhaps they are talking about a certain Redmond based companies refusal to support either language in an entirely standards-compliant way for at least a decade?

          For example:

          https://devblogs.microsoft.com/cppblog/announcing-msvc-conforms-to-the-c-standard/

          I don't think I need to find a source for HTML, we all know what IE was like.

          1. Charlie Clark Silver badge
            Thumb Up

            Re: The long view

            Yes, Microsoft was terrible when it came to updating HTML. However, it should be remembered that IE's implementaiton of table spacing was correct at the time. But, it should equally be noted that there attempt to prevent updates eventually backfired once Google saw the commercial opportunities for making browsers more powerful. I don't know how much money MS spent on trying to keep IE competitive, well preserve the ActiveX walled garden they'd created, but it failed badly. Credit to Google for continuing to enage in the process, unlike Apple which more or less decided it was job done once the app store could run in a browser.

      4. An_Old_Dog Silver badge

        Re: The long view

        Any technology can be misused by fools, the corrupt, and the ignorant.

        I'm sure there are appropriate applications for NoSQL and its offshoots, but I hope to hell whomever is designing a medical database understands why "eventual consistancy" is a bad idea.

        1. I like fruits
          Facepalm

          Re: The long view

          Ok, An_Old_Dog, I'll bite :-)

          Could you please explain what's wrong about using eventually consistent DB with most of medical data? I'd like to know what you mean exactly, before I start complaining :-)

          Now, if you said "financial transactions" I'd listen.

          BTW NoSQL does not have to mean "eventual consistency". At least DynamoDB supports transactions.

          1. Jou (Mxyzptlk) Silver badge

            Re: The long view

            > NoSQL does not have to mean "eventual consistency".

            Nobody ever said that. Unless you have to deal with Exchange DB :D.

      5. W.S.Gosset Silver badge

        Re: The long view

        >substandard but shiny implementations

        ...Oracle

        STILL can't do Isolation/SerialisedTransactions in any real-world sense.

        To get it, you have to drop the server, change config variable ("Serializability=True" or somesuch), then restart.

        Locking Granularity is then TABLE.

        !!!

        Massively obfuscated in the doco, spread across multiple docs, but admitted.

        Last checked (a nontrivial effort) a couple of years ago. They might have fixed it since then but I highly doubt it.

        Oracle marketing always used to trumpet that they ran on more platforms than anyone else. We always said the best platform to run Oracle on was a slide projector.

    2. Anonymous Coward
      Anonymous Coward

      Re: The long view

      > Its probably worth remembering that SQL isn't the relational model (or vice versa.)

      Indeed. And there was a time when such a comment would not have been necessary in a Register article, back when, behind its tabloid tone, there was a serious IT publication.

    3. Sanguma

      Re: The long view

      Neither Codd nor Date were fond of SQL. SQL's dominance is the result of IBM's then market dominance.

  2. Helcat Silver badge

    Got to pick the right tool for the job

    NoSQL has it's strengths but it also has weaknesses, too. Same for Relational DB's, transactional DB's etc.

    Issues with NoSQL is they don't scale to the same extent as the more traditional DB's. But they're a hell of a lot better at handling unstructured data (it's what they're designed for). If you want transactional data or data security, it's not as good (these have to be handled before hitting the DB) where as a well designed relational DB has it baked in.

    So, depending on what you need the DB for - that dictates which approach to take.

    Just not MySQL. Please, not MySQL... just... no.

    1. klh

      Re: Got to pick the right tool for the job

      Thing is...

      They were supposed to scale better - they do only under specific circumstances (and you can replicate those circumstances in an RDBMS to get similar "scalability").

      They were supposed to handle unstructured data better, then Postgres implemented JSON and it's faster and easier to use than in Mongo.

      At this point, choosing a general purpose RDBMS instead of something NoSQL gives you the best of both worlds (and more if you are into the LLM vector-y stuff) at almost no additional cost if you know what you are doing. And if you don't know what you are doing, NoSQL won't save you (but the vendor will happily shove a few grands worth of workshops on management to "help you").

    2. Jou (Mxyzptlk) Silver badge
      Devil

      Re: Got to pick the right tool for the job

      So it is MariaDB?

    3. Charlie Clark Silver badge

      Re: Got to pick the right tool for the job

      Unstructured data is an oxymoron. NoSQL typically handles "schemaless" data, or data that does not conform rigidly to a schema. This is, usually, some kind of key-value and/or timeseries stuff and very occasionally toplogies where graphs can be useful, though these can often be provided by extensions to the database. Codd, Date, et al. did the fundamental work back in the 1960s to prove, mathematically, the superiority of the relational model over hierarcharchical, network or graph databases.

      MySQL is less shit than it was but I currently see no reason to recommend either it or its red-haired clone, MariaDB, over Postgres. I can understand companies going with it because Oracle is behind it: that won't solve any fundamental problems but will give them a number to call. We've recently seen MariaDB's commercial plans go up in smoke.

      1. Anonymous Coward
        Anonymous Coward

        Re: Got to pick the right tool for the job

        >" I can understand companies going with it because Oracle is behind it"

        I can understand companies avoiding MySQL for the same reason.

        I have a soft spot for MySQL. That's what I used a couple of decades ago for a few different projects. Nothing I worked on had the budget for any of the big players MySQL's shortcomings were far overshadowed by my inexperience in database design.

    4. Mostly Irrelevant

      Re: Got to pick the right tool for the job

      MySQL is passable as a data store for applications, but PostgreSQL is better for mostly everything

    5. Sanguma

      Re: Got to pick the right tool for the job

      So basically NoSQL is a high-performance front-end to relational databases, If you're serious.

      1. Anonymous Coward
        Anonymous Coward

        Re: Got to pick the right tool for the job

        That can be the case...but there are NoSQL solutions like MongoDB that have absolutely nothing to with relational databases.

        In all cases, the performance of the final product depends entirely on the skill of the developer...it doesn't matter which solution looks more performant on paper because the benchmarks etc don't reflect how you might use the product.

        In the vast majority of real world cases a NoSQL solution like MongoDB will always outperform a relational database just simply because of software developer overhead...that and NoSQL solutions tend to do the data processing on ingress (usually asynchronously) rather than egress (UNION, JOIN, SELECT *, [insert typical lazy dev garbage here])...you never have to JOIN tons of tables together and bring back more data than you need from a NoSQL database because the document you want to retrieve later was already prepared on ingress, in the background, and the user didn't have to wait for anything.

    6. Anonymous Coward
      Anonymous Coward

      Re: Got to pick the right tool for the job

      "they don't scale to the same extent as the more traditional DB"

      On paper maybe...but in practice...it depends on how your organise your data and then access it...most benchmarks out there don't take into account software developer bullshit.

      This is where a lot of people get confused...NoSQL does scale to the same extent and often better...they just don't scale in the same way. With a NoSQL database, its often the case that RAM and CPU aren't as important as disk I/O...but disk I/O is hard to comprehend because you end up having to explain why you have a massive 20TB storage array for a 12GB database...it's tricky for devs to comprehend, because they like their traditional "just throw more CPU and RAM at it" fix, for a NoSQL database you need more throughput to improve performance, which means you need more disks to operate in parallel...it's even less intuitive when you suggest moving to nvme storage instead of hard drives...because you end up creating what appears to be a downgrade...because you end up with less storage than you started with but way more throughput...putting you in a position where you have to explain PCIe lanes, bandwidth etc etc to your customer in the face of what appears to be cutting off their server space at the knees.

      A traditional SQL database can benefit massively from improved disk I/O as well, but not to the same extent a NoSQL database can, and you can hide the disk I/O problems for a long time under a mask of loads of CPU cores and tons of RAM.

      In my experience, SQL vs NoSQL, SQL always hits the skids first. I've worked on a few solutions that were badly optimised and were being run on the max tier of Amazon RDS...basically the customer had reached a point where they can't "scale" anymore because there are no more CPUs and RAM options to throw at it...CPU usage at 100%, codebase a prime example of "JOINmageddon"as mentioned above.

      It's much harder for a developer to fuck up a NoSQL database...a SQL database is really easy to fuck up, because optimisation of traditional SQL can result in some serious expense further down the line...for the sake of "tidy" data, you may have lumbered your client with much higher bills down the road to account for it...not only that, feature updates in the future become even harder as time goes on...need to add a new column? Well shit, you have a bunch of JOINS and UNIONS in several queries that rely on that table, but don't necessarily need that new column...those queries just got unintentionally heavier...so your new feature has an impact on performance all over the place.

      TL;DR

      SQL can scale better than NoSQL...but in practice and with software developer shenanigans...it is often not the case in reality.

  3. STOP_FORTH Silver badge

    Unasked question

    Did nobody think to ask the guy "Is it ess cue ell, sequel or squeal?"

    1. Doctor Syntax Silver badge

      Re: Unasked question

      Maybe it's not even an answerable question.

    2. Anon

      Re: Unasked question

      I was wondering about that the other day, and then I... SQUIRREL!

      1. steelpillow Silver badge
        Coffee/keyboard

        Re: Unasked question

        Heh!

    3. JamesTGrant Bronze badge
      Joke

      Re: Unasked question

      It’s pronounced SQL. Hope that helps…

      1. This post has been deleted by its author

      2. STOP_FORTH Silver badge

        Re: Unasked question

        That does not help at all, James.

        You still at Jurassic Park, or did you escape?

  4. Anonymous Coward
    Anonymous Coward

    Sequels are never as good

    The father of SQL, who now works for a NoSQL company, says NoSQL is ok. No vested interests there then...

    1. CowHorseFrog Silver badge

      Re: Sequels are never as good

      This is the problem with america, celebrities have value rather than real engineers. Everyone somewhere is trying to bullshit something is great not because of technical reasons but because some celebrity who is whoring their name and nothing more.

  5. steelpillow Silver badge
    Boffin

    Um

    Seems to me there is a double mismatch:

    Query language structures >> Data structures >> Media formatting structures.

    Oracle do try to make their database all-singing-all-dancing. 20 years ago I was writing training materials for Oracle 9, which they were calling an ORDBMS - Object Relational Database Management System - and adding java queries and a java global area alongside the pl/sql global area (or whatever it's called, 20 years is a long time). Moreover it would, as always, grab a block of disc space and format it in its own way, fuck the OS. At the same time you could drop blobs into the main filesystem and integrate them into the schema. So they were addressing both mismatches independently. I do not imagine they have spent the whole time sitting on their butts since then.

    But as a time-served RF engineer, the term "impedance mismatch" is inappropriate. The data mismatch is one of structure or kind, not of degree, while an impedance mismatch is like jpeg compression - you always lose something. Each of those >> above represents a structural transformation where you do not want to lose anything.

    So we can think of that Oracle 9 ORDBMS as heading towards a paradigm not unlike the GCC: many possible input languages, each with its transformation to the inherent data structure, followed by a bunch of subsequent transformations to choose from according to each platform-native format.

    There, I've put myself to sleep. I don't suppose you've fared any better. If you ever code up an open-source Swiss-knife database like this, do let me know.

  6. Anonymous Coward
    Anonymous Coward

    Discipline, Structure, Normal Forms.....And So On.......

    Quote: "...To get the scalability and performance that you need for modern apps, many systems are relaxing some of the constraints of the relational data model...."

    Once upon a time I knew absolutely nothing about database theory. You know....ugly fiddling about with (subsequently) un-maintainable tables in dBase-III.

    I read some of Chris Date's assessments around SQL. Then in 1990 I found Joe Celko's excellent book "Instant SQL Programming" (WROX Press). This book led to some education about "normal forms" and the discipline of eliminating duplication. I'm still not very good at data structure, or at data discipline, but at least (now) I know something about these topics.

    So when I hear "an expert" telling the world that we should be "relaxing some of the constraints"................I start to wonder just how mad the world can get!!!

    1. steelpillow Silver badge
      Windows

      Re: Discipline, Structure, Normal Forms.....And So On.......

      Yeah, well, another of those courses I worked on was on database design, including normalization and - wait for it - de-normalization. A fully normalized database runs like a snail. It really pays in terms of speed and processing overhead to fuck up the data structures in carefully thought-through ways. That was the biggest part of that chapter!

      Indexes offer a simple example. By duplicating all the key content in a different data structure - an Index - you can make queries fly. They had a chapter all their own.

      Database tuning used to be a highly-paid black art until Microsoft bundled a tuning Wizard with the suck-it-and-see freebie SQL Server: Fire up percolator. Shut down Oracle. Import into SQL Server. Run tuning wizard. Import into Oracle. Coffee ready yet?

  7. dwrolfe
    Meh

    SQL can and does scale to the hundreds of thousands of TPS...

    First full disclosure: I am an employee of www.voltactivedata.com, who build a high performance OLTP SQL platform.

    I'm going to have to respectfully disagree with parts of this article.

    The concerns about scale and performance apply to [b]older[/b] RDBMS products, all of which were designed in the late 80's on the assumption there was one CPU and not a lot of RAM. There is nothing inherently slow or unscalable about SQL. It's just you need a modern architecture. We have customers running north of 500K TPS, with SQL, ACID transactions and zero eventual consistency. While NoSQL is one way of getting lots of transactions done, it's not the only way.

    As for JSON? It's a blessing and a curse, and is needed. Traditional SQL struggles.- like everything! - with application customization. You'd end up with thousands of tables and comically complicated schemas. JSON makes it much easier to add 'extra' items to a record. But the ability to add arbitrary stuff using JSON is a double edged sword. You could end up with 10 million records, with 19 different JSON schemas, one of which you nobody told you about. So JSON is useful, but not a replacement for everything.

    David Rolfe

    1. Jou (Mxyzptlk) Silver badge

      Re: SQL can and does scale to the hundreds of thousands of TPS...

      You can always go one step back and use XML notation for your objects. In some cases you have to use both (not at the same time of course) due to the way it is implemented by the application or language you use.

    2. Charlie Clark Silver badge
      Thumb Up

      Re: SQL can and does scale to the hundreds of thousands of TPS...

      The industry loves to come up with new problems that only its product, which is magical and requires little or no training, can solve. Companies love to be able to employ low-skilled people because they're cheap. Together a marriage made in heaven!

  8. martinusher Silver badge

    An old workhorse that's still useful

    I tend to lump SQL along with COBOL as a practical solution to business problems for the hardware of that era that has persisted because its widely used and still useful. That said it does run the risk of the solution driving the model (i.e. problem) rather than the model driving the solution. Its a problem with anything that gets widely used -- most of its practitioners are so busy using it (and coping with its limitations) that they just don't have the time of brain space to spend on anything new so they rationalize their reluctance in myriad ways.

  9. AlgernonFlowers4

    Bah Humbug!

    Bring back the ICL DAP, CAFS, Indepol and Clacks based computing so things get done at the speed of light!

  10. K.S. Bhaskar

    What is this NoSQL database movement of which he speaks?

    The very first database (IMS on a mainframe) was a NoSQL key-value database. The second one, circa 1967, was MUMPS on a PDP-7. Both are alive and well today. SQL is a Johnny-come-lately to the database world!

  11. K.S. Bhaskar
    Alert

    What is this NoSQL movement of which he speaks?

    The very first database was IMS on a mainframe, a NoSQL key-value database written to manage the bill of materials for the Saturn V rocket in the mid 1960s. The second, circa 1967, was MUMPS on the PDP-7 for biological laboratory information systems. Both remain alive and well today. SQL was a Johnny-come-lately to the database world.

    1. Jou (Mxyzptlk) Silver badge

      Re: What is this NoSQL movement of which he speaks?

      Yes, you mention examples from the Pre-SQL time, which was first mentioned and published as idea in 1974, as the article states. So what? Of course they are fine! Even a text file as "NoSQL database" works fine if you only have a few bits. Even a .CSV is fine and used as "No-SQL" for a lot more cases one might imagine.

    2. Mostly Irrelevant

      Re: What is this NoSQL movement of which he speaks?

      Currently people like to call object store databases "NoSQL" to be particularly non-specific. I'm not sure why.

      1. Sanguma

        Re: What is this NoSQL movement of which he speaks?

        Just to be obnoxious, I spent a few years trying to work out what an Object-Orientated Database was. I finally owkred that out, and the entire hoo-haa about Object-Orientated Database was an attempt to combine database services with the Object-Orientated languages such as C++. Some time later, after realizing that, I had a second look at the View construct of relational database, which is an effort to set up an independent Query, so you can use it in programming applications.

        I spent the least few years of the 90s wondering why others didn't see the (potential) interaction of those two constructs - database services for OOlangs like C++ and such, and programmable views that could be made Object Orientated. I still haven't worked that out. But it still seems to be the solution.

        1. martinusher Silver badge

          Re: What is this NoSQL movement of which he speaks?

          I might be a bit naive but I always thought that a database (view) was just a row of objects, at least as far as an application is concerned.

          1. Sanguma

            Re: What is this NoSQL movement of which he speaks?

            It's actually a query that's been archived, you might say.

            Here's some references to it:

            https://www.essentialsql.com/what-is-a-relational-database-view/

            https://learnsql.com/blog/sql-view/

            https://learn.microsoft.com/en-us/sql/relational-databases/views/views?view=sql-server-ver16

  12. Anonymous Coward
    Anonymous Coward

    SQL or not, if your source data isn’t glued in some consistent logic you have already lost.

  13. nremiz

    Father of SQL slash Employee of NoSQL DB Promotes NoSQL

    Vultures should've said up front that Donald is pulling a paycheck from Couchbase, when this fact hits gingerly later in the article it is way more sus.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like