back to article Using SQL techniques in NoSQL is OK, right? WRONG

Now that we have CQL for Cassandra and N1QL for Couchbase, it seems the power of SQL is being opened for NoSQL database developers. There is, however, a real danger that developers who see these SQL-like languages as an implementation of SQL are in for a world of pain as they try to leverage their SQL skills on a NoSQL …

  1. Destroy All Monsters Silver badge
    Trollface

    This gave them what they were looking for

    They were looking for sado-masochism, fast?

  2. Anonymous Coward
    Anonymous Coward

    Seems obvious.

    So, NoSQL is faster than SQL because there's no SQL:

    - No SQL transactions.

    - No SQL protection.

    - No SQL convenience

    - No SQL security.

    You get what you paid for...

    1. JLV

      Re: Seems obvious.

      No SQL standard

      Unless I am mistaken, each NoSQL db has its own little approaches to sql-like queries and indeed even for no-sql queries. OODB were briefly de rigueur in the early 90s but suffered when no uniform way to interact with them was made available. CQL is Cassandra's take, MongoDB and CouchDB will do something else.

      (Portable ANSI sql is tricky, but achievable, you don't have to be stuck in proprietary dialects if you pay attention to portability)

      Having said that, the NoSQL crowd is advancing the database paradigm, SQL is famously bad at certain types of data modelling (parts of parts) and web scale computing has different requirements. Postgresql for example is learning JSON tricks in order to compete.

      1. Anonymous Coward
        Anonymous Coward

        Re: Seems obvious.

        JSON and XML are both fine on a 1G LAN. Over any restricted connection (delay/bandwidth) they're, by definition, inefficient.

        1. Destroy All Monsters Silver badge
          Holmes

          Re: Seems obvious.

          JSON and XML are both fine on a 1G LAN.

          Why is this even being discussed? Either you have data in your database that fit your problem, then they have to transit, 1Gbps LAN or not. Or you don't, in which case better head for the showers. If you absolutely want, you can compress them to some binary but still inspectable format after all.

          JSON (for what purpose?) and XML are fine and in PostgreSQL and others, XML is just another column data type that you can operate on. Purposeless adherence to First Normal Form is just for people who want to suffer and who actually believe Strings are "atomic". We not only want set values, we want annotated tree values! (Why not graphs, actually?)

          1. Anonymous Coward
            Anonymous Coward

            Re: Seems obvious.

            The question is whether the entire XML/JSON chunk gets passed over the network to the client. In some NOSQL type databases, the entire chunk of XML/JSON is always passed over, leaving the client software to extract the atrribute(s) required. Thus if you have a 4k chunk of customer XML, but you only require the customer name (say 50 bytes) then there is a huge difference in network traffic - 80x in this example.

            SQL enabled databases will most likely have the 'projection' operation where only the required columns (attributes) are returned. Some of the more advanced databases will also allow you to apply a projection inside a JSON/XML structure.

    2. SimonSplat

      Re: Seems obvious.

      Well, it does depend on what you're trying to achieve, no? If you are trying to distribute data across distant DCs, or even across servers within the same DC, is not easy with "normal" SQL/RDBMS products, especially if you want the same answer everywhere at (near) the same time. It's certainly not convenient!

      If you don't strictly need the transactions (please define protection?), then why have the overhead of them? The right tool for the right job.

      1. Michael Wojcik Silver badge

        Re: Seems obvious.

        The right tool for the right job.

        Exactly. Relational DBMSes are the right tool for many jobs. Why would it be surprising that they aren't ideal for every job?

        And conversely key-value and other NoSQL1 DBMSes are useful for some jobs, where data needn't be always-consistent and in fact data integrity is not so terribly important.

        The problem is cheerleaders for either side2 who think their preferred paradigm is the correct answer to any question. That doesn't include the curmudgeonly types who are taking the occasion to do useful new work in database design, like Michael Stonebreaker.

        1This is usually where someone insists on pointing out that "NoSQL" has been backformed into "Not only SQL", and some NoSQL DBMSes also support relational databases with SQL. Utterly irrelevant to the discussion, but thanks anyway.

        2I'm sure someone here has already provided the link to "MongoDB is Web Scale", so I'm not even going to bother looking it up.

  3. Bucky 2

    This is why

    ...I'm highly suspicious of those software products that breathlessly exclaim, "Build your project with /no knowledge of the underlying database/."

    No. No thank you.

    1. SimonSplat

      Re: This is why

      Not knowing the data underneath is a recipe for disaster - no matter the approach or technology! But I hear your pain...

  4. Destroy All Monsters Silver badge
    Thumb Up

    Lesbos Transactions? No....

    lightweight transactions such as Paxos

    Kudos for mentioning PAXOS but AFAIK PAXOS is just the general case of the 2-phase commit protocol (n-phase commit protocol?), the latter hidden under the "Distributed Transactions" designator. So not particularly lightweight .

    In truth, I haven't understood PAXOS as far as I would like, Leslie Lamport's original paper was way confusing the last time I read it. I need to start again...

    Centerum Censo, people should read: A Critique of ANSI SQL Isolation Levels. Transactions - An underrated area of interesting stuff.

    1. Michael Wojcik Silver badge

      Re: Lesbos Transactions? No....

      Paxos is more like 4-phase commit with collision detection and retry. I suppose you could consider classic 2PC a reduced version of Paxos, though it discards most of the interesting features of Paxos. (With classic 2PC, the only allowable quorum is the entire set of participants and there's no provision for collision or retry.) And that's just "basic" Paxos - there's multi-Paxos (protocol is repeated on a regular basis to maintain agreement), Byzantine Paxos (some nodes may defect), etc.

      But yes, Paxos is definitely not lightweight. I don't think anyone uses Paxos as originally described by Lamport in practice; various optimizations are always employed. If memory serves there's a good Google paper on applied Paxos - might be the whitepaper on Chubby.

      (And there's no reason to write it in block caps - it's the name of an island (or of the fictional legislature of an island), not an acronym.)

  5. Charlie Clark Silver badge

    Pick any two: fast, reliable or quick

    There are good reasons NoSQL is so fast (and distributable), mainly that these engines have done away with table joins

    What have joins got to do with (implied) write speeds? Write speeds will be held back by data checking and transactional security. You can get speed by switching those off or by using a queue.

    Joins are only relevant in queries and are related to the projections that the relational model gives you, which has nothing to do with SQL. If data is properly normalised, and your DB has a good query optimiser then the flexibility imposes a minimal cost. There should be very few situations where a projection is slow. In exceptional cases you can denormalise for reporting purposes.

    NoSQL is there for inflexible data where projections will never be required. This is the exception.

    In summary: use Postgres and with BSON you can have your cake and eat it.

    1. Anonymous Coward
      Anonymous Coward

      Re: Pick any two: fast, reliable or quick

      Pity you don't seem to know what you're talking about.

      Some pointers :

      1. You're only referencing the trivial situation where an update is to a single row identified by a unique key. Any other update will need to be preceeded by a query to identify the row that needs to be updated. Also many situations will require multiple rows to be updated - again requiring retrieval via an implicit query. Typical sitations would include rows accessed via a foreign key (e.g. parent-child) or by date restriction. Even where an update is a write by unique primary key, most databases (whatever their type) will execute an implicit fetch of the data before attempting a write. In fact, even an insert will result in an implicit fetch at some deep level to ascertain whether that row already exists.

      2. Switch off transactional integrity? Of course, in Twitter or facebook, who cares if a few posts or tweets are missed? However, for most commercial situations involving money, orders or contracts etc, turning off transactional integrity and the resulting data loss is not an option.

      3. Using a queue makes no difference - you are simply switching the transactional integrity to the queue rather than to the database. In fact, you're making it worse by using a queue as the data has to be pushed through two data storage paradigms thus doubling the work.

      4. A 'projection' in relational terms is a subset of attributes of a relation. A relational join is something entirely different. The implementation of a projection in SQL is just the ability to list the columns to be returned (rather than all columns in the table).

      5. I think you mean "NoSQL is there for inflexible data where joins will never be required". The reality is that for any reasonablly mature application, then they will be required. In the simple example of customer orders for products, then the simple pair of requirements of wanting to list all orders for a customer and all orders for a product are hard to achieve in a NoSQL database (without sacrificing performance or duplicating data or resorting to secondary indexes).

      1. Michael Wojcik Silver badge

        Re: Pick any two: fast, reliable or quick

        I think you mean "NoSQL is there for inflexible data where joins will never be required". The reality is that for any reasonablly mature application, then they will be required.

        This simply is not true, unless you have a very special meaning of "reasonablly [sic] mature application". There are many, many applications which are not dealing with data that needs to be always consistent or reliable. Logging telemetry information from lots of sources (whether those are simple sensors or something much more complicated), for example, can often tolerate quite a lot of data loss and errant points.

        And there are many, many "mature" business applications which use non-relational databases that don't support joins. There are a huge number of hierarchical IMS databases still in use, for example. And we still see occasional queries from people running more-obscure old databases, such as Model 200 (basically a very early key-value database plus a bunch of other stuff).

        In short, you don't seem to know what you're talking about, either. But at least Charlie posted under his own name, rather than as AC.

    2. SimonSplat

      Re: Pick any two: fast, reliable or quick

      Ok I choose reliable and fast (isn't fast the same as quick?)

      1. matchbx
        Facepalm

        Re: Pick any two: fast, reliable or quick

        Sounds like what I've been told for years, there are three sides to any project.....

        Time, Money and Quality.

        Pick any two....

  6. smartypants

    All you need to know about databases

    http://howfuckedismydatabase.com/

    (A bit out-of-date but always makes me chuckle)

  7. Camilla Smythe

    Databases...

    Fucking Kludge More Like. Now kindly hadaway and shite with your extra layers of shite..

    Oh... FTFY

  8. a_yank_lurker

    Know when use 'em

    The problem is not the database technology but knowing when one type is appropriate and it is not. Document NoSQL databases are best when the data is somewhat similar between each document. Relational are best when the data can be specified very well beforehand and are particularly valuable when ACID compliance is not optional.

  9. david 12 Silver badge

    Back to the future...

    So, just like SQL in the early '90s ?

  10. Mark 65

    Training

    Relying on tutorials on the web is dangerous, as these often go out of date very quickly and often don't state the software version they refer to.

    I'd wager paid for courses have just as much trouble keeping up to date.

    1. Anonymous Coward
      Anonymous Coward

      Re: Training

      But with a paid course, there's a chance you could try "give me back my money you thieving..." if it's not up to scratch.

  11. mtnz

    Interesting, but

    NoSQL no doubt has it's uses - horses for courses. But I'm not sure this article uses correct examples to expound its use in appropriate scenarios.

    "using our user name example, both users pick the name they want. It appears to succeed, but actually only one user gets the name they want and the other is left with a puzzle – the system said they got their user name but now they can't log on."

    - why use a NoSQL database for this functionality? It doesn't sound appropriate.

    "Take for instance "Time To Live" in Cassandra, which makes your data disappear after a certain amount of time. This can give the developer the ability to add a feature to an application that would be painful in a traditional relational database if, say, a user uploaded photos that disappear after a certain amount of time."

    - easy in a relational DB. Use a scheduled job that looks at the creation date of the BLOB.

    "Then there are Maps, Lists, and Sets as fields in the table. These let the developer store many attributes under one field, for example users could have multiple email addresses in a field storing their user data. If you want, these can map on to your application structures, making persisting state easy."

    - it sounds like you are using the DB as a bit box over a generic data model which is not best practice.

    I'm not against NoSQL, but an article where it is being used to it's best advantage would be more interesting. Then tell us how it differs from doing the same in a relational DB / SQL.

    1. BigAndos

      Re: Interesting, but

      I second your article suggestion, this would be really interesting. In my previous job at a consulting firm I went on a two day MongoDB course as they were trying to build a "big data" capability without really knowing why. I found the course interesting but at the end I came away from it not that sure why I would want to use it.

      The main advantage I could see was the ease of scaling horizontally with sharding as doing this type of thing is very, very difficult with a conventional RDBMS. However, I've also never encountered a situation where "eventual consistency" was deemed acceptable.

      Also, with a well designed and well indexed SQL database you can handle "moderately big" data fairly well. In my current job I managed a data warehouse built using £15k of commodity wintel hardware, SQL Server 2008R2 and fast SAN storage, and we can run queries across hundreds of millions of rows of data and joining numerous tables in seconds.

      What would be fascinating is to see how NoSQL could handle "properly big" data and how people get around the consistency issues. For example, I would love to import the full event stream from our Adobe SiteCatalyst instance and query it - is this the kind of thing I can turn to NoSQL for?

      1. Michael Wojcik Silver badge

        Re: Interesting, but

        I've also never encountered a situation where "eventual consistency" was deemed acceptable

        Never?

        "Have you seen the Reg comment from BigAndos?"

        "No, and it's not showing up for me."

        "Check again."

        "OK, there it is."

        There are any number of real-world situations where eventually-consistent is acceptable.

        There are even some in business situations where, at first glance, always-consistent would be necessary. Consider an online retailer like Amazon, and their inventory system. You could make many inventory updates eventually-consistent. Sometimes someone goes to buy something and is told it's out of stock; well, you've lost or postponed a sale, but that's not the end of the world. Sometimes someone buys something that in fact you've already sold to someone else: you email them a "Sorry!" note and refund their payment (or don't deduct it in the first place until the item's ready for shipping). The vast majority of consumers will tolerate that sort of thing, as long as it doesn't happen too often and the rest of the shopping experience is convenient.

        Consider an example I mentioned in another post, where you're gathering telemetry data from many sources for later analysis. This can be very much eventually-consistent - you're not planning to look at the data at all until well after the consistency window closes.

        Or say your'e the NSA, scraping text from online conversations and emails and whatnot...

        That certainly doesn't mean that anything can be eventually-consistent; most of the things we have traditionally wanted to be always-consistent should stay always-consistent (and use a DBMS that provides that), or you just push nasty problems into the application. But there certainly are many situations where eventually-consistent or even inconsistent data works just fine.

        That includes a lot of "big data" applications, where the value is in the aggregate data, not so much individual pieces, and you expect it to have a significant noise rate anyway.

      2. Loud Speaker

        Re: Interesting, but

        (I am guessing here) Its ideal for situations where you want your data thoroughly "Mongo'd" - Of course its true you could use SQL Server, but then I would have to kill you!

  12. TeeCee Gold badge

    A couple of things.

    ... these engines have done away with table joins.

    Sounds like the answer here is to resurrect the toolchains and associated languages designed to work with large data sets in such structures[1]. RPGII/COBOL and SORT off the old System/36 spring immediately to mind and I'm sure there are others.

    Secondly, just about everything said about SQL structures and sequences and their importance apply equally as well to SQL databases. The query optimiser is not a worker of miracles and it's amazing how much performance improvement you can get by pointing someone who knows what they're doing at some abortion slung together by those who think that a query optimiser is both psychic and infinite in its wisdom.

    [1] There is nothing new here, just Old Skool 2.0.

    1. Michael Wojcik Silver badge

      Re: A couple of things.

      There is nothing new here, just Old Skool 2.0.

      Ever the cry of those who can't be bothered to learn about what they're commenting on.

      Yes, there have been non-relational databases for a long time - since before relational databases, of course. There are hierarchical databases and network databases and object databases. There are even key-value databases that long predate the "NoSQL" moniker, such as Model 204 (which I incorrectly called "Model 200" in another post; memory isn't what I remember it being).

      That doesn't mean that nothing has changed with non-relational databases since the 1980s. Much of what's in NoSQL DBMSes - particularly the high-level concepts - may be reinvention, but there has also been considerable innovation.

  13. Robert Grant

    Spark and Hadoop are not in any way equivalent

    Spark is not about table joins across NoSQL DBs, or anything remotely like that.

    1. Michael Wojcik Silver badge

      Re: Spark and Hadoop are not in any way equivalent

      Not the Spark core, but Spark SQL is in part about providing structured data and table joins on top of Spark. Of course, that's still not really "table joins across NoSQL DBs", except in a pretty vague sense.

  14. Captain Server Pants

    NoSQL is just a cache, it's not a substitute for an RDBMS

    Assigning user names is precisely a situation which calls for acid compliance. Why would anyone give that up to write their own fudge sandwich? Now you have 2 problems...

    No joins, no underlying normalized data tables, no acid compliance, no SQL languange, etc... Yes, it's possible to build something without these features but GOOD LUCK making changes. You're gong to need it. Remember, Paul Bunyan died the next day and the steam engine showed up just fine..

    1. Tom 7

      Re: NoSQL is just a cache, it's not a substitute for an RDBMS

      Horses for courses - there is some mileage to be had from NoSQL but in the end you have to suck the data out of it into a 'proper' database if you want to do SQL.

      My main complaint is that once you rely on the NoSQL its tends to get in the way of actually organising your data properly. And I mean really get in the way.

    2. Chris Leeson

      Nitpick...

      Paul Bunyan? Don't you mean John Henry?

      1. Captain Server Pants

        Re: Nitpick...

        Ha! Yea, you're correct. I had to consult Wikipedia. My child folklore is out-of-date.

  15. Henry Wertz 1 Gold badge

    Old School, and Ilya

    "There is nothing new here, just Old Skool 2.0.

    Ever the cry of those who can't be bothered to learn about what they're commenting on."

    I don't know if that was the point, I think the point is that it may be worth finding historical articles on non-relational database techniques in addition to reading brand-new tutorials. Some won't apply, but some will; those who don't know their history are doomed to repeat it. I've seen this before, where some new software or programming technique or language comes out that has attributes closely resembling something that "went out of style" 15 or 20 years ago. Then, all sorts of optimizations, tips and techniques are "discovered" that actually were in regular use 15-20 years ago (but not used more recently because they'd be incompatible or ineffective with what was used in the interim.) In this case, dig up the old knowledge on what techniques helped with pre-SQL database systems, to see if there's anything applicable to NoSQL.

    Ilya, with all due respect, I don't know if SQL is even in much trouble. I must agree, even though convential databases are used for it, SQL is pretty unsuitable for lexical analysis, categorization, natural language search, and so on. But, loads of databases are NOT used for this purpose; for example, I can't see someone replacing a database with account #, address, what services they're signed up for, and payment records, with some kind of lexically analyzed searchable system, because they simply are not performing those kind of searches.

    deadlockvictim, I know you're quoting lexiclone's old site. Don't know what to say about this, I'm not dubious about Ilya's lexical analysis software, but I am dubious that Ilya has come up with a "One Law of Nature" or done anything with the Bible (I'm sure he analyzed it... but it's such a large body of text, with people willing to accept so little sensible information as "proof" that their technique worked... like a few words, possibly misspelled... that almost any numerological or text analysis technique "works" by that criteria.)

    1. I.Geller Bronze badge

      Re: Old School, and Ilya

      The differences

      1. I use Grammatical rules and no n-grams

      2. I obtain internal statistics

      3. I filter queries through personal profiles

    2. I.Geller Bronze badge

      Re: Old School, and Ilya

      Oracle structures data

      1. Oracle obtains statistics on queries and data from the data itself, internally.

      3. Oracle gets 100% patterns from data.

      4. Oracle uses synonyms searching.

      5. Oracle indexes data by common dictionary.

      6. Oracle killed SQL.

  16. Adam Inistrator

    The Pick Operating System

    "No joins in NOSQL" ... Come back PICK OS ... all is forgiven! ... 40 years ahead of its time.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like