back to article AI query optimization in IBM's Db2 shows you can teach a tech dinosaur new tricks

IBM likes to be secretive about Db2, the 41-year-old database still used by some of the world's largest banks and retailers, but occasionally something leaks out. Db2 was first launched in 1983 and is seen as IBM's breakthrough relational database, a relatively new concept at the time. Big Blue had earlier efforts, as did …

  1. Doctor Syntax Silver badge

    "infused with GenAI"

    Do they expect to get into hot water with it?

    1. Fruit and Nutcase Silver badge
      Coat

      Infusion

      No idea if the Db2 division has had any culls of "Dinobabies" - Boomers/GenX

      They no doubt will be infusing the workforce with Zoomers/Gen Z

      1. teknopaul

        Re: Infusion

        AI based query optimisation will guarantee randomness in response times, more than the inability to choose the indexes for your query was ever capable of.

        It's 2024 and Db is still dark art. Imho Dbas love this.

        DBAs "get paid" by explaining nothing and sharing no tricks.

        What's interesting is they get sacked for the same reason.

        SO for Dbas has very little info for a reason

        1. lfinnie

          Re: Infusion

          The Db2 AI query optimizer will produce consistent results as long as a new model has not been trained since the last time the query was compiled - there is no concept of randomness baked into the models, unlike today's foundation models/LLMs that have a 'temperature' setting to intentionally introduce randomness.

          In the initial release, the AI query optimizer is used to estimate cardinalities by taking advantage of learned correlations in the table data. This is exactly what machine learning is very good at - discovering hidden correlations in data.

          If a new model is trained through RUNSTATs, then the first time the Db2 AI query optimizer uses the new model, it's possible that the generated plan may be different than when the SQL statement was compiled/optimized with an older model. This is not unlike the traditional (non-AI) optimizer though - if you issue a new RUNSTATs, new table statistics are collected, and the new statistics could lead the traditional optimizer to select a different access plan as well.

  2. that one in the corner Silver badge

    I query AI query optimisation

    Wild guess, Db2[1] already has a optimising query planner for when it turns your SQL (sorry, "Big SQL") into actual executable operations on the database. So if you are in the habit of just writing inefficient / inelegant SQL that can be logically / mathematically reduced and refactored for better performance (much the same way an optimising compiler performs sensible transforms on your code) then it is already doing that.

    > "allow Db2 to continuously learn from customer's queries

    So to optimise your queries the AI is going to - what? Decide that you didn't *really* mean to write that bit of SQL, surely you would be happier running this instead; after all, it runs so much faster and generates a much smaller set of results, so much easier for you to read!

    > "infused with GenAI"

    Even better, we've optimised away actually running any of your SQL, now we just *say* we ran it and use GenAI to create a table that looks like it could be right, sort of.

    [1] 1983? Really? Gosh, that means that old job was pretty much an Early Adopter! And I never did get around to fiinish reading that wall of binders full of documentation.

    1. david 12 Silver badge

      Re: I query AI query optimisation

      allow Db2 to continuously learn from customer's queries

      I'm not a Db2 DBA: in my experience query optimization is not automatically updated, and sometimes gets wildly out of wack.

      Also, (not a Db2 DBA), in my experience query optimization is painfully brittle, and sometimes the query needs extensive annotation to try to coerce it into using the obvious query plan that runs in 1/10th the time. In those cases, post-hoc query re-optimization would be an obvious good thing.

      1. fg_swe Silver badge

        "Obvious"

        To your wet computer of 100E9 Neurons and 100E13 Synapses there exists an "obvious" query plan. But for the dumb CPU there might be Billions of alternatives and each of them must be "rated"/"estimated" and then the best one chosen. In reality, the dumb CPU will use some sort of heuristic to boil the billions down to 10000 different plans. Works often, but not always, well.

        The premise is that the AI is essentially a novel heuristic for find a good query plan. Any experience from the real world ?

  3. rndSheeple

    AI optimizing SQL err well

    I haven't worked with db2 for a while, however on power architecture I would say it's at least competitive for "power" and efficiency.

    Like an earlier comment kind of stated, if you opttimize your queries which one should for goodness sake, then I do not see how an AI can make "organization specific" queries more efficient.

    Even if we look at large software like say anything. First of all changing queries on commercial sw is probably not supported, maybe, just maybe you could get more efficient reporting and other "outside the box" queries towards anything like say dynamics as least on-prem. And other bloatware. But this could probably be done by any optimization.

    So adding AI, well I guess anything is possible, however just tuning queries to bring some kind of magical power is highly unlikely. Also if the AI is on-prem (or on-system for hybrid and cloud) since almost all business data is not for public scrutiny by the DB provider, I do not believe that the quality of any AI in 2024 will be good enough to allow it to work changes on queried under my responsibility area. If AI's with online connection keep messing up all the time, how would a non-online one be magically better.

    Imagine this: you buy something like remedy. Then some queries totally fail. Then you analyze them and report to the software vendor and they issue an emergency patch.

    Imagine your AI tuning the queries towards any DB. No comments, no documentation just "AI knows it". Imaging trying to find what the actual problem is when something goes wrong.

    Versioning? Timelines? When did the AI decide to run the query like that? Why? yes it was faster but it fucked up some data....

    Personally as service owner for various database services and environments, I would not like to be the one trying to explain this to my stakeholders.

    So until "proven" AI needs to be an option, and NOT priced into the software since "it will automagically make everything better". I will just recommend the stakeholders to stay on the old, or migrate to another database without AI or as a minimum without the cost of AI.

    1. fg_swe Silver badge

      Re: AI optimizing SQL err well

      You can still run EXPLAIN to see the generated plan, I guess.

      The AI will not be fully freewheeling...

      1. lfinnie

        Re: AI optimizing SQL err well

        Yes, EXPLAIN will still show the plans. You can also selectively turn on/off the AI/ML based part of query optimization, so you can easily see what impacts it has on plan selection (e.g. try before+after experiments). Initially, a model is trained based on the RUNSTATs sample to learn correlations in the data. This model can then be used during query compilation to estimate the selectivity/cardinality of multiple local table predicates, taking into account any learned correlations in the data. This often leads to better cardinality estimates (that's really where ML techniques shine, finding hidden correlations in data, and correlations between column predicates can have a big impact on estimated plan costs), which helps the optimizer select better plans, which can improve query runtime.

  4. fg_swe Silver badge

    Any True DB/2 Experts Around ?

    So far I can only read generic nonsense in this forum.

    What I can say:

    1.) Query Plan Optimization is a Hard Problem. MySQL fails badly in this aspect for complex queries.

    2.) For many queries it would be too runtime-expensive to evaluate all possible solutions; heuristics are used.

    3.) Maybe AI can add "novel heuristics".

    4.) Any query plan, runtime-efficient or not, will produce the same result. AI is not messing with results.

    5.) Adding further indices to a DB schema will quickly be a double-edged thing, as index maintenance will also consume serious runtime.

    1. that one in the corner Silver badge

      Re: Any True DB/2 Experts Around ?

      > 1.) Query Plan Optimization is a Hard Problem.

      Very true. Optimisations are, on the whole, hard problems: just ask any Travelling Salesman.

      Which is a damn good reason for being careful about what methods you apply to solve them.

      > 2.) For many queries it would be too runtime-expensive to evaluate all possible solutions; heuristics are used.

      Yup, such as (in common with other large search space problems) setting up fascinating search structures and pruning them, much (most? all?) of which can be traced back to the AI labs. You know, there was a time when any discussion of heuristics meant you were in a discussion of the AI scene[1][2] - what a shame those two letters are being relegated to just the one thing (yeah, yeah, fighting a losing battle here). Heuristics seen as part of the day-to-day? As noted before "if it works, it isn't AI anymore" - anyway, as you note, heuristics are already used within plan optimisers (just as they are within other code optimisers).

      So from that p.o.v. "applying AI (techniques) to the plan optimisation problem" is - in computer terms - a very old strategy; hardly worth mentioning any more, in fact.

      > 3.) Maybe AI can add "novel heuristics".

      Looking for novel heuristics is a Good Thing - and doing that, plus novel ways of finding novel heuristics - is a good AI topic.

      Just - is it the topic that the LLMs[3] are really suited for? Two immediate concerns arise: first, the resource costs[4] to run these beasts.

      Second, the (by now) old issue of LLMs happily generating good looking nonsense: that is certainly novel, it also happens to be total twaddle - and applying it within the plan optimiser is rather worse (IMO) than simply (as I half-seriously described it above) applying it to generate junk SQL, from the p.o.v. of your poor beleaguered human DBAs and devs spotting what is going on. Would you really want to trust a heuristic so novel that it was only created in the last few seconds, hasn't been tested out on anything at all yet, may not even ever be run on another query (the "AI" deciding to do something different next time)? Remember, this isn't an AI being run in a lab as part of a concerted research effort to find new heuristics and prove their worth (and limitations), it is something running around doing random things within *your* expensive database.

      > 4.) Any query plan, runtime-efficient or not, will produce the same result. AI is not messing with results.

      Presumably you mean, any query plan generated from the same input SQL? In which case, *ONLY* if you can demonstrate that the transforms performed are valid in the way they are being performed - and if you are applying some random output from an LLM as one of your heuristics, that is not going to be the case (see above). It may be a sane thing to do, it may not. Prove it. Remember that heuristics include ideas such as "we can prune this from the search because we've seen X" - if the LLM has decided to generate the novel idea that "we can prune because we've seen Y" instead, and it turns out the X case came from a boring long maths-heavy proof back in the 1990 and the Y case just sorta looks like it is following the same pattern...

      > 5.) Adding further indices to a DB schema will quickly be a double-edged thing, as index maintenance will also consume serious runtime.

      True; picking your indices is almost an art form :-) Not quite sure how that fits into this discussion; are you suggesting that the AI is likely to just keep on adding new indices, because that is a common pattern it has picked up from its training data? Which would be a bit of a reversal, as your previous points seem to be more pro-AI. Dunno; clarification required.

      [1] when the idea of applying any "rule of thumb" instead of being rigorous & provable & correct in all your outputs was a bit avant garde - but then, as now, AI trials had a bit of an issue with generating huge (for the time) data and huge searches.

      [2] rats - tried G**gling for some references to bolster my memory but half-rembered terms got auto-corrected to hit sponsored links for - nothing I wanted :-(

      [3] yes, I know they weren't specifically stated to be LLMs - but (1) the mention of "GenAI" is a pretty big hint and (2) what else is being touted around at the moment? If it really was something else then we'd have been buried under a pile of IBMese marketing speak stressing how much it wasn't just the same as everyone else's

      [4] and I'd bet there is someone in IBM following the Sunk Costs argument of "we've built the damn thing, now we have to use it absolutely everywhere"

      1. fg_swe Silver badge

        Re: Any True DB/2 Experts Around ?

        Again, I strongly assume IBM only allows the ANN AI to perform "safe" transformations of the original Query Plan. Worst outcome is bad runtime, but still correct results.

        I am also quite sure all the transformations done by "conventional heuristics" are proven to be safe.

        After all DB/2 and Oracle run billions of bank accounts !

        1. lfinnie

          Re: Any True DB/2 Experts Around ?

          Initially, the AI Optimizer does not independently apply any translations (e.g. it does not rewrite your SQL query). Internally, it is used for cardinality estimates. Bad cardinality estimates can mean bad plans get selected by the optimizer. ANNs are extremely good at finding hidden correlations in data, and so based on multiple local predicates on a table, an ANN can likely generate a much better cardinality estimate for the combined predicates (e.g. how many rows survive the local table predicates and flow into the next operator/join/etc.). Improving cardinality estimates using an ANN can thus help the Db2 Optimizer generate better plans, improving query runtime.

  5. fg_swe Silver badge

    Regarding "Dinosaur"

    I fully disagree with this notion. Advanced SQL Database are some of the most advanced systems in existence. As complex as a modern operating system. Critical for almost ANY business operation from accounting to inventory to sales statistics to analytical processing etc.

    Relational databases continue to lead the field of databases due to solid theory, good performance, good query plan optimizers and so on.

    The "new" key-value databases are merely useful for building massive collections of cat picture sharing systems for teenagers.

    1. Excellentsword (Written by Reg staff)

      Re: Regarding "Dinosaur"

      It is old, though

      1. fg_swe Silver badge

        Well

        SQL Database Servers are "as old" as Unix kernels such as Linux, MacOS X and later, iOS.

        I am quite sure there are serious teams at IBM, Oracle, MSFT developing incremental improvements of their SQL engines. I would not be surprised to learn that serious percentages of code are not older than five years.

    2. that one in the corner Silver badge

      Re: Regarding "Dinosaur"

      You do know that the dinosaurs were *incredibly* successful and totally dominated the Earth for, what, 160 plus *MILLION* years?

      And that during that time, individual species and offshoots came and went, whilst the (literally) time-tested core structure of the beasties continued to thunder across the landscape.

      Although the big and slow were killed off - and only after a truly devasting blow that destroyed all of the top-level creatures - dinosaurs are still all around us, flitting through the skies and waking us up far too early in the mornings?

      SQL ain't going anywhere - there is more SQL flying around now than there ever has been in the past, flitting through our 'phones and waking us up far too early in the mornings.

      Any tech can only *hope* be so dinosaur (idea for new tee-shirt: "Be More Dinosaur")

  6. fg_swe Silver badge

    Query Plan Optimization Details

    https://www.tridex.org/wp-content/uploads/Db2-Query-Optimization-101.pdf

    1. fg_swe Silver badge

      Re: Query Plan Optimization Details

      Now, if these flimsy web 2.0/key-value/graph "databases" contain 1/1000th of DB/2 query optimizer technology I would be surprised.

      Again, relational databases are modern day wonders. They continue to be improved by scientists and engineers, as they are still far from perfect. Most users, including software engineers, never really cared about this technology. They just assume "it works" and only complain when the optimizer fails on an especially convoluted SQL query.

      SQL database are on par with the latest jet engines, 3nm semiconductors, metal 3D printing, gas chromatography !

      1. that one in the corner Silver badge

        Re: Query Plan Optimization Details

        > SQL database are on par with the latest jet engines, 3nm semiconductors, metal 3D printing, gas chromatography !

        Absolutely.

        So should be treated with respect, not have random flavour-of-the-month "GenAI" anywhere near them.

        > They continue to be improved by scientists and engineers

        Who carefully examine and test and mathematically prove what is going on inside. Including the techniques they've acquired from the AI research going back to the 50's (as they have techniques they are using from any other field of study).

        When the current crop of LLM-based systems reach the same level of provability (INCLUDING, but not limited to, self-explanatory capabilities), *then* they should be applied to SQL.

        But by that point there won't be anything "exciting" for the IBM marketing team to use in their press releases, so The Register will just have to report "Db2 version 99 & 3/4 is now 7% faster, again, just like it was last year, snore".

        1. lfinnie

          Re: Query Plan Optimization Details

          To try to clear up some confusion - Db2's AI query optimizer is not an LLM. The models are used for a very specific task initially - cardinality estimation based on predicates across multiple columns. AI/ML is extremely good at finding hidden correlations in data, so it's a natural extension for an SQL optimizer to start using those techniques instead of standard statistics-based approaches to estimate cardinality. Accurate cardinality estimations means the Db2 cost-based optimizer can choose better access plans, which can reduce query runtime.

  7. Anonymous Coward
    Anonymous Coward

    But is the optimiser still going to give you the answer you asked for, even if it uses different steps to get to it?

    And, if so, perhaps it's more of a sign that your tables are badly normalised to begin with...

    I realise changing out architecture is rarely simple by the time 200 processes attach to your core system (which is probably the sort of thing you have that still needs db2) though it should serve to highlight the performance benefits of a reorg.

    In practise most IT outfits I see now just lift and shift stuff repeatedly onto the "new platform of marketing Bull", while never actually going back and questioning is the existing process correct; or where improvements can be made. Lot of money for a lot of nothing.

    1. fg_swe Silver badge

      1.) My understanding is that the query plan mangling will never change end result, but only runtime/resource consumption. AI can only be used for heuristics, similar to existing heuristics in traditional optimizers.

      2.) I have seen large enterprises where they run convoluted queries and still wait for hours in queue to get results. The business questions are similar to "how many millions did we lose in winter, in the US, because a component was not that great ?". There exist corporations who really need large scale relational databases to better understand their core business. Much more than just a funny CMS.

  8. Steve Channell
    Facepalm

    DB2 has included "DB2" since 1983!

    IBM released DB2 in 1983, four year after Oracle because it was working to an advance artificial intelligence called "cost based optimizer". CBO might not be as sexy as AI, but arguable that is what it was

    1. fg_swe Silver badge

      Elaborate

      According to your terminology, an optimizing compiler also includes "AI" ?

      1. that one in the corner Silver badge

        Re: Elaborate

        > According to your terminology, an optimizing compiler also includes "AI" ?

        Yes (see my other comments), but "If it works, it isn't AI" in the common man's eye.

      2. Steve Channell
        Pint

        Re: Elaborate

        Cost Based Optimization takes a parse tree in a language we like to call "structured" query language - a LLM does the same (the prompt is parsed into a language tree

        It generates permutations of query plans using a stop check that estimated cost is lower than the cheapest plan generated to so far

        Cost is estimated using heuristics (generated from hard simulation of the of cost of access paths), and statistics of the cardinality of data and where the predicates fall within them

        Oracle (prior to v6) used rule based optimization that favored unique indexes, then non-unique indexes, then order of tables in the query - not particularly intelligent.

        Granted CBO is more like the Amygdala that higher cerebral functions, but so is current AI.

        Alas we live in an age where "an algorithm" that looks at recent price history to make buy/sell decisions is now branded as "an AI" - If you can call a price algo AI, or shopping "people who bought toys also bought batteries" agent AI, then you certainly can call CBO AI.

        IBM spent years training CBO to be reliable.. but (as with Oracle RBO) early punters were happy to go with "it's like magic"

        1. lfinnie

          Re: Elaborate

          To take this a step further... a CBO traditionally uses statistics to estimate things like cardinality (number of rows flowing into or out of an operator, such as a table scan after applying local predicates from the SQL query, or rows flowing out of a JOIN operation between two tables). Even with good statistics, estimating cardinality is very hard - e.g. if you have multiple predicates, how can you determine how many rows will be filtered out by those predicates? What if the affected columns are heavily correlated? Partially correlated? Not at all correlated?

          AI/ML techniques are very good at finding hidden correlations in data. Initially, the AI query optimizer is used by the CBO to improve cardinality estimation - using AI/ML for what it's really good at, and tying that into the decades of research and improvements in the Db2 CBO. When there are multiple predicates in your query, the model uses the discovered correlations in the data to provide better estimates for the joint predicate cardinality. Improved cardinality means the CBO can do a better job evaluating different potential access plans, and can more accurately choose a better access plan, which can result in faster query runtime execution.

          Hope this is helpful to de-mystify the 'magic' a little bit.

    2. Michael Wojcik Silver badge

      Re: DB2 has included "DB2" since 1983!

      CBOs are machine-learning algorithms, inasmuch as they rank candidate query plans based on statistics collected from the DBMS. I don't see much justification for referring to them as "AI", even as that term was used in the era when CBOs were first being developed and deployed.

      Of course, by the same token, while I can see there might be some advantage in using stacked convolutional and/or recurrent ANNs in evaluating candidate plans, and even possibly using DL stacks for plan generation (though such plans would have to be formally verified, which is not a trivial problem), none of that seems like "AI" even in any of the loose senses in which that term has been used. So, frankly, this seems like pure marketing guff — calling anything that uses ANNs "AI". Query optimization, in any sort of reliable fashion, is even less "AI" than text generation with transformers or image generation with diffusion.

      1. fg_swe Silver badge

        Re: DB2 has included "DB2" since 1983!

        1.) If the AI is only allowed to perform "safe" transformations of the query plan, there is limited worry regarding AI "correctness". No sane software engineer would grant an AI "full authority query plan transformation". AIs are "mostly correct", not "perfect".

        2.) ANNs might indeed be a new type of heuristic for query plan optimization. Much more "rule of thumb based" than "statistically, logically based". It will prolly work nicely on standard scenarios and will prolly fail badly on special, extreme cases.

        1. that one in the corner Silver badge

          Re: DB2 has included "DB2" since 1983!

          > No sane software engineer would grant an AI "full authority query plan transformation". AIs are "mostly correct", not "perfect".

          Depends upon:

          1) What his manager tells him to do

          2) What marketing & sales told his manager to do

          3) Whether the software engineer actually deeply understands what he is doing in terms of the maths[1]

          4) Whether he actually gives a shit or is just in it for the pay packet this month[2]

          > AIs are "mostly correct", not "perfect".

          Grr. LLMs are vaguely correct. LLMs are not all of AI (this hill also has a lovely freshwater stream, you really should try it)

          But you are correct insofar as no sane, properly qualified to redesign and change deep behaviour of an SQL engine, isn't just trusting his system architect, respects his users, is working for the long-term stability of the product, isn't doing a "feature" that could well be quietly deprecated (excised) under cover of the next round of flashy marketing ideas, software engineer would let an LLM have full authority over the transforms.

          [1] not - necessarily - a slight against the softie; many, many devs will call a library that they don't understand the maths of, and don't *need* to understand (I have seen people call zlib and libjpeg without knowing about Huffman or DCT; I know, it is a crazy world); and wild things have happened because, to their level of understanding, it was a quite reasonable to join those two together, they'd been using them for ages without any problems...

          [2] yes, I do mean to slight that sort; live with it (having seen that expressed more than once over the last few days)

        2. that one in the corner Silver badge

          Re: DB2 has included "DB2" since 1983!

          > Much more "rule of thumb based" ... . It will prolly work nicely on standard scenarios and will prolly fail badly on special, extreme cases.

          Yes, that is a description of a heuristic; any heuristic, old or new.

      2. that one in the corner Silver badge

        Re: DB2 has included "DB2" since 1983!

        > I don't see much justification for referring to them as "AI", even as that term was used in the era when CBOs were first being developed and deployed.

        Sigh, it was ever thus: "If it works, it isn't AI"

        The learning techniques were all derived as AI research and still deserve to be referred to as such (this is a very nice hill, with meadow flowers - come, join me).

  9. fg_swe Silver badge

    Other Approaches

    Safely Transform Query Plan by

    +Monte Carlo

    +Genetic Mutation

    +Simulated Annealing

    1. that one in the corner Silver badge

      Re: Other Approaches

      > +Genetic Mutation

      A technique firmly rooted in the AI labs (note: AI, not LLM, labs).

  10. teknopaul

    Migrating db2 to postres this month.

    It's politics, natch.

    But using db2 in the first place was arranged in the golf course.

    Please

    1. fg_swe Silver badge

      Application ?

      Elaborate.

  11. Anonymous Coward
    Anonymous Coward

    A lemon tree

    > IBM promises an AI-powered query optimizer, which will "allow Db2 to continuously learn from customer's queries and achieve up to three times query performance improvement over prior version,"

    What a great idea. I'm quite surprised that IBM didn't develop an AI system earlier and incorporate it into DB2 or any of their other products for that matter. Or if they did they kept very quiet about it.

  12. Fruit and Nutcase Silver badge
    Alert

    Optimistic AI

    https://www.ibm.com/topics/ai-hallucinations

    "AI hallucination is a phenomenon wherein a large language model (LLM) ... perceives patterns or objects that are nonexistent or imperceptible to human observers, creating outputs that are nonsensical or altogether inaccurate."

    What happens when the AI optimisation is... suboptimal?

    Pessimistic Me

    1. fg_swe Silver badge

      Re: Optimistic AI

      Then you will have slow runtime. Still correct results.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like