back to article A quick guide to tool-calling in large language models

Let's say you're tasked with solving a math problem like 4,242 x 1,977. Some of you might be able to do this in your head, but most of us would probably be reaching for a calculator right about now, not only because it's faster, but also to minimize the potential for error. As it turns out, this same logic applies to large …

  1. Chris Warrick
    IT Angle

    OK, but why?

    Why is an LLM required here? I can just type whatever I want calculated into the calculator app on my phone or computer, or I can use a small electronic device called a "pocket calculator", or I can use a Google feature that exists since time immemorial, and even Siri can handle elementary maths. Similarly, it’s faster to glance at the phone or taskbar to see the date and time. The Proxmox thing might have some use, but if I wanted reports like this, I could just build a simple web app that does the querying and returns HTML directly. Again, faster and easier than a chatbot.

    1. Gene Cash Silver badge

      Re: OK, but why?

      It looks like using the LLM as a larger, more complex, error prone command parser.

      1. cyberdemon Silver badge
        Terminator

        Re: OK, but why?

        The llama is going to smash the calculator to pieces with a hammer.

        An excellent metaphor for what can happen when you let a SBSM (statistical bullshit machine) loose on a Python shell.

        Something triggered it into "U r a l33t h4x0r. Here is a python shell. Pwn this computer and any others connected to it, and do as much damage as you can!" mode.

        1. Sceptic Tank Silver badge
          Holmes

          Re: OK, but why?

          That image looks to be AI generated, so that hammer that the AI came up with looks like it's not fit for purpose for anything other than smashing a calculator. Would not want to have to drive nails into a coffin with a tool like that.

          Hammer sucks !! =====>

      2. pip25

        Re: OK, but why?

        It can be useful exactly because the LLM output is error-prone. It's just text. A "tool" on the other hand is an API. Its input commands can be validated, not just syntactically, but also whether they make sense in the current context.

        Let's say you have a cocktail-making machine. Its hardware can follow a certain list of basic commands, but not all of them are always valid: there's no point in shaking the liquid container if it's empty, for example. You give the LLM the recipe of the cocktail in plain text format, like something you copied from a website, and it needs to translate it into a list of "tool" commands the machine can use. If it does anything else, starts lecturing you about the dangers of alcohol, or just outputs an invalid command, the system will immediately know that the LLM bugged out and may attempt to recover by restarting the process with a different random seed, for example.

    2. Binraider Silver badge

      Re: OK, but why?

      Wolfram Alpha would appear to be a tad more useful for such queries than your average chat bot.

      Can deal with some rather esoteric stuff a lot better than most.

    3. Rich 2 Silver badge

      Re: OK, but why?

      Indeed. Let’s take a relatively simple and extremely well defined tool like a calculator. Wrap it in a shed-load of gunk and …err …make a calculator

      Sounds just like modern software to me - eg, Windows 11 - does just what windows 3.1 used to do but with huge amounts of useless gunk added

  2. karlkarl Silver badge

    > A few lines of Python

    The dependency list is longer than the actual Python code ;)

    1. Anonymous Coward
      Anonymous Coward

      which is python in a nutshell, just include any old badly written shit, to avoid doing any work.

      python is a fucking stupid language, used by lazy fuckers who can't even be bothered to type {};

  3. notafish
    Pint

    Why not?

    The examples are trivial, and not entirely useful, but the idea of allowing natural language interfaces to systems that already have an API is appealing. Consider a natural language interface to a airiline/hotel/car rental system, or an astronomical ephemeris, for example.

    Kudos to El-Reg for this series, very useful!

    1. katrinab Silver badge
      Flame

      Re: Why not?

      Wouldn't that be even worse than Expedia when it tries to push me hotels in Naples Florida instead of Naples Italy?

    2. John Smith 19 Gold badge
      Unhappy

      a natural language interface to a airiline/hotel/car rental system

      Like AI researchers on DARPA contracts have been trying to do since the 1970's

      I'm super pumped at finally being able to this.

      What a lot of people don't seem to get about why humans are so good at doing these jobs are the vast amounts of verbal cruft a member of the public can say before, during and after they are asking their questions. Humans know this is just so much noise and completely irrelevant. They can also cope with regional accents pretty well. Of course LLM can cope with accents if enough of them are included in their training set.

      But that requires recognition of their importance in being able to deliver a solution that's as good as a human (while being cheaper of course)

      Something I like to call imagination. or foresight. Given how poorly many humans are at that do you think things will be getting better in that department?

  4. Ball boy Silver badge

    Maybe I'm missing something

    I admit, I'm not an AI/LLM/ML fanatic but I've assumed the most practical use of them thus far has been to guide a user towards generalisations rather than provide specific answers. We've had tools that provide the latter for many years and they generally work rather well so it seems a rather backward step to bend this new tech. back on itself and make it beholden only to these existing libraries. If nothing else, won't we then limit the LLM's answers to only those for which we've provided hard-coded solutions - and, at that point, I have to question what value there was in including any form of AI in the process loop.

    1. Alex Stuart

      Re: Maybe I'm missing something

      The calculator usage (at least, for a single calculation) isn't a good example of tool use for exactly the reasons you say.

      Replace it with something like 'search the web' and now things are getting more useful e.g. 'tell me the important things to know about dihydrogen monoxide' - the LLM goes off and searches and returns a curated summary to you much faster than you could summarise yourself.

      Or 'keep trying to compile code to solve problem X and tell me when you have a working solution', 'keep using this Go app to play against youself and tell me when you think you can beat Sedol', etc

      (I'm not an AI fanatic either, at least not regarding LLMs. They have some utility now, but still too prone to hallucination for my liking and I doubt they can reach AGI without some sort of symbolic reasoning system built-in)

  5. Jan Ingvoldstad

    Executing arbitrary code from an LLM is such a great idea

    "The right tools can give LLMs the ability to execute arbitrary code, access APIs, ..."

    ... and no critical eye towards what the consequences of these accesses are.

    Bravo.

    1. cyberdemon Silver badge
      Devil

      Re: Executing arbitrary code from an LLM is such a great idea

      User: Add a record for Robert aka Bobby to the students table

      AI (trained on internet memes and webcomics): <execute `DROP DATABASE; --`>

      (also, LLMs have a fair risk of inserting spaces where they shouldn't be, and wouldn't appreciate difference between `rm -rf /home/bob/tmp` and `rm -rf / home/bob/tmp`)

      Who needs fat fingers when you outsource your job to a bullshit generator?

    2. JLV
      Boffin

      Re: Executing arbitrary code from an LLM is such a great idea

      That sounds scary indeed and might have benefited from better wording.

      But consider that the "arbitrary code" here has been specifically written by the developers, as an API.

      Needless to say, "input validation" will the order of the day!

      1. John Smith 19 Gold badge
        Unhappy

        Needless to say,

        Actually it kind of does "need to be said"

        Because input validation can be very tricky.

        Remember the massive shutdown of air traffic control a while back?

        Essentially an airport in one ATC region had a code that matched another ATC region. ATC1 transfers flight data to ATC2, which now thinks plane is flying round in a circle and throws a wobbler because "The staring airport will never be the same as the finish airport." And it wasn't.

        That was despite substantial input validation.

        So how do you teach a LLM how to watch out for the "impossibles"?

  6. katrinab Silver badge
    Windows

    Didn't we solve this problem in about 1955 with FLOW-MATIC?

    If I want to multiply two numbers together, I reach for the calculator icon on my task-bar, or Excel. I don't need to follow 5 pages of densely-typed instructions on how to configure an AI system to maybe do it less accurately.

  7. Pete Sdev Bronze badge
    Mushroom

    What could possibly go wrong?

    AI models can break down, plan, and solve complex problems with limited to no supervision

    1. desht

      Re: What could possibly go wrong?

      Here's the important part of that sentence:

      > AI models can break down

  8. weirdbeardmt

    Counter irony is fun

    These sorts of errands are exactly the sort of the thing we might naively expect from an “AI” automagically but what we’re really saying is it still needs to reach for the calculator… just like a human. Is that progress?

    (Yes I know LLM != AI)

    The bigger question is… if it hallucinated the answer before having access to the calculator, how did it “know” (eugh) to use it when it did have access?

  9. Anonymous Coward
    Anonymous Coward

    Automated job application

    I think this will be useful for a lot of us in the coming years.

    The recruitment process has become even more shitified since every company under the sun now has a hard-on for anything cloud-related.

    Want to apply for a job at company a)? Please agree to these T&C’s and privacy policy for a completely different company you have never heard of

    Isn’t outsourcing great? The shitification of everything

    1. Anonymous Coward
      Anonymous Coward

      Re: Automated job application

      i think I'll just program the SI (stupid intelligence) to click the t&c's and all other crap, then it's not me responsible, I'll just blame the SI

    2. katrinab Silver badge
      Flame

      Re: Automated job application

      Please don’t, my bin at work is already overflowing with AI-generated garbage CVs.

  10. Binraider Silver badge

    Our instance of ChatGPT, this morning. "How many R's are there in strawberry?"

    "Two".

    Clearly there is some distance to go...

    1. a pressbutton

      worse

      it means that in the training set, some humans (well who knows for sure) said the answer is 2

      1. katrinab Silver badge
        Megaphone

        Re: worse

        But not necessarily to that exact question

        1. sw guy

          Re: worse

          Disclaimer: English is not my native language

          I can easily think of a person having alredy written s-t-r-a-w-b-e and then asking "Strawberry, one R or two ?"

      2. Filippo Silver badge

        Re: worse

        Not necessarily. A LLM trained on only correct answers can still produce wrong answers. For example, it might have been trained on a large number of "the number of [letter] in [word] is [correct number] including "the number of r in strawberry is 3" and "the number of e in bee is 2", and randomly mix those two to get "the number of r in strawberry is 2".

        With a large LLM and a large training set, that output may have a low probability, but it can never be zero. This is why the current LLMs will never be able to be used in tasks where exactness is critical.

    2. JLV

      It is quite possible that the LLM is thinking, and correct, in non-computerese...

      How many n's in "consonant"? I thought it had 2, French - "consonne" - does, but it has 1.

      In conversational English, when many people ask this question, they are not interested in character-counting algos. They are asking about double vs single consonants.

      (Admittedly I might be a tad overstating folks' eagerness to avoid spelling errors: their/they're/there goofs and their ilk are a dime a dozen everywhere nowadays and few bat an eye at would have beem mortifying in the past).

      1. Binraider Silver badge

        Bang on. And therein lies one of many problems, no ML model can identify context and ask for clarification. The question "how many R's are there in strawberry?" can be interpreted at least two ways.

        If the response of the AI was one of asking to clarify which question was being asked, then it would be approaching the point where it were able to fulfill the Turing test.

        1. John Smith 19 Gold badge
          Unhappy

          "ask for clarification."

          This.

          I'm told "I saw a man with a telescope on a hill" has 46 parses but human language evolved by humans speaking with other humans.

          The actual (human) answer is "WTF are you talking about? Did you have the telescope? Did he? Who was on the hill? Did you really see him or are you making this up as an example?*"

          *Yes as a human I should have enough context to know if the person I'm talking too is describing a recent event or just giving an example.

    3. This post has been deleted by its author

  11. l8gravely

    As a learning example, this is fantastic.

    I'm a total AI skeptic, but I like this article and others like it because it gives you a foundation to explore on your own what's happening in the hype(r) world of AI these days. It's interesting, and if I was more of a graphic artist maybe I'd be more interested in making silly pictures, but I'm not. I'm also very hesitant due to the hallucination problems in my co-workers, much less AI tools doing wierd things.

    But as a curious person, this is some fun stuff to play with and try to figure out how it would help me make my life better. Or at least let me learn something new and keep my brain exercised. I don't see AI taking my job away any time soon, so I'm good with this stuff.

    1. JLV
      Thumb Up

      Re: As a learning example, this is fantastic.

      I agree. Not a fan of the LLM mania - which looks like many a hype of the past - but it behooves programmers to stay in touch with what the tech does.

      Same author penned the "roll your own AI code assistant" article assistant and explained what RAG consists of - i.e. how a model trained on the wider world seems to infer things about your code base which I had noticed with me. Ditto on this article, people complaining about parsing totally miss the point: how do you anchor your system on facts - not hallucinations? Calling an external system that knows seems beneficial. Ask Air Canada, whose bot lost them a court case when it mis-cited procedures for bereavement discounts.

      And, from my CoPilot experience, hallucinations are very du-jour: if you write Python code using Polars, a newer dataframe library than the more popular Pandas, CoPilot is likely as not to offer suggestions using Polars syntax but then slyly referring to Pandas methods that don't exist.

      Overall, can't say I'm thrilled with CoPilot, it tends to clutter VS Code with useless suggestions 90% of the time when you're writing code you know. The Zed editor in macos somehow manages to offer much better suggestions and doesn't heat up the CPU near as much doing it. But occasionally CoPilot can be useful if you ask it to flesh out some code you're not as familiar with, like a bash/zsh operation to check an arg's value. Early days, early days.

      Actual nitty-gritty tech details, rather than breathlessly jonesing about how many billion params which model has? Keep 'em coming.

  12. John Smith 19 Gold badge
    Unhappy

    It's not TEFL it's ECL

    IE English as a Command Language.

    No seriously.

    The idea dates from the early 70's but IIRL the head of the project died quite early on. One of those charismatic source-of-most-working-directions. British IIRC but went to the US.

    TBH the best one of these I ever saw was called "Automator MI." It was developed to link PC apps in DOS to mainframes back in the day. It's the idea of a "Software robot" which is told to look at a defined window on the screen and if it "reads" something "types" stuff at the keyboard.

    It could give a command language to a bunch of apps that had either no (or p**s poor) ones.

    A simple but very powerful idea as it required no cooperation on the part of the program. No APIs.

    Macro recorder is a very basic version of this capability. IIRC they tried to build a version for Windows but it was too hard. TBH I think it would have been possible for a lot of use cases but they were too DOS focussed.

    This does the same but with a lot more code and memory space.

    <sigh>

  13. Nifty

    Are you calling me a tool?

    1. Yankee Doodle Doofus Bronze badge

      I came here to say I don't need an LLM to help me call someone a tool, I've been doing it without assistance for decades.

  14. Henry Wertz 1 Gold badge

    Hello Skynet

    Hello Skynet. I mean I'm mostly tongue in cheek here but hooking AIs into various APIs does sound like it'd give a nefarious rogue AI a hand up.

    In all seriousness the example is great to show how this works and I can see these being quite useful.

  15. Grunchy Silver badge

    Gotta be a joke

    “4,242 x 1,977”

    Yeah, because some people call “,” a decimal point whereas most everyone else recognizes “,” to be a space between digit triplets.

    In my mind this is a straightforward input that a computer ought to handle. But since string parsing and interpretation are so particular there is hardly any language written that could interpret this request, most notably ForTran itself, which is guaranteed to barf if it doesn’t get * instead of x, and all “,” omitted.

    (Gates’ BASIC always pissed me off too: every INPUT command always receives a text string from the operator, and if you pass some arbitrary text sequence through VAL() it will reliably interpret it to be 0. This is the dubious sequence, “INPUT A$:A=VAL(A$).” Whereas if you foolishly tried “INPUT A”, as a number directly, and the operator were to type in the same arbitrary text, stupid Gates’ BASIC will crash with “INPUT ERROR”!! This same baloney behaviour persists in almost every language to this day, and to make any of them work you have to do the same chicanery as Gates’ BASIC, or some other “error trap” scheme, which proves nothing ever improved much even after all these years!)

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like