back to article Cursor AI's own support bot hallucinated its usage policy

In a fitting bit of irony, users of Cursor AI experienced the limitations of AI firsthand when the programming tool's own AI support bot hallucinated a policy limitation that doesn't actually exist. Users of the Cursor editor, designed to generate and fix source code in response to user prompts, have sometimes been booted from …

  1. DS999 Silver badge

    Two possibilities

    1) they changed their policy, customers hated it, and they pulled it back and blamed it on the AI

    2) they trained their AI using customer service chats (maybe their own, maybe some third party set) and it "learned" on its own that making stuff up and saying "that's how it is supposed to work" is how customer service reps close tickets

    Neither is particularly good, but I'm curious if we'll be able to find out which it is.

    1. Doctor Syntax Silver badge

      Re: Two possibilities

      My money's on 1. First, a change made it happen and than the bot said it was policy. William of Occam would suggest that they were two consequences of a single decision rather than two separate happenings.

      1. Richard 12 Silver badge

        Re: Two possibilities

        It's 2, almost certainly.

        This is a policy that some companies have.

        The LLM is simply regurgitating vaguely related parts of the training data. That's what LLMs do. It's what they are.

        Add that to the fact that AI models are fundamentally chaotic - which means they will give radically different answers to equivalent but not identical questions, and we have a complete explanation.

        The proper action is of course for them to return the LLM as unfit for purpose, sue the supplier for misrepresentation and never do that again.

        1. nownow

          Re: Two possibilities

          Return the LLM and sue the supplier for misrepresentation...? Well, this is getting interesting. Headed back to the future with deterministic support bots built in-house.

      2. doublelayer Silver badge

        Re: Two possibilities

        I'll take the other side of that and bet on 2. Customer says they're receiving a message, is this correct? Most of the time, the answer to that is yes. Throw that question at a bot told to never say it doesn't know the answer and to pick the most likely result, you'll often get the answer yes. Similar with consumption limits. Tons of services have those. This service probably has them too, just on different aspects, so the bot has some data from its training data saying that there is indeed a limit. Parrot that back with incorrect context and you've got a plausible reply that's completely wrong. Which is why LLM support is pretty much unacceptable unless it's a voluntary opt in with warnings in front of it, in which case is it really worth building.

        1. John Brown (no body) Silver badge

          Re: Two possibilities

          It would make far more sense in this sort of situation to have a limited number of "known correct" responses which the AI is allowed to choose from rather than creating it's own idea of a "correct response". If it can't assign a probability of at least 90% to any of the allowed responses, pass it off to a human. Most customer facing help/service desks have a fairly small number of caller requests which make up the majority of requests. It really should not be a hard thing to automate mostly reliably. And the number of human curated "known correct" responses can more easily grow as the human support team can see the numbers more clearly on the next level of complexity when the "simple" ones are no longer reaching them.

          After all, most of the "easy ones" are already dealt with on the support websites FAQ, aren't they? So limit the LLM to identifying the correct response from that document or giving up.

          1. Anonymous Coward
            Anonymous Coward

            Re: Two possibilities

            That isn't how an lllm works though, it's not choosing between possible answers, it's just generating text. What you describe is possible to build, but it requires you have the correct answer to every problem available up front, and the minute you reintroduce an LLM to generate text around those answers the minute your back to it ignoring the answers you give it and making up it's own.

            If you want to constrain behaviour to specific options, an LLM is the wrong technology.

          2. doublelayer Silver badge

            Re: Two possibilities

            It would, which is why I called the LLM version "unacceptable". It's far from new though. Lots of companies have had the help bot where you type your question, it runs some algorithm against the FAQs, and sends you to one that often has nothing to do with your problem. Maybe the LLM can more accurately connect queries to canned answers.

    2. CowHorseFrog Silver badge

      Re: Two possibilities

      You forgot the 3rd option, AI is bullshit and shouldnt be trusted because random shit happens.

  2. doublelayer Silver badge

    A great advertisement for their service

    Meanwhile, by having an LLM hallucinate a support result, I hope they have demonstrated the randomness of these models to their customers. Were this a normal business, that would just show that the company is lazy in a way that can ruin your experience. But this company isn't normal, it's one that uses AI to help with programming. I hope the customers are now thinking the obvious question: if it messes up this badly with a simple support question, what is it doing to the code I give it?

  3. TheMaskedMan Silver badge

    "I hope the customers are now thinking the obvious question: if it messes up this badly with a simple support question, what is it doing to the code I give it?"

    Hmm, no. You could, and should, be able to read, understand and spot the errors in code it returns; if you can't, you really shouldn't be using it in the first place. These things are time savers, and very useful in that context, but relying on their output to be correct at all times is a recipe for disaster.

    What the customer shout be thinking is, do I really want to build something mission critical that might sprout random bollocks at any time? As supervised time savers, these things are fine. As unsupervised front line representatives of your organisation, not so much.

    1. John Brown (no body) Silver badge

      "Hmm, no. You could, and should, be able to read, understand and spot the errors in code it returns; if you can't, you really shouldn't be using it in the first place. These things are time savers, and very useful in that context, but relying on their output to be correct at all times is a recipe for disaster."

      Does that save time in the real world? Are they that good now? I've always found it much harder to read someone elses code than mine, so validating that the LLM produced good code and/or fixing it could take longer than just writing it myself.

      1. Anonymous Coward
        Anonymous Coward

        Data seems to show that when used well (developers that know when it's worth looking at suggestions and when it's worth ignoring them) they can save about 10%. That's worth having but no bigger than other innovations in software engineering such as IDEs, automated testing or a relevant library.

    2. doublelayer Silver badge

      You should be able to and you should do it, but in my experience, neither of those shoulds are done frequently enough that you can count on it. It is far too easy for bugs to slip through code review. It's still worth doing it, because sometimes a bug that the writer of the code can't see for the life of them jumps out immediately to a reviewer, but it doesn't always happen. Meanwhile, a lot of people with experience are used to reviewing code written by someone who has already eliminated the obvious bugs because those wouldn't have run right, experience that isn't helpful when they're reviewing code that could have any size of bug in it. There are also plenty of people who don't review code as thoroughly as they need to because they prefer speed. I admit that most of these negatives would probably be correlated with people who don't care that the LLM produces wrong results, but there may be some people who realize they're not going to do the testing necessary to confirm that the output wasn't riddled with bugs.

  4. b0llchit Silver badge
    FAIL

    AI asylum

    ...hallucinations cannot be stopped, though they can be managed.

    We used to deport the hallucinating to the asylum. Time to build an AI asylum to let them (bit) rot there.

  5. Pascal Monett Silver badge

    I say this is a good thing

    Let customers understand that AI is shit.

    Once enough people have understood that it is not reliable, they'll be clamoring for proper people at the front line again and AI will be confined to tings it can properly deal with, like detecting unusual variability in start brightness or, eventually, detecting early signs of cancer in patient's radioographies (which, contrary to chatbots, are always confirmed by an actual doctor).

  6. JimmyPage
    Terminator

    "HAL, Open the pod bay doors"

    "I'm sorry dave. I can't do that."

  7. MOH

    Can we please stop using the marketing term "hallucinate" and replace it with the correct technical term, "return garbage"?

    1. m4r35n357 Silver badge

      or "heat up your room & return garbage".

    2. Anonymous Coward
      Anonymous Coward

      Hallucinating Intelligenge

      Just reclassify it to HI. Bye Bye

      .

  8. Eclectic Man Silver badge
    Coat

    But was it as helpful as ...

    MicroSoft Help?

    The world needs to know.

    (Confession- I bought a newAaMcBook, and simply could not figure out why I could not 'drag and drop' like usual. The Apple 'help' was no help. I went into the settings menu etc. and found out. Seems you have to set up the trackpad to allow dragging and dropping. I am probably abysmal at finding things in the help function, so it might be there after all.)

    I'll get my coat, it has a Sinclair Spectrum coding manual in the pocket.

  9. 0laf Silver badge

    Interesting. The Air Canada decision in 2022 would imply that the company utilising the AI will be liable for the advice and statments it makes. So in theory the AI could make up policy on the fly via hallucination.

    It would certainly be prudent for anyone to keep a copy of any significant chat interation if they can, but I would expect we'll soon see court cases of "I said, it said" when the business does not keep or allow custoemrs to have chat transcripts.

    1. Eclectic Man Silver badge

      Regarding chat transcripts, you can always try screenshots, but in any case if they keep a transcript, at least in the UK and the EU, you can use GDPR Subject Access Request to obtain the details. I suspect that a court would take a dim view of any company that used an AI chatbot to manage customer contact and failed to keep the full transcript. Not like they take up much storage space.

      1. 0laf Silver badge

        I would hope that they would also put the onus on the supplier to maintain transcripts as appropriate and not leave customers relying on clunky screenshots as form of insurance just in case the AI is talking bollox on this occasion

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like