back to article LLMs can write and answer quizzes – but aren't quite ready to disrupt trivia night

A developer has put large language models (LLMs) to the test, literally, by creating AutoQuizzer – a tool that creates quizzes from text on web pages. The application was made by Stefano Fiorucci – whose day job sees him toil as a software engineer for enterprise AI outfit Deepset – and the code is available on GitHub. …

  1. steviebuk Silver badge

    Real housewives

    No, not porn. The crappy reality show that my partner loves to watch. I got ChatGPT to write 10 quiz questions on it to see if she could answer them as she knows the show so much.

    About 4 of them were questions about only one person in the show so the answer was the same 4 times, another 2 were about another person in the show so again, same answer for those 2 questions and about 3 of them the answer was wrong and one of them wasn't really a proper question.

  2. Charlie Clark Silver badge

    Really needs RAG

    The problem with the public models is that they're circular. Create a quiz with one and it's easy to create a client that can answer all the questions.

    With RAG you can use the models to create questions with documents and data that clients don't necessarily have. You can see how copyright holders might use this approach to get on board.

  3. PB90210 Silver badge

    I came across a page on The Poke showing ChatGPT is crap at 'Rock, Paper, Scissors'...

    ChatGPT> I'll go first. I choose Rock

    Human> I choose Paper

    (repeat)

    1. Dan 55 Silver badge

      I got Bing Copilot stuck in a loop in three questions and I wasn't even trying. How the hell this nonsense is getting billions of dollars of investment I'll never know.

      1. Charlie Clark Silver badge

        And yet, can you remember Groupon or pets.com or Uber? Silicon Valley has become the most effective snake oil salesman the world has ever seen. Well, apart from religions that is!

  4. Anonymous Coward
    Anonymous Coward

    I stuck in the URL of this article. https://www.theregister.com/2024/05/29/autoquizzer_llm_quiz_generation/

    Questions looked good but a lot of the suggested answers were the same across questions. It scored 60% in the closed book and 60% from the web crawl.

    Could actually be fun trying to find really obscure sites and testing it.

  5. This post has been deleted by its author

  6. Anonymous Coward
    Anonymous Coward

    Not bad

    I pointed it at the online help documentation for one of our products - 750KB of HTML on a single page. The questions weren't bad, but were also all from the first few paragraphs, and remained the same when I refreshed. However it did home in on some important aspects that were described in the text, paraphrasing quite succesfully when it did. Not a bad effort.

    1. David 132 Silver badge
      Thumb Up

      Re: Not bad

      > The questions weren't bad, but were also all from the first few paragraphs

      Probably related to the “4000 character limit” the article mentions?

  7. Falmari Silver badge
    Devil

    70% correctly answered!

    "When we let LLaMa-3-8B answer the quizzes it generated, it usually answered three or four of the five questions correctly when allowed access to Google results – which isn't half bad but is, well, cheating. "

    I suppose LLaMa-3-8B only being able to correctly answer 70%* of the questions it set isn't half bad, it's just 30% bad. Still only 70% answered correctly is certainly bad, it may even be terrible.

    LLaMa-3-8B creates a correct choice and 3 incorrect choices as answers to the questions it creates. If LLaMa-3-8Bs chooses what was generated as the correct choice and the wrong answers are because the correct choice was wrong that's just bad. But if LLaMa-3-8B is wrong because it choose one of the 3 incorrect choices that's terrible. Because if a model can't generate the same answer to the same question it's useless.

    LLaMa-3-8B creates a question and it's answer and 3 statements that are not the answer. Then when LLaMa-3-8B is asked the same question it fails to choose the same answer. That's terrible, not because 30% are wrong, but because it gives different answers to the same question. How can you trust a model that gives different answers to the same question when the model's data it was built on has not changed.

    * 3 or 4 correct answers out of 5, I took the middle ground 70%

  8. Anonymous Coward
    Anonymous Coward

    Correction

    "The app uses Deepset's open source framework Haystack to extract text from a specified page" ... Haystack doesn't have that option as far as I can tell, so I checked the Autoquizzer source to see what it really does. The part that extracts text from a specified page calls, not Haystack, but a library called Trafilatura. Which is also worth knowing about!

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like