back to article Remember the OpenAI text spewer that was too dangerous to release? Fear not, boffins have built a BS detector for it

If you were terrified by the news that "Elon Musk-backed scientists created an AI text generator that was too dangerous to release" then here’s something that may soothe your fears. Last month, OpenAI published a paper describing a machine-learning-based language system that could crank out what, at first glance, appeared to …

  1. b0llchit Silver badge
    Childcatcher

    English, bad English and AInglish

    At what stage will the AI perform equally bad as the majority of non-native English speakers? It seems to me that writing bad grammar or choosing the wrong words is not a sole domain of AI.

    In due time, all our poor little non-native English students will be marked as AI-bots. Well, they are, in fact, just like AI-bots. Learning the language by making obvious mistakes and poor choice of words to the native speaker. Good times ahead for auto-checkers and lazy teachers. Bad times ahead for our poor students.

    1. A.P. Veening Silver badge

      Re: English, bad English and AInglish

      The real difference between native and non-native speakers isn't in the vocabulary or spelling, but in specific parts of grammar, especially things like "their", "there" and "they're". And for some strange reason, non-native speakers are usually better at those while native speakers are usually better with "its"/"it's".

      1. Anonymous Coward
        Anonymous Coward

        Re: English, bad English and AInglish

        The real difference between native and non-native speakers is that non-native speakers are far more likely to have been taught English properly.

        1. Spoonsinger
          Paris Hilton

          Re: been taught English properly

          agreed, but not how to use it properly. Now if you'll excuse me I'm off to moisten some lettuce.

        2. Anonymous Coward
          Anonymous Coward

          Re: English, bad English and AInglish

          "non-native speakers are far more likely to have been taught English properly."

          I don't know how it is in your part of the world, but here in southern England, my kids are being taught grammar a lot more rigorously than I ever was. Speaking solely for myself, I'm not entirely convinced this is a good thing, because it's time that could have been dedicated to analytical or creative thinking instead.

          1. Anonymous Coward
            Anonymous Coward

            Re: English, bad English and AInglish

            Congratulations to your schools if children are actually being taught grammar at all, although for your other point am I really going to have to quote Wittgenstein's Tractatus to counter it?

            1. Anonymous Coward
              Anonymous Coward

              Re: English, bad English and AInglish

              Yes.

        3. Anonymous Coward
          Anonymous Coward

          Re: English, bad English and AInglish

          ::to have been taught::

          ugh. mestupid bot. that's quite near to perfection

          wanted to put a fiver in the envelope, but had glued it already

    2. Primus Secundus Tertius

      Re: English, bad English and AInglish

      My experience with sandwich students and junior engineers is that their conversation matches their colleagues. Their written work, however, often showed limited vocabulary, and the use only of short simple sentences.

      The ones who could write were usually the ones who had been to private schools rather than state schools.

    3. katrinab Silver badge

      Re: English, bad English and AInglish

      If you understand the non-native speaker's native language, then you will usually understand why they made the mistakes they made.

      For example, if I have garbled English from an Italian, then I will do a literal translation of the word / phrase back into Italian, and look at what they really mean. For example: Secondary Seat -> Sede Secondaria -> Branch Office

      An AI bot speaking in garbled English probably isn't doing a literal translation of something that makes perfect sense in another language, so that is how you would tell the difference.

    4. Michael Wojcik Silver badge

      Re: English, bad English and AInglish

      We've had adequate machine-generated English prose for years - in fact, for nearly two decades. Philip Parker's system has generated thousands of published works under the ICON Group imprint since his patent was granted in 2007; he'd been working on it since around the turn of the century. That's thousands of computer-generated books that people have paid money for. And who knows how many his private report-generating concern EdgeMaven Media has produced on top of that.

      It's hard to know exactly how many distinct works have been churned out by Parker's system, since most are print-on-demand and many of the listings may refer to books that have never been generated. According to media reports, ICON Group will list a title when they believe they have the data necessary to create the associated book.

      These books are, admittedly, rather dry. But it seems they're perfectly readable.

      Then we have Narrative Science, which for several years has produced computer-generated articles for major magazines. The AP wire service has used NLG (Natural Language Generation) software from Automated Insights to generate wire articles since 2014, and apparently Yahoo uses Automated's tech to produce recaps for fantasy football.1 And so on.

      NLG is actually quite easy within many constrained domains. But these NLG systems work quite differently from complex NN architectures which are attempting to generate realistic English prose based on unsupervised learning. Comparing them is a bit like comparing a table saw to a humanoid robot trying to teach itself to cut lumber using a handsaw.

      But that said, we have quite effective technology for automatically generating realistic, useful natural-language prose, and have had for some time.

      As for the point of your post: You'd really want a methodologically-sound study conducted by linguists with expertise in the appropriate areas for this, but I think your assumptions aren't well-founded. I've known a number of ESL (English as a Second Language) writers, and a number of TESOL professionals; and I've read a bit of TESOL theory and research. Second-language writers don't tend to be algorithmic. They're as prone to unexpected word choice as they are to using the expected word, because their vocabulary is more limited (and thus less precise) and they have less familiarity with idiom. While they may get inflections and particularly irregular word forms wrong, their grammatical structures are often more limited but also more formal than those of native speakers - they have less experience to tell them when native speakers are likely to bend the rules.2

      Even a relatively simple classifier like an SVM consuming chunked input (so you get some phrase-level structure, not just unigram) could probably do a decent job of distinguishing ESL and OpenAI candidates.

      1I look forward to the day when every aspect of sports, from participation to consumption, is fully automated, and humans are not involved at all.

      2Of course there are no rules as such, but for the sake of the point we'll pretend there are.

  2. Dan 55 Silver badge
    Black Helicopters

    "an auto-ranter, perfect for 2000s-era Blogspot and LiveJournal posts"

    I don't see why you wouldn't be able to point it at newspaper article comments either, answering a reasoned article with an avalanche of truthy bullshit or supporting one of those terrible comment pieces in the Mail or something, feigning popular support for something where there is initially really very little and allowing hostile powers to cheerlead Western countries off a cliff. Sort of like 55 Savushkina Street but more automated.

    This I guess is why it was considered dangerous.

  3. herman
    Thumb Up

    Cicero

    Will it be able to handle Cicero?

    https://oll.libertyfund.org/titles/cicero-letters-of-marcus-tullius-cicero

    Neque porro quisquam est qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit...

  4. herman
    Thumb Down

    Cicero

    Will it be able to handle Cicero?

    https://oll.libertyfund.org/titles/cicero-letters-of-marcus-tullius-cicero

    Neque porro quisquam est qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit...

  5. Michael H.F. Wilkinson Silver badge
    Coat

    A simpler solution

    Auto-ranters are labour-saving devices, that can spew conspiracy theories and other fake news without us humans having to do this tedious work. Sifting through and fact-checking all this stuff is of course tedious too, and is well beyond current AI. We should therefore just believe all of it, or rather develop electronic monks, who will believe this kind of shit for us, so we don't have to do it ourselves.

    Doffs hat to the late, great Douglas Adams.

    Mine is the one with "Dirk Gently's Holistic Detective Agency" in the pocket

    1. BebopWeBop
      Happy

      Re: A simpler solution

      There is always (well almost) a good Douglas Adams or Terry Pratchett exemplar to use

    2. kernelpickle

      Re: A simpler solution

      It’ll be labor saving for everyone but the Alex Jones types of the world. With infinitely more rabbit holes being procedurally generated, it’ll be hilarious watching them chase their tails until their heads explode!

      ...but maybe knowing that it’s all just bullshit written by bots will finally make them give up their shenanigans entirely?

      Either way, I’m looking forward to watching the chaos!

  6. Francis Boyle Silver badge

    Sorry, amanfromMars

    for you ze war is over. You can take that wastepaper bin off your head now.

    1. Sgt_Oddball

      Re: Sorry, amanfromMars

      I must admit I'd be fascinated to see what this system would make of amanfrommars1.

      I for one apatheticly shrug at our new AI detecting overlord.

      1. FrogsAndChips Silver badge

        Re: Sorry, amanfromMars

        Quick analysis from his last 3 posts: about 50% in the top 10 (green), and the rest vaguely equally distributed between the other 3 bands. AI (Alien Intelligence) detected!

        1. Francis Boyle Silver badge

          I wasn't (just) joking

          amanfromMars score remarkably low in the bot rating (mostly yellow).

          1. Anonymous Coward
            Anonymous Coward

            Re: I wasn't (just) joking

            Well surely his programmers are aware of this recent development, and made some changes to the AI to avoid the some of the things that trigger the evaluation as "bot".

          2. amanfromMars 1 Silver badge

            Re: I wasn't (just) joking

            amanfromMars score remarkably low in the bot rating (mostly yellow). .... Francis Boyle

            You need to run those tests again, FB. Those results of yours are not real and blatant misinformation..... aka BS.

            1. Jamie Jones Silver badge

              Re: I wasn't (just) joking

              You're saying you are a bot?

              1. amanfromMars 1 Silver badge

                Re: I wasn't (just) joking

                You're saying you are a bot? .... Jamie Jones

                No ..... that's something you're asking of an alien phorm with an Earthly presence? Do bots do introspection?

  7. Crypto Monad Silver badge
    Boffin

    Dr Jorge Pérez

    Sad that even those working in the field of AI don't know how to handle Unicode encodings (let alone unicorns).

    >>> "Dr Jorge Pérez".encode("latin_1").decode("utf_8")

    1. Michael Wojcik Silver badge

      Re: Dr Jorge Pérez

      Dear sir,

      I must complain in the strongest possible terms about the output of your text-generation system. Many of my friends are Spanish, and only a few have the copyright symbol in their surnames.

      Yours sincerely,

      Brigadier Sir Charles Arthur Strong (Mrs.)

  8. FrogsAndChips Silver badge

    Thanks, but who would use it?

    Do we really expect the Facebook, BuzzFeed et al to put their content through this tool before publishing, or even after to flag it as 'probably AI-generated BS'? Just like they do very little to filter illicit content, why should they do anything about this? And once it's published, shared and re-tweeted, good luck trying to get that horse back in the stable, people believe what they want no matter how dubious and easy to refute.

    1. Crypto Monad Silver badge

      Re: Thanks, but who would use it?

      > Do we really expect the Facebook, BuzzFeed et al to put their content through this tool before publishing,

      No, because it would instantly become as (in)effective as spam filters. Generators of this content would easily be able to manipulate it to include more red and yellow words and boost their score.

    2. nematoad
      Happy

      Re: Thanks, but who would use it?

      Looking at the reports of the twitterings of a certain Donald J Trump I would have thought that the answer was obvious.

      Just this quote from the article should explain why I say that:

      "...a lot of repetition, garbled grammar, and contradictions."

  9. GunstarCowboy

    All this spells is the end of credibility for online bullshit.

    1. BebopWeBop
      Thumb Down

      I don't think so. Too many human generated bullshitters - your AI really does need to have a mechanism for approximating understanding of the posts.

  10. Pirate Dave Silver badge
    Pirate

    Missed opportunity

    "They’ve called their kit Giant Language model Test Room, or GLTR for short."

    Should have dubbed it TL;DR - Twitter Language; Die Robot

    1. Neoc

      Re: Missed opportunity

      Should have nicknamed it Gary.

      Yeah, yeah, I know, I'm dating myself. Not that anyone else would (badum tish)

  11. Where Did All The Usernames Go

    Gun, meet Foot

    "In effect, the MIT-IBM-Harvard duo have turned GPT-2 117M on itself."

    Nope! Just the opposite. They've helped perfect it. What better way to filter out poor candidates from the GPT output than to filter it through GLTR?

    Do people really never think through what they're doing?

  12. katrinab Silver badge

    For now

    Now the bot will run its rants through this script and make sure to add loads of purple words, even if it has no idea what they mean or whether they are appropriate.

    1. FrogsAndChips Silver badge

      Re: For now

      The bot can pass (some) human BS detectors because it uses likely words. Start throwing in random words and people will be able to detect artificial content much more easily.

      1. quxinot

        Re: For now

        I like to masturbate long words into my phrases even if I don't know what they mean.

    2. Michael Wojcik Silver badge

      Re: For now

      Now the bot will run its rants through this script and make sure to add loads of purple words, even if it has no idea what they mean or whether they are appropriate.

      Covered in the article. Using the output of GLTR as a goal is almost certain to produce worse evaluations from human judges, because it will create less-plausible prose at the diction level, even as it moves the output closer to an information-entropy signal that's more natural.

  13. JohnFen

    I wasn't afraid

    It was connected to Musk, so I simply assumed that the hype far exceeded the reality.

    1. Michael Wojcik Silver badge

      Re: I wasn't afraid

      Even if their claims were accurate it wasn't worth worrying about. On one hand, we've had high-quality machine natural language generation for years in particular domains where there are strong economic incentives, such as technical reports, product reviews, and sports and financial reporting. On the other, so much human-generated text on a particular subject is already too voluminous for any one reader to adequately sample, while being of vanishingly small value.

      As I suggested in my other post, the OpenAI approach is different from what's used in most or all commercial NLG. Consistently producing NLG prose on arbitrary subjects from a general-purpose, unsupervised-learning ML architecture would be something of an evolutionary step. But it's not shocking and nothing is particularly at risk. Human-generated text does not have some magic attribute1 which makes it worth consuming, and various markets have already decided that machine-generated text isn't automatically not worth it.

      1I'm tempted to get into a discussion of Benjamin's theory of the "aura" here, but I realize I'd only be talking to myself.

  14. Jamie Jones Silver badge

    "There were tell-tale signs the generated words were crafted by a computer, such as a lot of repetition, garbled grammar, and contradictions"

    Haven't you ever read the writings of Trump or Brexit voters?

  15. cantankerous swineherd

    someone's invented a bullshitter!

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like