Remember the OpenAI text spewer that was too dangerous to release? Fear not, boffins have built a BS detector for it • The Register Forums

Monday 11th March 2019 07:30 GMT b0llchit

English, bad English and AInglish

At what stage will the AI perform equally bad as the majority of non-native English speakers? It seems to me that writing bad grammar or choosing the wrong words is not a sole domain of AI.

In due time, all our poor little non-native English students will be marked as AI-bots. Well, they are, in fact, just like AI-bots. Learning the language by making obvious mistakes and poor choice of words to the native speaker. Good times ahead for auto-checkers and lazy teachers. Bad times ahead for our poor students.

3 0 Reply

Monday 11th March 2019 08:21 GMT A.P. Veening

Re: English, bad English and AInglish

The real difference between native and non-native speakers isn't in the vocabulary or spelling, but in specific parts of grammar, especially things like "their", "there" and "they're". And for some strange reason, non-native speakers are usually better at those while native speakers are usually better with "its"/"it's".

3 0 Reply
1. Monday 11th March 2019 08:42 GMT Anonymous Coward
  
  Re: English, bad English and AInglish
  
  The real difference between native and non-native speakers is that non-native speakers are far more likely to have been taught English properly.
  
  11 0 Reply
  1. Monday 11th March 2019 09:14 GMT Spoonsinger
    
    Re: been taught English properly
    
    agreed, but not how to use it properly. Now if you'll excuse me I'm off to moisten some lettuce.
    
    2 0 Reply
  2. Monday 11th March 2019 11:27 GMT Anonymous Coward
    
    Re: English, bad English and AInglish
    
    "non-native speakers are far more likely to have been taught English properly."
    
    I don't know how it is in your part of the world, but here in southern England, my kids are being taught grammar a lot more rigorously than I ever was. Speaking solely for myself, I'm not entirely convinced this is a good thing, because it's time that could have been dedicated to analytical or creative thinking instead.
    
    0 1 Reply
    1. Monday 11th March 2019 20:39 GMT Anonymous Coward
      
      Re: English, bad English and AInglish
      
      Congratulations to your schools if children are actually being taught grammar at all, although for your other point am I really going to have to quote Wittgenstein's Tractatus to counter it?
      
      1 0 Reply
      1. Monday 11th March 2019 23:23 GMT Anonymous Coward
        
        Re: English, bad English and AInglish
        
        Yes.
        
        1 0 Reply
  3. Tuesday 12th March 2019 06:30 GMT Anonymous Coward
    
    Re: English, bad English and AInglish
    
    ::to have been taught::
    
    ugh. mestupid bot. that's quite near to perfection
    
    wanted to put a fiver in the envelope, but had glued it already
    
    0 0 Reply
Monday 11th March 2019 11:50 GMT Primus Secundus Tertius

Re: English, bad English and AInglish

My experience with sandwich students and junior engineers is that their conversation matches their colleagues. Their written work, however, often showed limited vocabulary, and the use only of short simple sentences.

The ones who could write were usually the ones who had been to private schools rather than state schools.

2 0 Reply
Monday 11th March 2019 15:36 GMT katrinab

Re: English, bad English and AInglish

If you understand the non-native speaker's native language, then you will usually understand why they made the mistakes they made.

For example, if I have garbled English from an Italian, then I will do a literal translation of the word / phrase back into Italian, and look at what they really mean. For example: Secondary Seat -> Sede Secondaria -> Branch Office

An AI bot speaking in garbled English probably isn't doing a literal translation of something that makes perfect sense in another language, so that is how you would tell the difference.

0 0 Reply
Wednesday 13th March 2019 21:32 GMT Michael Wojcik

Re: English, bad English and AInglish

We've had adequate machine-generated English prose for years - in fact, for nearly two decades. Philip Parker's system has generated thousands of published works under the ICON Group imprint since his patent was granted in 2007; he'd been working on it since around the turn of the century. That's thousands of computer-generated books that people have paid money for. And who knows how many his private report-generating concern EdgeMaven Media has produced on top of that.

It's hard to know exactly how many distinct works have been churned out by Parker's system, since most are print-on-demand and many of the listings may refer to books that have never been generated. According to media reports, ICON Group will list a title when they believe they have the data necessary to create the associated book.

These books are, admittedly, rather dry. But it seems they're perfectly readable.

Then we have Narrative Science, which for several years has produced computer-generated articles for major magazines. The AP wire service has used NLG (Natural Language Generation) software from Automated Insights to generate wire articles since 2014, and apparently Yahoo uses Automated's tech to produce recaps for fantasy football.¹ And so on.

NLG is actually quite easy within many constrained domains. But these NLG systems work quite differently from complex NN architectures which are attempting to generate realistic English prose based on unsupervised learning. Comparing them is a bit like comparing a table saw to a humanoid robot trying to teach itself to cut lumber using a handsaw.

But that said, we have quite effective technology for automatically generating realistic, useful natural-language prose, and have had for some time.

As for the point of your post: You'd really want a methodologically-sound study conducted by linguists with expertise in the appropriate areas for this, but I think your assumptions aren't well-founded. I've known a number of ESL (English as a Second Language) writers, and a number of TESOL professionals; and I've read a bit of TESOL theory and research. Second-language writers don't tend to be algorithmic. They're as prone to unexpected word choice as they are to using the expected word, because their vocabulary is more limited (and thus less precise) and they have less familiarity with idiom. While they may get inflections and particularly irregular word forms wrong, their grammatical structures are often more limited but also more formal than those of native speakers - they have less experience to tell them when native speakers are likely to bend the rules.²

Even a relatively simple classifier like an SVM consuming chunked input (so you get some phrase-level structure, not just unigram) could probably do a decent job of distinguishing ESL and OpenAI candidates.

¹I look forward to the day when every aspect of sports, from participation to consumption, is fully automated, and humans are not involved at all.

²Of course there are no rules as such, but for the sake of the point we'll pretend there are.

0 0 Reply

Monday 11th March 2019 07:31 GMT Dan 55

"an auto-ranter, perfect for 2000s-era Blogspot and LiveJournal posts"

I don't see why you wouldn't be able to point it at newspaper article comments either, answering a reasoned article with an avalanche of truthy bullshit or supporting one of those terrible comment pieces in the Mail or something, feigning popular support for something where there is initially really very little and allowing hostile powers to cheerlead Western countries off a cliff. Sort of like 55 Savushkina Street but more automated.

This I guess is why it was considered dangerous.

1 1 Reply

Monday 11th March 2019 08:42 GMT herman

Cicero

Will it be able to handle Cicero?

https://oll.libertyfund.org/titles/cicero-letters-of-marcus-tullius-cicero

Neque porro quisquam est qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit...

1 1 Reply

Monday 11th March 2019 08:42 GMT herman

Cicero

Will it be able to handle Cicero?

https://oll.libertyfund.org/titles/cicero-letters-of-marcus-tullius-cicero

Neque porro quisquam est qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit...

0 2 Reply

Monday 11th March 2019 08:43 GMT Michael H.F. Wilkinson

A simpler solution

Auto-ranters are labour-saving devices, that can spew conspiracy theories and other fake news without us humans having to do this tedious work. Sifting through and fact-checking all this stuff is of course tedious too, and is well beyond current AI. We should therefore just believe all of it, or rather develop electronic monks, who will believe this kind of shit for us, so we don't have to do it ourselves.

Doffs hat to the late, great Douglas Adams.

Mine is the one with "Dirk Gently's Holistic Detective Agency" in the pocket

9 0 Reply

Monday 11th March 2019 10:49 GMT BebopWeBop

Re: A simpler solution

There is always (well almost) a good Douglas Adams or Terry Pratchett exemplar to use

3 0 Reply
Monday 11th March 2019 15:16 GMT kernelpickle

Re: A simpler solution

It’ll be labor saving for everyone but the Alex Jones types of the world. With infinitely more rabbit holes being procedurally generated, it’ll be hilarious watching them chase their tails until their heads explode!

...but maybe knowing that it’s all just bullshit written by bots will finally make them give up their shenanigans entirely?

Either way, I’m looking forward to watching the chaos!

1 1 Reply

Monday 11th March 2019 08:47 GMT Francis Boyle

Sorry, amanfromMars

for you ze war is over. You can take that wastepaper bin off your head now.

10 1 Reply

Monday 11th March 2019 10:20 GMT Sgt_Oddball

Re: Sorry, amanfromMars

I must admit I'd be fascinated to see what this system would make of amanfrommars1.

I for one apatheticly shrug at our new AI detecting overlord.

1 0 Reply
1. Monday 11th March 2019 10:34 GMT FrogsAndChips
  
  Re: Sorry, amanfromMars
  
  Quick analysis from his last 3 posts: about 50% in the top 10 (green), and the rest vaguely equally distributed between the other 3 bands. AI (Alien Intelligence) detected!
  
  0 0 Reply
  1. Monday 11th March 2019 11:27 GMT Francis Boyle
    
    I wasn't (just) joking
    
    amanfromMars score remarkably low in the bot rating (mostly yellow).
    
    2 0 Reply
    1. Monday 11th March 2019 17:41 GMT Anonymous Coward
      
      Re: I wasn't (just) joking
      
      Well surely his programmers are aware of this recent development, and made some changes to the AI to avoid the some of the things that trigger the evaluation as "bot".
      
      1 0 Reply
    2. Monday 11th March 2019 19:28 GMT amanfromMars 1
      
      Re: I wasn't (just) joking
      
      amanfromMars score remarkably low in the bot rating (mostly yellow). .... Francis Boyle
      
      You need to run those tests again, FB. Those results of yours are not real and blatant misinformation..... aka BS.
      
      1 3 Reply
      1. Monday 11th March 2019 23:25 GMT Jamie Jones
        
        Re: I wasn't (just) joking
        
        You're saying you are a bot?
        
        1 0 Reply
        
        Tuesday 12th March 2019 05:19 GMT amanfromMars 1
        
        Re: I wasn't (just) joking
        
        You're saying you are a bot? .... Jamie Jones
        
        No ..... that's something you're asking of an alien phorm with an Earthly presence? Do bots do introspection?
        
        1 3 Reply

Monday 11th March 2019 08:59 GMT Crypto Monad

Dr Jorge PÃ©rez

Sad that even those working in the field of AI don't know how to handle Unicode encodings (let alone unicorns).

>>> "Dr Jorge PÃ©rez".encode("latin_1").decode("utf_8")

3 0 Reply

Wednesday 13th March 2019 21:39 GMT Michael Wojcik

Re: Dr Jorge PÃ©rez

Dear sir,

I must complain in the strongest possible terms about the output of your text-generation system. Many of my friends are Spanish, and only a few have the copyright symbol in their surnames.

Yours sincerely,

Brigadier Sir Charles Arthur Strong (Mrs.)

0 0 Reply

Monday 11th March 2019 10:27 GMT FrogsAndChips

Thanks, but who would use it?

Do we really expect the Facebook, BuzzFeed et al to put their content through this tool before publishing, or even after to flag it as 'probably AI-generated BS'? Just like they do very little to filter illicit content, why should they do anything about this? And once it's published, shared and re-tweeted, good luck trying to get that horse back in the stable, people believe what they want no matter how dubious and easy to refute.

0 0 Reply

Monday 11th March 2019 12:07 GMT Crypto Monad

Re: Thanks, but who would use it?

> Do we really expect the Facebook, BuzzFeed et al to put their content through this tool before publishing,

No, because it would instantly become as (in)effective as spam filters. Generators of this content would easily be able to manipulate it to include more red and yellow words and boost their score.

2 0 Reply
Monday 11th March 2019 15:23 GMT nematoad

Re: Thanks, but who would use it?

Looking at the reports of the twitterings of a certain Donald J Trump I would have thought that the answer was obvious.

Just this quote from the article should explain why I say that:

"...a lot of repetition, garbled grammar, and contradictions."

1 1 Reply

Monday 11th March 2019 10:33 GMT GunstarCowboy

All this spells is the end of credibility for online bullshit.

0 2 Reply

Monday 11th March 2019 10:50 GMT BebopWeBop

I don't think so. Too many human generated bullshitters - your AI really does need to have a mechanism for approximating understanding of the posts.

0 0 Reply

Monday 11th March 2019 13:49 GMT Pirate Dave

Missed opportunity

"They’ve called their kit Giant Language model Test Room, or GLTR for short."

Should have dubbed it TL;DR - Twitter Language; Die Robot

1 0 Reply

Wednesday 13th March 2019 03:59 GMT Neoc

Re: Missed opportunity

Should have nicknamed it Gary.

Yeah, yeah, I know, I'm dating myself. Not that anyone else would (badum tish)

0 0 Reply

Monday 11th March 2019 14:53 GMT Where Did All The Usernames Go

Gun, meet Foot

"In effect, the MIT-IBM-Harvard duo have turned GPT-2 117M on itself."

Nope! Just the opposite. They've helped perfect it. What better way to filter out poor candidates from the GPT output than to filter it through GLTR?

Do people really never think through what they're doing?

1 1 Reply

Monday 11th March 2019 15:29 GMT katrinab

For now

Now the bot will run its rants through this script and make sure to add loads of purple words, even if it has no idea what they mean or whether they are appropriate.

1 0 Reply

Monday 11th March 2019 17:31 GMT FrogsAndChips

Re: For now

The bot can pass (some) human BS detectors because it uses likely words. Start throwing in random words and people will be able to detect artificial content much more easily.

1 0 Reply
1. Monday 11th March 2019 17:38 GMT quxinot
  
  Re: For now
  
  I like to masturbate long words into my phrases even if I don't know what they mean.
  
  1 0 Reply
Wednesday 13th March 2019 21:42 GMT Michael Wojcik

Re: For now

Now the bot will run its rants through this script and make sure to add loads of purple words, even if it has no idea what they mean or whether they are appropriate.

Covered in the article. Using the output of GLTR as a goal is almost certain to produce worse evaluations from human judges, because it will create less-plausible prose at the diction level, even as it moves the output closer to an information-entropy signal that's more natural.

0 0 Reply

Monday 11th March 2019 17:17 GMT JohnFen

I wasn't afraid

It was connected to Musk, so I simply assumed that the hype far exceeded the reality.

0 1 Reply

Wednesday 13th March 2019 21:49 GMT Michael Wojcik

Re: I wasn't afraid

Even if their claims were accurate it wasn't worth worrying about. On one hand, we've had high-quality machine natural language generation for years in particular domains where there are strong economic incentives, such as technical reports, product reviews, and sports and financial reporting. On the other, so much human-generated text on a particular subject is already too voluminous for any one reader to adequately sample, while being of vanishingly small value.

As I suggested in my other post, the OpenAI approach is different from what's used in most or all commercial NLG. Consistently producing NLG prose on arbitrary subjects from a general-purpose, unsupervised-learning ML architecture would be something of an evolutionary step. But it's not shocking and nothing is particularly at risk. Human-generated text does not have some magic attribute¹ which makes it worth consuming, and various markets have already decided that machine-generated text isn't automatically not worth it.

¹I'm tempted to get into a discussion of Benjamin's theory of the "aura" here, but I realize I'd only be talking to myself.

0 0 Reply

Monday 11th March 2019 23:28 GMT Jamie Jones

"There were tell-tale signs the generated words were crafted by a computer, such as a lot of repetition, garbled grammar, and contradictions"

Topics

Special Features

Vendor Voice

Resources

COMMENTS

English, bad English and AInglish

Re: English, bad English and AInglish

Re: English, bad English and AInglish

Re: been taught English properly

Re: English, bad English and AInglish

Re: English, bad English and AInglish

Re: English, bad English and AInglish

Re: English, bad English and AInglish

Re: English, bad English and AInglish

Re: English, bad English and AInglish

Re: English, bad English and AInglish

"an auto-ranter, perfect for 2000s-era Blogspot and LiveJournal posts"

Cicero

Cicero

A simpler solution

Re: A simpler solution

Re: A simpler solution

Sorry, amanfromMars

Re: Sorry, amanfromMars

Re: Sorry, amanfromMars

I wasn't (just) joking

Re: I wasn't (just) joking

Re: I wasn't (just) joking

Re: I wasn't (just) joking

Re: I wasn't (just) joking

Dr Jorge PÃ©rez

Re: Dr Jorge PÃ©rez

Thanks, but who would use it?

Re: Thanks, but who would use it?

Re: Thanks, but who would use it?

Missed opportunity

Re: Missed opportunity

Gun, meet Foot

For now

Re: For now

Re: For now

Re: For now

I wasn't afraid

Re: I wasn't afraid

POST COMMENT House rules

Enter your comment

Add an icon

Other stories you might like

OpenAI CEO wants UAE into his plan for a global AI cabal

Microsoft, OpenAI may be dreaming of $100B 5GW AI 'Stargate' supercomputer

How to coax ChatGPT into making better predictions: Get it to tell tales from the future

MPs ask: Why is it so freakin' hard to get AI giants to pay copyright holders?

Google Cloud chief is really psyched about this AI thing

AI spam is winning the battle against search engine quality

Arm flexes silicon muscles to push generative AI at the edge

Developers are calling the shots on AI planning, judging by your experience

OpenAI claims its software can clone your voice from 15 seconds of you talking

US House of Reps tells staff: No Microsoft Copilot for you!

Why making pretend people with AGI is a waste of energy

Microsoft puts ex-DeepMind boffin in charge of London AI hub

About Us

Our Websites

Your Privacy