"but we can spot lies in headings"
"We trained our AI on The Register!!!"
Two AI researchers are behind a daring open challenge to fight the spread of outrageous headlines that are completely detached from reality. (As if anyone would write such things, tut-tut.) The Fake News Challenge (FNC) is organized by Dean Pomerleau, an entrepreneur and adjunct professor at Carnegie Mellon University, and …
"Truth by consensus" will work for a while, until someone - probably the Russians - work out how to game it. Won't take long. Just ask Google, they spend beeellions in keeping just one step ahead of the "optimisers" (read: liars).
I think the only sensible way to try to distinguish fake news is to follow up the references. Because, and it's important to remember this, all news stories take the form "A says X". If your story isn't syntactically equivalent to "Bob says 'It's raining'", then it's Not News.
References support the full story? Then, and only then, is it "real news". Of course the quality of news is only as good as the references, but that's always been the case and always will be. Repeated references acquire Reputation.
Story has references, and they're verifiable, but the references only confirm a small part of it? Flag as "analysis or speculation, not news".
Story has references, and they're verifiable, but they don't say what the story says they do? Flag as "bullshit".
Story has no references? Flag it as "creative fiction". This works for any story whose author doesn't explicitly say how they learned these things they're telling you.
The tricky case is: story has references, but they're unverifiable. Then you would want to cross-reference, and it's hard to do that without introducing guesswork. How plausible is it that these people exist, and would do or say these things? I can't think of any procedural way to make that assessment.
If you have an article "Trump facing imminent impeachment" what's the difference if it references "sources close to the House of Representatives", "senior republican Congressmen", "Todd Hays R-KS" (note there is no such person) "Speaker of the House Paul Ryan's staff" "Speaker of the House Paul Ryan"?
None, if the article is simply fake. If you check that the article has certain kinds of references, fake news will simply make up references that will pass the check". You can write the exact same article, claiming a quote from Paul Ryan saying that Trump faces impeachment, and it will be either true or stupidly false depending on whether the quote was accurate. How do you determine the accuracy of that quote, have the program email Paul Ryan asking if he actually said that?
Obviously for something like that it would be all over the news if it was true, but Paul Ryan probably would have given the original quote to only one source (unless it was a press conference with a lot of press attending) If the former, what's the source for the other articles aside from the original article? If I hack into foxnews.com and plant that fake story with the fake Paul Ryan quote, how many others will reference that? How many times will it be retweeted in the next 30 minutes? It would quickly be very widespread, so how is a bot supposed to figure out it is false? For that matter, how is a person supposed to? If you wait a bit someone will succeed in calling Paul Ryan or other high ranking house members and they'll deny it, and it will be revealed as a hoax in the next few hours.
But what if it is a story like "House republicans to investigate Trump's business deals in Turkey", which isn't going to attract the same feeding freezing that will quickly self correct the false impeachment story?
Well, if Paul Ryan was going about saying something like that, it would be mentioned in several places. And pretty soon Ryan himself would put out a statement through official channels, either confirming or denying or "clarifying" what he said. Ditto if it's attributed to "his staff".
If it refers to "Todd Hays (R-KS)", then you look that person up. Pretty sure it's not too hard to find a directory of congresscritters.
Yes, the story will be retweeted and spread quickly. But within 2 hours, tops - which is to say, long before most people have actually seen it - it will have been either debunked or "clarified". And the simple rule is: if Ryan himself denies the quotes that are attributed to him, then they're false. End of story.
If it's attributed to "sources close to the House of Representatives", that comes under "unverifiable sources". In that case a human fact checker would try to verify them, using their own "sources" in that position: how plausible is it that someone in a position to know something would have said these quotes? I don't know how to get a bot to make that assessment, so just flag it "unverified".
But note what the story actually says. Specifically, it does not say "Trump is about to be impeached", and anyone who reports it that way falls immediately into the "analysis or speculation, not news" bucket. What it says is "such and such a person says this is about to happen".
That's what I mean by all *news* being about what people say. Not about "what is really happening" - because, turns out journalists don't have any direct hotline to Ultimate Truth. All they can report is what they're told. Anytime we ask or expect them to do more than that, we're asking them to make stuff up.
Your prejudice is showing. The Mail has some very high calibre journalists.
It also has an unspeakably horrible editorial spin. But there's nothing wrong with the quality of the research or writing - just the choices of what to report and how to spin it.
But there's a difference between "misleading" and "false". Read the Mail's stories for what they actually say, not what they imply or hint or "try to lead you to conclude", and you can get a lot out of it. (In that regard it's actually better than the Independent nowadays. The Indy has some great writers, but its editorial spin is just as shameless as the Mail's, and it doesn't have anything like the resources the Mail has to check its facts.)
Written language (especially English!) is entirely too flexible for a mere computer to figure out. See such (t)witticisms as "Ode to a Spell Checker" for one way to completely balls-up an AI-bot that most readers of ThePress wouldn't even realize was an issue. There are many more.
But as long as we don't deviate from the training dataset, these jokers will be able to spend the grant money, having played the game as it was written ...
I'm curious as to amfM's take on the subject ;-)
@jake - many thanks, I hadn't encountered 'Ode to a Spell-checker' before - delightful!
I also wonder how the headline-checking project will cope with humour, given that even real, live humans, such as us commentards* can't always detect the sarcasm and irony in the occasional headline here at El Reg's Hallowed Halls of textual information for the technologically relatively literate.
*Well, most of us, anyway. And not forgetting Amanfrommars whom I personally put in a category all of their own.
.... into Overturning Crooked Tables.
RT.com and Ruptly. tv are currently ahead of the mainstream in media and IT in the Game, and also way ahead moribund terrestrially tied intelligence agencies in the Greater IntelAIgents Game with their overarching meme which eventually always out the fool on an errant mission ..... Question more
The Wild Wacky West's daily touted angst and paranoia cited against ye olde cold war Russian threat vector is proof positive of the veracity of the statement.
What would cause the West to behave as a retard with idiotic introspection paraded for all with virgin intelligence to see and scoff at? What catastrophic systemic weakness have they discovered they have covered and cannot further hope to conceal with their failing command and control of made freely available information and future advanced intelligence streams?
Share that discovery and the whole world changes practically overnight, or, and this is another attractive possibility and therefore real probability for exercising, agree not to share the secret and earn yourself an instant billionaire fortune.
Hmmm? Decisions, decisions!.
Some would tell you that Henry Ford had the system sussed and a chronic pervasive enemy for sustained destructive attack identified ages ago ....... It is well enough that people of the nation do not understand our banking and monetary system, for if they did, I believe there would be a revolution before tomorrow morning.
Sell yourself cheap and you are a hooker in an area and era where courtesans rule and give pleasure with universal treasure.
With one able to spend billions is there nothing that cannot be done with a little help from one's friends and families. The problem that is evident is the world is full of impotent idiotic fools unable to activate change with remotely provided bounty.
.I’m of the Advanced Autonomous Active HyperRadioProActive MindSet of Ye Olde School, AC. It is much more rewarding and entertaining …. Surplus wealth is a sacred trust which its possessor is bound to administer in his lifetime for the good of the community. The man who dies rich, dies disgraced. ….. Andrew Carnegie
There is nothing new to see here. How about this, for instance? (From 1898).
For some background, see https://en.wikiquote.org/wiki/Yellow_journalism
As long as people have been able to write (or even speak) they have told lies about one another, often with their own advantage in mind.
This post has been deleted by its author
I get the feeling that regardless of the scientists' motives here, any algorithm touted as a "fake news detector" will be used primarily by zealots to put down stories and ideas they oppose as "Proven to be false by computer analysis," just as the Global Warming proponents back in the nineties used computer simulations to "prove" catastrophic warming was just around the corner.
Of course the fake news detector might need a little tweaking to achieve the correct results, just like with the climate sims.
(subject: trump) AND (sentiment: positive) :- FAKE
(subject: facebook OR google OR artificial intelligence) AND (sentiment: negative :- FAKE
(headline contains "top #" :- LISTICLE
(headline contains "# reasons") :- LISTICLE
(LISTICLE) AND (# == count(section-heading startswith "#.")) :- LEGIT
But let's start with Betteridge's Law and remove all headlines ending with a question mark. Follow up with all headlines that use the Clickbait algorithm:
"You won't believe how...""50 people who...", "10 ways to"...
That kind of thing. Then we can at least give the fake news killer AI a healthy diet to ingest.
Biting the hand that feeds IT © 1998–2021