"... misplaced apostrophe's, ..."
I see what you did there and in other places. Are you also a grocer?
Pedants, imagine how much more relaxed your life would be if artificial intelligence automatically corrected grammar mistake's in online forum and social network posts. Never again would you explode with frustration and anger over misplaced apostrophe's, commas full stop's and exclamation! marks! The faults could be fixed up …
Any kind of playful use of language will be translated into AI generated "Newspeak" and turgid prose will come of the computer controlled word sausage machine.
Why not simply develop an AI to write the text in the first place? its probably easier than trying to fix language it cannot really appreciate.
I look forward to the AI art critic suggesting "It's just a pile of bricks..."
Why not simply develop an AI to write the text in the first place?
Done long ago. One well-known example is Phillip Parker's patented book-generation system, which has been used to create hundreds of thousands of books on specialty topics. Which, yes, he sells, and apparently makes quite a lot of money from.
Generating usable natural-language prose is actually quite easy, just like generating passable music (algorithmic generation of classical and jazz music good enough to fool expert judges has been demonstrated for decades). Creating writing that's stylistically interesting, and generating new ideas on a subject, are somewhat more difficult challenges.
In any case, the point of Shan's system, and others like it, isn't to fix broken prose. It's to attempt to add punctuation to text streams that lack it, such as ASR (speech-to-text) output, to make it easier to parse correctly. This was in the article.
"At the moment, it can only deal with commas and full stops, the most common and easiest of English's punctuation marks."
If they were that easy how come so many people do without them writing enormous walls of text without so much as a pause as if their taking one deep breath and just letting out a single massive belch of their stream of consciousness ooh look a cat video ?
Written language (especially kludge known as "English"!) is entirely too flexible for a mere computer to figure out. See such (t)witticisms as "Ode to a Spell Checker" for one way to completely balls-up an AI-bot that most readers wouldn't even realize was an issue. There are many more.
For the curious:
Ode to the Spell Checker
Eye halve a spelling chequer
It came with my pea sea
It plainly marques four my revue
Miss steaks eye kin knot sea.
Eye strike a key and type a word
And weight four it two say
Weather eye am wrong oar write
It shows me strait a weigh.
As soon as a mist ache is maid
It nose bee fore two long
And eye can put the error rite
Its rare lea ever wrong.
Eye have run this poem threw it
I am shore your pleased two no
Its letter perfect awl the weigh
My chequer tolled me sew.
"Eye halve a spelling chequer [...]"
Living in a small village near Stockholm was a good place to learn Swedish - which I did mostly by reading Asterix the Gaul. That gave me a fairly good grasp of everyday usage - but did little for my pronunciation.
One day I went into the bakery shop and used my new skills to ask for my favourite cake - a long pastry crusted with nuts. "En av den där nötter kakor, tack". (one of those nut cakes please).
I knew that "den" was locally pronounced as "dom" - but wasn't sure about "av" so tried my best guess.
She picked the cake up - good - and then started to cut it in half!
My mistake was to pronounce "av" sounding like "halv" (=half) - rather than the same sound as the English "of". Presumably there was a prior context for customers only wanting a half of that cake.
"Living in a small village near Stockholm was a good place to learn Swedish - which I did mostly by reading Asterix the Gaul. That gave me a fairly good grasp of everyday usage - but did little for my pronunciation."
I have Esperanto translations of Asterix (er Asteriks I mean) books for that same reason. Though apparently my pronunciation is perfect. That's why they moved me to the advanced Esperanto class, so they could all listen to my pronunciation in awe, despite the fact I had no idea what it was I was saying. Which is why I left those classes, it wasn't teaching me anything. I later found out I pronounce Esperanto with the same thick Aussie accent I pronounce English with, just all the other Aussie Esperanto students and teachers didn't notice.
" I later found out I pronounce Esperanto with the same thick Aussie accent I pronounce English with, [...]"
My Swedish colleagues in the Stockholm office said my accent was good - like a native of Gothenburg. They then explained that the Gothenburg Swedish accent is equivalent to the Scouse accent in English.
Written language (especially kludge known as "English"!) is entirely too flexible for a mere computer to figure out
So are you claiming human beings rely on something formally more powerful than a Turing machine to interpret language? What might that be?
Of course this is a long-standing debate. Searle, though he argued forcefully against one particular approach ("symbolic manipulation") to strong AI with his Chinese Room thought experiment, believed that the human mind was a mechanical effect, and therefore that someday, assuming continued progress, we would eventually have machines that were human-mind-equivalent. Penrose believes otherwise, and thinks human minds are formally more powerful. There are many others on both sides.
Oxford comma. Who knew what old Mr. Mandela's hobby was?
"By train, plane and sedan chair, Peter Ustinov retraces a journey made by Mark Twain a century ago. The highlights of his global tour include encounters with Nelson Mandela, an 800-year-old demigod and a dildo collector." -- The Times
Was " shoots, and leaves" an Oxford comma? It was merely a pause - not a comma separated list of items.
I'm afraid you're wrong; it was indeed a serial comma. The series in question is three verbs in a compound predicate. (They could also be described as three clauses, the latter two abbreviated. It comes down to the same thing.)
Truss's eats-shoots-leaves example isn't actually much of an argument for or against the serial ("Oxford") comma, because the comma that's important for distinguishing the sense of the two constructions is the one between "eats" and "shoots". The second comma is largely irrelevant to interpretation.1
The Ustinov-Mandela example someone else quoted above is a better one. In general, the serial comma really pulls its weight in cases like this, where it helps the reader distinguish between a series on the one hand, and an appositive or parenthetical phrase on the other.2
The interesting thing, to me, about the serial-comma war is that it cuts across the lines of the other Great Comma War, between the "naturalists" and the "scientifics".3 The former want English punctuation to reflect style, pacing, and often the rhythm of speech. The latter want it to conform to some sort of grammatical principles: this construction calls for a comma, and that one does not.
You might think that the scientifics would endorse the serial comma, say, because it can clarify an ambiguous phrase. But it seems plenty of them simply classify it as "unnecessary" and therefore undesirable. And similarly the naturalists are divided between those who abhor it as an ugly interruption, and those who feel its omission is lazy and jarring.
And then there's the ongoing fight over comma typography, specifically whether commas should be moved within closing quotation marks, in the style still preferred by many US copy-editors, or left unmolested when they aren't part of the quotation. It's a holdover from the days of lead type, and now pointless, but habits die hard.
1Which makes it no less contentious, of course, since proponents and opponents are perfectly happy to wage this war over questions of style, euphony, and consistency.
2Alas, a great many writers have trouble with appositives in general, or rather with treating adjectival phrases as appositives. I particularly note this when people put unnecessary commas after job titles: "Department chair, Bob Smith, said...". Those commas are not preferred and serve no purpose - "department chair" is an adjectival noun phrase preceding the compound noun it modifies. Now, if the phrase were "The department chair, Bob Smith, said..." then "Bob Smith" is an appositive phrase, and it is customary to set those off with commas. It's an appositive because "the department chair" is a complete noun phrase on its own. Really, it' s not hard.
3I'm ignoring the war between descriptivists and prescriptivists, because the latter are patently wrong and there's no point in discussing that further.
It should then be trained to assassinate anyone who follows "but" with a comma.
I was tempted to agree, but, on reflection, it occurred to me that there are many cases where the clause introduced by "but" will begin with some type of phrase that is traditionally set off with commas, such as an adverbial.
That said, there is a nasty tendency among some writers these days to move the comma that traditionally appeared before a coordinating conjunction (such as "but") to after it, and this should be greeted with scorn and derision.
My experience of reading junior engineers' English was of seeing clause after clause separated by commas, with only the occasional full stop. No other type of punctuation mark.
It was English written as it is spoken, but often with very limited vocabulary. No concept that writing is a more formal performance.
One of them told me once that I was the first person who had ever gone through their writing to point out the mistakes. This from a person in their early twenties.
The The. Yep!
"It was when Johnson and Christopherson flew to South America to film the videos for "Infected" and "Mercy Beat" that events started to spiral out of control. Filming in the Peruvian jungle in Iquitos, Johnson used the services of a local Indian tribe as guides. The Indians introduced Johnson, already an enthusiastic user of drugs, to the hallucinogenic concoctions used in their tribal rituals. The video for "Mercy Beat" captures a scene where during filming the crew were attacked by a rally of Communist rebel fighters, angry at the appearance of what they considered Western intruders. Johnson confirmed that the scene was genuine and unscripted, and admitted that at the time he was "so high", recalling the madness that had ensued: "Someone produced a snake which I was grappling with, and I hate snakes. A monkey bit me, and then me and this guy, who I'd only just met, cut each other and we became blood brothers, rubbing blood over each other's face, stuff like that." "
For this to really be useful, it needs to be built into those smart speakers they have these days. Then whip up a bit of code to have it scan the audio from my TV and shout 'That's *fewer* goals, you fucking moron! FEWER!' at appropriate times while I sit and relax on a Saturday afternoon. That would save me a lot of work.
The rejection of "less" for discrete quantities is a relatively recent trend - it started in the seventeenth century. Before then "fewer" and "less" were used more or less interchangeably. But then many of our contemporary shibboleths were introduced by the Augustans in the seventeenth, typically based on rules they derived from classical languages (such as the prohibition on "split infinitives") or etymology (as is the case with fewer/less).
Many of these probably won't survive much longer. Usage restrictions that don't affect interpretation are typically preserved as class markers, and spoken and written language seems to be falling out of favor as a category of class distinction.
Muphry's Law, early 1990s. There are several variations on the theme (Skitt's law, Bell's first law of Usenet, etc.). It has almost always been considered an obligation to include an error of your own when correcting another's post on Usenet, Fido, email lists & etc.
Will this thing be trained to do US "English" or the English used by just about everyone else?
People have already mentioned the Oxford Comma, which my, Oxford educated, English teacher taught us not to do. I suspect that there are other variations from correct English that may end up in the system if this system is set to use LeftPondian language.
They do seem to be keen on the comma splice and an unhealthy quantity of exclamation marks. What else I wonder...
What Spanner said.
And how is it going to cope when it starts coming across things like quotations in foreign languages? Convert them to English even if the whole point of it being there is to inform the reader that the speaker *might not* be an honest, deity-fearing English Gentleman but could be one of them damn furriners in dsguise...
It's bad enough I have to suffer spiel chuckers that try to change every spelling to Leftpondian without them starting on grammar and sentence structure too.
Here's a really radical idea - how about teaching children how to do this properly at school, rather than filling their heads with trendy nonsense like phonetics to teach spelling.
how about teaching children how to do this properly at school, rather than filling their heads with trendy nonsense like phonetics to teach spelling.
In the '50s I learned to read using phonics. Since then trendy methods have come and gone and we are back to using phonics that was used to teach my grandchildren how to read. You do learn the differences in spelling words like "F"arming and "PH"onetics (how do you spell "ff").
I had a colleague a few years ago who was (mis)taught to read using the Initial Teaching Alphabet. He admitted that even in his forties he had difficulty reading.
There are some 300 million Americans and 60 million British. So for obvious commercial reasons the American version will be developed first.
At a seminar a questioner asked about American attitudes to British English. An answerer said there are two conflicting attitudes: first a feeling that British English is something special; and secondly that most British people use very poor English.
We already have grammar checkers built in to stuff things like Word up totally into American English.
While we're at it, they're not "grammar checkers". They're systems that apply a bunch of mostly-inappropriate heuristics, nearly all concerned with usage, mechanics, diction, and other things which are not grammar, to prose which has been mechanically chunked but not actually parsed.
Those things may help marginal writers massage prose into something closer to someone's idea of preferred form, but they are by no means a substitute for learning to write well.
And they have nothing whatsoever to do with the system described in this article. Really, I can't imagine how the OP thought they're at all relevant.
So how would the "AI" be able to determine the original intent?
Thus, I call (partial) "AI BS" on this one.
But I'll grant that it could work most of the time, only occasionally causing a World War due to an unexpected 'AI correction' in a treaty.
Eats shoots and leaves. = Panda
Eats, shoots, and leaves. = Clint Eastwood
So how would the "AI" be able to determine the original intent?
Thus, I call (partial) "AI BS" on this one.
Argh. Did you read the article?
The point of punctuation-replacement systems, such as this one, is to take text streams that lack punctuation (such as from ASR) and attempt to inject appropriate punctuation to improve parsing.
"Determin[ing] the original intent" isn't the goal. Yes, there are ambiguous phrases in natural languages. Every single person who works on natural language processing is aware of that, and the many people who commented on this article to point it out may congratulate themselves on having made what might be the most obvious point conceivable.
A punctuation-replacement system is a model-based transformation. That model could, in principle, be extremely sophisticated. It could build competing parse trees and select among them based on sentiment, metadata, rhetorical structure analysis, or other secondary features. It could keep whole-document context. It could use a world model to determine probable meaning of text segments. Or it could just be something like an LSTM network (or even something simpler) trained on a large corpus.
But it won't "determine the original intent", any more than human readers do. That's the intentional fallacy. Writers (or speakers) don't transmit their intent through language to readers (or listeners). Readers construct interpretations, which will correspond to some degree with the writer's interpretation.
And there's no "BS" here. Shan claims the system injects punctuation with an F1 of around 0.7, if my memory of the article serves (I'm not going back to check because the details don't really matter). That's the claim: it's a specific one, about the measured output of the system run against a particular set of inputs, compared to the ideal output.
Biting the hand that feeds IT © 1998–2020