* Posts by Francis Tyers

1 publicly visible post • joined 15 May 2007

How Google translates without understanding

Francis Tyers

Some clarifications

>There are plenty of words for which there is not a 1:1 correlation between two languages

Look up "fertility statistical machine translation" in Google.

> One way to get around this problem is to use multiple reference translations, on the assumption that different human translators may choose different synonyms.

Another way to get around it would be to use METEOR (which has a WordNet synonymy plugin), and which correlates with human judgement significantly better than BLEU both at the corpus and sentence level.

Of course once a "standard" has been set it is difficult to go back and change it.

> Another interesting feature of the Google 'translator' is that take the title of this article, translate it from English to another language and then back and you'll end up in a completely different place in English.

Round-trip translations are a notoriously terrible way of evaluating machine translation output. Harold Somers (2005) discusses this in "Round-Trip Translation: What Is It Good For?" -- he uses BLEU scores (strangely), but the introductory criticism is good.

And seeing as everyone seems to be plugging their stuff, I might as well plug Apertium (http://apertium.sourceforge.net), a free software / open source machine translation engine/toolbox for closely (and not so closely) related languages. Unlike Google, not cagey with how it works (or in some cases doesn't) ;)