back to article New cunning linguist computer has got ancient tongues licked

Boffins have put together a new computer system that attempts to translate protolanguages, the ancient "parent" tongues from which modern languages evolved. The sophisticated Rosetta Stone-like system can quickly reconstruct the languages of yore from today's vocabularies with 85 per cent accuracy, we're told. The system's …


This topic is closed for new posts.
  1. Dr Who

    Cunning Linguist

    The old ones are the good ones!

    1. Anonymous Coward
      Paris Hilton

      Re: Cunning Linguist

      Yeah, but where's the comment about spoken language being composed of simple sounds like ooo, ahhh, mmmm and others....

      1. Anonymous Coward
        Anonymous Coward

        Re: Cunning Linguist

        That's pretty much the entire vocabulary of Dorset..

        Oooh Aaah, Tra'or Comboin 'Aaarster

    2. Frumious Bandersnatch

      Re: Cunning Linguist

      The old ones are the good ones!

      I'm so glad that the article wasn't about a really clever bunch of pygmies. Thank Heaven for small mercies, I say.

    3. M7S

      Re: Cunning Linguist - The old ones are the good ones

      Really? like C'thulhu?

    4. Mips

      Re: Cunning Linguist

      I thought the trick was with "cunning stunt".

  2. This post has been deleted by its author

    1. Anonymous Coward 101

      Re: Hmmm

      I'm sure they will have based their computer model on known protolanguages - is Latin really the only one? - and known daughter languages, and assumed that the process is the same for all languages. Languages do reliably change over time and linguists can categorise languages rather as biologists can categorise a particular species.

      1. Mage Silver badge

        With the exception of Latin?

        Try also Hebrew (up to 4500 years), a couple of types of Cuneiform/clay tablet stuff (Babylonian and pre-Sumerian, up to maybe 7,000 for earliest), two sorts of Egyptian (one of which in a different script turns out to be preserved by the Copts) , Greek/Linear B (up to 3,200 years ago?), and maybe something Scandinavian? Loads of Linear A clay tablets (3,800 years ago) but no-one has any idea what they say.

        I think this tool is a good development. If it helps crack Linear A I'll be impressed.

        There are also Chinese, Sanskrit (and other Indus valley/Indian?) Korean and Thai related stuff, no idea how old but the Chinese I believe had paper, gunpowder, Pasta and a Monetary system based on Government guarantee rather than rare shiny gems and metal when Europe was in Wattle and Mud huts using physical cows as currency, if at all.

        South & Central America?

        There is more really old stuff preserved because when you sack a city or library using clay tablets it makes them more durable. Parchment, Vellum, Paper and Papyrus was rather lacking in archival qualities in comparison (but likely can beat DVD & Tape backups by a thousand years or two). Doing the Library of Alexandria without a "backup" plan was stunningly bad. They WOULD have known the need for Off Site backups, but if they existed/exist it's a well kept secret.

        Mines the one with the "Story of Writing" in the pocket.

        1. wowfood

          Re: With the exception of Latin?

          (Babylonian and pre-Sumerian, up to maybe 7,000 for earliest),

          But the earth was only created 6000 years ago.

          1. Fatman

            Re: But the earth was only created 6000 years ago.

            I told you to stop reading fairy tales!!

          2. Anonymous Coward

            Re: the earth was only created 6000 years ago.

            I thought it was 4,500.

            I'm sure that's what I was taught 50 years ago.

          3. Euripides Pants

            Re: wowfood @14:12

            They came here on the Giant Space Ark.

        2. Fibbles

          Re: With the exception of Latin?

          Knowing what the markings on clay tablets mean is not the same as knowing how they're spoken. Languages tend to evolve by being spoken. Latin is used as the example because it is one of the few languages where we have vast collections of material detailing its evolution and we still know how to speak it.

        3. Anonymous Coward
          Anonymous Coward

          Re: With the exception of Latin?

          Don't forget the "Click languages" of the Bushmen.

  3. Robert Helpmann??

    Tongues untied

    The sophisticated Rosetta Stone-like system can quickly reconstruct the languages of yore from today's vocabularies with 85 per cent accuracy, beating human linguists' painstaking manual reconstruction from the words we all know and use.

    I would be interested to see the actual paper, when it is published if only to see the breakdown and analysis on which this statement is based. I cannot imagine that linguists do too much "manual" work in this area as there are software tools for almost every intellectual pursuit. This looks to be useful as a way to speed research along, but not to replace the human element in much the same way translation apps can get you pointed in the right direction, but still produce occasional howlers. I would guess that it will still be necessary to review 100% of the output to weed out and correct the 15% wrong, and to verify the 85% correct. It is unlikely to speed the process by 85%.

    1. vonBureck

      Re: Tongues untied

      > I cannot imagine that linguists do too much "manual" work in this area as there are software tools for almost every intellectual pursuit.

      Reconstructing ancient languages requires linguists to compare vocabulary items (including related words, homonyms etc.) for multiple languages and multiple time periods, so there's actually a huge amount of "manual" effort involved. Also, consider that people tend to specialise in selected languages or language families and might not have the resources to research multiple other languages, so it seems like a natural use for computers. As long as they manage to digitise and properly process all the required data, retrieving the entire historical evolution of more or less any word in any documented language (which you don't even need to be familiar with), presumably with cross-linking and related items, will be a huge aid to dictionary-wielding researchers.

      Also, bear in mind that all such reconstruction results are ultimately hypothetical, so the more languages can be referenced, the more plausible the end result. In any case, exciting stuff for the cunning ones.

    2. James Micallef Silver badge

      Re: Tongues untied

      " study changes in pronunciation, among other techniques"

      Genuine question - how can they know anything at all about how words were pronounced hundred+ years ago (any time before development of phonograph) simply from the written text?

      1. P. Lee

        Re: Tongues untied

        I'm not sure, but the article references "proto-languages," so I suspect they are making links between languages where we have no evidence. So, language X has some similarities to language Y so we'll see if we can interpolate a language between them (on the assumption they had a common ancestor).

        1. Rich.Alderson

          Protolanguage definition and example [was Re: Tongues untied]

          The word "protolanguage" is a term of art in historical linguistics, in use in English for more than a century. It has a precise definition:

          A protolanguage is the reconstruction of a prior stage of a group of related languages, providing prototype forms from which cognates in the different languages of the group can be derived by the application of regular rules of change. The protolanguage is constructed recursively, with more rules being added to the set of known changes as more data is added (from further investigations among the known languages, or from the addition of previously unrecognized or unknown members of the group.)

          To provide a concrete example: It was recognized in the 18th Century (codified in a 1786 state by Sir William Jones) that Sanskrit was clearly related to Greek and Latin; Jones noted the probability that Celtic, Germanic and Persian were also related. Early in the 19th Century, Jacob Grimm noted the set of regular correspondences that confirm the Germanic relationship. (We call this set "Grimm's Law".)

          In the 19th Century, Baltic and Slavic were added to the list early, with the recognition that Albanian and Armenian were also part of the group coming later. In 1876, Danish linguist Karl Verner explained some apparent exceptions to Grimm's Law as correlating with the Sanskrit accent. In 1879, a Swiss linguist named Ferdinand de Saussure posited some consonants in the reconstructed protolanguage based only on indirect evidence in the attested daughters.

          In the very early 20th Century, 2 languages of Chinese Turkestan (as the area was then known) were discovered in Buddhist scriptures in caves. They were written in a script derived from an old Indian source, and were determined to be members of the Indo-European group we have been discussing.

          In 1917, a Czech linguist, Friedrich (Bedřich) Hrozný, published a monograph demonstrating that an language from central Turkey, written in cuneiform and intermixed with Sumerian and Akkadian signs (not unlike the use of Chinese characters to write Japanese), was another unknown Indo-European language which we call Hittite. This claim was confirmed in 1927 by a Polish linguist, Jerzy Kuryłowicz, who pointed out that certain h-like consonants in the Hittite data occupied the places that Saussure had hypothesized in 1879, nearly 50 years earlier.

          The last 80 years in Indo-European studies have, to a large extent, been a reassessment of the previous 150 years' research based on these discoveries, with deniers as well as accepters of the changes required by the data. The same techniques have been applied to dozens of language families, large (Afroasiatic, Austronesian) and small (Muskogean, Miwokan). One experiment was to take the data of the modern Romance languages and reconstruct a protolanguage for them, which turns out to be very similar to but more importantly not exactly the same as Latin, a proof that there are limits to as well as benefits from our reconstruction techniques.

      2. breakfast Silver badge

        Re: Tongues untied

        One way that is used sometimes is through poetry - if you look at pieces of Shakespeare where there is a strict rhyming scheme you'll notice that some words just don't rhyme. Given that Shakespeare knew what he was doing when he was writing poetry, there's a good chance those words did rhyme as they were pronounced at the time, though which way the pronunciation has changed may be ambiguous.

        Similarly a pune ( or play on words ) will tend to work with homonyms, so when writers of the past are playful about language they can hand useful titbits to linguists of the future.

      3. That Awful Puppy

        Re: Tongues untied

        There are certain astonishingly accurate and universal laws that govern the changes in pronunciation through time. My flabber was well and truly gasted when I first stumbled upon this concept, but they do seem to work well. Google Grimm's law, and I'm sure you'll be amazed.

        1. Robert Helpmann??

          Re: Tongues untied

          Another method is to compare isolated pockets of a given language. As the main population moves on, the remote group tends to remain somewhat locked in place, at least for a while. See, for example, the many discussions concerning "American" versus "English" languages. I can think of several apt comparisons to biological evolution, including the way this tool might work in reconstructing a language from modern remnants.

      4. David Cantrell

        Re: Tongues untied

        We can know how Latin was pronounced by reading descriptions in Latin of how it's pronounced, and can also get useful information from poetry - some pronunciations work better than others to fit the metre, for example.

    3. Rich.Alderson

      Re: Tongues untied

      The paper was published in the Proceedings of the national Academy of Sciences. The Register does not allow URLs, so repalce the obvious words below with the appropriate punctuation marks:

      http colon slash slash slash content slash early slash 2013/02/05 slash 1204678110

      for the abstract and a pointer to the complete PDF.

  4. TeeCee Gold badge

    I just like to offer.... heartiest contrafribularites at the completion of this noble work.

    1. Chris007

      Re: I just like to offer....

      I am anaspeptic, frasmotic and compunctuous at your wonderful reply.

      1. Frumious Bandersnatch

        Re: I just like to offer....

        Damn! My pandigestory interlude just evacuated my nose. You owe me a new keyboard, sir!

    2. Helena Handcart

      Re: I just like to offer....


    3. This post has been deleted by its author

  5. Anonymous Coward

    And is...

    ...this usable on the Winter Queen?


    Oh good!

    1. This post has been deleted by its author

    2. Anonymous Coward
      Anonymous Coward

      Re: And is...

      I fear few will get your reference to the Oglaf Snow Queen :)

      (for those looking this up, be warned that it is very much Not Suitable For Work but worth it :) ).

      1. Anonymous Coward
        Anonymous Coward

        Re: And is...

        Deth'nitely wuth it.

  6. frank ly

    Can it work the other way round?

    If it 'knows' the processes and patterns by which spoken language changes, can that process be applied to modern English (compared with older and ancient forms) in order to predict how it will be spoken in, say, 300 years time?

    I realise that the modern world has a massive level of cultural exchange and pressure for language change than any previous natural process. It would be interesting to try it though.

    1. breakfast Silver badge

      Re: Can it work the other way round?

      My expectation was that it doesn't "know" so much as apply statistical analysis to large amounts of data- the Big Data version of knowledge. In that case it may not be as much use for projecting into the future, especially as many changes to the way language is used come from the description or use of new social, technological or environmental conditions. When we can predict those, things will have reached a very curious place.

    2. Grandpa_Tarkin

      Re: Can it work the other way round?

      Yes, it can.

      With the same level of accuracy, which is to say "fairly low".

  7. Andy The Hat Silver badge

    and now look to the future ...

    perhaps you are looking at version 0.01A (alpha release) of the all singing universal translator ... Check the ingredients list for 'just add Google live translate' and you can speak to Mrs Miggins at Ye Olde Pie Shoppe ...

  8. Nigel 11

    Etruscan? Basque?

    Many scholars have tried and failed to crack Etruscan. There's no Rosetta stone. If this program can crack it, then it's a major advance. If not ....

    Have they tested it on Basque, absent any help from a speaker of that language? Again that would be an acid-test. Banque is one of the world's anomalous languages, not related to any other in any known way.

    1. That Awful Puppy

      Re: Etruscan? Basque?

      You're missing the point. This program is not meant to decode the meaning of another language, it's supposed (as far as I can tell, anyway) to simulate the evolution of languages in reverse. This won't work on Basque or Etruscan or other language isolates, because they have no common ancestor, or at least none that is attested.

      It can, on the other hand, simulate the common ancestor of, e.g., the English hound and the German Hund, but do so far more rapidly, if less accurately, than humans, basically helping us to develop a phylogenetic tree for all related languages and even tentatively reconstruct words from related languages that are now extinct.

      Mmmm, Tocharian.

    2. Michael Wojcik Silver badge

      Re: Etruscan? Basque?

      [Basque] is one of the world's anomalous languages, not related to any other in any known way.

      Euskara, the Basque language, may be related to other extinct languages in that part of Europe, such as Iberian; and it may be related to some Northern Caucasian languages such as Chechen. (There's a theory that Euskara and the Northern Caucasian languages are both related to the Na-Dené languages, but that is controversial, to put it mildly. Under this theory, as I understand it, a lost protolanguage in Asia would be a common ancestor for Euskara, the Caucasian members, and the Na-Dené members of the superfamily, via waves of migration at least six thousand years ago.)

      It's not quite accurate to say it's "not related to any other in any known way". It could be a true isolate, but there are various possibilities for distant relatives.

      As others noted, though, Euskara isn't a candidate for the kind of work being discussed in this article. Presumably you could try to use this model to reconstruct a hypothetical protolanguage using Euskara and some of the candidate relatives, but the connections are very tenuous and it's hard to see how the results would be useful. Typically, you want to reconstruct a protolanguage either to understand the development of and relationships among its descendants, or to help in translating another descendant. Neither would apply with a hypothetical protolanguage created from such distant relations.

  9. Swarthy

    Just keep it away from The Crimson Scholars

    Especially Jennicandra.

  10. ukgnome

    But could it transcribe a conversation between a person from Norfolk, a person from Newcastle upon Tyne and a Glaswegian?

    1. Loyal Commenter Silver badge

      Translation into The Queen's English:

      Pardon me my good man, I can't understand you.

      Pardon me my good man, I can't understand you.

      Pardon me my good man, I can't understand you.

      1. Anonymous Coward

        Re: Translation into The Queen's English:

        Said at increased volume and reduced speed each time to assist Johnny Foreigner's understanding

        Mines the one with a History of Empire in the pocket

  11. BlueGreen

    Not bad

    Thought the comments section would be the usual witless avalance of rage at dumb boffins who obviously don't know what they're doing, but not at all. What a nice surprise.

    @James Micallef: I'm no linguist but I've read up on it a long while ago. There's been a lot of work done <>.

  12. thomas k.

    <cough> Prometheus <cough>

    Didn't we already see this in Prometheus? Not that it did poor David much good, as I recall.

  13. Anonymous Coward
    Thumb Down

    The only trouble is...

    whatever info you feed the program, the answer either comes out:-

    One ring to rule them all, one ring to bind them



  14. Anonymous Coward
    Anonymous Coward

    Does it do Klingon?

    ... just wondered... ok... leaving now :(

  15. Stevie


    Who cares what ancient languages sounded like? If they were as sonorous or all-round nifty as English we'd still be hearing them. Clearly irrelevant.

    1. Loyal Commenter Silver badge

      Re: Bah!

      Obvious troll is obvious.

  16. Herby


    Latin ancient?? Not really! The pope's recent declaration of his stepping down was released in Latin, which is the official language of the Holy See.

    Now can they do that with Cobol?

    1. Michael Wojcik Silver badge

      Re: Ancient??

      identification division.

      program-id. papal-resignation.

      author-id. pontifex.

      procedure division.

      display "I'm stepping down".

      end program.

  17. David Pollard

    Not a single mention of Raquel Welch

    Link for younger readers:

  18. benster
    Thumb Up

    Am I the ONLY one who gets the headline??

    :) cheers to the headline writer.

    1. Anonymous Coward
      Anonymous Coward

      Re: Am I the ONLY one who gets the headline??

      No, you're not. Anyone in Middle School would "get" it too, and that's where it should be buried. Absolutely insipid; the headline writer should be fired for making everyone interested in an important anthropological work first endure such a sophomoric indulgence.

    2. Anacolouthon

      Re: Am I the ONLY one who gets the headline??

      The first 2 repliers did, as did some of those below. What also may be underappreciated is that there is a place for humor in the conduct of science, & that linguistic archeology can be a sexy subject. (The truth of the latter assertion was exploited in the Indiana Jones film series.)

  19. gfjmcginnis

    Cunning, Linguist, Licked in the title (Cunnilingus Licked). Really, they didn't catch that when proofing the headline. Looks like someone put one over on the editor. Just too funny.

    1. Fred Flintstone Gold badge

      You must be new here..

  20. snowyphile

    Knowingly Tawdry

    Reviewing the Rolling Stones has different standards than for reviewing articles in PNAS, USA.

  21. bholder

    Your headline writer...

    ...needs to be fired.

  22. clascoutx


    ...try Nauatl on your cunning compu. it's older than Sanskrit/Hebrew, has the oldest Calendar, the Tonalamatl=

    Tonatiuh(N/the sun Anthony)=tonalli(N)=soul,=tone/tune/detonate. used during the upper stone age and

    the Tlaloc Nomad Deer/Mazatl(N/7 Tonalamatl)=maz(OHG)=meat(E)., 45k BCE to 10k BCE.

  23. Jynseng


    All that effort, and it will still only be used to look up rude words...


    Someone give this author an award for Most Artful Use of Puns and Double Entendres!!!

    1. Anacolouthon

      It certainly got some tongues wagging;)

  25. AZRLS

    Cunning linguist + tongues + licked. Really? Did that one slide right past the editors?

    1. Loyal Commenter Silver badge

      Why should it? This is a red-top, not a peer-reviewed journal.

  26. Grandpa_Tarkin

    Never thought about it, but Bayesian analysis should be perfect for this kind of thing. They have a decent model of how languages evolve, and some a priori knowledge, so a Bayesian approach will work a treat.

  27. Crosstown

    Assume code is C++... stunning array of c-c-c-comments...

  28. MezzoForte

    what's being licked here? (interesting title...)

  29. PNP

    Rosetta-stone like? Nonsense

    While the merits of the proposed methods are still unclear - for example, there are quite a few assumptions about cognate identification that may bias the model and that are really only discussed in the supplementary materials - one thing is for sure. This has ab-solute-ly NOTHING to do with Rosetta stone. I have no idea where the author gets that from ...

    PNP (Professor in Statistical Linguistics)

  30. Field Marshal Von Krakenfart

    Reconstructing ancient languages

    So...... It will convert Java into Cobol....

    Mines an Admiral’s overcoat.

  31. Britt Johnston
    IT Angle

    does the code run on my raspi?

    see title

This topic is closed for new posts.