Cunning Linguist
The old ones are the good ones!
Boffins have put together a new computer system that attempts to translate protolanguages, the ancient "parent" tongues from which modern languages evolved. The sophisticated Rosetta Stone-like system can quickly reconstruct the languages of yore from today's vocabularies with 85 per cent accuracy, we're told. The system's …
This post has been deleted by its author
I'm sure they will have based their computer model on known protolanguages - is Latin really the only one? - and known daughter languages, and assumed that the process is the same for all languages. Languages do reliably change over time and linguists can categorise languages rather as biologists can categorise a particular species.
Try also Hebrew (up to 4500 years), a couple of types of Cuneiform/clay tablet stuff (Babylonian and pre-Sumerian, up to maybe 7,000 for earliest), two sorts of Egyptian (one of which in a different script turns out to be preserved by the Copts) , Greek/Linear B (up to 3,200 years ago?), and maybe something Scandinavian? Loads of Linear A clay tablets (3,800 years ago) but no-one has any idea what they say.
I think this tool is a good development. If it helps crack Linear A I'll be impressed.
There are also Chinese, Sanskrit (and other Indus valley/Indian?) Korean and Thai related stuff, no idea how old but the Chinese I believe had paper, gunpowder, Pasta and a Monetary system based on Government guarantee rather than rare shiny gems and metal when Europe was in Wattle and Mud huts using physical cows as currency, if at all.
South & Central America?
There is more really old stuff preserved because when you sack a city or library using clay tablets it makes them more durable. Parchment, Vellum, Paper and Papyrus was rather lacking in archival qualities in comparison (but likely can beat DVD & Tape backups by a thousand years or two). Doing the Library of Alexandria without a "backup" plan was stunningly bad. They WOULD have known the need for Off Site backups, but if they existed/exist it's a well kept secret.
Mines the one with the "Story of Writing" in the pocket.
Knowing what the markings on clay tablets mean is not the same as knowing how they're spoken. Languages tend to evolve by being spoken. Latin is used as the example because it is one of the few languages where we have vast collections of material detailing its evolution and we still know how to speak it.
The sophisticated Rosetta Stone-like system can quickly reconstruct the languages of yore from today's vocabularies with 85 per cent accuracy, beating human linguists' painstaking manual reconstruction from the words we all know and use.
I would be interested to see the actual paper, when it is published if only to see the breakdown and analysis on which this statement is based. I cannot imagine that linguists do too much "manual" work in this area as there are software tools for almost every intellectual pursuit. This looks to be useful as a way to speed research along, but not to replace the human element in much the same way translation apps can get you pointed in the right direction, but still produce occasional howlers. I would guess that it will still be necessary to review 100% of the output to weed out and correct the 15% wrong, and to verify the 85% correct. It is unlikely to speed the process by 85%.
> I cannot imagine that linguists do too much "manual" work in this area as there are software tools for almost every intellectual pursuit.
Reconstructing ancient languages requires linguists to compare vocabulary items (including related words, homonyms etc.) for multiple languages and multiple time periods, so there's actually a huge amount of "manual" effort involved. Also, consider that people tend to specialise in selected languages or language families and might not have the resources to research multiple other languages, so it seems like a natural use for computers. As long as they manage to digitise and properly process all the required data, retrieving the entire historical evolution of more or less any word in any documented language (which you don't even need to be familiar with), presumably with cross-linking and related items, will be a huge aid to dictionary-wielding researchers.
Also, bear in mind that all such reconstruction results are ultimately hypothetical, so the more languages can be referenced, the more plausible the end result. In any case, exciting stuff for the cunning ones.
I'm not sure, but the article references "proto-languages," so I suspect they are making links between languages where we have no evidence. So, language X has some similarities to language Y so we'll see if we can interpolate a language between them (on the assumption they had a common ancestor).
The word "protolanguage" is a term of art in historical linguistics, in use in English for more than a century. It has a precise definition:
A protolanguage is the reconstruction of a prior stage of a group of related languages, providing prototype forms from which cognates in the different languages of the group can be derived by the application of regular rules of change. The protolanguage is constructed recursively, with more rules being added to the set of known changes as more data is added (from further investigations among the known languages, or from the addition of previously unrecognized or unknown members of the group.)
To provide a concrete example: It was recognized in the 18th Century (codified in a 1786 state by Sir William Jones) that Sanskrit was clearly related to Greek and Latin; Jones noted the probability that Celtic, Germanic and Persian were also related. Early in the 19th Century, Jacob Grimm noted the set of regular correspondences that confirm the Germanic relationship. (We call this set "Grimm's Law".)
In the 19th Century, Baltic and Slavic were added to the list early, with the recognition that Albanian and Armenian were also part of the group coming later. In 1876, Danish linguist Karl Verner explained some apparent exceptions to Grimm's Law as correlating with the Sanskrit accent. In 1879, a Swiss linguist named Ferdinand de Saussure posited some consonants in the reconstructed protolanguage based only on indirect evidence in the attested daughters.
In the very early 20th Century, 2 languages of Chinese Turkestan (as the area was then known) were discovered in Buddhist scriptures in caves. They were written in a script derived from an old Indian source, and were determined to be members of the Indo-European group we have been discussing.
In 1917, a Czech linguist, Friedrich (Bedřich) Hrozný, published a monograph demonstrating that an language from central Turkey, written in cuneiform and intermixed with Sumerian and Akkadian signs (not unlike the use of Chinese characters to write Japanese), was another unknown Indo-European language which we call Hittite. This claim was confirmed in 1927 by a Polish linguist, Jerzy Kuryłowicz, who pointed out that certain h-like consonants in the Hittite data occupied the places that Saussure had hypothesized in 1879, nearly 50 years earlier.
The last 80 years in Indo-European studies have, to a large extent, been a reassessment of the previous 150 years' research based on these discoveries, with deniers as well as accepters of the changes required by the data. The same techniques have been applied to dozens of language families, large (Afroasiatic, Austronesian) and small (Muskogean, Miwokan). One experiment was to take the data of the modern Romance languages and reconstruct a protolanguage for them, which turns out to be very similar to but more importantly not exactly the same as Latin, a proof that there are limits to as well as benefits from our reconstruction techniques.
One way that is used sometimes is through poetry - if you look at pieces of Shakespeare where there is a strict rhyming scheme you'll notice that some words just don't rhyme. Given that Shakespeare knew what he was doing when he was writing poetry, there's a good chance those words did rhyme as they were pronounced at the time, though which way the pronunciation has changed may be ambiguous.
Similarly a pune ( or play on words ) will tend to work with homonyms, so when writers of the past are playful about language they can hand useful titbits to linguists of the future.
Another method is to compare isolated pockets of a given language. As the main population moves on, the remote group tends to remain somewhat locked in place, at least for a while. See, for example, the many discussions concerning "American" versus "English" languages. I can think of several apt comparisons to biological evolution, including the way this tool might work in reconstructing a language from modern remnants.
The paper was published in the Proceedings of the national Academy of Sciences. The Register does not allow URLs, so repalce the obvious words below with the appropriate punctuation marks:
http colon slash slash www.pnas.org slash content slash early slash 2013/02/05 slash 1204678110
for the abstract and a pointer to the complete PDF.
This post has been deleted by its author
This post has been deleted by its author
If it 'knows' the processes and patterns by which spoken language changes, can that process be applied to modern English (compared with older and ancient forms) in order to predict how it will be spoken in, say, 300 years time?
I realise that the modern world has a massive level of cultural exchange and pressure for language change than any previous natural process. It would be interesting to try it though.
My expectation was that it doesn't "know" so much as apply statistical analysis to large amounts of data- the Big Data version of knowledge. In that case it may not be as much use for projecting into the future, especially as many changes to the way language is used come from the description or use of new social, technological or environmental conditions. When we can predict those, things will have reached a very curious place.
Many scholars have tried and failed to crack Etruscan. There's no Rosetta stone. If this program can crack it, then it's a major advance. If not ....
Have they tested it on Basque, absent any help from a speaker of that language? Again that would be an acid-test. Banque is one of the world's anomalous languages, not related to any other in any known way.
You're missing the point. This program is not meant to decode the meaning of another language, it's supposed (as far as I can tell, anyway) to simulate the evolution of languages in reverse. This won't work on Basque or Etruscan or other language isolates, because they have no common ancestor, or at least none that is attested.
It can, on the other hand, simulate the common ancestor of, e.g., the English hound and the German Hund, but do so far more rapidly, if less accurately, than humans, basically helping us to develop a phylogenetic tree for all related languages and even tentatively reconstruct words from related languages that are now extinct.
Mmmm, Tocharian.
[Basque] is one of the world's anomalous languages, not related to any other in any known way.
Euskara, the Basque language, may be related to other extinct languages in that part of Europe, such as Iberian; and it may be related to some Northern Caucasian languages such as Chechen. (There's a theory that Euskara and the Northern Caucasian languages are both related to the Na-Dené languages, but that is controversial, to put it mildly. Under this theory, as I understand it, a lost protolanguage in Asia would be a common ancestor for Euskara, the Caucasian members, and the Na-Dené members of the superfamily, via waves of migration at least six thousand years ago.)
It's not quite accurate to say it's "not related to any other in any known way". It could be a true isolate, but there are various possibilities for distant relatives.
As others noted, though, Euskara isn't a candidate for the kind of work being discussed in this article. Presumably you could try to use this model to reconstruct a hypothetical protolanguage using Euskara and some of the candidate relatives, but the connections are very tenuous and it's hard to see how the results would be useful. Typically, you want to reconstruct a protolanguage either to understand the development of and relationships among its descendants, or to help in translating another descendant. Neither would apply with a hypothetical protolanguage created from such distant relations.
Thought the comments section would be the usual witless avalance of rage at dumb boffins who obviously don't know what they're doing, but not at all. What a nice surprise.
@James Micallef: I'm no linguist but I've read up on it a long while ago. There's been a lot of work done <http://books.google.co.uk/books?id=ONY_EVd9zNYC>.
No, you're not. Anyone in Middle School would "get" it too, and that's where it should be buried. Absolutely insipid; the headline writer should be fired for making everyone interested in an important anthropological work first endure such a sophomoric indulgence.
The first 2 repliers did, as did some of those below. What also may be underappreciated is that there is a place for humor in the conduct of science, & that linguistic archeology can be a sexy subject. (The truth of the latter assertion was exploited in the Indiana Jones film series.)
...try Nauatl on your cunning compu. it's older than Sanskrit/Hebrew, has the oldest Calendar, the Tonalamatl=
Tonatiuh(N/the sun Anthony)=tonalli(N)=soul,=tone/tune/detonate. used during the upper stone age and
the Tlaloc Nomad Deer/Mazatl(N/7 Tonalamatl)=maz(OHG)=meat(E)., 45k BCE to 10k BCE.
While the merits of the proposed methods are still unclear - for example, there are quite a few assumptions about cognate identification that may bias the model and that are really only discussed in the supplementary materials - one thing is for sure. This has ab-solute-ly NOTHING to do with Rosetta stone. I have no idea where the author gets that from ...
PNP (Professor in Statistical Linguistics)