Reverse engineering language to find it’s origin

ResearchBlogging.orgI could write an introduction the explains how fundamentally important language is but I have a feeling I’d be telling you nothing new. Our spoken language forms the basis of co-operation and is one of the most obvious differences between us and chimps (along with bipedality and a lack of fur).

The obvious importance of language naturally makes most people curious about it. Who first spoke? Why did they decide to? How did they figure it out? When did all this happen? The study of language is a vibrant field that attracts all sorts of people who want to learn about this crucial trait. Luckily for them language behaves in a manner akin to evolution. It is passed through generations and occasionally changes in the process. This means some evolutionary techniques could be applied to language. For example, if you could measure the rate of this evolution you might be able to work backwards and figure out how long it has been around.

However this potential for evolutionary study is often little more than a scientist-tease. Language is also influenced by a range of social factors that often mask or otherwise alter its evolution. As such there are many cases where researchers think they’re onto something, only for it to turn out to be worthless (or at least not as profound as they thought).

Scientific Language tree

All the while the people who managed to figure something out just laugh at you

However, two researchers think they’ve gotten around such problems and believe they may have figured out when languages first arose. Charles Perreault and Sarah Mathew looked into phonemic diversity, which seems to change at a set rate.

A phoneme is essentially a sound, so phonemic diversity is the number of sounds included in a language. English, for example, includes “th” as in “they” and “uh” as in “cup.” In case you’re still don’t get it (or are just curious) a complete list of English phonemes can be found here.

Unlike other linguistic elements, like words, phonemes aren’t very strongly influenced by culture. Whilst the inventions of the computer has introduced many new words into the English language it hasn’t added any new sounds (except maybe this).


The effects of bottlenecking can also be seen in genes.

Phonemes have been used before, notably to study where language originated. When a small group moves to a new region they take with them a limited sample of the original population leading to reduced diversity in that pioneering group. This is known as “bottlenecking.” As such the most diversity will be found in older populations whilst groups which split off from this will have reduced diversity. Those who split off from the migratory group will have even less diversity still. Since the most diversity is in Africa, this means that is where language started.

This study also concluded that phonemes accumulate at a faster rate in larger populations and the new research builds on that. If two groups migrate from an ancestral population and one moves to a large area whilst the other goes to a small, isolated island, then the latter’s phonemes will not change as much. As such the island population acts as an effective “control” population with phoneme diversity similar to the ancestral population. You can then compare this to the other, larger group to see how many new phonemes have arisen. Then its simply a case of dividing the number of new phonemes by the time since the two groups diverged and boom! You have calculated the rate phonemes change, allowing you to calculate how long it would’ve taken to accumulate all the phonemes in language and thus how language has been around.

The paper included this diagram of what I just described, although it’s arguably the least useful diagram possible.

Therefore all you need to calculate how long language has been around is two related languages, one from a small isolated area and one from a larger area. Also, you need to know how long they have been separate. The researchers found a situation that provided this information in Southeast Asia. There, genetics suggests that the Andaman islands and mainland Southeast Asia were colonised at roughly the same time by the same group of people ~70,000 years ago.

So they plugged this data into their equation and got the rate at which phonemes accrue. Then they looked at how many phonemes are found in the most phoneme diverse languages (which are apparently the click languages from Africa) and worked out how long it would’ve taken them to get that number of phonemes. Their results varied depending on how many phonemes they assumed the first language had, with results ranging from between 150-600 thousand years ago.

The age of language assuming different numbers of phonemes in the first language and different time ranges since Southeastern Asian languages split.

So can we finally close the case on when language appeared? Sadly not quite yet since this study does have a fair number of flaws. Importantly, the original phoneme research regarding where language appeared from has been roundly criticised. Without it most of the foundations of this new study are destroyed.

On top of that it would also seem that a range of additional factors influence phoneme diversity. For example, it would seem that languages have a tendency to simplify which would artificially decrease phoneme diversity over time. If phonemes can’t be relied upon as a steady “clock” by which to measure language age then this study is useless. Further, their phoneme clock was calibrated against a single location. As such there is always the danger that it is not representative of how phonemes change over time in general, so applying these findings generally is pointless.

However, the researchers do acknowledge most of these flaws so full points for their honesty and rigour. But being honest doesn’t make you right and so – as much as I want to reward them for being truthful – there is still much to be skeptical of about this study.

Perreault C, & Mathew S (2012). Dating the origin of language using phonemic diversity. PloS one, 7 (4) PMID: 22558135

11 thoughts on “Reverse engineering language to find it’s origin

  1. Good post. The approach makes a loud of assumptions. Why couldn’t all (or almost all) of the phonemes have arisen suddenly as part of a kind of “linguistic Big Bang” (perhaps due to a lucky random mutation that suddenly kicked Homo’s brain into overdrive) and then all the phonemic evolution would just be rearranging those existing pieces?

    I mean I don’t think anyone would use this technique to work out when, say, songbirds started singing, even if you worked out that the songs they sung were spread ‘culturally’ and gradually changed…

    • How phonemes originated is a crucial missing piece of information when it comes to studying this. I mean, it might have happened in a big bang like you suggested, or it could’ve arisen multiple times in different places (neanderthals also seem to have had language, yet their evolutionary history means it probably wouldn’t the African one hinted at by this study).

      Without all of that information this method can’t really show anything. Most of the ones like it I see just seem to be attempts to make language history fit the out of Africa hypothesis.

  2. Pingback: Dating the origin of language? « EvoAnth « Secularity

  3. communication is something that had to happen through the course of evolution .to adapt and educate has given us many benefits which include mainly being at the top of the food chain self preservation.

    • A lot of the basic building blocks for language are already present in other species (see my post “Vocal learning show in primate” for an example). Whilst obviously all of these evolved, a lot of them seem to have evolved before the human lineage appeared. We just took them to the extreme.

  4. Folks are also looking at ears (audio canal dimentions): tuned for 3kHz = language (humanesque), 5 – 6 kHz (dogs, chimps) evolved to listen to ‘nature sounds’. Rather a stretch, not sure how they are coming.

    • If I recall correctly, one of the skulls from Sima de los Huesos has a “human tuned” ear canal, which would indicate that language – or some similar vocal form of communication – was prevalent and important by that point in time. The only trouble is that dating of the Atapuerca specimens has been…controversial.

  5. Interesting concept. Instead of narrowing the languages down to one original language and getting Hundreds of thousands of years, what about narrowing it down to the main 10 families of languages from their figure 1? That would take into account the sort of “linguistic big bang” theory Neuroskeptic mentioned. It would be interesting if the number of years they calculated from that come in close to the time around people came “out of africa” around 60-100 000 years ago.

    There is this one old story about the tower of Babel…

  6. As with the so-called Mitochondrial Eve. The very best this language exercise could achieve is to trace back to a most recent common ancestral toung. Language could have developed millions of years earlier but all earlier divergences have been lost at some bottleneck. There is the well noted genetic/cultural bottleneck with ‘modern’ humans. The many extinct species could have taken with them several ‘older’ dialects.

  7. The fact remains that empirical evidence for written language is only to approx. 3200 BC, and linguistic analysis of all known languages can place an origin only to 8,000 BC. Assuming an evolutionary advancement of the ear to accompany an evolutionary “jump” in speech anatomy only makes sense if there was a project leader directing the whole array with a set of blue prints and a really great imagination! After working for the US government for nearly 35 years, it is nearly impossible to see a project to fruition even when it is planned, blueprinted and funded; how can such complex features such as speech and ear anatomy know which direction to proceed in, not only to improve but to somehow perfectly compliment each other … unless someone or some “thing” mapped it all out? Assume for a moment that I had a great pictorial plan of how presumed ancient man would need to evolve to produce an ear that was perfectly tuned to new vocal and lingual anatomy and I etched this plan permanently in stone and handed it to our non-lingual ancestors, who passed it on to successive generations. Can a rational scientist believe that all of their staring and wishing and grunting will result in successive babies that have developed ears and vocal capacity to match an equally evolved linguistically capable brain? Yes, it is a convenient answer, but simply illogical. Look around you. All of the progress you see was the result of purposeful and hard work, most of which was based on the elaborate, detailed plans of group effort with somebody in charge, not the result of spontaneous occurrence.

You evolved too. Have a say.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s