Humans are only 70% chimp?

Around 80% of the human genome has an equivalent section in the chimp genome. When we compare these sections we find that they’re around 99% similar. That means, for example, that you only need to change the hair-related genes very slightly to produce either a naked ape or a hairy one. That’s fascinating, and that’s where phrases such as “99% chimp” come from. Not from looking at the entire genome, but so-called “alignable” parts.

Except they aren’t 99% similar. Or at least, that’s what Tomkins claimed in a recent article published in Answers in Genesis’s scientific journal. His analysis revealed that we’re only 70% similar; and this new number has gotten a lot of publicity in creationist circles. If you’ve spent any time reading ICR or AiG, or even DI, you’ve probably heard this 70% figure being used as evidence against evolution. But how can he arrive at such a different conclusion from the rest of science? Did he make a mistake, or is he stumbling onto a truth suppressed by the hundreds of geneticists examining the human genome each day?

On the face of it his methodology seems sound. Tomkins basically cut the human genome up into loads of little sections (known as slices) and compared them to slices of the chimp genome. He then determined the percentage of which was “optimally aligned” and that was the figure he used. Although this is a different method to what other researchers used, there’s nothing hugely wrong with it.

Tomkins’ results for how similar the genomes are when different length slices were used

Similarly, he processed the genome before his analysis with a computer algorithm he wrote himself. Although obviously different to what other researchers do, there’s nothing inherently wrong with it. At almost every step in the research he does something unusual, but not necessarily unscientific. This leads to the first problem I have with his research: “degrees of freedom.”

Over the course of research a scientist has many options as to how to proceed. What statistical analysis should they use? How long should the experiment run for? What computer program should they use? And so forth. Individually these decisions are innocuous but taken together they introduce biases into the data.

For example, insertion or deletions are instances where bits of the genome have been duplicated or deleted. Since large “indels” can arise from a single duplication or deletion event, these are typically counted separately with a different method that takes this into account. Tomkin’s lumps them into his analysis of the whole genome. There’s nothing inherently wrong with doing so, but it will bias the results and make chimps and humans seem more distant.

This is one of the reasons replication is so important: getting a different set of researchers to do something likely means different decisions will be taken. If the results are still the same we know the decisions of the researcher (known as “degrees of freedom“) didn’t bias the results too much. The problem here is there’s just the one experiment to go on, with just the one program, one algorithm, one definition of “optimal alignment” used and so forth. There’s nothing inherently wrong with the program, algorith etc. but there is the potential for degrees of freedom to result in biases. Until other researchers try and replicate the results with slightly different methods then these results should be taken with a grain of salt.

Of course, science has been comparing chimp and human DNA for quite a while now, with several experiments using slightly different methods all arriving at figures of 95-99% similar. For example, the original comparison of chimp and human DNA and the recent comparison of chimp, bonobo and human DNA used different computer programs and so forth, but still arrived at a figure of chimp/human similarity of ~99%. Even work by a young earth creationist arrived at this figure! Given all of these different studies arriving at the same conclusion, I’m inclined to think that Tomkin’s methodology – whilst not inherently wrong – introduced a few too many biases into his results.

But say lots of people replicate his method and get similar results, vindicating it. There’s still nothing there that’ll challenge evolution. He’s compared one animal to humans with a unique method and gotten a unique number. Is that number unusually low or high, thus challenging evolution? Without more animals analysed with this method we can’t say. If he examined a crocodile and found it to be 90% similar, then there’d be evidence against evolution. Just having one anomalous number produced with an anomalous method. Hardly challenging. 

So next time you see the 70% figure, I’d be skeptical. The methodology has probably introduced biases (particularly by counting indels into the total figure) and the results need context. Until that’s done Tomkins might as well be saying “humans and chimps are eleventy armchairs similar!”

21 thoughts on “Humans are only 70% chimp?

  1. Will be interested to see what Eye on the ICR and Evoanth might make of this propaganda piece (this post is being made at both blogs):
    http://www.icr.org/article/7413/
    (This is the ‘silent’ link at footnote 3: http://www.cell.com/AJHG/abstract/S0002-9297(13)00073-6)
    Regarding this Abstract, Tomkins claims “a modern living human has been discovered who has Y-chromosome variation that increases the range of human DNA diversity beyond that of so-called archaic humans” (Neanderthals and Denisovans). However, it is not clear to me that the paper in the AJHG reports such a discovery. Of course, whether or not this is scientifically justified, YECs wish to tell the world that there has only ever been one ‘human’ race.
    Tomkins then starts talking about human and chimp Y-chromosomes and their differences, flagging a 2010 paper in Nature (Evoanth’s latest post dated 6 May already discusses another Tomkins article on this same general theme – he claims that humans and chimps are only ‘70%’ genetically similar.)

    Of course it’s all simple. there is the human ‘kind’. And there are various ape ‘kinds’ (some of which are now extinct or have extinct members).

    If you are a YEC.

    And if you look at Y-chromosomes only.

    I also recently noticed this (but I don’t think Batten discussed this ‘99%’ figure at the particular talk in question):
    http://kingsmeadbaptist.com/?page_id=1092

    • There new paper doesn’t compare the Y chromosome they identified with Neanderthal or Denisovan chromosomes. So I thought that Tomkins may have looked at the archaic genomes himself and noticed they had derived alleles, compared to the new Y chromosome. However when I went looking at the genomes myself I couldn’t find much data on Neanderthal or Denisovan Y-Chromosome. In fact, many of the genomes for those species we’ve sequenced appear to be from females; hence the lack of data.

      So either he’s analysed a chromosome I couldn’t find and noted that it had derived alleles where this human had archaic; showing that there is more variation between me and the African American than there is between me and a Neanderthal. Or he’s just assuming that such derived alleles exist in the archaic genomes. If it’s the former then he needs to show his working; if it’s the latter then he’s got no real argument.

      Given we could interbreed and produce fertile offspring, I think it’s not that surprising that Neanderthals and modern humans aren’t massively different genetically.

  2. Re my comment (which currently awaits moderation due to links but IS already visible at the Eye on the ICR blog under the post dated 1 May) I’ve just noticed that it is the SAME Nature paper which is flagged in both the Tomkins article here http://www.answersingenesis.org/articles/arj/v6/n1/human-chimp-chromosome as critiqued by Adam and also the new Tomkins ‘mishmash’ of an article here (which I expect Peter/Eye will wish to attend to): http://www.icr.org/article/7413/

  3. In fact, my FIRST post above IS now visible but the second isn’t yet (it can already be read under the Eye on the ICR blog post of 1 May where I reproduced it – flagging that the SAME Nature paper was being flagged in both the recent Tomkins articles).

  4. I find that the most telling part of Tomkins work here is that when he refers to a “conservative estimate” of similarity he really means the maximum possible value. As to the idea that there may be enough variation in modern humans to encompass Neanderthals and Denisovans, I think Tomkins’ conclusion is based entirely on confusion over what is meant by the term “archaic human” rather than actual data. I think I’m going to have to add a summary to the end of my own post now that this has resurfaced.

    • It’s hard to say for sure where his conclusions come from given that he doesn’t show his working. His methodology section in the 70% paper is also woefully short. I’m sensing a theme here

      • My favourite part of the 70% paper is the way he spends more of the method describing the computers on which he performed the science, rather than the science itself. If it were’t for the fact he seems like a competent geneticist, I’d call it properly cargo cult stuff.

  5. Pingback: A Strange New Paternal Lineage | Eye on the ICR

  6. I’m sure you realize your criticisms apply equally to researchers making 95-99% returns. Saying a YEC found that, too doesn’t wave it away. There is immense paradigm pressure in the academic world to keep Human-Chimp similarity as high as possible.

    • If the only evidence for the high similarity was a single study conducted by a single individuals using a single methodology with high degrees of freedom that ran counter to other results then yes, that claim would be just as suspect as the idea that we are 70% similar.

      Fortunately we have multiple labs running multiple studies using different methodologies and programs all converging on a similar result. As such the 95-99% claim is a lot more solid

      Now, if the creationists can replicate their results using different methodologies with fewer degrees of freedom (and ideally with different teams) then they might be onto something. Until then there’s little reason to give them any credence.

      Regardless of paradigm pressure, science works. Repetition and strong methodologies trump preconceptions any day of the week

      • Adam, maybe you should be asking yourself why a lone creationist geneticist was able to empirically arrive at the 70% figure with sound methodology. If the well-funded evolution industry has been studying Human-Chimp genetic similarity for years, then why do we not have a wide spectrum of data and varying genomic comparison methodologies? (Some that return 99%, some that return 90%, 80%, 70%, etc.) Seriously think about this for a minute.

        I think the reason is clear. For evolutionists, returning a human-chimp similarity figure of 70% means “”You’re using an inaccurate method for determining common ancestry”” Since evolutionists *know* evolution is a fact and they *know* chimps are most closely related to humans. This is a real preconception driving evolutionary research. You know it and I know it.

        There are countless examples of this type of circularity in evolutionary reasoning. Where empirical data is a priori assumed to be an anomaly of some kind because it contradicts the picture of common descent. One clear example that comes to mind is microRNA shuffling around conventional mammal phylogeny that was recently reported on by Nature. But one could go on and on and on with such examples.

        • Except we do have a variety of comparison methodologies. As I mentioned before, pretty much every major study into genetic comparisons has a slightly different methodology as researchers vary the dataset, software and so forth involved. Aside from these standard genetic comparisons, there are several fundamentally different methods of comparing species genetics as well. You could compare the structure of the various genomes, rather than the genes themselves, or the proteome of various species (the sum total of proteins produced by the genome).

          There are even radically different ways of doing the standard genetic comparison. For example, an insertion or deletion mutation can shift the position of surrounding nucleotides. Although many nucleotides can be affected, this is typically counted as one “difference” because it is the result of one mutation. Sometimes researchers want to investigate these shifts in more detail, and might count all of the changes individually to determine their extent.

          Depending on which of these fundamentally different methodologies is used the results for human/chimp similarity can vary quite significantly. The lowest value I’m aware of is for proteome comparisons, which show that human and chimp proteins are only ~30% similar. Despite this variation in absolute percentages, all of these methodologies show the same thing: humans are more closely related to chimpanzees than any other animal.

          The fact that tinkering with the methodology in this manner can result in these varying percentages further emphasises the importance of describing your methods in detail, and makes Tomkins vague methodology all the more egregious. Even assuming his methodology is sound (which is difficult given the lack of details, but not unlikely) unless we know which methodology he is using his conclusions are meaningless.

          Changes to the methodology do not fundamentally change the result of genetic comparisons, showing that their conclusions are not the result of biases or flaws in the methodology. Fundamentally different methodologies also converge on the same conclusion, confirming that humans are closely related to chimps.

  7. Sorry to bother you so long after the original post, but do you have a source for “Around 80% of the human genome has an equivalent section in the chimp genome”? This is something I would like to learn more about.

    • Looking back through my notes it seems like I was referring to the “Initial sequence of the chimpanzee genome and comparison with the human genome” which notes “Of ~7.2 million SNPs mapped to the human genome in the current public database, we could assign the alleles as ancestral or derived in 80% of the cases according to which allele agrees with the chimpanzee genome sequence”. That said, they did note that of the 20% they couldn’t align between chimps and humans, 18% were because (at the time) the chimp genome was incomplete.

        • Nowadays we have the complete genome, and a few papers have been done just comparing us again using these more accurate numbers. They provide figures of 96% overall similarity. See “Comparing the human and chimpanzee genomes: Searching for needles in a haystack” for example. Now most research has shifted towards from just comparing similarities to trying to identify what are the differences/similarities. This is why you wind up with research say just looking at the bits that do align that indicates we have 99% similarity. They want to know what that 1% of similar, yet different, DNA is.

          • The “needles in a haystack” paper from 2005 looks familiar and I think I’ve already read it. Since it’s from only 3 months after the initial chimpanzee genome paper, I’m wondering if it had any more of the chimp genome to work with. I’m mostly trying to figure out if your statement, “Around 80% of the human genome has an equivalent section in the chimp genome.” is true, false, or that we just don’t know.

            Tomkins seems claimed in his paper: “only 70% of the chimpanzee DNA was similar to human under the most optimal sequence-slice conditions”. Does this indicate that Tomkins is calculating the reverse of your 80%? I’ve heard the chimp genome is about 10% larger than humans. If so I would think looking for human matches to chimp would detect a larger difference than chimpanzee matches to humans. Or were Tomkins’s numbers inflated because we’re still missing a large amount of the chimpanzee or human genomes?

            Sorry for all the questions and I hope I’m not coming off as hostile. Just trying to understand myself.

          • You don’t sound aggressive and I appreciate the questions. Genetics isn’t my strong suit and I appreciate being driven to investigate it more. I’ll probably have to rewrite this post at the rate I’m going.

            So, onto the actual questions. Yes, the haystack paper had more to work with than the original one. That was looking at the draft sequence from a single chimpanzee; the haystack paper was looking at three complete genome. Once you have one chimp genome (even a draft), it becomes much easier to get others. Hence why with humans it took us over a decade to get one genome, but less than a decade to sequence 1,000 genomes.

            As for why Tomkins got his different number; it’s difficult to say given he doesn’t describe his methodology particularly well. This is important because depending on what you can measure you get different results. Comparing only the bits of the genome that code for amino acids, us and chimps have >99% similarity. Comparing all the alignable bits gets figures of between 95% – 98%. Comparing how much is alignable in the first place can come up with figures from 80% (the original paper) to 96% (haystack paper).

            Now I’ve read a fair bit of Tomkins’ work. I don’t think he’s incompetent and his comparison is likely to be valid. He’s just using a new set of criteria to make that comparison. Maybe he’s only counting orthologs, maybe it is that he’s not looking at length, maybe he is counting in/dels differently…there are a lot of possibilities and they don’t invalidate the work. It just means we don’t have any context to place his 70% figure in. Is it really high or really low when compared to other comparisons made using his shiny new method?

            And this is the point I was trying to make in this post. One of the ways this is a great piece of evidence in favour of evolution is that regardless of what you compare, chimps always come out as more closely related to humans. So what we need to do is repeat Tomkins’ method, but use it on other animals. Find out if 70% is a low figure.

            If we do this and find that using this approach a tortoise is more closely related to us than a chimp then serious questions will be raised over evolution. Until then, this is the equivalent of someone saying “this house is two stories high”. Then Tomkins turns up and goes “actually it only has 3 bathrooms”.

You evolved too. Have a say.