There are many out there that refuse to accept that the same science which is used to convict criminals can also be used to demonstrate conclusively that humans evolved from common ancestors with chimpanzees, which evolved from common ancestors with great apes, etc. Very strange, since lay people are so trusting of the former. I’m going to write here about a story about one gene, out of the many thousands of similar stories in our genome, the NANOG gene. For those that are interested in molecular studies of evolution, with particular reference to human evolution, I strongly recommend Daniel Fairbanks Relics of Eden. This is a great primer for how molecular biology is applied to the problem of evolution. The following information is contained within Chapter 5 and Appendix 1 of Relics of Eden.
What is NANOG?
NANOG is a homeobox gene which encodes a transcription factor critical in development. In other words, it is a regulatory gene. The products of this gene allow cultured stem cells to divide indefinitely without differentiation (to become specialized cell types). It becomes activated shortly after conception.
What are ’pseudogenes’ and how do they arise?
In the human genome, there are 11 pseudogenes (genes that have been inactivated through one of the many types of mutation) of NANOG: ten are retropseudogenes and one is a duplication pseudogene. What do we mean by each? First, genes are typically not whole sections of DNA. They are segmented into two types of DNA fragments: introns and exons. The whole gene, exons and introns together, are transcribed into RNA, but subsequent processes remove the introns. Thus, the exons hold the code producing the DNA products and the introns are junk bits (except for the bit of intron code telling transcriptase that it is indeed an intron). A duplication pseudogene is one where the gene is copied wholesale, introns and exons. But a retropseudogene is a retroelement (inactivated through mutation), which is a segment of DNA which can copy itself through the transcription process into RNA, and then back into another region via reverse transcription (from RNA to its DNA equivalent).
How do we tell the difference between duplication and retropseudogenes?
There are several ways to differentiate between a duplication pseudogene and a retropseudogene. Since introns are removed soon after the transcription process, retropseudogenes have no introns. Duplication pseudogenes are wholesale copies, introns and all.
There is a second telltale sign that a pseudogene is a retropseudogene. Reverse transcription from RNA to DNA causes the addition of a poly-(A) tail, which is a string of adenine bases. Thus, retropseudogenes often (not always) have a poly-(A) tail associated with them in the DNA. Reverse transcription is sometimes incomplete and only part of the DNA corresponding to the RNA may be inserted. Duplication pseudogenes never have this poly-(A) tail.
There is one more way that retropseudogenes can be identified. The insertion of the retropseudogene into DNA repeats portions of the original DNA on either side of the pseudogene. What happens is that the DNA is not sliced on either strand at the same point, but rather in a staggered fashion. When the missing bases are filled in, a section of DNA is left which is a repeat of the opposite end of the inserted pseudogene.
Which NANOG pseudogenes are from duplication? Which are inactive retroelements?
The NANOG pseudogenes which are retropseudogenes are numbered: NANOGP2, P4, P5, P7, P8, P9, P10 and P11. All have poly-(A) tails and all lack introns. NANOGP1, on the other hand, has no poly-(A) tail and contains introns and is thus a duplication pseudogene. In addition, it lies very close to the active NANOG gene, typical of duplication genes arising from tandem duplication. While not definitive, it is at least indicative of a gene resulting from duplication. The NANOG retropseudogenes, on the otherhand, are scattered amongst the various chromosomes. NANOGP3 and P6 are not full length pseudogenes, due to the lower fidelity of reverse transcription, and lack poly-(A) tails, but can still be identified as retropseudogenes because of their lack of introns.
How can we tell how old a retropseudogene is?
The relative age (and approximate real age) of each retropseudogene can be determined by studying the poly-(A) tail. Mutations in these tails are selectively neutral. That is, are neither detrimental or beneficial mutations and thus there is no selection to maintain or remove them. Thus, the number of mutations within the tail gives an idea of how old the pseudogene is. NANOGP8 is relatively new. It has a tail comprised of 26 adenines without any changes:
Compare that to NANOGP5:
A slightly longer tail with quite a few mutations means that this gene is much older.
A little less visually pleasing is to compare the sequences of the pseudogenes (inactive and therefore any mutations in them are selectively neutral) to the original gene. Similar to that for using the poly-(A) tails as the yardstick, the number of differences in the sequences of the original and pseudogene copies of NANOG tell us how old the copy is. For duplication pseudogenes, this is obviously the way to go. But it is certainly much easier to see this visually for poly-(A) tails. Again, an analysis comparing the original and copied sequences show that NANOGP8 is the youngest of the NANOG pseudogenes.
A final check on the relative ages of pseudogenes is to determine mutations in the source gene and then determine the order of the pseudogenes in terms of the number of source-gene mutations required to produce that order. This will become important in interpreting the data for NANOGP1 below.
NANOGP1 is unusual
The methods used to determine the relative ages of pseudogenes as described produced a strange result: comparing NANOG pseudogene and NANOG sequences suggests that NANOGP1 is younger than NANOGP4, while the source-mutation method suggests the reverse. This needed explaining.
It turns out that NANOGP1 is not a nonfunctioning gene. A search of the RNA databases shows that RNA had been derived from NANOGP1. The first intron in NANOG and NANOGP1, it turns out, are different. The segment which contains the most number of mutations in NANOGP1 is, you guessed it, in that first intron, which is unique to that pseudogene. It is in fact an exon in the RNAs produced by NANOG. NANOGP1 was not always inactive and indeed is still active, if only barely.
Natural selection acts only on active genes. Since NANOGP1 was (is) an active gene, the sequence in the exons would be conserved. In this case, then, not all mutations were equally likely. Thus, from a sequence comparison standpoint, this pseudogene looked much younger.
NANOG in chimpanzees and humans
Both chimpanzees and human genomes contain NANOG pseudogenes, but chimps have 10 whereas humans have 11. The extra pseudogene (and the youngest), P8, is contained on chromosome 15. Thus, the P8 pseudogene inserted itself into our genome after the chimp-human split. This happened about 5 million years ago, so that this pseudogene was soon after the divergence.
The site in the chimp DNA at which P8 was inserted in ours has been identified and no pseudogene has been found at that location in the chimp genome.
Further to this, confirming the young age of the P8 pseudogene, a Alu element (the most common retroelement in the genome) is positioned near one end of the functioning NANOG gene in both chimps and humans. This Alu element lies in the 3’ untranslated region (3’ UTR) and, since it is not located in an intron, survives transcription and is copied. This element has no effect on activity or on the final gene product since it is not within the reading frame (it is not in a region which encodes the final product, hence it is in an untranslated region). The positon of the Alu element in the 3’ UTR is the same in the copy as in the original and is in the same position in both humans and chimps in the active NANOG gene.
Only the P8 pseudogene has this Alu element. What this means is that this Alu retroelement inserted itself in that position after the other ten previous retropseudogenes were created but before the chimp-human divergence point. When P8 was created, this Alu element was in place and was copied into the pseudogene. Interestingly, this Alu element is is absent in rhesus macaques. Thus, the element was introduced after the rhesus-(chimp/human) lineages diverged.
What I’ve tried to represent here is the detective work which goes on in how genomes change over time using contemporary species. Science is not ad hoc. Results which do not seem to fit not only need explaining, but verification of the explanation is required. This is serious work and those who try to paint it as guesswork are simply ignorant fools belittling things which are beyond their knowledge base. But this stuff is accessible to everyone and those that choose to take a good look at it will find that it is amazing and wondrous!