Schmitz JF, Ullrich K, Bornberg-Bauer E
Incipient de novo genes can evolve from "frozen accidents" which escaped rapid transcript turnover
Nature Ecology and Evolution, 2018

[Login to Download]


A recent surge of studies suggested that many novel genes arise <i>de novo</i> from previously non-coding
DNA and not by duplication. However, since most studies concentrated on longer evolutionary time
scales and rarely considered protein structural properties, it remains unclear how these properties
are shaped by evolution, depend on genetic mechanisms and influence gene survival. Here we
compare open reading frames (ORFs) from high coverage transcriptomes from mouse and another
four mammals covering 160 million years of evolution. We find that novel ORFs pervasively emerge
from non-coding regions but are rapidly lost again while relatively fewer arise from divergence of
coding sequences but are retained over much longer times. We also find a subset (14%) of the
mouse-specific ORFs to bind ribosomes and be potentially translated, showing that such ORFs can be
the starting points of gene emergence. Surprisingly, disorder and other protein properties of young
ORFs hardly change with gene age in short time frames. Only length and nucleotide composition
change significantly. Thus some transcribed <i>de novo</i> genes resemble frozen accidents of randomly
emerged ORFs which survived initial purging. This perspective complies with very recent studies
which indicate that some neutrally evolving transcripts containing random protein sequences may be
translated and viable starting points of <i>de novo</i> gene emergence.