Dowling D, Schmitz JF, Bornberg-Bauer E
Close-up of primate specific de novo genes---transitions from nascent transcripts to early proteins
Molecular Biology & Evolution,

[Login to Download]


Widespread transcription outside of known genes creates a vast pool of novel transcripts. Translation of such transcripts can lead to new proteins with hitherto unknown functions and high adaptive potential. In the human genome, several proteins have arisen <i>de novo</i> from ancestrally non-coding DNA. The early stages of their emergence, including how young transcripts transition into mature protein-coding genes is not well understood. Here, using transcriptomic data from human and six other primates, we present a fine-scale analysis of how thousands of novel transcripts containing open reading frames (ORFs) emerged, corresponding to 90 million years of primate evolution. We find that these transcribed ORFs are rapidly gained from non-coding regions of the genome, including intergenic regions and repetitive regions such as <i>Alu</i>
elements. While the high number of species-specific transcripts (e.g. 4429 human-specific transcripts compared to 1048 conserved between human and chimpanzee) suggests that they are rapidly lost shortly after they emerge, we find a much higher fraction are conserved between primate species than previously expected. ORFs show signs of gradual changes including increasing length, GC-content, and increased purifying selection. However, other properties of encoded proteins, such as aggregation propensity and intrinsic structural disorder remain stable. Young ORFs are preferentially found on GC-rich chromosomes, which influences subsequent amino-acid compositions of translated proteins products. Our results suggest that <i>de novo</i> genes which survive initial purging from the genome may rapidly acquire novel functions and gradually become more integrated into cellular processes and networks but without notable changes to structural properties.