Schmitz, JF and Chain, F and Bornberg-Bauer, E
Evolution of novel genes in three-spined stickleback populations
, 2018

[Login to Download]


Abstract

The question of how novel protein coding genes emerge has rapidly gained interest over the past
decade. Most observed novel genes seem to originate from the duplication of existing genes. Additionally,
de novo gene emergence, which describes a protein-coding gene emerging from a previously
non-coding region, has recently garnered support as an important contributor to novel protein emergence.
However, fixation and early evolutionary properties of novel genes in general and de novo
genes in particular remain unclear. We use comparative genomics and transcriptomics to analyse novel
genes found in populations of the three-spined stickleback, Gasterosteus aculeatus, an ecological
model species. Across 9 geographically distinct populations we detected 991 expressed novel genes,
many of which have likely emerged de novo. Novel genes tend to be expressed in fewer populations,
fewer tissues, and at lower levels compared to older genes. Many more novel than older genes are
expressed exclusively in gonads. Structural properties of novel proteins such as disorder and aggregation
propensity do not differ from those of older proteins, whereas nucleotide sequence properties do.
Interestingly, younger genes are more often associated with genomic regions recently duplicated or
deleted (i.e. copy number variations, CNVs), suggesting a role of CNVs in novel gene emergence. We
also find the novel genes to be less strongly methylated than old genes, hinting at a role of methylation
in novel gene emergence, e.g. in regulation of expression. Taken together, our findings suggest that
novel genes including de novo genes typically start out as transcripts with low expression and high
tissue specificity. Although there is a high likelihood that such young narrowly expressed genes are
rapidly lost, those that survive can subsequently spread through populations gaining broader and
higher expression levels.