Over the last years there has been an increasing amount of evidence for the de novo
emergence of protein coding genes, i.e. out of non-coding DNA. Here, we review
the current literature and summarise the state of the field. We focus specifically
on open questions and challenges in the study of de novo protein coding genes
such as the identification and verification of de novo emerged genes. The greatest
obstacles to date is the lack of high-quality genomic data with very short divergence
times which could help precisely pin down the location of origin of a de novo gene.
We conclude that, while there is plenty of evidence from a genetics perspective,
there is a lack of functional studies of bona fide de novo genes and that there is
almost no knowledge about protein structures and how they come about during the
emergence of de novo protein coding genes. We suggest that future studies should
concentrate on the functional and structural characterisation of de novo protein
coding genes as well as the detailed study of the emergence of functional de novo
protein coding genes.