Klasberg, S, Bitard-Feildel, T, Callebaut, I and Bornberg-Bauer, E
Origins and Structural Properties of Novel and De Novo Protein Domains During Insect Evolution.
Submitted, 2017

[Login to Download]


Over evolutionary long time scales, protein evolution is characterised
by modular rearrangements of protein domains. Such rearrangements are
mainly caused by gene duplication, fusion and terminal losses. Accordingly,
extant proteomes contain fewer domains than ancient ones but in longer
and more variable combinations per protein. Novel domains emerge very
rarely but rapidly multiply, probably because of their adaptive benefits. To
better understand domain emergence mechanisms we investigated 32 insect
genomes covering a speciation gradient ranging from ∼ 2 to ∼ 390 my. We
use established domain models and complementary Hydrophobic-Cluster-
Analysis (HCA), which does not require homologous sequences, to also
identify domains which have likely arisen de novo, i.e., from previously
non-coding DNA. Our results indicate that most novel domains emerge ter-
minally as they originate from ORF extensions while much fewer arise in
middle arrangements, resulting from exonisation of intronic or intergenic
regions. Many novel domains rapidly migrate between terminal or middle
positions and single- and multi-domain arrangements. Younger domains,
such as most HCA defined domains, are under stronger selection pressure
as they show signals of purifying selection. Young novel domains, either
linked to ancient domains or defined by HCA, and older novel domains have
higher degrees of intrinsic disorder and disorder-to-order transition upon
binding. However, the originating DNA sequence of de novo domains could
only rarely be found in sister genomes. We conclude that novel domains are
often recruited by other proteins and undergo important structural modifi-
cations shortly after their emergence, but evolve too fast to be characterised
by cross-species comparisons alone.