Dohmen E, Klasberg S, Bornberg-Bauer E, Perrey S and Kemena C
The modular nature of protein evolution: Domain rearrangement rates across eukaryotic life
, 2018

[Login to Download]


Modularity is important for evolutionary innovation. The recombination of existing
units to form larger complexes with new functionalities spares the need to create novel
elements from scratch. In proteins, this principle can be observed at the level of protein
domains, functional subunits which are regularly rearranged to acquire new functions.
In this study we analyse the mechanisms leading to new domain arrangements in five
major eukaryotic clades (vertebrates, insects, fungi, monocots and eudicots) at
unprecedented depth and breadth. This allows, for the first time, to directly compare
rates of rearrangements between different clades and identify both lineage specific and
general patterns of evolution in the context of domain rearrangements. We analyse
arrangement changes along phylogenetic trees by reconstructing ancestral domain
content in combination with feasible single step events, such as fusion or fission. Using
this approach we explain up to 70% of all rearrangements by tracing them back to their
precursors. We find that rates in general, but the ratio between these rates for a given
clade in particular, are highly consistent across all clades. In agreement with previous
studies, fusions are the most frequent event leading to new domain arrangements. A
lineage specific pattern in fungi reveals exceptionally high loss rates compared to other
clades, supporting recent studies highlighting the importance of loss for evolutionary
Furthermore, our methodology allows us to link domain emergences at specific nodes
in the phylogenetic tree to important functional developments, such as the origin of hair
in mammals. Our results demonstrate that domain rearrangements are based on a
canonical set of genetic operations with rates which, if appropriately measured, lie
within a relatively narrow and consistent range. In addition, gained knowledge about
these rates provides a basis for advanced domain-based methodologies for phylogenetics
and homology analysis which complement current sequence-based methods.