Dohmen E, Klasberg S, Bornberg-Bauer E, Perrey S and Kemena C
The modular nature of protein evolution: Domain rearrangement rates across eukaryotic life
BMC Evolutionary Biology, 2020

[Login to Download]

[Online Access]


Modularity is important for evolutionary innovation. The recombination of existing units to form larger complexes with new functionalities spares the need to create novel elements from scratch.
In proteins, this principle can be observed at the level of protein domains, functional subunits which are regularly rearranged to acquire new functions.

In this study we analyse the mechanisms leading to new domain arrangements in five major eukaryotic clades (vertebrates, insects, fungi, monocots and eudicots) at unprecedented depth and breadth.
This allows, for the first time, to directly compare rates of rearrangements between different clades and identify both lineage specific and general patterns of evolution in the context of domain rearrangements.
We analyse arrangement changes along phylogenetic trees by reconstructing ancestral domain content in combination with feasible single step events, such as fusion or fission.
Using this approach we explain up to 70\% of all rearrangements by tracing them back to their precursors.
We find that rates in general and the ratio between these rates for a given clade in particular, are highly consistent across all clades.
In agreement with previous studies, fusions are the most frequent event leading to new domain arrangements.
A lineage specific pattern in fungi reveals exceptionally high loss rates compared to other clades, supporting recent studies highlighting the importance of loss for evolutionary innovation. Furthermore, our methodology allows us to link domain emergences at specific nodes in the phylogenetic tree to important functional developments, such as the origin of hair in mammals.

Our results demonstrate that domain rearrangements are based on a canonical set of genetic operations with rates which lie within a relatively narrow and consistent range.
In addition, gained knowledge about these rates provides a basis for advanced domain-based methodologies for phylogenetics and homology analysis which complement current sequence-based methods.