The Molecular and Genomic Basis of Phenotypic Innovations:
We are primarily concerned with the question of how molecular changes cause new phenotypes to emerge. Our main goal is to understand, from as many angles as possible, how protein evolution works. Furthermore, we want to understand which of the zillions of possible evolutionary paths at the molecular level are facilitated or hindered by biophysical constraints which govern structure and function of proteins and RNAs. Accordingly, we use computational simulations, transcriptomics (aka RNAseq), genomic data analyses and wet-lab directed evolution experiments to investigate the phenotypic effects of genotypic changes.
In particular, we aim at understanding evolvability (many mutations of a genotype create novel phenotypes), robustness (most mutations will preserve the phenotype), lock-in (the evolutionary history constrains the evolvability of a molecule), canalisation (the evolutionary history constrains phenotypic plasticity), epistatic ratchets (two or more neutral or deleterious mutations become beneficial in concert) and the role of promiscuity (multi-functionality) for escaping adaptive conflicts (a molecule has two (or more) fitness-relevant functions but can not optimise both simultaneously).
A long-standing riddle in molecular evolution concerns the question of how proteins, forming so many shapes and exercising so many fundamentally different functions, can evolve into new proteins, with new structures and functions. Several models try to explain such innovations without depriving an organism of its existing, possibly essential function exerted by the "old"protein. The most popular models include neofunctionalization, subfunctionalization (SUBF) by degenerative mutations, and dosage models, all focusing on adaptation after gene duplication. “Escape-from-Adaptive-Conflict” partially resolves thisriddle by including adaptive processes before and after gene duplication that led to multifunctional proteins (promiscuous enzymes), and divergence (SUBF).
However, potentially beneficial changes are vastly outnumbered by those that are detrimental. Therefore, any protein also has to be robust to mutation.The success of this balancing act between adaptability and the maintenance ofgeneral functionality ultimately determines which mutations are fixed duringadaptive evolution. In other words: how do evolving genes (or their encodedproteins) change their dominant and/or latent function(s) without ever gettingstuck in fitness minima?Recent results illustrate the importance of sub-optimal or promiscuous functions for the adaptation toward new function of protein coding genes.Unfortunately, this renders any modelling of fitness landscapes (and thereforerational design) incredibly complicated.
We developed a theoretical framework that uses biophysical principles to infer the roles of functional promiscuity,gene dosage, gene duplication, point mutations, and selection pressures.We find that selection pressures and duplication rates alone can determine which scenario will prevail. Multi-functionality becomes a crucial advantage when gene duplications are rare and an increase in mutational robustness, not necessarily functional optimization, can be the sole driving force behind SUBF. Overall, this is the first model in which all three processes are unified and demonstrates that, given a certain rate of gene duplications and pointmutations, selection pressure determines which processes will prevail. Furthermore, by mapping both RNA and protein-like models on a unified landscape with tunable neighborhood properties, we demonstrate that the relationship between robustness and evolvability depends critically on the ratio of viable mutations which are neutral (coding the same phenotype) and innovative (codingfor a new phenotype).
People: Berndjan Eenink, Margaux Aubel
Funding: BBSRC (2002 - 2005), DAAD (2006 -- 2007), HFSP (2013 - 2017), EC Horizon 2020 ITN (2017-2020).
Techniques employed: computational: ancestor reconstruction, phylogenies, calculation of stability effects of mutations (e.g. FodX), simulations of population dynamics, disorder prediction; experimental: High throughput functional molecular screening using in-cell assays and "lab-on-a-chip" micro-droplets (Hollfelder group); experimental measuring of stability and structural dis-/order (CD); detection assays, cloning, (over-)expression, purification, SDS page, E.coli autodisplay (Jose), differential scanning fluorimetry to study unfolding / TD stability, ko-libraries such as ASKA (E.coli), detection assays.
Modularity is a hallmark of molecular evolution, whether considering gene regulation, the components of metabolic pathways or signalling cascades, the ability to reuse autonomous modules in different molecular contexts can expedite evolutionary innovation. Over times scales of several 100 MY, the evolution of protein coding genes is dominated by the modular rearrangements of protein domains, their evolutionary, structural and functional units. While a small core of arrangements is universal, a large fraction of multi-domain arrangements is species specific and has been created recently via gene duplication, fusion and terminal loss of domains. Surprisingly, thousands of domains are completely (i.e. all copies within a genome) lost from genomes along every lineage in a stochastic manner, i.e. at a fairly constant rate of ca. four domains per million years.
Novel domains are rarely fixed and arise either as their own genes or terminally, by extension of existing reading frames. Novel domains are under strong selection pressure and confer a strong fitness value as they rapidly attain high copy numbers within the genomes and are involved in biotic defence, reproduction and development.
Using cross-species genomic comparisons and population genomics we investigate the genetic mechanisms and biophysical constraints of domain emergence and develop algorithms for rapid screening of many genomes, understanding their phylogenies and comparing, aligning and clustering sequences. This is possible because of complexity reduction (sequences can be characterised as linear arrangements of 5-6 domains drawn from an alphabet of several thousand characters as opposed to ca. 500 amino acids drawn from an alphabet of 20) and the maintenance of linear order and the high reliability of HMMs which characterise the domains.
We test the potential of increased evolvability by rearranged domains experimentally in biochemical pathways, for the defence against pathogens and during developmental processes.
The software we develop is available here: https://domainworld.uni-muenster.de/
People involved: Carsten Kemena, Steffen Klasberg, Elias Dohmen.
Funding: Volkswagen Foundation (2 x, 2009 -- 2013); DFG (2009 -- 2013).
Techniques employed: Design Hidden Markov Models, suffix arrays, recursive dynamic programming,
Approximate Bayesian Computaing, parsimony and maximum likelihood, design and implementation of string algorithms, databases and interactive graphical interfaces; DnDs ratio tests (PAML); experimental (planned): see also P1.
With every new genome sequenced a couple of hundred proposed genes remain ''orphans'' because computational methods could not assign any orthologs, even to closely related and well annotated species. Presumably many of these (lineage-specific) genes are transcribed, sometimes translated and proteins functional and adaptive, at least under some (possibly unknown) conditions. De novo emergence is not only against current believe that most novel genes emerge from old ones, it is also difficult to reconcile with a biophysical perspective because novel reading frames emerging from previously non-coding matter must be considered extremely unlikely: they would most likely be disordered, aggregate and thus be deleterious or, at least be purged for purely energetic reasons. So, where do new coding genes actually come from, how do they function and how is their -- potentially detrimental -- expression regulated?
We ask where novel protein coding genes come from and how genomic novelties and rearrangements trigger adaptation and spur developmental transitions. Using comparative genomics and biophysical analyses (computational and experimental) we test their properties and functions. We found that most genetic novelty comes from novel domains but also many completely new reading frames emerge, e.g. across the insect tree, with an estimated frequency of 500 new genes in the wake of each speciation event. This former process has been termed ''grow slow and moult'' because some novel domains later lose their initially stabilising parent protein and become independent and amenable for further rearrangements.
We concentrate on some major transitions which happened during the development of extant life forms: signalling across multicellular organisms, placentation in mammals, the emergence of holometabolism in insects and the onset and reversal of ageing.
Furthermore, to catch novel genes "in the act" of emergence, we investigated genomes not only between species but also from populations and, as an outgroup, their closely related sister species. We determine, using gene and domain prediction programmes, novel ORFs, their expression (RNAseq) and, if necessary, confirm them e.g. with (long-read and primer walking) PCR and qPCR. We are currently screening several systems (populations of fish, mice, flies, and human) to achieve a good genomic coverage for detecting possible recent emergence and reconstruct ancestral sequences which can then be tested for their genetic origin and investigate their structural and biophysical properties with the help of TSA, CD, NMR, and phage display experiments. Additonally, we aim to examine the behavior of the predicted ancestoral de novo gene compared to the existing one in vitro Drosophila experiments.
People: Andreas Lange, Anna Grandchamp, Brennen Heames, Daniel Dowling, Hanna Kuß, Margaux Aubel
Funding: Leibniz Gemeinschaft (2013 -- 2016); Horizon 2020 Research and Innovation Framework Programme No. 722610 (2017 -- 2021); Volkswagen Stiftung (2021 -- 2026)
Techniques employed: Computational: comparative genomics, differential GO analysis, biophysical predictions (disorder, secondary structure, hydrophobic clusters), ancestral reconstruction and phylogenies, mutational effects on stability (FoldX, Rosetta); experimental: deep sequencing, qPCR; antibody staining; cloning, (over-)expression, purification; SDS page expression quantification; E.coli autodisplay (Jose); in-situ hybridisation; CD; stability measures; in-cell NMR (Selenko); pull down assays (Ivarsson); in vitro expression of ancestral de novo genes (Findlay); MD-Simulations (Jackson); TSA; FRET-Facs (Hlouchova);
Sociality is considered to be one of the major transitions in evolution but only little of the underlying genomic basis and the associated selectable traits are known. Social insects are an excellent study object because their genomes are relatively simple to analyse and many speciation events gave rise to morphologically and ecologically diverse species within a relatively short time period. Furthermore, in insects, sociality has independently evolved at least twice, in Hymenoptera (comprising ants and bees) and in termites. Both groups also show a striking reversal of the otherwise widespread trade-off between longevity and fecundity. Furthermore, loss and/or reversal of social behaviour has been observed since several ants parasitise or enslave the colonies of evolutionary very closely related species.
We have investigated genomes and transcriptomes of several insects, most of which are either social or closely related outgroups thereof. Using standard bioinformatic techniques and several of our in-house algorithms, we could identify horizontally transferred genes from bacterial parasites, several genes under adaptation and novel genes which were instrumental for the ecological success of social insects. We also uncovered convergent gene losses which accompanied the transition to social parasitism in ants, characterised by the loss of certain social behaviours in the parasites. In comparing social with non-social insects, we found that both novel genes and rewiring of regulatory networks play a big role for the regulation of sociality. We also consider the epigenetic marks and effects from methylation/acetylation and of small regulatory RNAs on the differentiation of individuals during their development to test the possible role of epigenetic marks in general and their effect on de novo genes and on rearranged genes.
People involved: Mark Harrison, Alice Séguret, Elias Dohmen, Bertrand Fouks, Alina Mikhailova
Funding: DFG (2015 - 2019, 2017 - 2022); HFSP (2018-2021); ANR/DFG (2020-2023); MSCA-IF (2021-2023)
Techniques employed: cuffsuite; 454, Illumina, PacBio, ONP; CANU, MARVEL, AllPath, SoapDeNovo, MIRA, Platanus, Spades, MaSuRCA, pilon, finisherSC; Maker; BUSCO, DOGMA; DomRates; PorthoDOM, proteinOrtho, orthofinder; RaXML, CodeML/PAML and many others.
It has for long been assumed that the blueprint for an organism's blueprint lies entirely in its genome. Advances over the last decade have demonstrated an every increasing role of variation which occurs at the level of populations between almost identical individuals, between cells in the same tissue and so forth. Deciphering the roles of these variations has become increasingly important for pushing the limits of genomics further and for improving the understanding of how biological novelty arises.
We are using the methods and insights from the other project to understand the emergence of evolvability and robustness and creation and maintenance of genetic diversity from a perspective of protein coding genes, their duplications and rearrangements.