• The Molecular and Genomic Basis of Phenotypic Innovations:

    We are primarily concerned with the question of how molecular changes cause new phenotypes to emerge. Our main goal is to understand, from as many angles as possible, how protein evolution works. Furthermore, we want to understand which of the zillions of possible evolutionary paths at the molecular level are facilitated or hindered by biophysical constraints which govern structure and function of proteins and RNAs. Accordingly, we use computational simulations, transcriptomics (aka RNAseq), genomic data analyses and wet-lab directed evolution experiments to investigate the phenotypic effects of genotypic changes.

    In particular, we aim at understanding evolvability (many mutations of a genotype create novel phenotypes), robustness (most mutations will preserve the phenotype), lock-in (the evolutionary history constrains the evolvability of a molecule), canalisation (the evolutionary history constrains phenotypic plasticity), epistatic ratchets (two or more neutral or deleterious mutations become beneficial in concert) and the role of promiscuity (multi-functionality) for escaping adaptive conflicts (a molecule has two (or more) fitness-relevant functions but can not optimise both simultaneously).

    Research areas

  • P1: Evolvability, Epistasis and Molecular Robustness in Protein Evolution
    Graphicforwebsite Functionatransitions

    Sikosek Eac Landscape Web 2011 01 24

    Van Loo Heberlein Bornberg Autodisplay Microdroplets Acs Synth Biol 2019 To Cgraphic

    A long-standing riddle in molecular evolution concerns the question of how proteins, forming so many shapes and exercising so many fundamentally different functions, can evolve into new proteins, with new structures and functions. Several models try to explain such innovations without depriving an organism of its existing, possibly essential function exerted by the "old"protein. The most popular models include neofunctionalization, subfunctionalization (SUBF) by degenerative mutations, and dosage models, all focusing on adaptation after gene duplication. “Escape-from-Adaptive-Conflict” partially resolves thisriddle by including adaptive processes before and after gene duplication that led to multifunctional proteins (promiscuous enzymes), and divergence (SUBF).

    However, potentially beneficial changes are vastly outnumbered by those that are detrimental. Therefore, any protein also has to be robust to mutation.The success of this balancing act between adaptability and the maintenance ofgeneral functionality ultimately determines which mutations are fixed duringadaptive evolution. In other words: how do evolving genes (or their encodedproteins) change their dominant and/or latent function(s) without ever gettingstuck in fitness minima?Recent results illustrate the importance of sub-optimal or promiscuous functions for the adaptation toward new function of protein coding genes.Unfortunately, this renders any modelling of fitness landscapes (and thereforerational design) incredibly complicated.

    We developed a theoretical framework that uses biophysical principles to infer the roles of functional promiscuity,gene dosage, gene duplication, point mutations, and selection pressures.We find that selection pressures and duplication rates alone can determine which scenario will prevail. Multi-functionality becomes a crucial advantage when gene duplications are rare and an increase in mutational robustness, not necessarily functional optimization, can be the sole driving force behind SUBF. Overall, this is the first model in which all three processes are unified and demonstrates that, given a certain rate of gene duplications and pointmutations, selection pressure determines which processes will prevail. Furthermore, by mapping both RNA and protein-like models on a unified landscape with tunable neighborhood properties, we demonstrate that the relationship between robustness and evolvability depends critically on the ratio of viable mutations which are neutral (coding the same phenotype) and innovative (codingfor a new phenotype).

    People:  Berndjan Eenink, Margaux Aubel


    Funding: BBSRC (2002 - 2005), DAAD (2006 -- 2007), HFSP (2013 - 2017), EC Horizon 2020 ITN (2017-2020).

    Related publications:

    • Bornberg-Bauer E. How are model protein structures distributed in sequence space? Biophys.J. 1997 [Pubmed]
    • E Bornberg-Bauer and HS Chan Modeling Evolutionary Landscapes: Mutational Stability, Topology and Superfunnels in Sequence Space. PNAS 1999 [Online access]
    • Y Cui et al. Recombinatoric exploration of novel folded structures: A heteropolymer-based model of protein evolutionary landscapes. PNAS 2002 [Online access]
    • R Wroe et al. A Structural Model of Latent Evolutionary Potentials Underlying Neutral Networks in Proteins. HFSP J 2007 [Online access]
    • D Whitehead et al. The look-ahead effect of phenotypic mutations. Biol. Dir. 2008 [Online access]
    • E Bornberg-Bauer et al. How do new proteins arise? Curr Opn Struct Biol. 2010 [Online access]
    • T Sikosek, HS Chan, E Bornberg-Bauer;. Escape from Adaptive Conflict follows from weak functional trade-offs and mutational robustness. PNAS 2012 [Online access]
    • Bert van Loo , Magdalena Heberlein , Philip Mair , Anastasia Zinchenko , Jan Schüürmann , Bernard D G Eenink , Josephin M Holstein, Carina Dilkaute, Joachim Jose , Florian Hollfelder, Erich Bornberg-Bauer
      High-Throughput, Lysis-Free Screening for Sulfatase Activity Using Escherichia coli Autodisplay in Microdroplets, ACS Synthetic Biol, 8 (12), 2690-2700, [Online access]
    • Gloria Yang, Dave W Anderson, Florian Baier, Elias Dohmen, Nansook Hong, Paul D Carr, Shina Caroline Lynn Kamerlin, Colin J Jackson, Erich Bornberg-Bauer , Nobuhiko Tokuriki
      Higher-order Epistasis Shapes the Fitness Landscape of a Xenobiotic-Degrading Enzyme
      Nat Chem Biol, 15 (11), 1120-1128, [Online access]

    Techniques employed: computational: ancestor reconstruction, phylogenies, calculation of stability effects of mutations (e.g. FodX), simulations of population dynamics, disorder prediction; experimental: High throughput functional molecular screening using in-cell assays and "lab-on-a-chip" micro-droplets (Hollfelder group); experimental measuring of stability and structural dis-/order (CD); detection assays, cloning, (over-)expression, purification, SDS page, E.coli autodisplay (Jose), differential scanning fluorimetry to study unfolding / TD stability, ko-libraries such as ASKA (E.coli), detection assays.

  • P2: Algorithms and Evolutionary Dynamics of Modular Protein Evolution
    Agptree Gain Loss Compo


    Domain World Pic

    Modularity is a hallmark of molecular evolution, whether considering gene regulation, the components of metabolic pathways or signalling cascades, the ability to reuse autonomous modules in different molecular contexts can expedite evolutionary innovation. Over times scales of several 100 MY, the evolution of protein coding genes is dominated by the modular rearrangements of protein domains, their evolutionary, structural and functional units. While a small core of arrangements is universal, a large fraction of multi-domain arrangements is species specific and has been created recently via gene duplication, fusion and terminal loss of domains. Surprisingly, thousands of domains are completely (i.e. all copies within a genome) lost from genomes along every lineage in a stochastic manner, i.e. at a fairly constant rate of ca. four domains per million years. 

    Novel domains are rarely fixed and arise either as their own genes or terminally, by extension of existing reading frames. Novel domains are under strong selection pressure and confer a strong fitness value as they rapidly attain high copy numbers within the genomes and are involved in biotic defence, reproduction and development.
    Using cross-species genomic comparisons and population genomics we investigate the genetic mechanisms and biophysical constraints of domain emergence and develop algorithms for rapid screening of many genomes, understanding their phylogenies and comparing, aligning and clustering sequences. This is possible because of complexity reduction (sequences can be characterised as linear arrangements of 5-6 domains drawn from an alphabet of several thousand characters as opposed to ca. 500 amino acids drawn from an alphabet of 20) and the maintenance of linear order and the high reliability of HMMs which characterise the domains.

    We test the potential of increased evolvability by rearranged domains experimentally in biochemical pathways, for the defence against pathogens and during developmental processes.

    The software we develop is available here:

    People involved: Carsten Kemena, Steffen Klasberg, Elias Dohmen.


    Funding: Volkswagen Foundation (2 x, 2009 -- 2013); DFG (2009 -- 2013).

    Related Publications:

    • E. Bornberg-Bauer Computational Approaches to Identify Leucine Zippers. Nucleic Acids Res. 1998 [Online access]
    • J. Weiner, G. Thomas and E. Bornberg-Bauer Rapid Motif-Based Prediction of Circular Permutations in Multi-Domain Proteins. Bioinformatics 2005 [Online access]
    • J. Weiner and E. Bornberg-Bauer Evolution of Circular Permutations in Multi-Domain Proteins. Molecular Biology and Evolution 2006 [Online access]
    • Andrew D. Moore and Erich Bornberg-Bauer The Dynamics and Evolutionary Potential of Domain Loss and Emergence. Molecular Biology and Evolution 2012 [Online access]
    • E. Bornberg-Bauer and M.Mar Alba; Dynamics and Adaptive Benefits of Modular Protein Evolution, Curr Opn Struct Biol 2013.
    • A. Schüler and E. Bornberg-Bauer, Evolution of Protein Domain Repeats in Metazoa, MolBiolEvol, 2016.
    • E. Dohmen, L.P.M. Kremer, E. Bornberg-Bauer and C. Kemena, DOGMA: Domain-based transcriptome and proteome quality assessment, Bioinformatics, 2016
    • S. Klasberg, T. Bitard-Feildel, I. Callebaut, and E. Bornberg-Bauer, Origins and Structural Properties of Novel and De Novo Protein Domains During Insect Evolution. FEBS Journal, 2018
    • GWC Thomas, E Dohmen et al. The Genomic Basis of Arthropod Diversity, Genome Biology, 2020
    • E. Dohmen, S. Klasberg, E. Bornberg-Bauer, S. Perrey and C. Kemena, The modular nature of protein evolution: Domain rearrangement rates across eukaryotic life, BMC Evolutionary Biology, 2020

    Techniques employed: Design Hidden Markov Models, suffix arrays, recursive dynamic programming,
    Approximate Bayesian Computaing, parsimony and maximum likelihood, design and implementation of string algorithms, databases and interactive graphical interfaces; DnDs ratio tests (PAML); experimental (planned):  see also P1.

  • P3: Evolutionary Origin, Fixation and Functions of de Novo Protein Coding Genes
    P3 Figure 1 31032021

    P3 Figure 2 31032021

    P3 Figure 3 31032021

    With every new genome sequenced a couple of hundred proposed genes remain ''orphans'' because computational methods could not assign any orthologs, even to closely related and well annotated species. Presumably many of these (lineage-specific) genes are transcribed, sometimes translated and proteins functional and adaptive, at least under some (possibly unknown) conditions. De novo emergence is not only against current believe that most novel genes emerge from old ones, it is also difficult to reconcile with a biophysical perspective because novel reading frames emerging from previously non-coding matter must be considered extremely unlikely: they would most likely be disordered, aggregate and thus be deleterious or, at least be purged for purely energetic reasons. So, where do new coding genes actually come from, how do they function and how is their -- potentially detrimental -- expression regulated?

    We ask where novel protein coding genes come from and how genomic novelties and rearrangements trigger adaptation and spur developmental transitions. Using comparative genomics and biophysical analyses (computational and experimental) we test their properties and functions. We found that most genetic novelty comes from novel domains but also many completely new reading frames emerge, e.g. across the insect tree, with an estimated frequency of 500 new genes in the wake of each speciation event. This former process has been termed ''grow slow and moult'' because some novel domains later lose their initially stabilising parent protein and become independent and amenable for further rearrangements.
    We concentrate on some major transitions which happened during the development of extant life forms: signalling across multicellular organisms, placentation in mammals, the emergence of holometabolism in insects and the onset and reversal of ageing.
    Furthermore, to catch novel genes "in the act" of emergence, we investigated genomes not only between species but also from populations and, as an outgroup, their closely related sister species. We determine, using gene and domain prediction programmes, novel ORFs, their expression (RNAseq) and, if necessary, confirm them e.g. with (long-read and primer walking) PCR and qPCR. We are currently screening several systems (populations of fish, mice, flies, and human) to achieve a good genomic coverage for detecting possible recent emergence and reconstruct ancestral sequences which can then be tested for their genetic origin and investigate their structural and biophysical properties with the help of TSA, CD, NMR, and phage display  experiments. Additonally, we aim to examine the behavior of the predicted ancestoral de novo gene compared to the existing one in vitro Drosophila experiments. 

    People:  Andreas Lange, Anna Grandchamp, Brennen Heames, Daniel Dowling, Hanna Kuß, Margaux Aubel


    Funding: Leibniz Gemeinschaft (2013 -- 2016); Horizon 2020 Research and Innovation Framework Programme No. 722610 (2017 -- 2021); Volkswagen Stiftung (2021 -- 2026)

    Related Publications:

    • Lange, A, Patel, PH, Heames, B, Damry, AM, Saenger, T, Jackson, CJ, Findlay, GD, Bornberg-Bauer E; Structural and functional characterization of a putative de novo evolved gene essential for male fertility in Drosophila, Nat Comm, 2021:12(1667), [Online access]
    • Bornberg-Bauer E, Hlouchova, K, Lange A; Structure and Function of Naturally Evolved de novo Proteins, Curr Opn Struct Biol, 2021, [Online access]
      Bornberg-Bauer, E. and Heames, B.; Becoming a de novo gene; Nature Ecology & Evolution, 2019
    • Schmitz JF, Ullrich K, Bornberg-Bauer E; Incipient de novo genes can evolve from "frozen accidents" which escaped rapid transcript turnover; Nature Ecology and Evolution, 2018, [Online access]
    • Klasberg, S, Bitard-Feildel, T, Callebaut, I and Bornberg-Bauer, E; Origins and Structural Properties of Novel and De Novo Protein Domains During Insect Evolution.; FEBS Journal, 2018, [Online access]
    • Schmitz, JF and Bornberg-Bauer E; Fact or fiction: Updates on how protein coding genes might emerge de novo from previously non-coding DNA; F 1000Research, 2017, [Online access]
    • Gubala et al. The goddard and saturn genes are essential for Drosophila male fertility and may have arisen de novo; Molecular Biology and Evolution, 2017, [Online access]
    • E Bornberg-Bauer et al. Emergence of de novo proteins from ''dark genomic matter'' by ''grow slow and moult''; Transactions of the Biochemical Society; 2015, [Online access]
    • T Bitard-Feildel et al. Detection of Orphan Domains in Drosophila using Hydrophobic clustering analysis; Biochimie, 2015, [Online access]
    • L. Wissler et al. Mechanisms and dynamics of orphan gene emergence in insect genomes. Genome Biol Evol. 2013 [Online access]
    • P. Feulner et al. Genome-wide patterns of standing genetic variation in a natural marine population of three-spined sticklebacks. Molecular Ecology 2012 [Online access]
    • F. Chain et al., Extensive copy-number variation of young genes across stickleback populations. PLoS Genet, 2014, [Online access]
    • E Bornberg-Bauer et al. How do new proteins arise?, Curr Opn Struct Biol, 2010, [Online access]


    Techniques employed: Computational: comparative genomics, differential GO analysis, biophysical predictions (disorder, secondary structure, hydrophobic clusters), ancestral reconstruction and phylogenies, mutational effects on stability (FoldX, Rosetta); experimental: deep sequencing, qPCR; antibody staining; cloning, (over-)expression, purification; SDS page expression quantification; E.coli autodisplay (Jose); in-situ hybridisation; CD; stability measures; in-cell NMR (Selenko); pull down assays (Ivarsson); in vitro expression of ancestral de novo genes (Findlay); MD-Simulations (Jackson); TSA; FRET-Facs (Hlouchova);

  • G1: Comparative Genomics and Molecular Mechanisms Underlying Sociality, Ageing and Epigenetics in Insects
    Harrison Nee2018 Figure3

    Schueler Nasonia Wolbachia Pranc Web V1

    Sociality is considered to be one of the major transitions in evolution but only little of the underlying genomic basis and the associated selectable traits are known. Social insects are an excellent study object because their genomes are relatively simple to analyse and many speciation events gave rise to morphologically and ecologically diverse species within a relatively short time period. Furthermore, in insects, sociality has independently evolved at least twice, in Hymenoptera (comprising ants and bees) and in termites. Both groups also show a striking reversal of the otherwise widespread trade-off between longevity and fecundity. Furthermore, loss and/or reversal of social behaviour has been observed since several ants parasitise or enslave the colonies of evolutionary very closely related species.

    We have investigated genomes and transcriptomes of several insects, most of which are either social or closely related outgroups thereof. Using standard bioinformatic techniques and several of our in-house algorithms, we could identify horizontally transferred genes from bacterial parasites, several genes under adaptation and novel genes which were instrumental for the ecological success of social insects. We also uncovered convergent gene losses which accompanied the transition to social parasitism in ants, characterised by the loss of certain social behaviours in the parasites. In comparing social with non-social insects, we found that both novel genes and rewiring of regulatory networks play a big role for the regulation of sociality. We also consider the epigenetic marks and effects from methylation/acetylation and of small regulatory RNAs on the differentiation of individuals during their development to test the possible role of epigenetic marks in general and their effect on de novo genes and on rearranged genes.

    People involved: Mark Harrison, Alice Séguret, Elias Dohmen, Bertrand Fouks, Alina Mikhailova

    Funding: DFG (2015 - 2019, 2017 - 2022); HFSP (2018-2021); ANR/DFG (2020-2023); MSCA-IF (2021-2023)



    • He S, Evidence for reduced immune gene diversity and activity during the evolution of termites. Proceedings of the Royal Society B, 2021
    • Harrison MC, Chrenyshova AM, Thompson GJ. No obvious transcriptome-wide signature of indirect selection in termites. Journal of Evolutionary Biology, 2020
    • GWC Thomas, et al. Gene content evolution in the arthropods. Genome Biology, 2020
    • C Gstöttl, et al. Comparative analyses of caste, sex and developmental stage-specific transcriptomes in two Temnothorax ants. Ecology & Evolution, 2020
    • R Kaur, et al. Ant behaviour and brain gene expression of defending hosts depend on the ecological success of the intruding social parasite. Philosophical Transactions of the Royal Society B, 2019
    • MC Harrison, et al. Hemimetabolous genomes reveal molecular basis of termite eusociality. Nature Ecology & Evolution, 2018
    • E Jongepier, et al. Remodeling of the juvenile hormone pathway through caste‐biased gene expression and positive selection along a gradient of termite eusociality. Journal of Experimental Zoology Part B, 2018
    • LPM Kremer et al. Reconstructed evolution of insulin receptors in insects reveals duplications in early insects and cockroaches. Journal of Experimental Zoology Part B, 2018
    • J Olsen et al. The genome of the seagrass Zostera marina reveals angiosperm adaptation to the sea., Nature 2016.
    • CR Smith et al. How do genomes create novel phenotypes? Insights from the loss of the worker caste in ant social parasites, Molecular Biology & Evolution, 2015
    • B Sadd et al. The genomes of two key bumblebee species with primitive eusocial organization. Genome Biology, 2015
    • N Terrapon et al. Molecular traces of alternative social organization in a termite genome. Nat Communications, 2014
    • Z Myburg et al. The genome of Eucalyptus grandis, Nature 2014.
    • DF Simola Social insect genomes exhibit dramatic evolution in gene composition and regulation while  preserving regulatory features linked to sociality. Genome Research, 2013
    • O Niehuis et al. Genomic and morphological evidence converge to resolve the enigma of Strepsiptera., Current Biol., 2012.
    • G Sueng The genome sequence of the leaf-cutter ant Atta cephalotes reveals insights into its obligate symbiotic lifestyle. PLoS Genet 2012.
    • JH Werren et al. Functional and evolutionary insights from the genomes of three parasitoid Nasonia species. Science 2010.

    Techniques employed: cuffsuite; 454, Illumina, PacBio, ONP; CANU, MARVEL, AllPath, SoapDeNovo, MIRA, Platanus, Spades, MaSuRCA, pilon, finisherSC; Maker; BUSCO, DOGMA; DomRates; PorthoDOM, proteinOrtho, orthofinder; RaXML, CodeML/PAML and many others.

  • G2: Population genomics, transcriptomics, meta-genomics and epi-genomics (2006 - 2016, now discontinued)
    Chain Reusch Stickleback Cnv Phylogeography Deletions Duplications Fig1 P Lo S Genet 2014

    Chain Reusch Stickleback Cnv Lsd Lsg Schema Venn Fig4 P Lo S Genet 2014

    It has for long been assumed that the blueprint for an organism's blueprint lies entirely in its genome. Advances over the last decade have demonstrated an every increasing role of variation which occurs at the level of populations between almost identical individuals, between cells in the same tissue and so forth. Deciphering the roles of these variations has become increasingly important for pushing the limits of genomics further and for improving the understanding of how biological novelty arises.

    We are using the methods and insights from the other project to understand the emergence of evolvability and robustness and creation and maintenance of genetic diversity from a perspective of protein coding genes, their duplications and rearrangements.

    Related Publications:

    • L el Masri et al. Host-Pathogen Coevolution: The Selective Advantage of Bacillus thuringiensis Virulence and Its Cry Toxin Genes. PLoS Biol 2015
    • PG Feulner et al. Genomics of divergence along a continuum of parapatric population differentiation. PLoS Genet 2015
    • FFJ Chain et al. Extensive copy-number variation of young genes across stickleback populations. PLoS Genet 2015
    • SU Franssen et al. Genome-wide transcriptomic responses of the seagrasses Zostera marina and Nanozostera noltii under a simulat Marine Genomics 2014.
    • B Guo e tal. Genomic divergence between nine- and three-spined sticklebacks.ed heatwave confirm functional types. BMC Evol Biol., 2013.
    • PG Feulner et al. Genome-wide patterns of standing genetic variation in a marine population of three-spined sticklebacks. Mol Ecol 2013.
    • SU Franssen et al. Transcriptomic resilience to global warming in the seagrass Zostera marina, a marine foundation species. PNAS 2011.