• The Molecular and Genomic Basis of Phenotypic Innovations:

    We are primarily concerned with the question of how molecular changes cause new phenotypes to emerge. Our main goal is to understand, from as many angles as possible, how protein evolution works. Furthermore, we want to understand which of the zillions of possible evolutionary paths at the molecular level are facilitated or hindered by biophysical constraints which govern structure and function of proteins and RNAs. Accordingly, we use computational simulations, transcriptomics (aka RNAseq), genomic data analyses and wet-lab directed evolution experiments to investigate the phenotypic effects of genotypic changes.

    In particular, we aim at understanding evolvability (many mutations of a genotype create novel phenotypes), robustness (most mutations will preserve the phenotype), lock-in (the evolutionary history constrains the evolvability of a molecule), canalisation (the evolutionary history constrains phenotypic plasticity), epistatic ratchets (two or more neutral or deleterious mutations become beneficial in concert) and the role of promiscuity (multi-functionality) for escaping adaptive conflicts (a molecule has two (or more) fitness-relevant functions but can not optimise both simultaneously).

    Research areas

  • P1: Evolvability, Epistasis and Molecular Robustness in Protein Evolution
    Graphicforwebsite Functionatransitions

    Sikosek Eac Landscape Web 2011 01 24

    In order to confer adaptation to continuously changing environments, new protein-encoding genes are continuously exploring the available sequence space through mutation. However, potentially beneficial changes are vastly outnumbered by those that are detrimental. Therefore, the protein also has to be robust to mutation. The success of this balancing act between adaptability and the maintenance of general functionality ultimately determines which mutations are fixed during adaptive evolution. In other words: how do evolving genes (or their encoded proteins) change their dominant and/or latent function(s) without ever getting stuck in fitness minima?

    Recent results illustrate the importance of sub-optimal or promiscuous functions for the adaptation toward new function of protein coding genes. Unfortunately, this renders any modelling of fitness landscapes (and therefore) rational design incredibly complicated. We use simple models systems to characterize the fitness landscape of molecules and predict their evolvability. In particular we investigate the effects of the degree of neutrality (the fraction of neutral mutations for a given genotype) in sequence space and of multi-functionality on evolvability, i.e. the number of new phenotypes that can be reached within a few mutations.

    Furthermore, we use predicted ancestral states of enzymes, in particular those that occur at the point of functional divergence, to unravel their evolutionary potential and understand their functional switches during evolution. The latter should ultimately lead to the development of a so-called evolvability assessment, based on experimentally determined biochemical/biophysical characteristics, which can used to determine the best enzyme, out of several possible candidates, to use as a starting point for engineering a better protein for a particular (commercially valuable) function.

    We assume that genes simultaneously optimized from two functions will only evolve if the fitness benefits of keeping both genes will outweigh the costs of reducing the fitness associated to one of the traits. If this is the case, subsequent gene duplication with sub-functionalization is likely to provide an additional advantage. We will use computational and experimental studies to test these premises and the influences of network connectivity on the shape and structure of the fitness landscape.

    People: Bert van Loo, April Kleppe, Jasmin Kurafeiski


    Funding: BBSRC (2002 - 2005), DAAD (2006 - 2007), DAAD (2006 -- 2007), DFG SPP1399 (2009 -- 2016), HFSP (2013 - 2017), EC Horizon 2020 ITN (2017-2020).

    Related publications:

    • Bornberg-Bauer E. How are model protein structures distributed in sequence space? Biophys.J. 1997 [Pubmed]
    • E Bornberg-Bauer and HS Chan Modeling Evolutionary Landscapes: Mutational Stability, Topology and Superfunnels in Sequence Space. PNAS 1999 [Online access]
    • Y Cui et al. Recombinatoric exploration of novel folded structures: A heteropolymer-based model of protein evolutionary landscapes. PNAS 2002 [Online access]
    • R Wroe et al. A Structural Model of Latent Evolutionary Potentials Underlying Neutral Networks in Proteins. HFSP J 2007 [Online access]
    • D Whitehead et al. The look-ahead effect of phenotypic mutations. Biol. Dir. 2008 [Online access]
    • E Bornberg-Bauer et al. How do new proteins arise? Curr Opn Struct Biol. 2010 [Online access]
    • T Sikosek et al. Escape from Adaptive Conflict follows from weak functional trade-offs and mutational robustness. PNAS 2012 [Online access]
    • T Sikosek Evolutionary dynamics on protein bi-stability landscapes have the potential to resolve adaptive conflicts. PLoS Comp Biol. 2012 [Online access]

    Techniques employed: computational: ancestor reconstruction, phylogenies, calculation of stability effects of mutations (e.g. FodX), simulations of population dynamics, disorder prediction; experimental: High throughput functional molecular screening using in-cell assays and "lab-on-a-chip" micro-droplets (Hollfelder group); experimental measuring of stability and structural dis-/order (CD); detection assays, cloning, (over-)expression, purification, SDS page, E.coli autodisplay (Jose), differential scanning fluorimetry to study unfolding / TD stability, ko-libraries such as ASKA (E.coli), detection assays.

  • P2: Algorithms and Evolutionary Dynamics of Modular Protein Evolution
    Agptree Gain Loss Compo


    Domain World Pic

    Modularity is a hallmark of molecular evolution, whether considering gene regulation, the components of metabolic pathways or signalling cascades, the ability to reuse autonomous modules in different molecular contexts can expedite evolutionary innovation. Over times scales of several 100 MY, the evolution of protein coding genes is dominated by the modular rearrangements of protein domains, their evolutionary, structural and functional units. While a small core of arrangements is universal, a large fraction of multi-domain arrangements is species specific and has been created recently via gene duplication, fusion and terminal loss of domains. Surprisingly, thousands of domains are completely (i.e. all copies within a genome) lost from genomes along every lineage in a stochastic manner, i.e. at a fairly constant rate of ca. four domains per million years. 

    Novel domains are rarely fixed and arise either as their own genes or terminally, by extension of existing reading frames. Novel domains are under strong selection pressure and confer a strong fitness value as they rapidly attain high copy numbers within the genomes and are involved in biotic defence, reproduction and development.
    Using cross-species genomic comparisons and population genomics we investigate the genetic mechanisms and biophysical constraints of domain emergence and develop algorithms for rapid screening of many genomes, understanding their phylogenies and comparing, aligning and clustering sequences. This is possible because of complexity reduction (sequences can be characterised as linear arrangements of 5-6 domains drawn from an alphabet of several thousand characters as opposed to ca. 500 amino acids drawn from an alphabet of 20) and the maintenance of linear order and the high reliability of HMMs which characterise the domains.

    We test the potential of increased evolvability by rearranged domains experimentally in biochemical pathways, for the defence against pathogens and during developmental processes.

    People involved: Carsten Kemena, Steffen Klasberg, Andreas Schüler, Elias Dohmen.


    Funding: Volkswagen Foundation (2 x, 2009 -- 2013); DFG (2009 -- 2013).

    Related Publications:

    • E. Bornberg-Bauer Computational Approaches to Identify Leucine Zippers. Nucleic Acids Res. 1998 [Online access]
    • G. Amoutzias et al., Convergent evolution of gene networks by single-gene duplications in higher eukaryotes. EMBO Rep. 2004.
    • J. Weiner, F. Beaussart, E. Bornberg-Bauer Domain Deletions and Substitutions in the Modular Protein Evolution. FEBS Journal 2006 [Online access]
    • J. Weiner, G. Thomas and E. Bornberg-Bauer Rapid Motif-Based Prediction of Circular Permutations in Multi-Domain Proteins. Bioinformatics 2005 [Online access]
    • J. Weiner and E. Bornberg-Bauer Evolution of Circular Permutations in Multi-Domain Proteins. Molecular Biology and Evolution 2006 [Online access]
    • Andrew D. Moore and Erich Bornberg-Bauer The Dynamics and Evolutionary Potential of Domain Loss and Emergence. Molecular Biology and Evolution 2012 [Online access]
    • A. Kersting et al. Dynamics and adaptive bene fits of protein domain emergence and arrangements during plant genome evolution. Genome Biology and Evolution 2012 [Online access]
    • E. Bornberg-Bauer and M.Mar Alba; Dynamics and Adaptive Benefits of Modular Protein Evolution, Curr Opn Struct Biol 2013.
    • C. Kemena et al. MDAT- Aligning multiple domain arrangements, BMC Bioinformatics 2015.
    • T. Bitard-Feildel et al. Domain similarity based orthology detection. BMC Bioinformatics 2015.
    • L. El Masri et al. Host-Pathogen Coevolution: The Selective Advantage of Bacillus thuringiensis Virulence and Its Cry Toxin Genes. PLoS Biology, 2015.
    • A. Schüler and Erich Bornberg-Bauer, Evolution of Protein Domain Repeats in Metazoa, MolBiolEvol, 2016.
    • J. Schmitz et al., Mechanisms of transcription factor evolution in Metazoa, Nucleic Acids Res., 2016.

    Techniques employed: Design Hidden Markov Models, suffix arrays, recursive dynamic programming,
    Approximate Bayesian Computaing, parsimony and maximum likelihood, design and implementation of string algorithms, databases and interactive graphical interfaces; DnDs ratio tests (PAML); experimental (planned):  see also P1, P2.

  • P3: Evolutionary Origin, Fixation and Functions of de Novo Protein Coding Genes

    Bitard Evolution Hydrophobic Clusters V1

    With every new genome sequenced a couple of hundred proposed genes remain ''orphans'' because computational methods could not assign any orthologs, even to closely related and well annotated species. Presumably many of these (lineage-specific) genes are transcribed, sometimes translated and proteins functional and adaptive, at least under some (possibly unknown) conditions. De novo emergence is not only against current believe that most novel genes emerge from old ones, it is also difficult to reconcile with a biophysical perspective because novel reading frames emerging from previously non-coding matter must be considered extremely unlikely: they would most likely be disordered, aggregate and thus be deleterious or, at least be purged for purely energetic reasons. So, where do new coding genes actually come from, how do they function and how is their -- potentially detrimental -- expression regulated?

    We ask where novel protein coding genes come from and how genomic novelties and rearrangements trigger adaptation and spur developmental transitions. Using comparative genomics and biophysical analyses (computational and experimental) we test their properties and functions. We found that most genetic novelty comes from novel domains but also many completely new reading frames emerge, e.g. across the insect tree, with an estimated frequency of 500 new genes in the wake of each speciation event. This former process has been termed ''grow slow and moult'' because some novel domains later lose their initially stabilising parent protein and become independent and amenable for further rearrangements.
    We concentrate on some major transitions which happened during the development of extant life forms: signalling across multicellular organisms, placentation in mammals, the emergence of holometabolism in insects and the onset and reversal of ageing.
    Furthermore, to catch novel genes "in the act" of emergence, we investigated genomes not only between species but also from populations and, as an outgroup, their closely related sister species. We determine, using gene and domain prediction programmes, novel ORFs, their expression (RNAseq) and, if necessary, confirm them e.g. with (long-read and primer walking) PCR and qPCR. We are currently screening several systems (populations of fish, mice and flies) to achieve a good genomic coverage for detecting possible recent emergence and reconstruct ancestral sequences which can then be tested for their genetic origin and investigate their structural and biophysical properties.

    People: Steffen Klasberg, Jonathan Schmitz, Alberto Lopez, Andreas Lange, Brennen Heames.


    Funding: Leibniz Gemeinschaft (2013 -- 2016);

    Related Publications:

    • E Bornberg-Bauer et al. Emergence of de novo proteins from ''dark genomic matter'' by ''grow slow and moult''; Transactions of the Biochemical Society; 2015.
    • T Bitard-Feildel et al. Detection of Orphan Domains in Drosophila using Hydrophobic clustering analysis; Biochimie, 2015
    • L. Wissler et al. Mechanisms and dynamics of orphan gene emergence in insect genomes. Genome Biol Evol. 2013 [Online access]
    • P. Feulner et al. Genome-wide patterns of standing genetic variation in a natural marine population of three-spined sticklebacks. Molecular Ecology 2012 [Online access]
    • F. Chain et al., Extensive copy-number variation of young genes across stickleback populations. PLoS Genet, 2014.
    • E Bornberg-Bauer et al. How do new proteins arise?, Curr Opn Struct Biol, 2010.

    Techniques employed: Computational: comparative genomics, differential GO analysis, biophysical predictions (disorder, secondary structure, hydrophobic clusters), ancestral reconstruction and phylogenies, mutational effects on stability (FoldX, Rosetta); experimental: deep sequencing, qPCR; antibody staining; cloning, (over-)expression, purification; SDS page expression quantification; E.coli autodisplay (Jose); in-situ hybridisation; CD; stability measures; in-cell NMR (Selenko); pull down assays (Ivarsson);

  • G1: Comparative Genomics and Molecular Mechansims Underlying Sociality, Ageing and Epigenetics in Insects
    Social Insect Website Pic

    Schueler Nasonia Wolbachia Pranc Web V1

    Sociality is considered as one of the major transitions in evolution but only little of the underlying genomic basis and the associated selectable traits are known. Social insects are an excellent study object because their genomes are relatively simple to analyse and many speciation events gave rise to morphologically and ecologically diverse species within a relatively short time period. Furthermore, in insects, sociality has independently evolved at least twice, in hymenoptera (comprising ants and bees) and in termites. Both groups also show a striking reversal of the otherwise widely spread tradeoff between longevity and fecundity. Furthermore, loss and/or reversal of social behaviour has been observed since several ants parasitise or enslave the colonies of evolutionary very closely related species.

    We have investigated genomes and transcriptomes of several insects, most of which are either social or closely related outgroups thereof. Using standard bioinformatic techniques and several of our in-house algorithms we could identify horizontally transfered genes from bacterial parasites, several genes under adaptation and novel genes which were instrumental for the ecological success of social insects. In comparing social with non-social insects, we found that both, novel genes and rewiring of regulatory networks, play a big role for the regulation of sociality. We also consider the epigentic marks and effects from methylation/acetylation and of small regulatory RNAs on the differentiation of individuals during their development to test the possible role of epigenetic marks in general and its effect on de novo genes and on rearranged genes.

    People involved: Carsten Kemena, Mark Harrison, Nico Arning, Lukas Kremer,, Alberto Lopez, Evelien Jongepier

    Funding: DFG (2015 - 2019, 2017 - 2021)



    • CR Smith et al. How do genomes create novel phenotypes? Insights from the loss of the worker caste in ant social parasites, Mol Biol Evol, 2015
    • B Sadd et al. The genomes of two key bumblebee species with primitive eusocial organization. Genome Biol. 2015
    • N Terrapon et al. Molecular traces of alternative social organization in a termite genome. Nat Comm., 2014
    • DF Simola Social insect genomes exhibit dramatic evolution in gene composition and regulation while  preserving regulatory features linked to sociality. Genome Res. 2013
    • O Niehuis et al. Genomic and morphological evidence converge to resolve the enigma of Strepsiptera., Current Biol., 2012.
    • G Sueng The genome sequence of the leaf-cutter ant Atta cephalotes reveals insights into its obligate symbiotic lifestyle. PLoS Genet 2012.
    • JH Werren et al. Functional and evolutionary insights from the genomes of three parasitoid Nasonia species. Science 2010.
    • J Olsen et al. The genome of the seagrass Zostera marina reveals angiosperm adaptation to the sea., Nature 2016.
    • Z Myburg et al. The genome of Eucalyptus grandis, Nature 2014.

    Techniques employed: cuffsuite; 454, Illumina, PacBio, ONP; AllPath, SoapDeNovo, MIRA, Platanus, Spades, MaSuRCA; Maker; CEGMA, DOGMA; PorthoDOM, proteinOrtho; RaXML, CodeML/PAML and many others.

  • G2: Population genomics, transcriptomics, meta-genomics and epi-genomics (2006 - 2016, now discontinued)
    Chain Reusch Stickleback Cnv Phylogeography Deletions Duplications Fig1 P Lo S Genet 2014

    Chain Reusch Stickleback Cnv Lsd Lsg Schema Venn Fig4 P Lo S Genet 2014

    It has for long been assumed that the blueprint for an organism's blueprint lies entirely in its genome. Advances over the last decade have demonstrated an every increasing role of variation which occurs at the level of populations between almost identical individuals, between cells in the same tissue and so forth. Deciphering the roles of these variations has become increasingly important for pushing the limits of genomics further and for improving the understanding of how biological novelty arises.

    We are using the methods and insights from the other project to understand the emergence of evolvability and robustness and creation and maintenance of genetic diversity from a perspective of protein coding genes, their duplications and rearrangements.

    Related Publications:

    • L el Masri et al. Host-Pathogen Coevolution: The Selective Advantage of Bacillus thuringiensis Virulence and Its Cry Toxin Genes. PLoS Biol 2015
    • PG Feulner et al. Genomics of divergence along a continuum of parapatric population differentiation. PLoS Genet 2015
    • FFJ Chain et al. Extensive copy-number variation of young genes across stickleback populations. PLoS Genet 2015
    • SU Franssen et al. Genome-wide transcriptomic responses of the seagrasses Zostera marina and Nanozostera noltii under a simulat Marine Genomics 2014.
    • B Guo e tal. Genomic divergence between nine- and three-spined sticklebacks.ed heatwave confirm functional types. BMC Evol Biol., 2013.
    • PG Feulner et al. Genome-wide patterns of standing genetic variation in a marine population of three-spined sticklebacks. Mol Ecol 2013.
    • SU Franssen et al. Transcriptomic resilience to global warming in the seagrass Zostera marina, a marine foundation species. PNAS 2011.