Abstract
The origin and evolution of modern biochemistry and cellular life is a complex problem that has puzzled scientists for almost a century. The entire history of life can in principle be studied using myriad sequences generated by genomic research. This includes the appearance of the first cells and the emergence of diversified domains of life. However, the use of molecular sequence information for deep phylogenetic exploration is limited by technical (e.g., the problem of Markov convergence and phylogenetic character independence) and biological considerations (e.g. mutation saturation, intra-molecular interactions, horizontal transfer). In contrast, macromolecular structures are evolutionary modules that are highly conserved and diverse enough to enable deep historical inquiry. The emergence of the very early macromolecules that populated primordial cells using ideographic (historical, retrodictive) approaches has been dissected, with the hope that this knowledge will unfold the mechanistic basis of genetics and its links to molecular functions. This is needed for future endeavors in synthetic biology and biotechnology.
Phylogenomic data-driven exploration: Deep evolutionary signals were retrieved from a census of molecular structures and functions in thousands of nucleic acids and millions of proteins using powerful phylogenomic methods. Phylogenies describing the evolution of proteins, proteomes and molecular functions were built from a genomic census of protein structural domains defined at different levels of structural abstraction and associated Gene Ontology (GO) terms in thousands of organisms. Phylogenomic trees of domains unfolded molecular clocks and timelines of domain appearance, with time spanning from the origin of proteins to the present. Trees of domains and trees of molecular functions revealed: (1) an archaic protein world, (2) the rise by reductive evolution of viruses and Archaea from a primordial stem line of descent, (3) the appearance of structural innovations unique to the diversified domains of life, first in Bacteria and then in Archaea and Eukarya, and (4) an explosion of functions and structures in Eukarya. The process of domain gain and loss was pervasive, with gains exceeding losses along the entire timeline. Reductive forces resulted in compact and more flexible protein structures with short domain linkers. Primordial metabolic domains evolved earlier than informational domains involved in translation and transcription, supporting the metabolism-first hypothesis and an ancient protein-RNA world rather than the canonical RNA world scenario. Universal trees of proteomes consistently supported the very early cellular origin of viruses and the late appearance of capsids. An analysis of protein domain organization and RNA structure confirms the validity of these evolutionary patterns and a graph theoretical view of domain combination in proteins uncovers complex accretion pathways culminating in a ‘big bang’ of domain rearrangement.
Conclusions: Clock-like signals revealed that modern biochemistry resulted from gradual accretion and coevolution of molecular parts and molecules. This was made evident in the study of individual molecules (e.g. tRNA, RNase P RNA or rRNA) and macromolecular complexes such as the ATPase synthase. While the first biochemical functions were metabolic, translation and the genetic code appeared quite late as ‘exacting’ mechanisms that enhanced protein folding speed and flexibility, impacting the structural make up of proteins and benefiting the search for new molecular functions. The timelines reveal that genetic memory unfolded only after the rise of viruses but prior to the appearance of diversified archaeal microbes. Remarkably, its debut coincided with the rise of nucleotide and amino acid biosynthetic pathways.
About the speaker
Prof Gustavo Caetano-Anolles received his MSc in Biochemistry and PhD in Biochemical Sciences from Argentina National University of La Plata in 1979 and 1986 respectively. He joined the University of Tennessee in 1988 and worked for 10 years there before his move to University of Oslo as an Associate Professor in 1998. In 2003, Prof Caetano-Anolles joined University of Illinois at Urbana-Champaign and is currently the Professor of Bioinformatics.
Prof Caetano-Anolles’ research interests focus on the interaction of plants and microbes; spontaneous mutation process; evolution of biological networks; function and structure in proteins and non-coding RNA; and evolution of macromolecular structure.
Prof Caetano-Anolles was awarded the Education Publication Award in 1998 and the L.M. Ware Distinguished Research Award in 1999, both by the American Society for Horticultural Science. He was also awarded the International Scientist of the Year by the International Biographical Centre in 2006 and 2008. In 2010, he was elected the University Scholar by University of Illinois.
|