Journal Club Jenny Gu October 24, 2006
Introduction Defining the subset of Superfamilies in LUCA Examine adaptability and expansion of particular superfamilies of LUCA related to function and genome size. Challenged Woese’s Annealing hypothesis.
3-D Structural Comparison Domain Similarity Defined by: SSAP Dynamic Programming based Structure Comparison Algorithm CORA Comparison to 3D templates for each Superfamily. Manual Inspection. Profile based approaches Detect sequence patterns between relatives Functional Information Public resources (COGs, GO, KEGG) and literature Expect Curators Methods
Genome Structural Annotation and Occurrence Profiles Dataset: 114 complete genomes. 100 Prokaryotic Genomes 85 Bacteria, 15 Archeobacteria species 14 Eukaryotic Genomes Structural Annotation CATH HMMs -> Gene3D database. Superfamily Domain Occurrence Profiles (Prokaryotes) 940/1278 CATH domain present in at least one genome. Annotation Coverage: 50% of genes. Methods
Ancestral Superfamily Set Selection Defined by: Present in at least 90% of species from all kingdoms. Present in at least 70% archaeal and eukaryotic species. Definition avoids selection of superfamilies overrepresented in Bacteria but poorly represented in smaller groups. Flexibility for considering false-negative prediction error with sequence based approach. Guarantee selection of families in LUCA. Eliminate error introduced by horizontal gene transfer. Methods
Functional Annotation Automatic Functional Annotation for 940 structural superfamilies annotated in 100 prokaryotic species with COG. Superfamily functionally classified according to statistically most represented functional COG subcategory. 726/940 superfamilies annotated in COG (5% or more of species, at least 5 genes) For ancestral superfamily, further annotation with Pfam and literature. Methods
Definition of the Superfamily Functional Groups COG has six functional groups Translation Replication Metabolism Cellular Process Transcription Poorly Characterized Not considered RNA processing and modificaton Chromatin structure and dynamics Methods
Superfamily Functional Distribution in the Ancestral Domain Set 140 superfamilies found in all organisms of the three main kingdoms (Bacteria, Archaea, and Eukaryotes) 15% of Superfamilies, 55% of all domains in bacterial genes, and 18% of all domains in eukaryotes. Results and Discussion
Superfamily Functional Distribution in the Ancestral Domain Set (cont..) Representatives in all six COG functional groups. Translation (48 superfamilies) and Metabolic (46 superfamilies) comprise majority of ancestral domains. Metabolism (385 superfamilies) has undergone a higher expansion than translation (90 superfamilies). Results and Discussion
Analysis of the Cellular Functions of Ancestral CATH Superfamilies in the LUCA Two issues in defining ancestry: Domain ubiquity through all species. Probable functions such domains could have performed in LUCA. Results and Discussion
Analysis of the Cellular Functions of Ancestral CATH Superfamilies in the LUCA Results and Discussion
Analysis of the Cellular Functions of Ancestral CATH Superfamilies in the LUCA Results and Discussion Interconversion of sugars and synthesis of polysaccharides. Synthesis of ATP and partial equilibrium of NAD/NADH Part of the Calvin Cycle Pentose phosphate pathway Acetyl-CoA for cholesterol and/or steroids and synthesis and degradation of fatty acids. Part of the Krebs Cycle
Analysis of the Cellular Functions of Ancestral CATH Superfamilies in the LUCA Results and Discussion Nucleotide metabolism incomplete. Two alternatives for LUCA Synthesized nucleotides by de novo pathways Incorporated from surrounding soup. Enzyme for interconversion of nucleoside monophosphates are present.
Analysis of the Cellular Functions of Ancestral CATH Superfamilies in the LUCA Results and Discussion DNA synthesis, repair, ligation, and modification are represented. Synthesis of RNA and DNA transcription represented. Domain related to robosomal partical and protein synthesis are abundant. Methyl Transfer Proteins
Analysis of the Cellular Functions of Ancestral CATH Superfamilies in the LUCA Results and Discussion Membrane and Cell wall biogenesis Transduction of protein-protein signals and gene regulation Protein signal recognitio for protein transport Cell division Electron transport And ATP synthase
Universal Distribution Percentage of Superfamilies Universal Distribution Percentages Superfamily occurrence profiles derived from the prokaryotic sample (Archaea and Bacteria) 100% = Superfamily present in all species. 0% = Superfamily has highly specific distribution in just a few species. Methods
Ancestry and Evolutionary Temperature Results and Discussion
Ancestry and Evolutionary Temperature Results and Discussion
Superfamily Duplication Rates and Functional Diversification Another measure to gauge evolutionary temperature. Number of homologues within a superfamily. Observed high correlation with duplication and functional diversification. Results and Discussion
Superfamily Duplication Rates and Functional Diversification High universality spans across more function subcategories. Metabolism has a higher duplication rate and functional diversification than translation. Results and Discussions
Genome Size Correlation and the Coefficient of Interspecies Gene Variation (CIGV) of Superfamilies Domain occurrence profiles from 100 prokaryotic sample. Correlation coefficients between occurrence and genome size. (compared to randomly generated null model.) CIGV calculated by dividing standard deviation over all values of occurrence profile for a given superfamily. Methods
Statistical Analysis of Superfamily Distributions Kolmogorov-Smirnov two-sample test in the two- tailed version for large samples. Compared pairs of distribution between different functional groups. Methods
Superfamily Occurrence Profiles and Genome Size Correlation Results and Discussions
Superfamily Occurrence Profiles and Genome Size Correlation Results and Discussions
Superfamily Occurrence Profiles and Genome Size Correlation Results and Discussions
Superfamily Coefficient of Interspecies Gene Variation Results and Discussions High CIGV values = more adaptable. Hotter evolutionary temperature Low CIGV values = less adaptable.
Superfamily Coefficient of Interspecies Gene Variation Results and Discussions
Rates of Superfamily Innovation in the Functional Groups Results and Discussions Poor Innovation High Innovation
Conclusions A more realistic distribution of superfamilies in distant species. Life achived modern cellular status long before separation of three kingdoms. Woese’s annealing hypothesis called into question. A function of specific features and adaptabilities versus time.