Annotation
Traditional genome annotation
BLAST Similarities
Traditional genome annotation BLAST Similarities
Traditional genome annotation BLAST Similarities
Traditional genome annotation BLAST Similarities
Traditional genome annotation BLAST Similarities
Traditional genome annotation BLAST Similarities
Traditional genome annotation BLAST Similarities
Traditional genome annotation BLAST Similarities
Traditional genome annotation BLAST Similarities
Traditional genome annotation BLAST Similarities
Traditional genome annotation BLAST Similarities
Traditional genome annotation BLAST Similarities
Traditional genome annotation BLAST Similarities
Protein Families
Gene Ontology Ontology A “hierarchy” of functions Does not need to be linear Directed Acyclic Graph Controlled Vocabulary Decides which words or phrases to use
GO Gene ontology A eukaryotic focus Drosophila Mus Saccharomyces Homo
GO Cellular component The parts of a cell Molecular function e.g. ligand binding Biological processes What things do
GO Terms [GO ID, function] e.g: GO: Ontology: molecular function Name: pyruvate kinase activity
GO Terms [GO ID, function] e.g: GO: Ontology: molecular function Name: pyruvate kinase activity Mainly assigned by BLAST/HMMER/... etc
Directed Acyclic Graph Molecular function Catalytic activity Transferase activity Transferase activity, transferring phosphorous Kinase activity phosphotransferase activity, alcohol group as acceptor Pyruvate kinase activity
Problems Annotation by committee Eukaryotic focus Some efforts to counter that Owen White Arriane Toussaint Not very deep Strict controlled vocabulary
Alternatives
lacZlacIlacYlacA Jacob & Monod, 1961 Basic biology
lacZlacIlacYlacA Basic biology
< 80 % Different types of clustering
< 80 % Different types of clustering
Purine metabolism
< 80 % Different types of clustering
Heme / chlorophyll metabolism is conserved They are both porphyrins
Actinobacteria Aquificae Bacteroidetes Chlamydiae Chloroflexi Cyanobacteria Deinococcus- Thermus Firmicutes Spirochaetes Thermotogae Proteobacteria Clusters of genes w/ maximum 80% identity Genes in subsystems in clusters Total number of genomes in group Fraction of genes in clusters Number of genomes Average Occurrence of clustering in different genomes
Subsystem is a generalization of “pathway” collection of functional roles jointly involved in a biological process or complex Functional Role is the abstract biological function of a gene product atomic, or user-defined, examples: 6-phosphofructokinase (EC ) LSU ribosomal protein L31p Streptococcal virulence factors Should not contain “putative”, “thermostable”, etc Populated subsystem is complete spreadsheet of functions and roles The Subsystems Approach to Annotation
Conversion of histidine to glutamate Functional roles defined in table Inclusion in subsystem is only by functional role Controlled vocabulary … Histidine Degradation
Column headers taken from table of functional roles Rows are selected genomes or organisms Cells are populated with specific, annotated genes Functional variants defined by the annotated roles Variant code -1 indicates subsystem is not functional Clustering shown by color OrganismVariant HutHHutUHutIGluFHutGNfoDForI Bacteroides thetaiotaomicron 1 Q8A4B3Q8A4A9Q8A4B1Q8A4B0 Desulfotela psychrophila 1 gi gi gi gi Halobacterium sp. 2 Q9HQD5Q9HQD8Q9HQD6Q9HQD7 Deinococcus radiodurans 2 Q9RZ06Q9RZ02Q9RZ05Q9RZ04 Bacillus subtilis 2 P10944P25503P42084P42068 Caulobacter crescentus 3 P58082Q9A9MIP58079Q9A9M0Q9A9L9 Pseudomonas putida 3 Q88CZ7Q88CZ6Q88CZ9Q88D00Q88CZ3 Xanthomonas campestris 3 Q8PAA7P58988Q8PAA6Q8PAA8Q8PAA5 Listeria monocytogenes Subsystem Spreadsheet
OrganismVariant HutHHutUHutIGluFHutGNfoDForI Bacteroides thetaiotaomicron 1 Q8A4B3Q8A4A9Q8A4B1Q8A4B0 Desulfotela psychrophila 1 gi gi gi gi Halobacterium sp. 2 Q9HQD5Q9HQD8Q9HQD6Q9HQD7 Deinococcus radiodurans 2 Q9RZ06Q9RZ02Q9RZ05Q9RZ04 Bacillus subtilis 2 P10944P25503P42084P42068 Caulobacter crescentus 3 P58082Q9A9MIP58079Q9A9M0Q9A9L9 Pseudomonas putida 3 Q88CZ7Q88CZ6Q88CZ9Q88D00Q88CZ3 Xanthomonas campestris 3 Q8PAA7P58988Q8PAA6Q8PAA8Q8PAA5 Listeria monocytogenes Subsystem Spreadsheet “The Populated Subsystem”
Wet lab Chromosomal context Metabolic context Phylogenetic context Microarray data Proteomics data … Subsystems developed based on
Three level “hierarchy” Amino Acids and Derivatives –Alanine, serine, and glycine Serine Biosynthesis Amino Acids and Derivatives –Lysine, threonine, methionine, and cysteine Methionine Biosynthesis Make your own subsystems! About 2,500 Subsystems
Growth in Subsystems Over Time
Classification # SS Classification # SS Classification# SS Experimental Subsystems 498Regulation and Cell signaling 51Motility and Chemotaxis 11 Clustering-based subsystems 352Virulence49Plant cell walls and outer surfaces 10 Carbohydrates160Stress Response43Phages10 Cofactors, Vitamins, Prosthetic Groups, Pigments 123DNA Metabolism41Cell Division and Cell Cycle 10 Amino Acids and Derivatives 96Aromatic Compounds38Photosynthesis9 Protein Metabolism95Phages36Metabolite damage8 Virulence, Disease, Defense 70Secondary Metabolism34Phosphorus Metabolism 7 Miscellaneous70Iron acquisition and metabolism 31Potassium metabolism4 RNA Metabolism65Nucleosides and Nucleotides 24Transcriptional regulation 2 Membrane Transport65Sulfur Metabolism20Plasmids2 Respiration62Dormancy and Sporulation 17Central metabolism2 Cell Wall and Capsule62Plant-prokaryote12Autotrophy2 Fatty Acids, Lipids, and Isoprenoids 60Nitrogen Metabolism12Arabinose Transport1
RAST usage grows...
RAST coverage....
RASTtk RAST2.0 Customizable choice of pipelines to run Same behind the scenes infrastructure
RASTtk