Annotation. Traditional genome annotation BLAST Similarities.

Slides:



Advertisements
Similar presentations
Molecular Basis for Relationship between Genotype and Phenotype DNA RNA protein genotype function organism phenotype DNA sequence amino acid sequence transcription.
Advertisements

Atoms Made of protons, neutrons, and electrons Protons and neutrons are clustered as the nucleus Electrons orbit the nucleus in energy levels or shells.
What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards Fellowship.
5’ C 3’ OH (free) 1’ C 5’ PO4 (free) DNA is a linear polymer of nucleotide subunits joined together by phosphodiester bonds - covalent bonds between.
Gene Ontology John Pinney
Basics of Molecular Biology
Intro to Molecular Genetics RNA & Protein Synthesis 3/16/2011.
Community Annotation of Gene Function with GONUTS Jim Hu EcoliHub/EcoliWiki Dept. of Biochemistry and Biophysics Texas A&M University.
Gene Ontology Luis Tari. Gene Ontology (GO) URL: Gene Ontology is A hierarchy of roles of genes.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Microbial response to changing environments. Changes in physiology Inherited reversible changes.
Functional annotation and network reconstruction through cross-platform integration of microarray data X. J. Zhou et al
Alternative splicing and evolution Daniel Jeffares.
Internet tools for genomic analysis: part 2
How We Annotated Genomes for Free: Fast and Accurate Functional Analysis Using Subsystems Technology Rob Edwards Depts of Computer Science And Biology,
Amino Acids Metabolism Amino Acids Metabolism 2 nd Year Medicine By Eman Mokbel Alissa, Ph.D.
Annotations, Subsystems based approach Rob Edwards Argonne National Labs San Diego State University.
Analysis of GO annotation at cluster level by H. Bjørn Nielsen Slides from Agnieszka S. Juncker.
Nitrogen Metabolism 1. Nitrogen Fixation 2. Amino Acid Biosynthesis.
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
Chorismate is an important precursor for aromatic amino acids Derived from PEP and erythrose 4- phosphate First branch point of pathways, one leading to.
Genome of the week - Deinococcus radiodurans Highly resistant to DNA damage –Most radiation resistant organism known Multiple genetic elements –2 chromosomes,
From DNA to Proteins Lesson 1. Lesson Objectives State the central dogma of molecular biology. Describe the structure of RNA, and identify the three main.
MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics.
Tools for comparative genomics and expert annotations.
SPH 247 Statistical Analysis of Laboratory Data 1May 14, 2013SPH 247 Statistical Analysis of Laboratory Data.
Figure 16.0 Watson and Crick. Figure 16.0x James Watson.
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
Fig 17.8 Biosynthesis of amino acids
National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana.
Predicting protein degradation rates Karen Page. The central dogma DNA RNA protein Transcription Translation The expression of genetic information stored.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
Analysis of GO annotation at cluster level by Agnieszka S. Juncker.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
DNA Pretest! Yes, I know I am a little late… Take out a separate sheet of paper Name Date Period DNA Pretest.
Gene Ontology Consortium
Transcription and Translation
Scope of the Gene Ontology Vocabularies. Compile structured vocabularies describing aspects of molecular biology Describe gene products using vocabulary.
GO-Slim term Cluster frequency cytoplasm 1944 out of 2727 genes, 71.3% 70 out of 97 genes, 72.2% out of 72 genes, 86.1% out.
Supplementary Material 3 Gene ontology annotation of cellular component, molecular function and biological processes for both hypoxia and NAP supplemented.
Introduction to Molecular Biology
Amino Acid Synthesis Essential Amino Acids : amino acids that cannot be synthesized by the organism at a rate sufficient to meet the normal requirements.
Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
DANDY Deoxyribonucleic Acid ALL CELLS HAVE DNA… Cells are the basic unit of structure and function of all living things. –Prokaryotes (bacteria) –Eukaryotes.
Protein. Protein and Roles 1: biological process unknown 1.1 Structural categories 1.2 organism categories 1.3 cellular component o unlocalized.
Gene Ontology TM (GO) Consortium
Related Pathways Anaerobic Pathways (4.4) & Alternatives To Glucose (4.3)
Gene Annotation & Gene Ontology May 24, Gene lists from RNAseq analysis What do you do with a list of 100s of genes that contain only the following.
Genomics Lecture 3 By Ms. Shumaila Azam. Proteins Proteins: large molecules composed of one or more chains of amino acids, polypeptides. Proteins are.
Introduction to Molecular Biology. MOLECULAR BIOLOGY.
The SEED Family First bacterial genome 100 bacterial genomes 1,000 bacterial genomes Number of known sequences Year How.
All genes with pathway annotation (23885)
CH 12.3 RNA & Protein Synthesis.
GO : the Gene Ontology & Functional enrichment analysis
Down-regulated genes in evolved normomutable variants
RNA Ribonucleic Acid.
The Mimivirus Giant double stranded DNA virus Discovered in amoebas
Bellwork: Tues. Nov. 28, 2017 What is each number?
20.2 Gene Expression & Protein Synthesis
Analysis of GO annotation at cluster level by Agnieszka S. Juncker
A Figure 1 A PROTECTION Dehydrins / LEAs /HSPs Defense
Prediction of protein function from sequence analysis
Translation.
Schematic of cellular role categories of theoretical (open bars) and identified proteins on a 2-D electrophoresis gel, pH 4–7 (black bars), in L. casei.
Pangenomes and core genomes of 13 M. florum strains.
Annotations, Subsystems based approach
Automated Read-based Metagenomic Analysis Pipeline (ARMAP)
Milk-associated proteomes.
Presentation transcript:

Annotation

Traditional genome annotation

BLAST Similarities

Traditional genome annotation BLAST Similarities

Traditional genome annotation BLAST Similarities

Traditional genome annotation BLAST Similarities

Traditional genome annotation BLAST Similarities

Traditional genome annotation BLAST Similarities

Traditional genome annotation BLAST Similarities

Traditional genome annotation BLAST Similarities

Traditional genome annotation BLAST Similarities

Traditional genome annotation BLAST Similarities

Traditional genome annotation BLAST Similarities

Traditional genome annotation BLAST Similarities

Traditional genome annotation BLAST Similarities

Protein Families

Gene Ontology Ontology  A “hierarchy” of functions  Does not need to be linear Directed Acyclic Graph Controlled Vocabulary  Decides which words or phrases to use

GO Gene ontology  A eukaryotic focus Drosophila Mus Saccharomyces Homo

GO Cellular component  The parts of a cell Molecular function  e.g. ligand binding Biological processes  What things do

GO Terms [GO ID, function] e.g:  GO:  Ontology: molecular function  Name: pyruvate kinase activity

GO Terms [GO ID, function] e.g:  GO:  Ontology: molecular function  Name: pyruvate kinase activity Mainly assigned by BLAST/HMMER/... etc

Directed Acyclic Graph Molecular function Catalytic activity Transferase activity Transferase activity, transferring phosphorous Kinase activity phosphotransferase activity, alcohol group as acceptor Pyruvate kinase activity

Problems Annotation by committee Eukaryotic focus  Some efforts to counter that Owen White Arriane Toussaint Not very deep Strict controlled vocabulary

Alternatives

lacZlacIlacYlacA Jacob & Monod, 1961 Basic biology

lacZlacIlacYlacA Basic biology

< 80 % Different types of clustering

< 80 % Different types of clustering

Purine metabolism

< 80 % Different types of clustering

Heme / chlorophyll metabolism is conserved They are both porphyrins

Actinobacteria Aquificae Bacteroidetes Chlamydiae Chloroflexi Cyanobacteria Deinococcus- Thermus Firmicutes Spirochaetes Thermotogae Proteobacteria Clusters of genes w/ maximum 80% identity Genes in subsystems in clusters Total number of genomes in group Fraction of genes in clusters Number of genomes Average Occurrence of clustering in different genomes

Subsystem is a generalization of “pathway”  collection of functional roles jointly involved in a biological process or complex Functional Role is the abstract biological function of a gene product  atomic, or user-defined, examples: 6-phosphofructokinase (EC ) LSU ribosomal protein L31p Streptococcal virulence factors Should not contain “putative”, “thermostable”, etc Populated subsystem is complete spreadsheet of functions and roles The Subsystems Approach to Annotation

Conversion of histidine to glutamate Functional roles defined in table Inclusion in subsystem is only by functional role Controlled vocabulary … Histidine Degradation

Column headers taken from table of functional roles Rows are selected genomes or organisms Cells are populated with specific, annotated genes Functional variants defined by the annotated roles Variant code -1 indicates subsystem is not functional Clustering shown by color OrganismVariant HutHHutUHutIGluFHutGNfoDForI Bacteroides thetaiotaomicron 1 Q8A4B3Q8A4A9Q8A4B1Q8A4B0 Desulfotela psychrophila 1 gi gi gi gi Halobacterium sp. 2 Q9HQD5Q9HQD8Q9HQD6Q9HQD7 Deinococcus radiodurans 2 Q9RZ06Q9RZ02Q9RZ05Q9RZ04 Bacillus subtilis 2 P10944P25503P42084P42068 Caulobacter crescentus 3 P58082Q9A9MIP58079Q9A9M0Q9A9L9 Pseudomonas putida 3 Q88CZ7Q88CZ6Q88CZ9Q88D00Q88CZ3 Xanthomonas campestris 3 Q8PAA7P58988Q8PAA6Q8PAA8Q8PAA5 Listeria monocytogenes Subsystem Spreadsheet

OrganismVariant HutHHutUHutIGluFHutGNfoDForI Bacteroides thetaiotaomicron 1 Q8A4B3Q8A4A9Q8A4B1Q8A4B0 Desulfotela psychrophila 1 gi gi gi gi Halobacterium sp. 2 Q9HQD5Q9HQD8Q9HQD6Q9HQD7 Deinococcus radiodurans 2 Q9RZ06Q9RZ02Q9RZ05Q9RZ04 Bacillus subtilis 2 P10944P25503P42084P42068 Caulobacter crescentus 3 P58082Q9A9MIP58079Q9A9M0Q9A9L9 Pseudomonas putida 3 Q88CZ7Q88CZ6Q88CZ9Q88D00Q88CZ3 Xanthomonas campestris 3 Q8PAA7P58988Q8PAA6Q8PAA8Q8PAA5 Listeria monocytogenes Subsystem Spreadsheet “The Populated Subsystem”

Wet lab Chromosomal context Metabolic context Phylogenetic context Microarray data Proteomics data … Subsystems developed based on

Three level “hierarchy” Amino Acids and Derivatives –Alanine, serine, and glycine Serine Biosynthesis Amino Acids and Derivatives –Lysine, threonine, methionine, and cysteine Methionine Biosynthesis Make your own subsystems! About 2,500 Subsystems

Growth in Subsystems Over Time

Classification # SS Classification # SS Classification# SS Experimental Subsystems 498Regulation and Cell signaling 51Motility and Chemotaxis 11 Clustering-based subsystems 352Virulence49Plant cell walls and outer surfaces 10 Carbohydrates160Stress Response43Phages10 Cofactors, Vitamins, Prosthetic Groups, Pigments 123DNA Metabolism41Cell Division and Cell Cycle 10 Amino Acids and Derivatives 96Aromatic Compounds38Photosynthesis9 Protein Metabolism95Phages36Metabolite damage8 Virulence, Disease, Defense 70Secondary Metabolism34Phosphorus Metabolism 7 Miscellaneous70Iron acquisition and metabolism 31Potassium metabolism4 RNA Metabolism65Nucleosides and Nucleotides 24Transcriptional regulation 2 Membrane Transport65Sulfur Metabolism20Plasmids2 Respiration62Dormancy and Sporulation 17Central metabolism2 Cell Wall and Capsule62Plant-prokaryote12Autotrophy2 Fatty Acids, Lipids, and Isoprenoids 60Nitrogen Metabolism12Arabinose Transport1

RAST usage grows...

RAST coverage....

RASTtk RAST2.0 Customizable choice of pipelines to run Same behind the scenes infrastructure

RASTtk