Computer Science Ph. D. Seminar Gene Ontology (GO) Based Search for Protein Structure Similarity Clustering Metrics Ph.D. Candidate Steve Johnson Committee.

Slides:



Advertisements
Similar presentations
Annotation of Gene Function …and how thats useful to you.
Advertisements

Applications of GO. Goals of Gene Ontology Project.
Www. GeneOntology.org Gene Ontology Collaboration.
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
European Bioinformatics Institute The Gene Ontology Annotation (GOA) Database and enhancement of GO annotations through InterPro2GO Nicky Mulder
Gene Ontology John Pinney
CACAO - Remote training Gene Function and Gene Ontology Fall 2011
1 Using Gene Ontology. 2 Assigning (or Hypothesizing About) Biological Meaning to Clusters What do you want to be able to to? –Identify over-represented.
COG and GO tutorial.
Biology 224 Dr. Tom Peavy Sept 27 & 29 Protein Structure & Analysis- part 2.
David Binns, * Emily Dimmer, Rachael Huntley, Daniel Barrell, Claire O'Donovan, and Rolf Apweiler.
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
Demonstration Trupti Joshi Computer Science Department 317 Engineering Building North (O)
Enzymatic Function Module (KEGG, MetaCyc, and EC Numbers)
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
Methods for Creating GO Annotations Emily Dimmer European Bioinformatics Institute Wellcome Trust Genome Campus Cambridge UK.
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Using The Gene Ontology: Gene Product Annotation.
Gene Ontology (GO) Project
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
The aims of the Gene Ontology project are threefold: - to compile vocabularies to describe components, functions and processes - to produce tools to query.
March 24, Integrating genomic knowledge sources through an anatomy ontology Gennari JH, Silberfein A, and Wiley JC Pac Symp Biocomputing 2005:
GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA.
Gene Ontology Consortium
I529: Lab5 02/20/2009 AI : Kwangmin Choi. Today’s topics Gene Ontology prediction/mapping – AmiGo –
The Gene Ontology: a real-life ontology, progress and future. Jane Lomax EMBL-EBI.
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
Gene Ontology Project
Grup.bio.unipd.it CRIBI Genomics group Erika Feltrin PhD student in Biotechnology 6 months at EBI.
Gene Ontology TM (GO) Consortium Jennifer I Clark EMBL Outstation - European Bioinformatics Institute (EBI), Hinxton, Cambridge CB10 1SD, UK Objectives:
Lecture Four: GO: The Gene Ontology ----Infrastructure for Systems Biology.
1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
DAVID R. SMITH DR. MARY DOLAN DR. JUDITH BLAKE Integrating the Cell Cycle Ontology with the Mouse Genome Database.
Integrating the Cell Cycle Ontology with the Mouse Genome Database David R. Smith Mary Dolan Dr. Judith Blake.
From Functional Genomics to Physiological Model: Using the Gene Ontology Fiona McCarthy, Shane Burgess, Susan Bridges The AgBase Databases, Institute of.
Manual GO annotation Evidence: Source AnnotationsProteins IEA:Total Manual: Total
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
24th Feb 2006 Jane Lomax GO Further. 24th Feb 2006 Jane Lomax GO annotations Where do the links between genes and GO terms come from?
DAVID R. SMITH DR. MARY DOLAN DR. JUDITH BLAKE Integrating the Cell Cycle Ontology with the Mouse Genome Database.
The Gene Ontology and its insertion into UMLS Jane Lomax.
Expanding GO annotations with text classification Nicko Goncharoff Reel Two, Inc.
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010.
To Boldly GO… Amelia Ireland GO Curator EBI, Hinxton, UK.
Gene Ontology Consortium
Statistical Testing with Genes Saurabh Sinha CS 466.
A collaborative tool for sequence annotation. Contact:
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
Scope of the Gene Ontology Vocabularies. Compile structured vocabularies describing aspects of molecular biology Describe gene products using vocabulary.
Getting GO: how to get GO for functional modeling Iowa State Workshop 11 June 2009.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Protein. Protein and Roles 1: biological process unknown 1.1 Structural categories 1.2 organism categories 1.3 cellular component o unlocalized.
Gene Ontology TM (GO) Consortium
Joined up ontologies: incorporating the Gene Ontology into the UMLS.
Gene Annotation & Gene Ontology May 24, Gene lists from RNAseq analysis What do you do with a list of 100s of genes that contain only the following.
` Comparison of Gene Ontology Term Annotations Between E.coli K12 Databases REDDYSAILAJA MARPURI WESTERN KENTUCKY UNIVERSITY.
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
Gene Annotation & Gene Ontology
Annotating with GO: an overview
GO : the Gene Ontology & Functional enrichment analysis
Sequence based searches:
Statistical Testing with Genes
Department of Genetics • Stanford University School of Medicine
Modified from slides from Jim Hu and Suzi Aleksander Spring 2016
Annotation Presentation
Statistical Testing with Genes
Presentation transcript:

Computer Science Ph. D. Seminar Gene Ontology (GO) Based Search for Protein Structure Similarity Clustering Metrics Ph.D. Candidate Steve Johnson Committee Members Dr. Debasis Mitra, Dr. Philip Bernhard, Dr. Walter Bond, Dr. Julia Grimwade Date: September 12, 2011

Gene Ontology (GO) Based Search for Protein Structure Similarity Clustering Metrics GO Background GO Subontologies GO Annotations GO Relationships GO Tools GO Research Research Direction

Gene Ontology Background The Gene Ontology (GO), provides a consistent vocabulary for describing the attributes of proteins, specifically molecular function, biological process and the cellular component where the protein is found.

Gene Ontology Background GO Consortium Berkley Bioinformatics Open Source Project (BBOP) British Heart Foundation EcoliWiki Flybase GeneDB UniProtKB-GOA Univ. of Maryland – IGS Mouse Genome Informatics (MGI) Rat Genome Database (RGD) Saccharomyces Genome Database (SGD) The Arabidopsis Information Resource (TAIR) WormBase

Gene Ontology Background GO Consortium GO terms o A set of integer IDs (i.e., GO terms) is assigned to members of the GO Consortium GO Consortium members o provide annotations o attend all meetings, o receive funding for supported databases

Gene Ontology Project Facts Started in 1998 Primary Goals o Structured Vocabulary o Use to annotate genes and gene products 3 Model Organisms o FlyBase (Drosophila) o Saccharomyces Genome Database (SGD) o Mouse Genome Informatics (MGI) project

Gene Subontologies Three Ontology Structure Biological Process Molecular Function Cellular component

Gene Subontologies Biological Process Biological process refers to the series of steps or sequence of molecular functions. Examples of biological processes include the following. Metabolic Process Photosynthetic Process Biosynthetic Process

Gene Subontologies Molecular Function Molecular Function refers to describing the purpose of the gene product and refers to a single function (i.e., unlike biological process). Examples of molecular function include the following. Binding Activity Transport Activity Receptor Activity

Gene Subontologies Cellular Component Cellular component refer to identifying the location of the gene product within the structure of the cell. Examples of cellular components include the following. Organelle Part Cell Body Membrane Apical Complex

Example Term: Glucose Biosynthetic Process ID: GO: Definition: The formation of glucose from noncarbohydrate precursors, such as pyruvate, amino acids and glycerol. GO Annotations GO Annotation Terms

Molecular Function 8637 terms Biological Process 17,069 terms Cellular Component 2432 terms Total 28, 138 terms GO Annotations GO Annotation Term Statistics As of September 2009

GO Annotations GO Annotation Methods Electronic Annotation Manual Annotation All annotations o Source o Supportive evidence

Manual Annotation Primary source is published literature Curators perform sequence similarity analyses to transfer annotations between highly similar gene products (BLAST, protein domain analysis) GO Annotations GO Annotation Methods

Electronic Annotation Database entries o Manual mapping of GO terms to concepts external to GO (‘translation tables’) o Proteins then electronically annotated with the relevant GO term(s) Automatic sequence similarity analyses to transfer annotations between highly similar gene products GO Annotations GO Annotation Methods

1A71 Liver Alcohol Dehydrogenase GO Annotations GO Annotation Example Cellular component: Mitochondria GO: Biological Process: Ethanol Catabolic Process GO: Molecular Function: Oxireductase Activity

GO Annotations Sample Annotations GO Consortium members provide gene annotation data based on information obtained from research quality articles. The information extracted from the articles are described as “Annotation Sets” Sample Annotation Sets

GO Annotations File Format The Gene Ontology website represents the annotation data in graphical format. It is part of the Open Biomedical Ontologies (OBO), Current Species/Database Annotations Annotation File Format (GAF 2.0)

GO Annotations Evidence Code Categories The information in the annotation file includes evidence information which serves as a source to validate /the annotation information. Experimental Evidence Codes Computational Analysis Evidence Codes Author Statement Evidence Codes Curator Statement Evidence Codes

GO Annotations GO Slims GO Slims GO Slims are subsets of GO annotation information that provide broader classification of terms. GO Slim Application Example

GO Relationships A graph structure is used to establish relationship amongst the terms for molecular function, biological process, and cellular component features.graph structure Primary Ontology Relations is a part of regulates

Gene Ontology Background GO Mappings to EC Numbers Enzyme Commission numbers are used to specify categories of enzymes based on the chemical reactions catalyzed. The UniProtKB-GOA EC2GO mapping provides GO molecular function IDs for each classificationUniProtKB-GOA EC2GO EC1 - Oxidoreductases EC2 - Transferases EC3 - Hydrolases EC4 - Lyases EC5 – Isomerases EC 6 - Ligases

GO Tools Amigo OBO – Edit QuickGO Goanna agriGO

Gene Ontology Database MySQL Querying GO MySQL o SQL o Perl o GHOUL

Gene Ontology Interesting Research GO Annotation Consistency Automated Annotation Biocreative CLUGO Similarity Prediction Method Automated Protein Function Predictions Search for Genes w/ Similar Function Semantic Similarity

Dissertation Research Hypothesis There exists protein alignment metrics/algorithms that can be used as clustering indexes for proteins with matching GO molecular functions IDs

Gene Ontology References Evelyn B Camon, Daniel G Barrell, Emily C Dimmer, Vivian Lee, Michele Magrane, John Maslen, David Binns and Rolf Apweiler; An evaluations of GO annotation retrieval for BioCreAtIvE and GOA. BMC Bioinformatices (Supplement 1): S17. Mary E. Dolan, Li Ni, Evelyn Camon and Judith A. Blake; A procedure for assessing GO annotation consistency. Bioinformatics (Supplement 1): i136 – i143. In-Yee Lee, Jan-Ming Ho, Ming-Syan Chen; CLUGO: A Clustering Algorithm for Automated Functional Annotations Based on Gene Ontology. Proceedings of the 5 th IEEE International Conference on Data Mining (ICDM, 05): i136 – i143. Gene Ontology Consortium; The Gene Ontology in 2010: extensions and refinements. Nucleic Acids Research, Evelyn Camon, Michele Magrane, Daniel Barrell, Vivian Lee, Emily Dimmer, John Maslen, David Binns, Nicola Harte, Rodrigo Lopez and Rolf Apweiler; The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Research, 2004 (32).

Gene Ontology References Gene Ontology Consortium; The Gene Ontology (GO) database and informatics resource. Nucleic Acids Research, 2004 (32). Seth Carbon1, Amelia Ireland2, Christopher J. Mungall, ShengQiang Shu, Brad Marshall, Suzanna Lewis; Amigo: online access to ontology and annotation data. Bioinformatics Application Note. 22 (2), 2009: 288 – 289.