“Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still.

Slides:



Advertisements
Similar presentations
The Gene Ontology Project: Content for the Semantic Web.
Advertisements

Www. GeneOntology.org Gene Ontology Collaboration.
Gene Ontology John Pinney
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
1 Using Gene Ontology. 2 Assigning (or Hypothesizing About) Biological Meaning to Clusters What do you want to be able to to? –Identify over-represented.
COG and GO tutorial.
Thomas Bittner and Barry Smith IFOMIS (Saarbrücken) Normalizing Medical Ontologies Using Basic Formal Ontology.
Genes. Outline  Genes: definitions  Molecular genetics - methodology  Genome Content  Molecular structure of mRNA-coding genes  Genetics  Gene regulation.
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
Toward Making Online Biological Data Machine Understandable Cui Tao Data Extraction Research Group Department of Computer Science, Brigham Young University,
Ayesha Masrur Khan Spring Course Outline Introduction to Bioinformatics Definition of Bioinformatics and Related Fields Earliest Bioinformatics.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Automatic methods for functional annotation of sequences Petri Törönen.
Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Using The Gene Ontology: Gene Product Annotation.
Manifestations of a Code Genes, genomes, bioinformatics and cyberspace – and the promise they hold for biology education.
Chapter 5 Genome Sequences and Gene Numbers. 5.1Introduction  Genome size vary from approximately 470 genes for Mycoplasma genitalium to 25,000 for human.
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
The aims of the Gene Ontology project are threefold: - to compile vocabularies to describe components, functions and processes - to produce tools to query.
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
March 24, Integrating genomic knowledge sources through an anatomy ontology Gennari JH, Silberfein A, and Wiley JC Pac Symp Biocomputing 2005:
GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA.
Gene Ontology Consortium
COURSE OF BIOINFORMATICS Exam_31/01/2014 A.
The Gene Ontology: a real-life ontology, progress and future. Jane Lomax EMBL-EBI.
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
Gene Ontology Project
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
Gene Ontology TM (GO) Consortium Jennifer I Clark EMBL Outstation - European Bioinformatics Institute (EBI), Hinxton, Cambridge CB10 1SD, UK Objectives:
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
Cell Signaling Ontology Takako Takai-Igarashi and Toshihisa Takagi Human Genome Center, Institute of Medical Science, University of Tokyo.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
DAVID R. SMITH DR. MARY DOLAN DR. JUDITH BLAKE Integrating the Cell Cycle Ontology with the Mouse Genome Database.
Integrating the Cell Cycle Ontology with the Mouse Genome Database David R. Smith Mary Dolan Dr. Judith Blake.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
DAVID R. SMITH DR. MARY DOLAN DR. JUDITH BLAKE Integrating the Cell Cycle Ontology with the Mouse Genome Database.
The Gene Ontology and its insertion into UMLS Jane Lomax.
Protein and RNA Families
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Mark D. Adams Dept. of Genetics 9/10/04
Proteomics Session 1 Introduction. Some basic concepts in biology and biochemistry.
Gene Ontology Consortium
Central dogma: the story of life RNA DNA Protein.
EB3233 Bioinformatics Introduction to Bioinformatics.
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
Scope of the Gene Ontology Vocabularies. Compile structured vocabularies describing aspects of molecular biology Describe gene products using vocabulary.
The iPlant Collaborative Vision Enable life science researchers and educators to use and extend cyberinfrastructure.
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,
The iPlant Collaborative Vision Enable life science researchers and educators to use and extend cyberinfrastructure.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Gene Ontology TM (GO) Consortium
Joined up ontologies: incorporating the Gene Ontology into the UMLS.
Eukaryotic genes are interrupted by large introns. In eukaryotes, repeated sequences characterize great amounts of noncoding DNA. Bacteria have compact.
Using DNA Subway in the Classroom Genome Annotation: Red Line.
Gene Annotation & Gene Ontology May 24, Gene lists from RNAseq analysis What do you do with a list of 100s of genes that contain only the following.
` Comparison of Gene Ontology Term Annotations Between E.coli K12 Databases REDDYSAILAJA MARPURI WESTERN KENTUCKY UNIVERSITY.
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
The Transcriptional Landscape of the Mammalian Genome
Annotating with GO: an overview
Department of Genetics • Stanford University School of Medicine
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
Evolution of eukaryote genomes
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

“Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still be done to collect data and create the tools to analyse it. Bioinformatics, which provides the tools to extract and combine knowledge from isolated data, gives us ways to think about the vast amounts of information now available. It is changing the way biologists do science.” A report to Harold Varmus, June

3 Kilobytes 6 Megabytes 9 Terabytes 12 Petabytes 15 Exabytes 18 Zettabytes 21 Yottabytes

GAATTCCCGGTTCAATCTCGTAGAACTTGCCCTTGGTGGACAGTGGGACGTACAACACCTGCCGGTTTTCATTAAGCAGCTG GGCATACTTCTTTTCCTTCTCCCTTCCCATGTACCCACTGCCATGGGACCTGGTCGCATTGCCGTTGCCATGTTGCGACATA TTGACCTGATCCTGTTTGCCATCCTCGAAGACGGCCAACAGACGGAATACCTGCCCGCCCCTTGCCGTCGTTTTCACGTACT GTGGTCGTCCCTTGTTTATGGGCAGGCATCCCTCGTGCGTTGGACTGCTCGTACTGTTGGGCGAGGATTCCGTAAACGCCGG CATGTTGTCCACTGAGACAAACTTGTAAACCCGTTCCCGAACCAGCTGTATCAGAGATCCGTATTGTGTGGCCGTGGGGAGA CCCTTCTCGCTTAGCATCGAAAAGTAACCTGCGGGAATTCCACGGAAATGTCAGGAGATAGGAGAAGAAAACAGAACAACAG CAAATACTGAGCCCAAATGAGCGATAGATAGATAGATCGTGCGGCGATCTCGTACTGGTAACTGGTAATTTGATCGATTCAA ACGATTCTGGGTCTCCCCGGTTTTCTGGTTCTGGCTTACGATCGGGTTTTGGGCTTTGGTTGTGGCCTCCAGTTCTCTGGCT CGTTGCCTGTGCCAATTCAAGTGCGCATCCGGCCGTGTGTGTGGGCGCAATTATGTTTATTTACTGGTAACTGGTAATTTGA TCGATTCAAACGATTCTGGGTCTCCCCGGTTTTCTGTCCCGGTTCAATCTCGTAGAACTTGCCCTTGGTGGACAGTGGGACG TACAACACCTGCCGGTTTTCATTAAGCAGCTGGGCATACTTCTTTTCCTTCTCCCTTCCCATGTACCCACTGCCATGGGACC TGGTCGCATTGCCGTTGCCATGTTGCGACATATTGACCTGATCCTGTTTGCCATCCTCGAAGACGGCCAACAGACGGAATAC CTGCCCGCCCCTTGCCGTCGTTTTCACGTACTGTGGTCGTCCCTTGTTAAAGTAACCTGCGGGAATTCCACGGAAATGTCAG GAGATAGGAGAAGAAAACAGAACAACAGCAAATACTGAGCCCAAATGAGCGATAGATAGATAGATCGTGCGGCGATCTCGTA CTGGTAACTGGTAATTTGATCGATTCAAACGATTCTGGGTCTCCCCGGTTTTCTGGTTCTGGCTTACGATCGGGTTTTGGGC TTTGGTTGTGGCCTCCAGTTCTCTGGCTCGTTGCCTGTGCCAATTCAAGTGCGCATCCGGCCGTGTGTGTGGGCGCAATTAT GTTTATTTACTGGTAACTGGTAATTTGATCGATTCAAACGATTCTGGGTCTCCCCGGTTTTCTGTCCCGGTTCAATCTCGTA GAACTTGCCCTTGGTGGACAGTGGGACGTACAACACCTGCCGGTTTTCATTAAGCAGCTGGGCATACTTCTTTTCCTTCTCC CTTCCCATGTACCCACTGCCATGGGACCTGGTCGCATTGCCGTTGCCATGTTGCGACATATTGACCTGATCCTGTTTGCCAT CCTCGAAGACGGCCAACAGACGGAATACCTGCCCGCCCCTTGCCGTCGTTTTCACGTACTGTGGTCGTCCCTTGTTTATGGG CAGGCATCCCTCGTGCGTTGGACTGCTCGTACTGTTGGGCGAGGATTCCGTAAACGCCGGCATGTTGTCCACTGAGACAAAC TTGTAAACCCGTTCCCGAACCAGCTGTATCAGAGATCCGTATTGTGTGGCCGTGGGGAGACCCTTCTCGCTTAGCATCGAAA AGCTTACGATCGGGTTTTGGGCTTTGGTTGTGGCCTCCAGTTCTCTGGCTCGTTGCCTGTGCCAATTCAAGTGCGCATCCGG CCGTGTGTGTGGGCGCAATTATGTTTATTTACTGGTAACTGGTAATTTGATCGATTCAAACGATTCTGGGTCTCCCCGGTTT TCTGTCCCGGTTCAATCTCGTAGAACTTGCCCTTGGTGGACAGTGGGACGTACAACACCTGCCGGTTTTCATTAAGCAGCTG GGCATACTTCTTTTCCTTCTCCCTTCCCATGTACCCACTGCCATGGGACCTGGTCGCATTGCCGTTGCCATGTTGCGACATA TTGACCTGATCCTGTTTGACTGGTAACTGGTAATTTGATCGATTCAAACGATTCTGGGTCTCCCCGGTTTTCTGTCCCGGTT CAATCTCGTAGAACTTGCCCTTGGTGGACAGTGGGACGTACAACACCTGCCGGTTTTCATTAAGCAGCTGGGCATACTTCTT TTCCTTCTCCCTTCCCATGTACCCACTGCCATGGGACCTGGTCGCATTGCCGTTGCCATGTTGCGACATATTGACCTGATCC TGTTTGCCATCCTCGAAGACGGCCAACAGACGGAATACCTGCCCGCCCCTTGCCGTCGTTTTCACGTACTGTGGTCGTCCCT TGTTTATGGGCAGGCATCCCTCGTGCGTTGGACTGCTCGTACTGTTGGGCGAGGATTCCGTAAACGCCGGCATGTTGTCCAC TGAGACAAACTTGTAAACCCGTTCCCGAACCAGCTGTATCAGAGATCCGTATTGTGTGGCCGTGGGGAGACCCTTCTCGCTT AGCATCGAAAAGTAACCTGCGGGAATTCCACGGAAATGTCAGGAGATAGGAGAAGAAAACAGAACAACAGCAAATACTGTGC GGCGATCTCGTACTGGACGGAAATGTCAGGAGATAGGAGAAGAAAA

Nucleotide sequence database

The Human Proteome ~ 30,000 protein coding genes Expansion of the number of different protein molecules due to: –(a) alternative splicing (30 to 50% increase); –(b) post-translational modifications (5 to 10 fold increase) There could well be about 1 million different protein molecules in the human body

Annotated genome Annotation Depth of knowledge Breadth of knowledge Detailed analysis (typically biological) of single genes Large-scale analysis (typically computational) of entire genome

The two major methods of gene prediction sequence comparison ab initio

Approaches to gene finding: Generalized hidden Markov models

Limitations of Gene Prediction Programs Good at predicting ORF-containing sequence Prediction of exact exon-intron boundaries difficult Fuse & split genes Cannot predict UTRs Cannot predict nested genes

Computational Analysis Fly Alignments Known genes/cDNAs ESTs Transposons Cross-species Sequence Similarities Proteins & ESTs Fly Primate Rodent Worm Yeast Plant Other Insects Other Vertebrates Other Invertebrates Gene Predictions Genie Genscan tRNAscan-SE

Drosophila Gene Collection 1 Pavel Tomancak

Embryonic expression of wild-type eve (rust) and a transgene containing the stripe tertiary element (blue) Alignment of eve 5’ regulatory region D. melanogaster vs (A) D.erecta (B) D.pseudoobscura (C) D. willistoni and (D) D.littoralis stripe eve

Gene_Ontology FlyBase - Drosophila - Cambridge & EBI, Harvard Berkeley & Bloomington. Saccharomyces Genome Data Base - Stanford. Mouse Genome Informatics - Jackson Labs. The Arabidopsis Information Resource - Stanford WormBase - Caltech & CSHL DictyBase - Chicago SwissProt - Hinxton & Geneva The Institute for Genome Research - MD With support from NIH (NHGRI) &AstraZeneca.

The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.

What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations. …a specification of a conceptualization is a written, formal description of a set of concepts and relationships in a domain of interest. Peter Karp (2000) Bioinformatics 16:269

The Gene Ontology Consortium subscribes to the Manifesto of Liberation Bioinformatics : Open source Open standards Open annotation Open data merci tim hubbard - liberationise extraordinaire de ‘inxton

Introduction to GO GO: A Gene Ontology GO Objectives: Provide a controlled vocabulary for the description of the molecular function and cellular location of gene products, as well as the role of the gene products in basic biological processes Use these terms as attributes of gene products in the collaborating databases Allow queries across databases using GO terms, providing the linking of biological information across species

GO = Three Ontologies Biological Process = goal or objective within cell Molecular Function = elemental activity or task Cellular Component = location or complex

Parent-Child Relationships Hierarchy One-to-many parental relationship Directed acyclic graph - dag Many-to-many parental relationship Each child has only one parent Each child may have one or more parents

Classes of parent-child relationship: ISA (hyponomy) - as in: an elephant is a mammal. PARTOF (meronomy) - as in: a trunk is part of an elephant.

cellular_component %membrane %vacuolar membrane %nuclear membrane %intracellular %cell <cytoplasm <vacuole <vacuolar membrane <vacuolar lumen <nucleus <nuclear membrane cellular_component vacuolar membrane intracellular vacuole vacuolar lumen cytoplasmnucleus nuclear membrane cell instance of (%), part of (<). Structure of the Ontologies

molecular function 5232 terms biological process 6416 terms cellular component 1111 terms all 12,759 terms definitions7735 (61%) September Content of GO

Thank yous Genome annotation: Colleagues in the European and Berkeley Drosophila Genome Projects. FlyBase: Colleagues in Harvard, Berkeley, Bloomington & Cambridge. Gene Ontology: Colleagues in Berkeley, Jackson Labs, Stanford and EBI.