Data Integration, Gene Ontology, and the Mouse* Joel Richardson, Ph.D. Mouse Genome Informatics Group The Jackson Laboratory Bar Harbor, Maine 04609 *

Slides:



Advertisements
Similar presentations
Annotation of Gene Function …and how thats useful to you.
Advertisements

Mouse Phenotype Ontology George Gkoutos. Phenotype Annotation Traditional phenotypic descriptions are captures as free text Information retrieval based.
Implementing Dictionary-Based NER Solutions for Mining Biomedical Literature Karen Dowell, Monica McAndrews-Hill, David Hill, Harold Drabkin, Judith Blake.
Mouse Genome Informatics Online Resource Joanne Berghout, PhD Oct 13,
Working in Real Time: Building Ontologies While Annotating the Mouse from Genotype to Phenotype Judith Blake, Ph.D. Mouse Genome Informatics The Jackson.
MouseMine: Mouse Gene Lists (and a whole lot more) Joel Richardson.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information Database Resources.
Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December Mouse Genome Informatics.
Vlad: A Visual Annotation Display Tool Joel Richardson Mouse Genome Informatics The Jackson Laboratory.
Gene Ontology John Pinney
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Alignment of Ontologies for Biological Research Judith A. Blake, Ph.D. Bioinformatics and Computational Biology The Jackson Laboratory.
CACAO - Remote training Gene Function and Gene Ontology Fall 2011
Terry F. Hayamizu Mouse Genome Informatics, The Jackson Laboratory M OUSE A NATOMY O NTOLOGIES AND GXD.
Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side.
Comprehensive Annotation System for Infectious Disease Data Alexander Diehl University at Buffalo/The Jackson Laboratory IDO Workshop /9/2010.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Mouse Genome Informatics November 2008 Paul Szauter MGI User Support.
Ontologies and vocabularies supporting data integration: emphasis on mouse phenotypes and disease model Control C3H/HeJ Homozygous Fasl gld /Fasl gld The.
Identification of network motifs in lung disease Cecily Swinburne Mentor: Carol J. Bult Ph.D. Summer 2007.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations.
Managing Big Scientific Data Capturing, Integrating and Presenting Mouse Data at MGI Cynthia Smith Canberra April Mouse Genome.
Curatorial Procedures at Mouse Genome Informatics with an Emphasis on Expression Data Constance M. Smith The Jackson Laboratory Bar Harbor, ME.
Linking Diseases and Genes through Informatics Knowledge Bases and Ontologies Joyce A. Mitchell, Ph.D. National Library of Medicine University of Missouri.
Gene Ontology Overview and Perspective Lung Development Ontology Workshop.
Data Analysis Summary. Elephant in the room General Comments General understanding that informatics is integral in medical sequencing and other –omics.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
Copyright OpenHelix. No use or reproduction without express written consent1.
Molecular Biology Primer. Starting 19 th century… Cellular biology: Cell as a fundamental building block 1850s+: ``DNA’’ was discovered by Friedrich Miescher.
Part I: Identifying sequences with … Speaker : S. Gaj Date
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
DAVID R. SMITH DR. MARY DOLAN DR. JUDITH BLAKE Integrating the Cell Cycle Ontology with the Mouse Genome Database.
Integrating the Cell Cycle Ontology with the Mouse Genome Database David R. Smith Mary Dolan Dr. Judith Blake.
Genomics for Librarians Stuart M. Brown, Ph.D. Director, Research Computing, NYU School of Medicine.
Manual GO annotation Evidence: Source AnnotationsProteins IEA:Total Manual: Total
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
Linking Animal Models and Human Diseases Supported by NIH P41 HG002659, U54 HG004028, & R01 HG Cambridge University & the University of Oregon.
PGA Workshop August 2003 Rat Genome Database an introduction Simon N. Twigger, Ph.D. Bioinformatics Research Center Medical College of Wisconsin, Milwaukee.
Maps & Markers - Noel Yap Proteins - Pankaj Jaiswal Phenotypes Mutants –Junjian Ni QTLs- Literature - all Curation.
DAVID R. SMITH DR. MARY DOLAN DR. JUDITH BLAKE Integrating the Cell Cycle Ontology with the Mouse Genome Database.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
Copyright OpenHelix. No use or reproduction without express written consent1.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010.
Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology
Bioinformatics and Computational Biology
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
Exploring and Exploiting the Biological Maze Zoé Lacroix Arizona State University.
 9 European Countries  1 Third Country  14 Research Centers of Excellence  5 Universities  4 SMEs  1 Venture Capital.
MGI and Phenotyping Projects Mouse Genome Informatics.
The Future of Genetics Research Lesson 7. Human Genome Project 13 year project to sequence human genome and other species (fruit fly, mice yeast, nematodes,
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Ontologies, Databases, Knowledgebases: How should they interoperate?
Human Genome Project.
Annotating with GO: an overview
Functional Annotation of the Horse Genome
Genomes and Their Evolution
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
A User’s Guide to GO: Structural and Functional Annotation
QTL Annotation in MGI Susan M Bello, Ira Lu, Cynthia L Smith, Janan T Eppig, and the Mouse Genome Informatics Group.
Browsing the GO at MGI Harold Drabkin, Ph.D. Senior Scientific Curator
Presentation transcript:

Data Integration, Gene Ontology, and the Mouse* Joel Richardson, Ph.D. Mouse Genome Informatics Group The Jackson Laboratory Bar Harbor, Maine * Not necessarily in that order.

We have the human sequence: OK, now what? One species is not enough: One species is not enough: model organisms (one strain is not enough) model organisms (one strain is not enough) comparative studies comparative studies The sequence is just the beginning The sequence is just the beginning sequence variants sequence variants gene regulation and interaction networks gene regulation and interaction networks non-coding functional elements non-coding functional elements environmental effects environmental effects Genotype to phenotype Genotype to phenotype

The Mouse the premier animal model for studying human disease the premier animal model for studying human disease > 95% same genes > 95% same genes same diseases, similar reasons (e.g., cancer, hypertension, diabetes, osteoporosis, …) same diseases, similar reasons (e.g., cancer, hypertension, diabetes, osteoporosis, …) 1000s lab strains, diff. characteristics 1000s lab strains, diff. characteristics precise genetic control precise genetic control

The Jackson Laboratory Private nonprofit research institution (est. 1929) Private nonprofit research institution (est. 1929) Studying mouse as a model of human biology and disease Studying mouse as a model of human biology and disease National Cancer Research Center National Cancer Research Center Supplier of laboratory strains to researchers worldwide Supplier of laboratory strains to researchers worldwide Areas: metabolism, development, cancer, immune response Areas: metabolism, development, cancer, immune response

Bar Harbor, ME 04609

Mouse Genome Informatics (MGI) Consortium of NIH-funded projects Consortium of NIH-funded projects Housed at TJL Housed at TJL Integrates and disseminates public data resources covering selected aspects of mouse biology Integrates and disseminates public data resources covering selected aspects of mouse biology First program project funding 1989 First program project funding 1989 > $10M/y total, >60 people > $10M/y total, >60 people Online since Online since 1994.

MGI Concept Map Genes and other loci Expression Data Mapping Data Molecular Fragments DNA and Protein Sequences Strains Phenotypes Anatomy Genotypes Alleles References Accession IDs Variants

Integration in MGI Identifying objects. Resolving or noting discrepancies. Integration is key to knowledge discovery in age of genomics in age of genomics

The Power Of Integration: Queries What transcription factors are expressed in a 2-cell embryo and not in a blastocyst? What transcription factors are expressed in a 2-cell embryo and not in a blastocyst? integration of multiple expression assay data sets and data types. integration of multiple expression assay data sets and data types. standardization of anatomical references and developmental stages standardization of anatomical references and developmental stages What development QTLs contain these TFs? What development QTLs contain these TFs? integration of expression data and mapping data integration of expression data and mapping data genetic map result of integrating lots of mapping data genetic map result of integrating lots of mapping data What strains are distinguished by SNPs in this region? What strains are distinguished by SNPs in this region? And so on… And so on…

The MGI System (from 40,000 feet) MGI RDBMS Web Files Data Downloads Literature Curation SQL Load scripts Editing Interface Servlets CGI Scripts Files Report Scripts

MGI in Context MGI db Scientific Literature Mutagenesis Centers GenBank LocusLink Unigene TIGR DoTS OMIM Ensembl GO Interpro SwissProt ATCC RIKEN Anatomy RPCI RatMap NIA MGC I.M.A.G.E. NCBI RefSeq

Integration relies on Standard Vocabularies Structured vocabularies Structured vocabularies The common semantic frameworks The common semantic frameworks Structured into is-a/part-of hierarchies Structured into is-a/part-of hierarchies Evidence-based annotation Evidence-based annotation Associations of vocabulary terms with objects Associations of vocabulary terms with objects Evidence (codes), citations, etc., decorate the associations Evidence (codes), citations, etc., decorate the associations Structured annotations and queries Structured annotations and queries

Structured Vocabularies in MGI Gene Ontology (GO) Gene Ontology (GO) Functional gene annotations Functional gene annotations Mammalian Phenotype (MP) Mammalian Phenotype (MP) Annotations to genotypes (e.g. knockouts) Annotations to genotypes (e.g. knockouts) Mouse Anatomical Dictionary Mouse Anatomical Dictionary Annotations of expression Annotations of expression Other standardized, non-structured vocabularies Other standardized, non-structured vocabularies Mouse strains Mouse strains cell lines cell lines clone libraries clone libraries tissues tissues lots of smaller ones lots of smaller ones

Challenges Domain very difficult to frame Domain very difficult to frame Huge variability, variety of data, formats, providors, update schedules &semantics, etc… Huge variability, variety of data, formats, providors, update schedules &semantics, etc… Biologists and Computer Scientists think differently. Biologists and Computer Scientists think differently. communication is paramount, but difficult communication is paramount, but difficult Rapid changes, e.g., in last 10 years: Rapid changes, e.g., in last 10 years: genetic crosses -> YAC/BAC mapping -> RH mapping -> genome sequence genetic crosses -> YAC/BAC mapping -> RH mapping -> genome sequence northern blots -> microarrays -> mpss northern blots -> microarrays -> mpss

System Evolution The system is a software ecosystem The system is a software ecosystem Maintenance is the cost of success Maintenance is the cost of success Changes and cost/benefit Changes and cost/benefit If it ain’t broke, don’t fix it If it ain’t broke, don’t fix it Commitments/agenda/priorities Commitments/agenda/priorities

Credits Richard Baldarelli Matt Baya Jon Beal Dale Begley Judy Blake John Boddy Dirck Bradt Carol Bult Nancy Butler Donna Burkart Jeff Campbell Lori Corbani Rebecca Corey Sharon Cousins Diane Dahmen Harold Drabkin Janan Eppig Jackie Finger David Garippa Lucette Glass Carroll Goldsmith Pat Grant Terry Hayamizu David Hill Jim Kadin Ben King Debbie Krupke Moyha Lennon-Pierce Jill Lewis Ira Lu Cathy Lutz Lois Maltais Prita Mani Mike McCrossin Louise McKenzie David Miers Daniel Modrusan Dieter Naf Li Ni Janice Ormsby Sridhar Ramachandran Deborah Reed Joel Richardson Martin Ringwald David Shaw Bob Sinclair Cynthia Smith Connie Smith Paul Szauter Leslie Trombley Pierre Vanden Borre Michael Walker Linda Washburn Josh Winslow Iry Witham Sophia Zhu