PIONEER HI-BRED INTERNATIONAL, INC. Plant Ontologies – Industrial Science meets Renaissance Concepts Dave Selinger Computational Biologist Pioneer Hi-Bred,

Slides:



Advertisements
Similar presentations
IPY and Semantics Siri Jodha S. Khalsa Paul Cooper Peter Pulsifer Paul Overduin Eugeny Vyazilov Heather lane.
Advertisements

Introduction to the Plant Ontology™ Laurel Cooper* 1, Justin Elser 1, Maria A. Gandolfo 3, Chris Mungall 4, Pankaj Jaiswal 1, Barry Smith 5, Dennis Wm.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
Gene Ontology John Pinney
Managing Data Resources
Knowledge Acquisitioning. Definition The transfer and transformation of potential problem solving expertise from some knowledge source to a program.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 The Enhanced Entity- Relationship (EER) Model.
File Systems and Databases
A Review of Ontology Mapping, Merging, and Integration Presenter: Yihong Ding.
Internet tools for genomic analysis: part 2
POC tutorial #2: Ontology Development This tutorial will run automatically in Quicktime. To run the tutorial at your own pace use the internal controllers.
1 Welcome to Biol 178 Principles of Biology Course goals Course information Text Grading Syllabus Lab Chapter Organization.
POC tutorial #1: Introduction This tutorial will run automatically in Quicktime. To run the tutorial at your own pace use the internal controllers within.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
BIS310: Week 7 BIS310: Structured Analysis and Design Data Modeling and Database Design.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
The Science of Life Biology unifies much of natural science
Data Mining Chun-Hung Chou
Automatic methods for functional annotation of sequences Petri Törönen.
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
The Plant Ontology: Linking Phenotypes and Genomics Across Plant Taxa Laurel D. Cooper* 1, Ramona L. Walls 2, Justin Elser 1, Justin Preece 1, Dennis W.
Managing the Record of Research At the Smithsonian Using SIdora SAA Research Forum August 12, 2014.
Gene Expression Databases: Where and When Dave Clements EuReGene and Mouse Atlas projects Medical Research Council Human Genetics.
Knowledge representation
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
Of 39 lecture 2: ontology - basics. of 39 ontology a branch of metaphysics relating to the nature and relations of being a particular theory about the.
BioHealth Informatics Group Advanced OWL Tutorial 2005 Ontology Engineering in OWL Alan Rector & Jeremy Rogers BioHealth Informatics Group.
Ontology and Phylogeny: Ontologies as research tools linking phylogenies, systematics, phenotypes, and genomics Brent D. Mishler University of California,
Big Idea 1: The Practice of Science Description A: Scientific inquiry is a multifaceted activity; the processes of science include the formulation of scientifically.
Web Usage Mining for Semantic Web Personalization جینی شیره شعاعی زهرا.
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
Chapter 9 View Design and Integration. © 2001 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Outline Motivation for view design.
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
MINING MULTI-LABEL DATA BY GRIGORIOS TSOUMAKAS, IOANNIS KATAKIS, AND IOANNIS VLAHAVAS Published on July, 7, 2010 Team Members: Kristopher Tadlock, Jimmy.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
DataBase Management System What is DBMS Purpose of DBMS Data Abstraction Data Definition Language Data Manipulation Language Data Models Data Keys Relationships.
Discovering Descriptive Knowledge Lecture 18. Descriptive Knowledge in Science In an earlier lecture, we introduced the representation and use of taxonomies.
The Plant Ontology Consortium Lincoln Stein 1, Susan McCouch 2, Elizabeth Kellogg 3, Seung Rhee 4, Pankaj Jaiswal 2, Doreen Ware 1, Peter Stevens 5 1 Cold.
Copyright OpenHelix. No use or reproduction without express written consent1.
The Plant Ontology: Development of a Reference Ontology for all Plants Plant Ontology Consortium Members and Curators*: Laurel D.
DATA MANAGEMENT AND CURATION AT TAIR
Generic Tasks by Ihab M. Amer Graduate Student Computer Science Dept. AUC, Cairo, Egypt.
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
Statistical Testing with Genes Saurabh Sinha CS 466.
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
Scope of the Gene Ontology Vocabularies. Compile structured vocabularies describing aspects of molecular biology Describe gene products using vocabulary.
Some Thoughts to Consider 8 How difficult is it to get a group of people, or a group of companies, or a group of nations to agree on a particular ontology?
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design, 2 nd Edition Copyright 2003 © John Wiley & Sons, Inc. All rights reserved.
Plant structure and growth stage ontologies to describe phenotypes and gene expression in angiosperms Pankaj Jaiswal Cornell University.
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
High throughput biology data management and data intensive computing drivers George Michaels.
Data Models. 2 The Importance of Data Models Data models –Relatively simple representations, usually graphical, of complex real-world data structures.
Event Linking With Meaning: Ontological Hypertext and the Semantic Web Hugh Davis Learning Societies Lab ECS The University of Southampton, UK All Notes.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Semantic Web. P2 Introduction Information management facilities not keeping pace with the capacity of our information storage. –Information Overload –haphazardly.
Managing Data Resources File Organization and databases for business information systems.
Knowledge Representation Part I Ontology Jan Pettersen Nytun Knowledge Representation Part I, JPN, UiA1.
COP Introduction to Database Structures
The Enhanced Entity- Relationship (EER) Model
What is cognitive psychology?
COMP6215 Semantic Web Technologies
ece 627 intelligent web: ontology and beyond
Mental Functioning and the Gene Ontology
MANAGING DATA RESOURCES
File Systems and Databases
Introduction.
Information Networks: State of the Art
Presentation transcript:

PIONEER HI-BRED INTERNATIONAL, INC. Plant Ontologies – Industrial Science meets Renaissance Concepts Dave Selinger Computational Biologist Pioneer Hi-Bred, DuPont Agriculture and Nutrition

RESEARCH Outline n What is the nature of the problem that a Plant Anatomy Ontology can solve? n What is an Ontology? n How do you make a Plant Anatomy Ontology? n Does it really solve the problem?

RESEARCH Industrial Science n Not science in industry, but the industrialization of data creation, i.e. the ‘omics revolutions. n High-throughput data u Sequencing u Expression n Medium-throughput data u Proteomics u Metabolomics n Low-throughput data u Gene/protein function u Phenotype

RESEARCH The double-edged sword of Industrial Science n Industrial science means lots of cheap data u Sequencing << $0.01/base l $10,000 prokaryotic genomes are reality l $10,000 eukaryotic genomes will be reality in the next five years u Expression <$0.50/gene u And much of this data is available for free after it is produced! n Lots of data means that you can’t sit down with your lab notebook and analyze the data by hand. u Databases, software for searching and comparing u Whole new areas of research devoted to finding meaningful patterns in lots of data.

RESEARCH Organizing information n Information is not knowledge. u But knowledge can be acquired from information. u But only with a lot of effort, see third law of thermodynamics n Central challenge with Industrial science is organizing the information. u The organization of the information determines what you can discover. u Experimental design l Good design will produce a contrast that will support or refute a hypothesis. l Statistical rigor – –Is the signal higher than the noise? –How conclusive will the discoveries be?

RESEARCH Context n How do we compare across experiments? u Not too hard if one person did all the experiments and kept careful notes. u If multiple people, then we need to define what was done, what the analysis was, and what the sample was. l What was done – e.g. MIAME standard for describing the technical details of an expression experiment. l Analysis – e.g. ANOVA, SAM, etc. l Sample – ?

RESEARCH Renaissance concepts (historically Enlightenment) n Things can be systematically described and classified u Organisms - Linneaus, Species Plantarum, 1758 n Linneaus’ problem is much the same as the sample description problem u Variable specificity l California Laurel or Oregon Myrtlewood? l Kernel or seed? n In addition, a term like kernel assumes all parts, but this assumption could be wrong

RESEARCH Ontologies to the rescue? n Ontology = the study of being (Philosophy) u The specification of a conceptualization of a domain of interest (Computer Science) u Original and continuing computer science interest was Artificial Intelligence. l How can a computer make inferences? l Need to define meanings – can for example. l Structure and relationships in an ontology allow a computer to make inferences. –Mary is the mother of Bill. Is Mary a parent of Bill? –IsA Mother Parent n Parts of an ontology u Concepts -> objects, real and abstract, processes, functions u Partitions -> rules that can classify concepts u Attributes -> properties of a concept, can have individual and class attributes u Relationships -> is a, part of

RESEARCH Does an ontology make sense? n The value of ontologies is a current debate among information scientists. u One group advocates that ontologies are necessary for computers to understand content. l Semantic web -> an extension of the current HTML/XML based web to something with ontological inference u Others argue that ontologies are not needed and are not practical l Complexity is ok and just use a Google like search to connect concepts. u However, some problems, like organismal classification and the periodic table are very amenable to an ontological approach. l Formal categories and stable entities l Expert users and catalogers

RESEARCH Forms of ontologies n Ontologies can take several forms (data structures) u Controlled vocabulary (List) l Terms but no relationships l Enforces systematic naming u Hierarchy (tree structure) => Taxonomy l Terms and “is a” relationship l Children are unique and have a single parent u Directed acyclic graph => Gene Ontology l Multiple relationship types l Children with multiple parents

RESEARCH Features of Trees n Because each child node has only one parent u There is an unambiguous path to the root from each leaf u Child nodes can be easily grouped at any level of the structure n Trees can express only one organizing principle n Work well for taxonomy (at least eukaryotic taxonomy) u Organizing principle is classification by similarity u All terms have an “is a” relationship to the next level term u Organisms were classified before evolution was hypothesized, but the classification matches the evolutionary relationships l Similar example would be the periodic table of the elements l Classification can facilitate discovery of underlying principles

RESEARCH A tree based Anatomy Ontology n Developed by Winston Hide’s group at SANBI and Electric Genetics n Single concept, orthogonal trees u Cells u Tissues u Organs u Disease state n Each tree is independent, but has related dimensions describing a sample n Set operations, intersection or union, between trees allows specific queries.

RESEARCH Features of DAGs n A tree is a special case of the DAG class n Children can have multiple parents. u Allows multiple classifications of the same child l E.g. a guard cell is both part of a leaf and is an epidermal cell. l Allows for more than a binary classification of a concept u If this results from poor definition of the concept, then it is not good. n Multiple parentage fits a “normalized” data model u Like a normalized relational database, a DAG can minimize duplication of objects (concepts).

RESEARCH Sample DAG n Root u Cooking l Spices –Bay leaf Laurel nobilis Umbellularia californica (California laurel) u Trees l Lauraceae –Laurel Laurel nobilis –Umbellularia Umbellularia californica

RESEARCH Constructing the Pioneer Plant Ontology n Decided to produce a DAG u Used DAGeditor (editor developed for GO) u Developed our own web based viewing tool l AmiGO was too complicated to re-use. Other public browsers did not have the functionality we wanted. n Decided to focus on Corn and Soybeans u Used Kiesselbach’s 1949 Monograph on Corn structure and reproduction as the primary source. u Used Iowa State University Ag Extension publications for the development stages of corn and soybeans u Added information from a botany textbook to cover missing terms from soybean.

RESEARCH To collaborate or not to collaborate? n Advantage of just using the Pioneer Ontology was that it served our needs and was focused on corn and soybeans, our major crops. n Disadvantage was that it was not synchronized to the public u We would not be able to easily integrate public tissue classifications to ours u We would not be able to easily take advantage of improvements to the public ontology u Presumably the public ontology would be more “botanically correct” than ours.

RESEARCH Plant Ontology Consortium n Focused on model organisms u Arabidopsis u Rice and other grasses with the rice terms (corn). n Used a DAG approach u Multiple concepts l Structure (cells, tissues, sporophyte and gametophyte) l Development u Used DAGeditor and other GO approaches l Most terms have multiple parents l Same software and data structures as GO

RESEARCH Plant Ontology n Domain = Plant anatomy and development n Concepts l Plant parts (leaf, root, flower, meristem, etc.) l Life cycle stages (sporophyte, gametophyte) l Developmental stages (V1, flowering, R1, etc.) n Relationships between concepts l “A kind of” (Is a) –A prop root is a root l “A part of” (part of) –A root cap is part of a root l In addition, for plant anatomy a “develops from” relation is needed –For example the relationship between stomatal guard cells and the guard mother cell –Guard cells develop from guard mother cells

RESEARCH Adapting the POC ontology for Pioneer’s needs n Problem is that it has many more terms than required for our experiments u Some terms describe tissues or cells that are not practical to collect (e.g. antipodal cells) u Some terms describe parts not found in corn (e.g. nectary) n Another problem is that we collect samples that are convenient subdivisions of structures u Tip and base of an immature ear. Each differs from a whole immature ear in terms of what it contains. u Basal endosperm – morphologically distinct from starchy endosperm, but not found in the ontology

RESEARCH Our current solution n Add additional terms to the POC ontology u Use a different id system l easily distinguished from POC terms l will not be overwritten by on-going public curation efforts. n Label experiments with the terms from the ontology. n Create a Custom ontology u Query the whole ontology with the terms used in the labeling and keep only l terms that are used to label an experimental sample l Parent terms of used terms. u Can be readily rebuilt if new experiments or terms are added.

RESEARCH What can you do with the ontology? n Provides a grouping mechanism u Summarize expression for a tissue u Compare expression between tissues u Make complex queries that involve multiple tissues n Provides a systematic label for annotating genes u Where is the gene expressed? u Query annotation of genes based on terms n Provides a description of the complexity of tissue samples u Leaf sample is composed of multiple cell types with different roles u Cell types can be shared between tissues or structures

RESEARCH Comparing by tissue n The ontology provides the groupings, but how to summarize u Mean? u Median? u Maximum value? n Significance of differences? u Each group will be much more variable than a set of samples from a controlled experiment. u But you may be able to eliminate the inevitable false discoveries that appear when looking at large numbers of genes.

RESEARCH Annotating genes n This is the primary use for TAIR and Gramene u Potentially label most genes with tissues of expression u However, need to differentiate presence with preferential expression. l A gene may be present in many tissues, but highly expressed in a few l Another gene may be present in the same tissues, but similarly expressed in all of them. –Might need to precompute and indicate which tissues the gene is significantly preferentially expressed in. –Might be able to use the RMS differences between expression in each tissue as a measure of consistency.

RESEARCH Complexity n Genes may appear to differ between tissues for trivial reasons u Example: Gene appears to be preferentially expressed in stem versus leaf tissue. l If gene is really specific to vascular tissue and stem has more… l Gene is expressed late in development, adjacent leaves and stems may differ in development. u Ontology can guide further experiments l Compare vascular and non-vascular tissue from both leaf and stem. l Compare multiple leaf and stem samples from different positions (developmental stages).

RESEARCH Conclusions n The Plant Ontology classifies experiments and genes based on anatomical and developmental concepts. n Now that we have significant data, can we, like Darwin, discern the underlying mechanisms for how anatomical and developmental differences occur. n The Plant Ontology will be successful and used long term if it facilitates these kinds of investigations.

RESEARCH Acknowledgements n Pioneer u Henry Mirsky u Lane Arthur u Bob Merrill n POC u Doreen Ware (Gramene) u Katica Ilic (TAIR)