Habitat-Lite & EnvO Jin Mao Postdoc, School of Information, University of Arizona Nov. 20, 2015.

Slides:



Advertisements
Similar presentations
OMV Ontology Metadata Vocabulary April 10, 2008 Peter Haase.
Advertisements

Central Bureau of Statistics Environment Protection Expenditure in Israel Dr. Moshe Yanai Yaniv Sharabi Agriculture and Environments Sector.
Representing the Immune Epitope Database in OWL Jason A. Greenbaum 1, Randi Vita 1, Laura Zarebski 1, Hussein Emami 2, Alessandro Sette 1, Alan Ruttenberg.
Ke Liu1, Junqiu Wu2, Shengwen Peng1,Chengxiang Zhai3, Shanfeng Zhu1
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Standardizing Metadata Associated with NIAID Genome Sequencing Center Projects Richard H. Scheuermann, Ph.D. Department of Pathology Division of Biomedical.
DESIGNING THE MICROBIAL RESEARCH COMMONS: AN INTERNATIONAL SYMPOSIUM NATIONAL ACADEMY OF SCIENCES, WASHINGTON, DC, 8-9 OCTOBER 2009 Paul Gilna, B.Sc.,
SAB 2008 LITERATURE CURATION Overview & Integrated Phenotype Curation.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
ODM2: Developing a Community Information Model and Supporting Software to Extend Interoperability of Sensor and Sample Based Earth Observations Jeffery.
Who am I Gianluca Correndo PhD student (end of PhD) Work in the group of medical informatics (Paolo Terenziani) PhD thesis on contextualization techniques.
1 Enriching UK PubMed Central SPIDER launch meeting, Wolfson College, Oxford Paul Davey, UK PubMed Central Engagement Manager.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
What is an ontology and Why should you care? Barry Smith with thanks to Jane Lomax, Gene Ontology Consortium 1.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
DI FC UL1 Gene Function Prediction by Mining Biomedical Literature Pooja Jain Master in Bioinformatics Supervisor - Mário Jorge Costa Gaspar.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
Genetic Research Using Bioinformatics: LESSON 6:
Administration Of A Website Information Architecture November 17, 2010.
Richard H. Scheuermann, Ph.D. Department of Pathology Division of Biomedical Informatics U.T. Southwestern Medical Center Standardizing Metadata Associated.
Information retrieval thur jan data…. framework for today’s lecture…
Development Principles PHIN advances the use of standard vocabularies by working with Standards Development Organizations to ensure that public health.
1 LOMGen: A Learning Object Metadata Generator Applied to Computer Science Terminology A. Singh, H. Boley, V.C. Bhavsar National Research Council and University.
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Title: GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes By Peter F. Hallin, Hans-Henrik Stærfeldt, Eva Rotenberg, Tim T. Binnewies,
Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations.
Linking Diseases and Genes through Informatics Knowledge Bases and Ontologies Joyce A. Mitchell, Ph.D. National Library of Medicine University of Missouri.
1 Enhancing Organism Based Disease Knowledge Using Biological Taxonomy, and Environmental Ontologies Ken Baclawski Northeastern University Neil Sarkar.
Information retrieval wed sept data…. -start at 6.45.
Survey of Medical Informatics CS 493 – Fall 2004 September 27, 2004.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
Gene Ontology TM (GO) Consortium Jennifer I Clark EMBL Outstation - European Bioinformatics Institute (EBI), Hinxton, Cambridge CB10 1SD, UK Objectives:
A School of Information Science, Federal University of Minas Gerais, Brazil b Medical University of Graz, Austria, c University Medical Center Freiburg,
Integrating the Cell Cycle Ontology with the Mouse Genome Database David R. Smith Mary Dolan Dr. Judith Blake.
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
ADVANCED DB SYSTEMS BIOMEDICAL ENGINEERING. Index INTRODUCTION  BIOMEDICAL ENGINEERING  B.E. DATASETS APPLICATIONS  DATA MINING ON FDA DATABASE  ONTOLOGY-BASED.
DAVID R. SMITH DR. MARY DOLAN DR. JUDITH BLAKE Integrating the Cell Cycle Ontology with the Mouse Genome Database.
Copyright OpenHelix. No use or reproduction without express written consent1.
Infinite is Enhancement of the functions of the Japan Atomic Energy Agency Library’s Fukushima Nuclear Accident Archive using a novel data flagging system.
The Plant Ontology: Development of a Reference Ontology for all Plants Plant Ontology Consortium Members and Curators*: Laurel D.
Analyzing Time Course Data: How can we pick the disappearing needle across multiple haystacks? IEEE-HPEC Bioinformatics Challenge Day Dr. C. Nicole Rosenzweig.
Protein Domain Database
Mining the Biomedical Research Literature Ken Baclawski.
Need for common standard upper ontology
1 Class exercise II: Use Case Implementation Deborah McGuinness and Peter Fox CSCI Week 8, October 20, 2008.
Approach to building ontologies A high-level view Chris Wroe.
1 An Introduction to Ontology for Scientists Barry Smith University at Buffalo
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Statistical Data and Metadata Exchange SDMX Metadata Common Vocabulary Status of project and issues ( ) Marco Pellegrino Eurostat
Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the Sapelo Island Microbial Observatory Wade M. Sheldon Mary Ann Moran James T. Hollibaugh.
Ontology Driven Data Collection for EuPathDB Jie Zheng, Omar Harb, Chris Stoeckert Center for Bioinformatics, University of Pennsylvania.
Big Data that might benefit from ontology technology, but why this usually fails Barry Smith National Center for Ontological Research 1.
Designing and Using an Audio-Visual Description Core Ontology Friday 8 th of October, 2004 Antoine Isaac & Raphaël Troncy.
 First thing that the reader will see and this will often determine whether they will read on  Capture their attention, so the title needs to succinctly.
Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some.
Data Management: Data Analysis Types of Data Analysis at USGS There are several ways to classify Data Analysis activities at USGS, and here are some of.
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
Metagenomic Species Diversity.
facilitating the Net-enabled Ecosystem
Concept Grounding to Multiple Knowledge Bases via Indirect Supervision
Development of the Amphibian Anatomical Ontology
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Doron Goldfarb & Yann LE FRANC
Modified from slides from Jim Hu and Suzi Aleksander Spring 2016
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
OBI – Standard Semantic
Presentation transcript:

Habitat-Lite & EnvO Jin Mao Postdoc, School of Information, University of Arizona Nov. 20, 2015

Outline Habitat-LiteEVNOCases

Habitat-Lite The association of organisms to their environments is a key issue in exploring biodiversity patterns(Pafilis et al., 2015). To facilitate the capture of metadata describing the growing number of genomic and metagenomic projects, including information about isolation source and habitat (Field et al., 2008a; Morrison et al., 2006). Motivations

Habitat-Lite  Habitat: the place or environment where an organism naturally or normally lives and grows.  Sample source (Isolated from): the environmental context in which a sample is collected (Morrison et al., 2006). Definition

Habitat-Lite  The literature is scattered and the metadata is difficult to find, even by expert manual extraction.  Related fields in databases: sparse, free text.  Lacking standardization in vocabulary and definitions Challenge

Habitat-Lite  Short-term:  high-level habitat descriptions  develop a lightweight controlled vocabulary (Habitat-Lite) within the EvnO framework to capture high-level habitat and environmental metadata.  Long-term  to develop a repeatable process for other types of metadata by identifying key terms based on usage in databases and the open literature. Goal

Habitat-Lite  Do a survey for terms used in a number of relevant sources.  Selected a set of high-level terms as a strawman for the first iteration of the Habitat-Lite term list.  Discuss with annotators at NCBI. Construction Method

Habitat-Lite Construction Method Seed Terms ExperimentsExperiments “bin” existing entries Useable for human and semiautomated annotation “minimal set” of habitat terms that provided good coverage of entries in key resources NCBI Microbial genomes 16S sequences patterns and biases in the complete genome collection

Habitat-Lite Term List

Habitat-Lite Term List

Environment Ontology (ENVO)  Biological: data from environmental samples  Biomedical: physical environment of organisms Environment-aware analyses Background

Environment Ontology (ENVO)  Need for consistent description of the environmental origins of tissue, pathogen, and metagenomics samples  Need for the labeling of samples and artifacts in museum collections Needs

Environment Ontology (ENVO)  ENVO should be comprised of classes (terms) referring to key environment-types that may be used to facilitate the retrieval and integration of a broad range of biological data.  Interoperability with the numerous biological and biomedical ontologies compliant with Open Biomedical and Biological Ontologies (OBO) Foundry principles.  A standardized and semantically controlled representation as GO  Both for specialists and for non-experts Goals

Environment Ontology (ENVO)   24/envo.owl    OBO: OBO-Edit ontology development tool  OWL  CSV Download

Cases  The ability to “bin” data into interesting categories for purposes of comparison  To test the coverage, utility, and usability  A small experiment was carried out in late 2006 for the Ribosomal Database Project (RDP; Cole et al.,2007). Bin data

Cases  Manually classify into habitats the 168,911 rRNA sequences marked as environmental in RDP release 9.44 (November 2006).  Splitting host-associated into separate categories for plant and animal (including human) associated.  isolation_source  the reference titles  Not existed Bin data

Cases  The biggest category was animal associated, and a large fraction of these were human associated. Bin data

Cases  The metadata about habitat or isolation source occurs in many diverse forms, including PDF tables, densely written materials and methods sections, supplementary material, and even in referenced work.  Free text metadata already available  The “isolation_source” field from GenBank gene records GenBank Case

Cases  To identify probable classes based on the presence of specific key words in each entry.  Habitat-Lite terms + synonyms for “waste water” the terms used for matching were “waste water,” “waste-water,” “wastewater,” “sewage,” “sewerage,” etc.  Specializations For “food,” the terms used for matching included specific kinds of foods, for example, “milk,” “cheese,” “beer,” etc.  This pattern-matching approaches GenBank Case

Cases GenBank Case Of the almost 35,000 distinct entries in the isolation_source field, some 22,000 (63%) contained specific words or phrases that could be mapped to the 17 Habitat-Lite categories.

Cases  Habitat field plus Isolation field  E xact matches for 84% of GOLD Habitat terms with an additional term “aquatic.”  The three most frequent terms (“host,” “aquatic,” and “soil”) covered 75% of GOLD habitat data.  Six Habitat-Lite terms were not seen at all in this smaller data set (“air,” “freshwater,” “extreme,” “microbial mat,” “fossil,” “terrestrial”). GOLD

Cases GOLD Comparison of automated mapping and expert mapping The need for annotation guidelines, to handle situations where a term might be placed in several categories.

References Hirschman, L., Clark, C., Cohen, K. B., Mardis, S., Luciano, J., Kottmann, R.,... & Field, D. (2008). Habitat-Lite: a GSC case study based on free text terms for environmental metadata. OMICS A Journal of Integrative Biology, 12(2), Buttigieg, P. L., Morrison, N., Smith, B., Mungall, C. J., Lewis, S. E., & ENVO Consortium. (2013). The environment ontology: contextualising biological and biomedical entities. J. Biomedical Semantics, 4, 43. Pafilis, E., Frankild, S. P., Schnetzer, J., Fanini, L., Faulwetter, S., Pavloudi, C.,... & Jensen, L. J. (2015). ENVIRONMENTS and EOL: identification of Environment Ontology terms in text and the annotation of the Encyclopedia of Life. Bioinformatics, 31(11),

Thank you!