Presentation is loading. Please wait.

Presentation is loading. Please wait.

Habitat-Lite & EnvO Jin Mao Postdoc, School of Information, University of Arizona Nov. 20, 2015.

Similar presentations


Presentation on theme: "Habitat-Lite & EnvO Jin Mao Postdoc, School of Information, University of Arizona Nov. 20, 2015."— Presentation transcript:

1 Habitat-Lite & EnvO Jin Mao Postdoc, School of Information, University of Arizona Nov. 20, 2015

2 Outline Habitat-LiteEVNOCases

3 Habitat-Lite The association of organisms to their environments is a key issue in exploring biodiversity patterns(Pafilis et al., 2015). To facilitate the capture of metadata describing the growing number of genomic and metagenomic projects, including information about isolation source and habitat (Field et al., 2008a; Morrison et al., 2006). Motivations

4 Habitat-Lite  Habitat: the place or environment where an organism naturally or normally lives and grows.  Sample source (Isolated from): the environmental context in which a sample is collected (Morrison et al., 2006). Definition

5 Habitat-Lite  The literature is scattered and the metadata is difficult to find, even by expert manual extraction.  Related fields in databases: sparse, free text.  Lacking standardization in vocabulary and definitions Challenge

6 Habitat-Lite  Short-term:  high-level habitat descriptions  develop a lightweight controlled vocabulary (Habitat-Lite) within the EvnO framework to capture high-level habitat and environmental metadata.  Long-term  to develop a repeatable process for other types of metadata by identifying key terms based on usage in databases and the open literature. Goal

7 Habitat-Lite  Do a survey for terms used in a number of relevant sources.  Selected a set of high-level terms as a strawman for the first iteration of the Habitat-Lite term list.  Discuss with annotators at NCBI. Construction Method

8 Habitat-Lite Construction Method Seed Terms ExperimentsExperiments “bin” existing entries Useable for human and semiautomated annotation “minimal set” of habitat terms that provided good coverage of entries in key resources NCBI Microbial genomes 16S sequences patterns and biases in the complete genome collection

9 Habitat-Lite Term List

10 Habitat-Lite Term List

11 Environment Ontology (ENVO)  Biological: data from environmental samples  Biomedical: physical environment of organisms Environment-aware analyses Background

12 Environment Ontology (ENVO)  Need for consistent description of the environmental origins of tissue, pathogen, and metagenomics samples  Need for the labeling of samples and artifacts in museum collections Needs

13 Environment Ontology (ENVO)  ENVO should be comprised of classes (terms) referring to key environment-types that may be used to facilitate the retrieval and integration of a broad range of biological data.  Interoperability with the numerous biological and biomedical ontologies compliant with Open Biomedical and Biological Ontologies (OBO) Foundry principles.  A standardized and semantically controlled representation as GO  Both for specialists and for non-experts Goals

14 Environment Ontology (ENVO)  http://purl.obolibrary.org/obo/envo/releases/2013-09-  24/envo.owl  http://bioportal.bioontology.org/ontologies/ENVO http://bioportal.bioontology.org/ontologies/ENVO  http://environmentontology.org/ http://environmentontology.org/  OBO: OBO-Edit ontology development tool  OWL  CSV Download

15 Cases  The ability to “bin” data into interesting categories for purposes of comparison  To test the coverage, utility, and usability  A small experiment was carried out in late 2006 for the Ribosomal Database Project (RDP; http://rdp.cme.msu.edu/; Cole et al.,2007). Bin data

16 Cases  Manually classify into habitats the 168,911 rRNA sequences marked as environmental in RDP release 9.44 (November 2006).  Splitting host-associated into separate categories for plant and animal (including human) associated.  isolation_source  the reference titles  Not existed Bin data

17 Cases  The biggest category was animal associated, and a large fraction of these were human associated. Bin data

18 Cases  The metadata about habitat or isolation source occurs in many diverse forms, including PDF tables, densely written materials and methods sections, supplementary material, and even in referenced work.  Free text metadata already available  The “isolation_source” field from GenBank gene records GenBank Case

19 Cases  To identify probable classes based on the presence of specific key words in each entry.  Habitat-Lite terms + synonyms for “waste water” the terms used for matching were “waste water,” “waste-water,” “wastewater,” “sewage,” “sewerage,” etc.  Specializations For “food,” the terms used for matching included specific kinds of foods, for example, “milk,” “cheese,” “beer,” etc.  This pattern-matching approaches GenBank Case

20 Cases GenBank Case Of the almost 35,000 distinct entries in the isolation_source field, some 22,000 (63%) contained specific words or phrases that could be mapped to the 17 Habitat-Lite categories.

21 Cases  Habitat field plus Isolation field  E xact matches for 84% of GOLD Habitat terms with an additional term “aquatic.”  The three most frequent terms (“host,” “aquatic,” and “soil”) covered 75% of GOLD habitat data.  Six Habitat-Lite terms were not seen at all in this smaller data set (“air,” “freshwater,” “extreme,” “microbial mat,” “fossil,” “terrestrial”). GOLD

22 Cases GOLD Comparison of automated mapping and expert mapping The need for annotation guidelines, to handle situations where a term might be placed in several categories.

23 References Hirschman, L., Clark, C., Cohen, K. B., Mardis, S., Luciano, J., Kottmann, R.,... & Field, D. (2008). Habitat-Lite: a GSC case study based on free text terms for environmental metadata. OMICS A Journal of Integrative Biology, 12(2), 129-136. Buttigieg, P. L., Morrison, N., Smith, B., Mungall, C. J., Lewis, S. E., & ENVO Consortium. (2013). The environment ontology: contextualising biological and biomedical entities. J. Biomedical Semantics, 4, 43. Pafilis, E., Frankild, S. P., Schnetzer, J., Fanini, L., Faulwetter, S., Pavloudi, C.,... & Jensen, L. J. (2015). ENVIRONMENTS and EOL: identification of Environment Ontology terms in text and the annotation of the Encyclopedia of Life. Bioinformatics, 31(11), 1872-1874.

24 Thank you!


Download ppt "Habitat-Lite & EnvO Jin Mao Postdoc, School of Information, University of Arizona Nov. 20, 2015."

Similar presentations


Ads by Google