Download presentation
Presentation is loading. Please wait.
Published byTabitha Hunt Modified over 9 years ago
1
Habitat-Lite & EnvO Jin Mao Postdoc, School of Information, University of Arizona Nov. 20, 2015
2
Outline Habitat-LiteEVNOCases
3
Habitat-Lite The association of organisms to their environments is a key issue in exploring biodiversity patterns(Pafilis et al., 2015). To facilitate the capture of metadata describing the growing number of genomic and metagenomic projects, including information about isolation source and habitat (Field et al., 2008a; Morrison et al., 2006). Motivations
4
Habitat-Lite Habitat: the place or environment where an organism naturally or normally lives and grows. Sample source (Isolated from): the environmental context in which a sample is collected (Morrison et al., 2006). Definition
5
Habitat-Lite The literature is scattered and the metadata is difficult to find, even by expert manual extraction. Related fields in databases: sparse, free text. Lacking standardization in vocabulary and definitions Challenge
6
Habitat-Lite Short-term: high-level habitat descriptions develop a lightweight controlled vocabulary (Habitat-Lite) within the EvnO framework to capture high-level habitat and environmental metadata. Long-term to develop a repeatable process for other types of metadata by identifying key terms based on usage in databases and the open literature. Goal
7
Habitat-Lite Do a survey for terms used in a number of relevant sources. Selected a set of high-level terms as a strawman for the first iteration of the Habitat-Lite term list. Discuss with annotators at NCBI. Construction Method
8
Habitat-Lite Construction Method Seed Terms ExperimentsExperiments “bin” existing entries Useable for human and semiautomated annotation “minimal set” of habitat terms that provided good coverage of entries in key resources NCBI Microbial genomes 16S sequences patterns and biases in the complete genome collection
9
Habitat-Lite Term List
10
Habitat-Lite Term List
11
Environment Ontology (ENVO) Biological: data from environmental samples Biomedical: physical environment of organisms Environment-aware analyses Background
12
Environment Ontology (ENVO) Need for consistent description of the environmental origins of tissue, pathogen, and metagenomics samples Need for the labeling of samples and artifacts in museum collections Needs
13
Environment Ontology (ENVO) ENVO should be comprised of classes (terms) referring to key environment-types that may be used to facilitate the retrieval and integration of a broad range of biological data. Interoperability with the numerous biological and biomedical ontologies compliant with Open Biomedical and Biological Ontologies (OBO) Foundry principles. A standardized and semantically controlled representation as GO Both for specialists and for non-experts Goals
14
Environment Ontology (ENVO) http://purl.obolibrary.org/obo/envo/releases/2013-09- 24/envo.owl http://bioportal.bioontology.org/ontologies/ENVO http://bioportal.bioontology.org/ontologies/ENVO http://environmentontology.org/ http://environmentontology.org/ OBO: OBO-Edit ontology development tool OWL CSV Download
15
Cases The ability to “bin” data into interesting categories for purposes of comparison To test the coverage, utility, and usability A small experiment was carried out in late 2006 for the Ribosomal Database Project (RDP; http://rdp.cme.msu.edu/; Cole et al.,2007). Bin data
16
Cases Manually classify into habitats the 168,911 rRNA sequences marked as environmental in RDP release 9.44 (November 2006). Splitting host-associated into separate categories for plant and animal (including human) associated. isolation_source the reference titles Not existed Bin data
17
Cases The biggest category was animal associated, and a large fraction of these were human associated. Bin data
18
Cases The metadata about habitat or isolation source occurs in many diverse forms, including PDF tables, densely written materials and methods sections, supplementary material, and even in referenced work. Free text metadata already available The “isolation_source” field from GenBank gene records GenBank Case
19
Cases To identify probable classes based on the presence of specific key words in each entry. Habitat-Lite terms + synonyms for “waste water” the terms used for matching were “waste water,” “waste-water,” “wastewater,” “sewage,” “sewerage,” etc. Specializations For “food,” the terms used for matching included specific kinds of foods, for example, “milk,” “cheese,” “beer,” etc. This pattern-matching approaches GenBank Case
20
Cases GenBank Case Of the almost 35,000 distinct entries in the isolation_source field, some 22,000 (63%) contained specific words or phrases that could be mapped to the 17 Habitat-Lite categories.
21
Cases Habitat field plus Isolation field E xact matches for 84% of GOLD Habitat terms with an additional term “aquatic.” The three most frequent terms (“host,” “aquatic,” and “soil”) covered 75% of GOLD habitat data. Six Habitat-Lite terms were not seen at all in this smaller data set (“air,” “freshwater,” “extreme,” “microbial mat,” “fossil,” “terrestrial”). GOLD
22
Cases GOLD Comparison of automated mapping and expert mapping The need for annotation guidelines, to handle situations where a term might be placed in several categories.
23
References Hirschman, L., Clark, C., Cohen, K. B., Mardis, S., Luciano, J., Kottmann, R.,... & Field, D. (2008). Habitat-Lite: a GSC case study based on free text terms for environmental metadata. OMICS A Journal of Integrative Biology, 12(2), 129-136. Buttigieg, P. L., Morrison, N., Smith, B., Mungall, C. J., Lewis, S. E., & ENVO Consortium. (2013). The environment ontology: contextualising biological and biomedical entities. J. Biomedical Semantics, 4, 43. Pafilis, E., Frankild, S. P., Schnetzer, J., Fanini, L., Faulwetter, S., Pavloudi, C.,... & Jensen, L. J. (2015). ENVIRONMENTS and EOL: identification of Environment Ontology terms in text and the annotation of the Encyclopedia of Life. Bioinformatics, 31(11), 1872-1874.
24
Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.