1 Using Ontologies for Annotation of Genomic Data Barry Smith University at Buffalo
Outline 1.Who am I? 2.How to find your data 3.How to do biology across the genome 4.How to extend the GO methodology to clinical and translational medicine 5.Anatomy Ontologies: An OBO Foundry success story 6.The Infectious Disease Ontology 7.Towards a controlled vocabulary for community-based medicine 8.The Community Ontology and its branches 9.The Environment Ontology: A new type of patient data
1.Who am I? 2.How to find your data 3.How to do biology across the genome 4.How to extend the GO methodology to clinical and translational medicine 5.Anatomy Ontologies: An OBO Foundry success story 6.The Infectious Disease Ontology 7.Towards a controlled vocabulary for community-based medicine 8.The Community Ontology and its branches 9.The Environment Ontology: A new type of patient data
Who am I? NCBO: National Center for Biomedical Ontology (NIH Roadmap Center) 4 −Stanford Medical Informatics −University of San Francisco Medical Center −Berkeley Drosophila Genome Project −Cambridge University Department of Genetics −The Mayo Clinic −University at Buffalo (PI of Dissemination and Ontology Best Practices)
Who am I? Duke/Dallas CTSA Ontology Consortium Cleveland Clinic Semantic Database in Cardiothoracic Surgery Gene Ontology Scientific Advisory Board Biomedical Informatics Research Network (BIRN) Ontology Task Force Advancing Clinico-Genomic Trials on Cancer (ACGT) 5
1.Who am I? 2.How to find your data 3.How to do biology across the genome 4.How to extend the GO methodology to clinical and translational medicine 5.Anatomy Ontologies: An OBO Foundry success story 6.The Infectious Disease Ontology 7.Towards a controlled vocabulary for community-based medicine 8.The Community Ontology and its branches 9.The Environment Ontology: A new type of patient data
Multiple kinds of data in multiple kinds of silos Lab / pathology data Electronic Health Record data Clinical trial data Patient histories Medical imaging Microarray data Protein chip data Flow cytometry Mass spec Genotype / SNP data 7
How to find your data? How to find and integrate other people’s data? How to reason with data when you find it? How to understand the significance of the data you collected 3 years earlier? Part of the solution must involve consensus- based, standardized terminologies and coding schemes 8
Making data (re-)usable through standards Standards provide –common structure and terminology –single data source for review (less redundant data) Standards allow –use of common tools and techniques –common training –single validation of data 9
10 Problems with standards Standards involve considerable costs of re- tooling, maintenance, training,... Not all standards are of equal quality Bad standards create lasting problems
11 NIH Mandates for Sharing of Research Data Investigators submitting an NIH application seeking $500,000 or more in any single year are expected to include a plan for data sharing (
12 Program Announcement Number: PAR Title: Data Ontologies for Biomedical Research (R01) NIH Blueprint for Neuroscience Research, ( National Cancer Institute (NCI), ( National Center for Research Resources (NCRR), ( National Eye Institute (NEI), ( National Heart Lung and Blood Institute (NHLBI), ( ) National Human Genome Research Institute (NHGRI), ( National Institute on Alcohol Abuse and Alcoholism (NIAAA), ( National Institute of Biomedical Imaging and Bioengineering (NIBIB), ( National Institute of Child Health and Human Development (NICHD), ( National Institute on Drug Abuse (NIDA), ( National Institute of Environmental Health Sciences (NIEHS), ( National Institute of General Medical Sciences (NIGMS), ( National Institute of Mental Health (NIMH), ( National Institute of Neurological Disorders and Stroke (NINDS), ( National Institute of Nursing Research (NINR), (
13 Purpose. Optimal use of informatics tools and data resources depends upon explicit understandings of concepts related to the data upon which they compute. This is typically accomplished by a tool or resource adopting a formal controlled vocabulary and ontology.
14 Currently, there is no convenient way to map the knowledge that is contained in one data set to that in another data set, primarily because of differences in language and structure... in some areas there are emerging standards. Examples include: the Unified Medical Language System (UMLS), the Gene Ontology, the caBIG project, Open Biomedical Ontologies (OBO)
15 NIH anticipates that, once important data sets in a topical area have been unified, others in that area will adopt the emerging standard. The nucleation points should be able to interact with each other, e.g. through the use of the tools made freely available by the National Center for Biomedical Ontology (NCBO) ( or by caBIG.
16 Another determinate of ontology acceptance is the degree to which the ontology conforms to best practices governing ontology design and construction.... the applicant should specify the criteria with which the ontology will conform Criteria have been developed by the Vocabulary and Common Data Element Work Group of caBIG and by the OBO Foundry (
1.Who am I? 2.How to find your data 3.How to do biology across the genome 4.How to extend the GO methodology to clinical and translational medicine 5.Anatomy Ontologies: An OBO Foundry success story 6.The Infectious Disease Ontology 7.Towards a controlled vocabulary for community-based medicine 8.The Community Ontology and its branches 9.The Environment Ontology: A new type of patient data
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFES IPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVIS VMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVY TLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLER CHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKY GYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERL KRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRAC ALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVC KLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDD NNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGI SLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLK TLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPW MDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEY ATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGS RFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSG TTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDV How to do biology across the genome?
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDR KRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTL SLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYM FLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRA CALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCAC TARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTR RIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDP NQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGS RFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCS FSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEI YMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPV RNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQS QFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMF NLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVV WIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGG LCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIE RMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTAST NVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATT TESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTS ATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTN SNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSEN MNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEAL AVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTR GKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKG GVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSM LIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDG RFDILLCRDSSREVGE 19
20 what cellular component? what molecular function? what biological process? through annotation of data
21 what cellular component? what molecular function? what biological process? and through curation of literature
22 what cellular component? what molecular function? what biological process? three types of data
Clark et al., 2005 part_of is_a 23
24
The Gene Ontology 25
WormBase Gramene FlyBase Rat Genome Database DictyBase Mouse Genome Database The Arabidopsis Information Resource The Zebrafish Information Network Berkeley Drosophila Genome Project Saccharomyces Genome Database Gene Ontology Consortium
Benefits of GO 1.rooted in basic experimental biology 2.links people to data and to literature 3.links data to data across species (human, mouse, yeast, fly...) across granularities (molecule, cell, organ, organism, population) 4.links medicine to biological science 5.cumulation of scientific knowledge in algorithmically tractable form 27
A strategy for translational medicine Sjöblöm T, et al. analyzed 13,023 genes in 11 breast and 11 colorectal cancers using functional information captured by GO identified 189 genes as being mutated at significant frequency and thus as providing targets for diagnostic and therapeutic intervention. Science Oct 13;314(5797):
29
1.Who am I? 2.How to find your data 3.How to do biology across the genome 4.How to extend the GO methodology to clinical and translational medicine: Open Biomedical Ontologies 5.Anatomy Ontologies: An OBO Foundry success story 6.The Infectious Disease Ontology 7.Towards a controlled vocabulary for community-based medicine 8.The Community Ontology and its branches 9.The Environment Ontology: A new type of patient data
31 OntologyScopeURLCustodians Cell Ontology (CL) cell types from prokaryotes to mammals obo.sourceforge.net/cgi- bin/detail.cgi?cell Jonathan Bard, Michael Ashburner, Oliver Hofman Chemical Entities of Bio- logical Interest (ChEBI) molecular entitiesebi.ac.uk/chebi Paula Dematos, Rafael Alcantara Common Anatomy Refer- ence Ontology (CARO) anatomical structures in human and model organisms (under development) Melissa Haendel, Terry Hayamizu, Cornelius Rosse, David Sutherland, Foundational Model of Anatomy (FMA) structure of the human body fma.biostr.washington. edu JLV Mejino Jr., Cornelius Rosse Functional Genomics Investigation Ontology (FuGO) design, protocol, data instrumentation, and analysis fugo.sf.netFuGO Working Group Gene Ontology (GO) cellular components, molecular functions, biological processes Ontology Consortium Phenotypic Quality Ontology (PaTO) qualities of anatomical structures obo.sourceforge.net/cgi -bin/ detail.cgi? attribute_and_value Michael Ashburner, Suzanna Lewis, Georgios Gkoutos Protein Ontology (PrO) protein types and modifications (under development)Protein Ontology Consortium Relation Ontology (RO) relationsobo.sf.net/relationshipBarry Smith, Chris Mungall RNA Ontology (RnaO) three-dimensional RNA structures (under development)RNA Ontology Consortium Sequence Ontology (SO) properties and features of nucleic sequences song.sf.netKaren Eilbeck
32 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO)
Clark et al., 2005 part_of is_a 33
Goal of the OBO Foundry all biomedical research data should cumulate to form a single, algorithmically processable, whole Smith, et al. Nature Biotechnology, Nov
35 CRITERIA The ontology is open and available to be used by all. The ontology is instantiated in, a common formal language and shares a common formal architecture The developers of the ontology agree in advance to collaborate with developers of other OBO Foundry ontology where domains overlap. OBO FOUNDRY CRITERIA
36 CRITERIA The developers of each ontology commit to its maintenance in light of scientific advance, and to soliciting community feedback for its improvement. They commit to working with other Foundry members to ensure that, for any particular domain, there is community convergence on a single controlled vocabulary.
37 Mature OBO Foundry ontologies Cell Ontology (CL) Chemical Entities of Biological Interest (ChEBI) Foundational Model of Anatomy (FMA) Gene Ontology (GO) Phenotypic Quality Ontology (PaTO) Relation Ontology (RO) Sequence Ontology (SO)
38 Foundry ontologies being built ab initio Common Anatomy Reference Ontology (CARO) Ontology for Biomedical Investigations (OBI) Protein Ontology (PRO) RNA Ontology (RnaO) Subcellular Anatomy Ontology (SAO)
39 Ontologies in planning phase Environment Ontology (EnvO) Infectious Disease Ontology (IDO) Biobank/Biorepository Ontology Food Ontology Allergy Ontology Vaccine Ontology
1.Who am I? 2.How to find your data 3.How to do biology across the genome 4.How to extend the GO methodology to clinical and translational medicine 5.An OBO Foundry success story 6.The Infectious Disease Ontology 7.Towards a controlled vocabulary for community-based medicine 8.The Community Ontology and its branches 9.The Environment Ontology: A new type of patient data
Anatomy Ontologies Fish Multi-Species Anatomy Ontology (NSF funding received) Ixodidae and Argasidae (Tick) Anatomy Ontology Mosquito Anatomy Ontology (MAO) Spider Anatomy Ontology (SPD) Xenopus Anatomy Ontology (XAO) undergoing reform: Drosophila and Zebrafish Anatomy Ontologies 41
Ontologies facilitate grouping of annotations brain 20 hindbrain 15 rhombomere 10 Query brain without ontology 20 Query brain with ontology 45 42
Multiple axes of classification Functional: cardiovascular system, nervous system Spatial: head, trunk, limb Developmental: endoderm, germ ring, lens placode Structural: tissue, organ, cell Stage: developmental staging series 43
CARO – Common Anatomy Reference Ontology for the first time provides guidelines for model organism researchers who wish to achieve comparability of annotations Haendel et al., “CARO: The Common Anatomy Reference Ontology”, in: Burger (ed.), Anatomy Ontologies for Bioinformatics: Springer, in press. 44
45
1.Who am I? 2.How to find your data 3.How to do biology across the genome 4.How to extend the GO methodology to clinical and translational medicine 5.Anatomy Ontologies: An OBO Foundry success story 6.IDO: The Infectious Disease Ontology 7.Towards a controlled vocabulary for community-based medicine 8.The Community Ontology and its branches 9.The Environment Ontology: A new type of patient data
We have data TBDB: Tuberculosis Database, including Microarray data VFDB: Virulence Factor DB TropNetEurop Dengue Case Data ISD: Influenza Sequence Database at LANL PathPort: Pathogen Portal Project... 47
We need to annotate this data to allow retrieval and integration of –sequence and protein data for pathogens –case report data for patients –clinical trial data for drugs, vaccines –epidemiological data for surveillance, prevention –... Goal: to make data deriving from different sources comparable and computable 48
IDO needs to work with Disease Ontology (DO) + SNOMED CT Gene Ontology Immunology Branch Phenotypic Quality Ontology (PATO) Protein Ontology (PRO) Sequence Ontology (SO)... 49
We need common controlled vocabularies to describe these data in ways that will assure comparability and cumulation What content is needed to adequately cover the infectious domain? –Host-related terms (e.g. carrier, susceptibility) –Pathogen-related terms (e.g. virulence) –Vector-related terms (e.g. reservoir, –Terms for the biology of disease pathogenesis (e.g. evasion of host defense) –Population-level terms (e.g. epidemic, endemic, pandemic, ) 50
IDO Processes 51
IDO Qualities 52
IDO Roles 53
IDO provides a common template IDO works like CARO. It contains terms (like ‘pathogen’, ‘vector’, ‘host’) which apply to organisms of all species involved in infectious disease and its transmission Disease- and organism-specific ontologies built as refinements of the IDO core 54
Malaria Vectors of 422 species of Anopheles worldwide, about 40 are significant vectors for malaria in humans IDO Malaria ontology will contain those terms which apply to all types of malarial plasmodium infection 55
Disease-specific IDO test projects MITRE, Mount Sinai, UTSouthwestern – Influenza –Stuart Sealfon, Joanne Luciano, IMBB/VectorBase – Vector borne diseases (A. gambiae, A. aegypti, I. scapularis, C. pipiens, P. humanus) –Kristos Louis Colorado State University – Dengue Fever –Saul Lozano-Fuentes Duke – Tuberculosis –Carol Dukes-Hamilton Cleveland Clinic – Infective Endocarditis –Sivaram Arabandi University of Michigan – Brucilosis –Yongqun He 56
1.Who am I? 2.How to find your data 3.How to do biology across the genome 4.How to extend the GO methodology to clinical and translational medicine 5.Anatomy Ontologies: An OBO Foundry success story 6.The Infectious Disease Ontology 7.Towards a controlled vocabulary for community-based medicine 8.The Community Ontology and its branches 9.The Environment Ontology: A new type of patient data
58 All OBO Foundry ontologies work in the same way –we have data (biosample, haplotype, clinical data, survey data,...) –we need to make this data available for semantic search and algorithmic processing –we create a consensus-based ontology for annotating the data
59
60
61
62
to enhance alignment of data about instances (communities, places,...) 63
to enhance alignment of data about relevant types of entities (origin, community, cell type, race, family...) 64
65
to enhance coordination of research 66
1.Who am I? 2.How to find your data 3.How to do biology across the genome 4.How to extend the GO methodology to clinical and translational medicine 5.Anatomy Ontologies: An OBO Foundry success story 6.The Infectious Disease Ontology 7.Towards a controlled vocabulary for community-based medicine 8.The Community Ontology and its branches 9.The Environment Ontology: A new type of patient data
Community / Population Ontology 68 − family, clan − ethnicity − religion − diet − social networking − education (literacy...) − healthcare (economics...) − household forms − demography − public health −...
69 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO)
70 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Family, Community, Deme, Population Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO)
71 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT COMPLEX OF ORGANISMS Family, Community, Deme, Population Organ Function (FMP, CPRO) Population Phenotype Population Process ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO)
1.Who am I? 2.How to find your data 3.How to do biology across the genome 4.How to extend the GO methodology to clinical and translational medicine 5.Anatomy Ontologies: An OBO Foundry success story 6.The Infectious Disease Ontology 7.Towards a controlled vocabulary for community-based medicine 8.The Community Ontology and its branches 9.The Environment Ontology: A new type of patient data
73 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT COMPLEX OF ORGANISMS Family, Community, Deme, Population Organ Function (FMP, CPRO) Population Phenotype Population Process ORGAN AND ORGANISM Organism (NCBI Taxonomy) (FMA, CARO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cell Com- ponent (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) E N V I R O N M E N T
74 RELATION TO TIME GRANULARITY CONTINUANT INDEPENDENT COMPLEX OF ORGANISMS Family, Community, Deme, Population Environment of population ORGAN AND ORGANISM Organism (NCBI Taxonomy) (FMA, CARO) Environment of single organism CELL AND CELLULAR COMPONENT Cell (CL) Cell Com- ponent (FMA, GO) Environment of cell MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular environment E N V I R O N M E N T
75 RELATION TO TIME GRANULARITY CONTINUANT INDEPENDENT COMPLEX OF ORGANISMS Family, Community, Deme, Population Environment of population ORGAN AND ORGANISM Organism (NCBI Taxonomy) (FMA, CARO) Environment of single organism* CELL AND CELLULAR COMPONENT Cell (CL) Cell Com- ponent (FMA, GO) Environment of cell MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular environment E N V I R O N M E N T * The sum total of the conditions and elements that make up the surroundings and influence the development and actions of an individual.
76 RELATION TO TIME GRANULARITY CONTINUANT INDEPENDENT COMPLEX OF ORGANISMS biome / biotope, territory, habitat, neighborhood,... work environment, home environment; host/symbiont environment;... ORGAN AND ORGANISM CELL AND CELLULAR COMPONENT extracellular matrix; chemokine gradient;... MOLECULE hydrophobic surface; virus localized to cellular substructure; active site on protein; pharmacophore... E N V I R O N M E N T
clinical data includes clinical records clinical trial data demographic data National Hospital Discharge Survey National Ambulatory Medical Care Surveys MEDPAR Medicare’s national claims data base 77
The Environment Ontology 78 OBO Foundry Genomic Standards Consortium National Environment Research Council (UK) USDA, Gramene, J. Craig Venter Institute...
79 Applications of EnvO in biology
How EnvO currently works for information retrieval Retrieve all experiments on organisms obtained from: –deep-sea thermal vents –arctic ice cores –rainforest canopy –alpine melt zone Retrieve all data on organisms sampled from: –hot and dry environments –cold and wet environments –a height above 5,000 meters Retrieve all the omic data from soil organisms subject to: –moderate heavy metal contamination 80
extending EnvO to clinical and translational research we have public heath, community and population data we need to make this data available for search and algorithmic processing we create a consensus-based ontology which can interoperate with ontologies for neighboring domains of medicine and basic biology 81
Environment = totality of circumstances external to a living organism or group of organisms –pH –evapotranspiration –turbidity –available light –predominant vegetation –predatory pressure –nutrient limitation … 82
extend EnvO to the clinical domain –dietary patterns (Food Ontology: FAO, USDA)... allergies –neighborhood patterns built environment, living conditions climate social networking crime, transport education, religion, work health, hygiene –disease patterns bio-environment (bacteriological,...) patterns of disease transmission (links to IDO) 83
a new type of patient data a patient’s environmental history use EnvO and the community ontology to mine relations between disease phenotypes and environmental patterns and patterns of community behavior 84
with thanks to CARO: Fabian Neuhaus (NIST), Melissa Haendel (ZFin), David Sutherland (Flybase) EnvO: Dawn Field, Norman Morrison, (NERC) IDO: Lindsay Cowell (Duke) OBO Foundry: Michael Ashburner, Suzanna Lewis, Chris Mungall (Flybase, GO), Alan Ruttenberg (MIT, Neurocommons) NCBO : NIH RFA-RM PRO: NIH R01 GM ACGT: European Commission IST