Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ontologically Modeling Sample Variables in Gene Expression Data James Malone EBI, Cambridge, UK.

Similar presentations


Presentation on theme: "Ontologically Modeling Sample Variables in Gene Expression Data James Malone EBI, Cambridge, UK."— Presentation transcript:

1 Ontologically Modeling Sample Variables in Gene Expression Data James Malone malone@ebi.ac.uk EBI, Cambridge, UK

2 Overview Application Background Motivation for ontologies – questions we to answer Methodology Ontology and application Future work/things we’d like to do Ontologically Modeling Sample Variables in Gene Expression Data malone@ebi.ac.uk

3 Gene Expression: Archive to Atlas AE/GEO acquire >250,000 Assays >10,000 experiments Re-annotate & summarize ATLAS ArrayExpress Curation Ontologically Modeling Sample Variables in Gene Expression Data malone@ebi.ac.uk

4 4 Gene Expression Sample Variable Annotations AnnotationsArchiveAtlas Species3309 Samples238,00034,650 Annotations on samples860,700101830 Unique sample annotations37,5006600 Assays (Hybridizations)246,00030,000 Annotations on assays569,70067,000 Unique assay annotations25,0004000

5 Use Cases Query support (e.g, query for 'cancer' and get also ‘leukemia') Data visualisation – e.g., presenting an ontology tree to the user of what is in the database Data integration by ontology terms – e.g., we assume that 'kidney' in independent studies roughly means the same, so we can count how many kidney samples we have in the database Intelligent template generation for different experiment types in submission or data presentation Summary level data Nonsense detection – e.g. telling us that something marked as cancer can not be marked as healthy Ontologically Modeling Sample Variables in Gene Expression Data malone@ebi.ac.uk

6 Questions we want to answer Diverse nature of annotations on data Need to support complex queries which contain semantic information E.g. which genes are under-expressed in brain samples in human or mouse If we annotate with do we get this data? cancer adenocarcinoma

7 Primary Question: Where to place our semantics? cancer adenocarcinoma Atlas/AE Ontologically Modeling Sample Variables in Gene Expression Data malone@ebi.ac.uk

8 Decoupling knowledge from data Atlas/AE

9 Methodology: Reference vs Application Ontology Debate in community about difference, here is our thesis A reference ontology describes a knowledge space; an explicitly delineated part of a domain. Cell type Human Anatomy GO Process Biomedicine Ontologically Modeling Sample Variables in Gene Expression Data malone@ebi.ac.uk

10 Methodology: Reference vs Application Ontology An application ontology describes an application or data space; an explicitly delineated part of a domain. Should consume reference ontologies to meet application needs Cell type Human Anatomy GO Process Biomedicine Ontologically Modeling Sample Variables in Gene Expression Data malone@ebi.ac.uk

11 10/13/201511 Ontologically Modeling Sample Variables in Gene Expression Data malone@ebi.ac.uk Building the Experimental Factor Ontology We consume parts of reference ontologies from domain Construct new classes and relations to answer our use cases Aim is reuse of existing resources, shared frameworks and mapping of equivalencies where they exist EFO Disease Ontology Anatomy Reference Ontology Ontology Biomedical Investigations Chemical Entities of Biological Interest (ChEBI) Various Species Anatomy Ontologies Relation Ontology Text mining

12 Identify Upper Level Structure Taken a BFO-lite approach, hiding labels from users for application purposes and sometimes different definition information content entity (IAO) site (BFO) material entity (BFO) processual entity (BFO) specifically dependent continuant (BFO) Specifically dependent continuant: A continuant [snap:Continuant] that inheres in or is borne by other entities. Every instance of A requires some specific instance of B which must always be the same. Material property: A property or characteristic of some other entity. For example, the mouse has the colour white.

13 Adding New Classes @ www.ebi.ac.uk/efo/tools We wish to maximise our interoperability Submitters and other groups use many ontologies Trade-off: open to their data and preferences vs imposing a more ordered view on semantics Our goal: Where orthognality exists we aim to import only that classs. Where it does not, we perform ‘mappings’ in our EFO classes via annotation property references (in similar way to xrefs) E.g. chebi classes, import chebi URI for ‘cancer’, create an EFO class and add multiple mappings

14 Creating Class Mappings For overlapping ontologies, we aim to create a ‘mapping class’ Use semi-automated text mining “double-metaphone” algorithm Perform matching of our values in database to ontology class labels and definitions. Also perform mappings from EFO to other ontologies, so that EFO: cancer = NCI: cancer, DO: cancer et al. Sanity checking over mappings before adding to ontology

15 Keeping Up To Date with External Classes Use of tool to automatically update metadata every release (monthly) Uses BioPortal web services to access latest definition, synonyms Class URI/ID Ontologically Modeling Sample Variables in Gene Expression Data malone@ebi.ac.uk

16 Detecting Change in External Ontologies Bubastis tool for detecting axiomatic changes between two ontologies (in our case 2 versions of same ontology) @todo: detect annotation property changes We also detect missing annotation properties with Watchman tool (not released yet) – mainly used for labels presently

17 Creating Relations and Equivalent Classes cell line (Hela) organism part (cervix) cell type (epithelial) disease (cervical adenocarcinoma) species (human) Ontologically Modeling Sample Variables in Gene Expression Data malone@ebi.ac.uk

18 Structure for queries Ontologically Modeling Sample Variables in Gene Expression Data malone@ebi.ac.uk

19 Gene Expression Atlas Linking data to the ontology Assay Table Sample Table Ontology Term Table Query OWL Model Database formulated query

20 Gene Expression Atlas @ www.ebi.ac.uk/gxa Query for Cell adhesion genes in all ‘organism parts’ ‘View on EFO’ Ontologically Modeling Sample Variables in Gene Expression Data malone@ebi.ac.uk

21 ArrayExpress Archive @ www.ebi.ac.uk/arrayexpress

22 Developing an Ontology from the Application Up malone@ebi.ac.uk Future Work: Linked Data Linking data by dereferenceable URI for human and machine http://www.ebi.ac.uk/gxa/Experiment12345

23 Future Work: RDF Triple Store @ www.ebi.ac.uk/efo/semanticweb/atlas Q: Is an RDF Triple store SPARQL query quicker than a SPARQL translated into SQL? OWL Ontology Atlas Data RDFizer SPARQLSPARQL RDF Triple Store SQL Translation Layer

24 Future Work: Data Integration Consuming reference ontologies and mapping to multiple ontologies where overlap exists offers us maximum interoperability The advantage of triple stores is not immediate yet Impetus required: “should we champion this technology” Rdf triple QUERY Atlas Swiss Prot Amino Acid Ontology

25 Summary We have created a sustainable approach to consuming multiple reference ontologies Tooling solutions to expedite process We consider EFO to be a ‘view’ of such ontologies for our application needs The primary aim of this work is to enable novel research with the experimental data we have Specifically, we can answer new questions, integrate across our data resources, visualise and summarise the data Our belief is describing such data should be the driving force behind ontology development Future work will look at linked data and rdf triple stores

26 Acknowledgements Ontology creation: James Malone, Tomasz Adamusiak, Ele Holloway, Helen Parkinson, Jie Zheng (U Penn) Ontology Mapping tools and text mining evaluation: Tim Rayner, Holly Zheng, Margus Lukk GUI Development Misha Kapushesky, Pasha Kurnosov, Anna Zhukova. Nikolay Kolesinkov External Review and anatomy: Jonathan Bard, Jie Zheng ArrayExpress Production Staff EBI Rebholz Group (Whatizit text mining tool) Many source ontologies for terms and definitions esp. Disease Ontology, Cell Type Ontology, FMA, NCIT, OBI Funders: EC (Gen2Phen,FELICS, MUGEN, EMERALD, ENGAGE, SLING), EMBL, NIH Eric Neumann, Joanne Luciano and Alan Ruttenberg W3C & HCLS Group - Eric Prud'hommeaux and Scott Marshall OBI developers Ontologically Modeling Sample Variables in Gene Expression Data malone@ebi.ac.uk


Download ppt "Ontologically Modeling Sample Variables in Gene Expression Data James Malone EBI, Cambridge, UK."

Similar presentations


Ads by Google