Exploiting semantic technologies to build an application ontology

Slides:



Advertisements
Similar presentations
The ArrayExpress Gene Expression Database: a Software Engineering and Implementation Perspective Ugis Sarkans European Bioinformatics Institute.
Advertisements

Welcome to mini-symposium on ontologies for biological sample description EMBL-EBI Wellcome Trust Genome Campus Deceber 5, 2001.
EleMAP: An Online Tool for Harmonizing Data Elements using Standardized Metadata Registries and Biomedical Vocabularies Jyotishman Pathak, PhD 1 Janey.
Semantic Web Tools for Authoring and Using Analysis Results Richard Fikes Robert McCool Deborah McGuinness Sheila McIlraith Jessica Jenkins Knowledge Systems.
Automating Generation of Textual Class Definitions from OWL to English Robert Stevens, James Malone, Sandra Williams, Richard Power.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
1 ArrayExpress and MAGE Jamboree II Ugis Sarkans, EBI.
EBI is an Outstation of the European Molecular Biology Laboratory. MAGE-TAB - The ArrayExpress Production Experience Helen Parkinson, PhD.
Enriching the Ontology for Biomedical Investigations (OBI) to Improve Its Suitability for Web Service Annotations Chaitanya Guttula, Alok Dhamanaskar,
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations.
The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.
School of Computing FACULTY OF ENGINEERING Developing a methodology for building small scale domain ontologies: HISO case study Ilaria Corda PhD student.
Resource Curation and Automated Resource Discovery.
Ontologically Modeling Sample Variables in Gene Expression Data James Malone EBI, Cambridge, UK.
Rapid Development of an Ontology of Coriell Cell Lines Chao Pang, Tomasz Adamusiak, Helen Parkinson and James Malone
VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to.
EBI is an Outstation of the European Molecular Biology Laboratory. Anatomy ontology ArrayExpress Helen Parkinson,
Gene Expression Data Annotation – an application of the cell type ontology Helen Parkinson, PhD 19 May 2010.
Open Terminology Portal (TOP) Frank Hartel, Ph.D. Associate Director, Enterprise Vocabulary Services National Cancer Institute, Center for Biomedical Informatics.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
LREC 2008 Marrakech1 Clustering Related Terms with Definitions Scott Piao, John McNaught and Sophia Ananiadou
MIAMExpress and the development of annotation ontologies for gene expression experiments Ele Holloway Microarray Informatics European Bioinformatics Institute.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
A plant-specific annotation and submission tool for the incorporation of Arabidopsis gene expression data into ArrayExpress, the EBI’s public DNA microarray.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Alvis Brazma, Johan Rung, Ugis Sarkans, Thomas Schlitt, Jaak Vilo European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge,
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
Statistical Testing with Genes Saurabh Sinha CS 466.
- EVS Overview - Biomedical Terminology and Ontology Resources Frank Hartel, Ph.D. Director, Enterprise Vocabulary Services NCI Center for Bioinformatics.
Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari
16/11/ Semantic Web Services Language Requirements Presenter: Emilia Cimpian
1 Aligning the Parasite Experiment Ontology and the Ontology for Biomedical Investigations Using AgreementMaker Valerie Cross, Cosmin Stroe Xueheng Hu,
Proposed Research Problem Solving Environment for T. cruzi Intuitive querying of multiple sets of heterogeneous databases Formulate scientific workflows.
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
Approach to building ontologies A high-level view Chris Wroe.
ISA Project update CaNano November 12 th,2012 Philippe Rocca-Serra.
Describing Bioinformatic Metadata at EBI James Malone
NeOn Components for Ontology Sharing and Reuse Mathieu d’Aquin (and the NeOn Consortium) KMi, the Open Univeristy, UK
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Anatomy Ontologies & Potential Users: Bridging the Gap Ravensara Travillian European Bioinformatics Institute
Ontology Driven Data Collection for EuPathDB Jie Zheng, Omar Harb, Chris Stoeckert Center for Bioinformatics, University of Pennsylvania.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
ArrayExpress Ugis Sarkans EMBL - EBI
OncoTrack Bioinformatics Workshop Max Planck Institute for Molecular Genetics, Berlin Wednesday 6 th November 2013 TimeSubject 13:30-15:00 Introduction.
EBI is an Outstation of the European Molecular Biology Laboratory. Semantic Interoperability Framework Sarala M. Wimalaratne (RICORDO project)
Cheminformatics and Metabolism Team The EBI Enzyme Portal.
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
David Amar, Tom Hait, and Ron Shamir
Stanford University, Stanford, CA, USA
Databases, Ontologies and Text mining Session Introduction Part 2
Mental Functioning and the Gene Ontology
Software Configuration Management
Data challenges in the pharmaceutical industry
Using the Drupal Content Management Software (CMS) as a framework for OMICS/Imaging-based collaboration.
Doron Goldfarb & Yann LE FRANC
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Ontology Evolution: A Methodological Overview
An ecosystem of contributions
CCO: concept & current status
2. An overview of SDMX (What is SDMX? Part I)
A Snapshot of the OWL Web
Collaborative RO1 with NCBO
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Service-enabling Biomedical Research Enterprise
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
Presentation transcript:

Exploiting semantic technologies to build an application ontology James Malone PhD, Helen Parkinson PhD, Tomasz Adamusiak Phd, MD, Ele Holloway

Overview Motivation Methodology for creating the ontology Our use cases Annotating HTP experimental data Integrating clinical data Methodology for creating the ontology Semi-automated mapping and manual curation Current ontology usage Future use Exploiting semantic technologies to build an application ontology malone@ebi.ac.uk

Our Use Cases Query support (e.g, query for 'cancer' and get also 'leukemia') Over-representation analysis in groups of samples (analogous to the use of GO terms in over-representation analysis in groups of genes) Data visualisation – e.g., presenting an ontology tree to the user of what is in the database Data integration by ontology terms – e.g., we assume that 'kidney' in independent studies roughly means the same, so we can count how many kidney samples we have in the database Intelligent template generation for different experiment types in submission or data presentation Summary level data Nonsense detection – e.g. telling us that something marked as cancer can not be marked as healthy Exploiting semantic technologies to build an application ontology malone@ebi.ac.uk 3 24.06.2018

Scope of Experimental Factor Ontology (EFO) Modelling all of the experimental factors that are currently present in the ArrayExpress repository Experimental factors are variable aspects of an experiment design which can be used to describe an experiment Scope is primarily determined by data currently held in ArrayExpress clinical conditions level (e.g. disease) developmental stage sample level species level Exploiting semantic technologies to build an application ontology malone@ebi.ac.uk 4

‘Experimental Factors’ Exploiting semantic technologies to build an application ontology malone@ebi.ac.uk 5 24.06.2018 Developing an Experimental Factor Ontology

Annotating High Throughput Data Text mining at data acquisition Ontology driven queries Data mining ATLAS Summarize Ranked gene/ condition queries acquire 246,000 assays Experiment queries > 200 species Public/Private Genes in Expts Re-annotate Gene level queries, 9 species Public Only (whatizit) Exploiting semantic technologies to build an application ontology malone@ebi.ac.uk 6 6 24.06.2018 24.06.2018 6

Integrating Clinical Data Use cases include: Primary aim: expose our gene expression experimental data Secondary aim: harmonizing clinical data for study designs (e.g. GWAS) ArrayExpress EFO Smoking History Tobacco age of first use Clin1 Clin2 Smoking History Alcohol consumption Ex1 Ex2 Exploiting semantic technologies to build an application ontology malone@ebi.ac.uk 7 7

Building the Experimental Factor Ontology Position of EFO in the ‘bigger picture’ Key is orthogonal coverage, reuse of existing resources and shared frameworks Chemical Entities of Biological Interest (ChEBI) Relation Ontology Cell Type Ontology Text mining Various Species Anatomy Ontologies Disease Ontology Anatomy Reference Ontology EFO Exploiting semantic technologies to build an application ontology malone@ebi.ac.uk 8 24.06.2018

Semi-automated mapping text to ontology Following an evaluation from Tim Rayner we selected Double Metaphone algorithm Perform matching of our values in database to ontology class labels and definitions. Also perform mappings from EFO to other ontologies, so that EFO: cancer = NCI: cancer, DO: cancer et al. Sanity checking over mappings before adding to ontology Exploiting semantic technologies to build an application ontology malone@ebi.ac.uk

Mapping using Agent Technology User Bioontologies Search Engines Repositories Querying Mechanism Component 2: ontology mappings Component 3: ontology discovery Component 1: MAS architecture Exploiting semantic technologies to build an application ontology malone@ebi.ac.uk

What does agent technology buy us? Annotation consistency EFO_1001214 is now inconsistent because DO_15654 has new parent Richer mappings (hence annotations) EFO_1000156 can have new mappings because new cancer class found in MIT ontology New potentially relevant ontologies New ontology found relating to molecular + pathways Semantic web compatible (i.e. can be deployed as standards compliant service) Exploiting semantic technologies to build an application ontology malone@ebi.ac.uk

EFO Axes process material material property site information EFO Exploiting semantic technologies to build an application ontology malone@ebi.ac.uk

Process process Exploiting semantic technologies to build an application ontology malone@ebi.ac.uk

Information information Exploiting semantic technologies to build an application ontology malone@ebi.ac.uk

Material material Exploiting semantic technologies to build an application ontology malone@ebi.ac.uk

Material Property material property Exploiting semantic technologies to build an application ontology malone@ebi.ac.uk

Using the ontology: Querying Public repository of gene expression data Multiple sources – direct submissions, external databases >200 species 8400 experiments, 246,000 assays Exploiting semantic technologies to build an application ontology malone@ebi.ac.uk 17 24.06.2018 17

Using the ontology: Atlas Querying Exploiting semantic technologies to build an application ontology malone@ebi.ac.uk 18 24.06.2018 18

Using the ontology: Exposing data via external resources NCBO Bioportal Exploiting semantic technologies to build an application ontology malone@ebi.ac.uk 19 24.06.2018 Developing an Experimental Factor Ontology 19

Using the ontology: ISACreator (www.ebi.ac.uk/bioinvindex) Exploiting semantic technologies to build an application ontology malone@ebi.ac.uk

Using the Ontology: Detecting Nonsense: Enforcing correctness species (human) cell line (Hela) organism part (cervix) cell type (epithelial) disease (cervical adenocarcinoma) Exploiting semantic technologies to build an application ontology malone@ebi.ac.uk 21

Using the Ontology: Detecting Nonsense: Enforcing correctness species (human) organism part (hair follicle) disease (cardiovascular disease) Exploiting semantic technologies to build an application ontology malone@ebi.ac.uk 22

Future Work for EFO Mapping in external ids– Snomed-CT, FMA, ChEBI, Brenda tissue ontology – and on requested API development for serving external ids from ArrayExpress Atlas API Working with external ontologies to produce validated cross products Extensions for clinical data integration Gen2Phen, Engage Extensions for mouse model of human disease queries Addressing ‘temporal dimension’ Addition of units Improving query implementation in ArrayExpress Atlas – GUI changes Addition of synonyms Semantic clustering of experiments Exploiting semantic technologies to build an application ontology malone@ebi.ac.uk

Conclusion Ontology development for text mining, annotation, query Built with our needs in mind, however covers a wide range of experimental variables across a wide range of technologies, extensible, open source Xref’d to existing ontology resources when possible Text mining works, reduces the workload 1.0 is released on April 1st 2009 0.10 version currently available in OLS and NCBO bioportal http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=EFO http://www.ebi.ac.uk/microarray-srv/efo/ Exploiting semantic technologies to build an application ontology malone@ebi.ac.uk

Acknowledgments Ontology creation: James Malone (malone@ebi.ac.uk), Helen Parkinson, Tomasz Adamusiak, Ele Holloway Mapping tools and text mining evaluation: Tim Rayner, Holly Zheng External Specialist Review: Trish Whetzel, Jonathan Bard AE Team: Alvis Brazma ,Anna Farne, Ele Holloway, Margus Lukk, Eleanor Williams, Tony Burdet, Misha Kapushesky EBI Rebholz Group (Whatizit text mining tool) EC (Gen2Phen,FELICS,MUGEN, EMERALD, ENGAGE, SLING), EMBL, NIH Exploiting semantic technologies to build an application ontology malone@ebi.ac.uk 25 24.06.2018 25