RightField: Semantic Enrichment of Systems Biology Data using Spreadsheets Katy Wolstencroft myGrid, SysMO-DB University of Manchester.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

The use of Ontology in Organising and Managing Protein Family Resources Katy Wolstencroft, University Of Manchester.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Knowledge Graph: Connecting Big Data Semantics
Microsoft Excel 2003 Illustrated Complete Excel Files and Incorporating Web Information Sharing.
RightField The Semantic Annotation of Experimental Data using Spreadsheets, The Semantic Annotation of Experimental Data using Spreadsheets, Katy Wolstencroft,
SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK.
SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Stuart Owen, University of Manchester.
Building and Analyzing Social Networks Web Data and Semantics in Social Network Applications Dr. Bhavani Thuraisingham February 15, 2013.
Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU.
5 EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
RDF: Building Block for the Semantic Web Jim Ellenberger UCCS CS5260 Spring 2011.
Mapping Physical Formats to Logical Models to Extract Data and Metadata Tara Talbott IPAW ‘06.
Tutorial 11: Connecting to External Data
RightField Rich Annotation of Experimental Biology through Stealth Using Spreadsheets Katy Wolstencroft, Stuart Owen, Matthew Horridge, Olga Krebs, Wolfgang.
Vocabulary Services “Huuh - what is it good for…” (in WDTS anyway…) 4 th September 2009 Jonathan Yu CSIRO Land and Water.
Ontologies: Making Computers Smarter to Deal with Data Kei Cheung, PhD Yale Center for Medical Informatics CBB752, February 9, 2015, Yale University.
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
Linking Disparate Datasets of the Earth Sciences with the SemantEco Annotator Session: Managing Ecological Data for Effective Use and Reuse Patrice Seyed.
Cytoscape A powerful bioinformatic tool Mathieu Michaud
Review of Ondex Bernice Rogowitz G2P Visualization and Visual Analytics Team March 18, 2010.
1Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall. Exploring Microsoft Office Access 2010 by Robert Grauer, Keith Mast, and Mary Anne.
SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch,
Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.
Using Vocabulary Services in Validation of Water Data May 2010 Simon Cox, JRC Jonathan Yu & David Ratcliffe, CSIRO.
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
BioUML Fedor Kolpakov Institute of Systems Biology (spin-off of DevelopmentOnTheEdge.com) Laboratory of Bioinformatics, Design Technological Institute.
SysMo-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch,
1 MIAME The MIAME website: © 2002 Norman Morrison for Manchester Bioinformatics.
Spreadsheets to OWL with Populous 8/12/2011 Mikel Egaña Aranguren 3205 School of Computer Science Universidad Politécnica de Madrid (UPM) Boadilla.
Teranode Tools and Platform for Pathway Analysis Michael Kellen, Solution Manager June 16, 2006.
Development Process and Testing Tools for Content Standards OASIS Symposium: The Meaning of Interoperability May 9, 2006 Simon Frechette, NIST.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
EADGENE and SABRE Post-Analyses Workshop 12-14th November 2008, Lelystad, Netherlands 1 François Moreews SIGENAE, INRA, Rennes Cytoscape.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
SysMO-DB: Sharing and Exchanging Data and Models in Systems Biology Katy Wolstencroft University of Manchester.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
Semantic Technologies and Application to Climate Data M. Benno Blumenthal IRI/Columbia University CDW /04-01.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Representing Flow Cytometry Experiments within FuGE Josef Spidlen 1, Peter Wilkinson 2, and Ryan Brinkman 1 1 BC Cancer Research Centre, Vancouver, BC,
Tool for Ontology Paraphrasing, Querying and Visualization on the Semantic Web Project By Senthil Kumar K III MCA (SS)‏
PHS / Department of General Practice Royal College of Surgeons in Ireland Coláiste Ríoga na Máinleá in Éirinn Knowledge representation in TRANSFoRm AMIA.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
BBN Technologies Copyright 2009 Slide 1 The S*QL Plugin for Cytoscape Visual Analytics on the Web of Linked Data Rusty (Robert J.) Bobrow Jeff Berliner,
SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK.
Semantic Publishing Benchmark Task Force Fourth TUC Meeting, Amsterdam, 03 April 2014.
Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari
Mining the Biomedical Research Literature Ken Baclawski.
MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06.
Web Technologies for Bioinformatics Ken Baclawski.
Oracle Spatial Network Data Model Overview Oracle Life Sciences User Group Meeting Susie Stephens Life Sciences Product Manager Oracle Corporation.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Linking Models & Data within the ISA structure Stuart Owen (based upon notes by Olga Krebs).
KAnOE: Research Centre for Knowledge Analytics and Ontological Engineering Managing Semantic Data NACLIN-2014, 10 Dec 2014 Dr. Kavi Mahesh Dean of Research,
Workshop: Linking Models and Data in SysMO Katy Wolstencroft, SysMO-DB University of Manchester, UK.
Improving User Access to Metadata for Public and Restricted Use US Federal Statistical Files William C. Block Jeremy Williams Lars Vilhuber Carl Lagoze.
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
Manufacturing Systems Integration Division Development Process and Testing Tools for Content Standards Simon Frechette National Institute of Standards.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
High throughput biology data management and data intensive computing drivers George Michaels.
Describing and Annotating Experimental Data: Hands On.
EBI is an Outstation of the European Molecular Biology Laboratory. Semantic Interoperability Framework Sarala M. Wimalaratne (RICORDO project)
Infrastructure and Workflow for the Formal Evaluation of Semantic Search Technologies Stuart N. Wrigley 1, Raúl García-Castro 2 and Cassia Trojahn 3 1.
Semantic Database Builder
LOD reference architecture
Presentation transcript:

RightField: Semantic Enrichment of Systems Biology Data using Spreadsheets Katy Wolstencroft myGrid, SysMO-DB University of Manchester

Outline What RightField does Origins - SysMO-DB project and data sharing in Systems Biology How RightField works Evaluation – how successfully it works Extensions and future directions

RightField A tool for embedding ranges of ontology terms into spreadsheets to allow the users of those spreadsheets to semantically annotate their data from simple drop-down lists A tool for automatically extracting semantically annotated metadata from spreadsheets and producing RDF

RightField Annotation benefits Makes annotation quicker and more efficient Standardises annotation Hides the ontology complexity from the users RDF production benefits Querying over heterogeneous data files Semantic searching and reasoning Standard format for interoperability Hides semantic web tools from end users Spreadsheets and web browsers

SEEK: Systems Biology Data Sharing The SEEK Systems Biology of MicroOrganisms Pan-European > 100 research groups > 320 scientists Distributed, interdisciplinary projects Expected to pool data and results and disseminate Microbiologists, molecular biologists, biochemists, mathematicians....not many informaticians SysMO Consortium A platform for Systems Biology data and models sharing Web based environment for sharing within a consortium and disseminating to the community (an eLaboratory) Standards Compliant Fitting in with laboratory practices

~ 1900 assets People – 350 Investigations - 35 Studies - 87 Assays Data sets Models - 60 SOPs Publications -165

SSFH CISBIC Consortia using SEEK JenAge SyBaCol Rosage Yeast Glycolysis Forsys

Types of data Multiple omics genomics, transcriptomics proteomics, metabolomics fluxomics, reactomics Images Molecular biology Reaction Kinetics Models Metabolic, gene network, kinetic Relationships between data sets/experiments Procedures, experiments, data, results and models Analysis of data

Minimum Information Model What is the least amount of information required to: Find Interpret Understand Reuse Different for different data sets CIMRCIMR Core Information for Metabolomics Reporting MIABEMIABE Minimal Information About a Bioactive Entity MIACAMIACA Minimal Information About a Cellular Assay MIAMEMIAME Minimum Information About a Microarray Experiment MIAME/EnvMIAME/Env MIAME / Environmental transcriptomic experiment MIAME/NutrMIAME/Nutr MIAME / Nutrigenomics MIAME/PlantMIAME/Plant MIAME / Plant transcriptomics MIAME/ToxMIAME/Tox MIAME / Toxicogenomics MIAPAMIAPA Minimum Information About a Phylogenetic Analysis MIAPARMIAPAR Minimum Information About a Protein Affinity Reagent MIAPEMIAPE Minimum Information About a Proteomics Experiment MIAREMIARE Minimum Information About a RNAi Experiment MIASEMIASE Minimum Information About a Simulation Experiment

Not quite available “off the shelf” Loose guidelines or checklists Specific formats (generally in XML) Specific formats with associated ontologies Remaining questions for the scientists: How do we generate standards compliant data? Which vocabularies/ontologies should I use? How do I know which ontology terms to use where?

Data MIBBI ModelOntologies Microarray MIAME:Minimum Information about a Microarray Experiment MGED Proteomics MIAPE: Minimum Information about a Proteomics Experiment PSI-MI, PSI-MS, PSI-MOD Interaction experiments MIMIX:Minimum Information about a Molecular Interaction Experiment PSI-MI Protein-Protein Interaction Systems Biology Models MIRIAM:Minimal Information Required In the Annotation of biochemical Models SBO: Systems Biology Ontology Systems Biology Model Simulation MIASE:Minimum Information About a Simulation Experiment KISAO:Kinetic Simulation Algorithm Ontology

SOP Data Templates and Vocabularies Construction Validation SOP Metabolomics Mass Spec Transcriptomics Proteomics Fluxomics Investigations Studies Assays

Fitting in with Laboratory practices Scientists can continue to do what they have always done Scientists remain in control Embedding semantics into the tools already in use Excel, excel, excel.....

Ontology terms for marked- up cells in drop-down boxes The End Result

RightField Architecture Java Platform Independent OWL API Loading ontologies and reasoning Apache POI HSSF libraries Loading and saving of Excel Spreadsheets

Availability Open source

Excel Workbook Ontology “Portion” of ontology terms Terms Embedded into Excel Workbook RightField Client How RightField Works – part 1 Marked-up workbook Saved in plain Excel Informaticians/ontologists End Users

Loading Ontologies from BioPortal Published ontologies Multiple versions You can also load local ontologies from file or URL

JERM = “Just Enough Results Model” What type of data is it Microarray, growth curve, enzyme activity… What was measured Gene expression, OD, metabolite concentration…. What do the values in the datasets mean Units, time series, repeats….

Excel workbook loaded into RightField with multiple worksheets

Class hierarchies of loaded ontologies. Multiple ontologies shown in separate tabs

Selected parent term from the ontology Methods for specifying ontology terms Term lists for selected cells Value Type and Property

Excel workbook with marked-up cells

Marking-up Columns or Rows

Ontology Languages RDFS - RDF Schema OBO - Open Biomedical Ontologies OWL - Web Ontology Language

Provenance and Identifiers Term Label The human readable term label Term IRI The (unique) term identifier Ontology IRI Ontology Version The ontology that defines the term The version of the ontology Physical Location The (web) location of the ontology

Ontology Information Ontologies encapsulated Scientists can work offline Ensures same versions of ontologies used for a series of experiments No special macros or plugins required, just Excel or Open Office Versions and URIs captured in hidden worksheets Provenance Comparisons between sheets Linking back to the vocabularies

Store / Reuse RDF Graph Populate Extract Metadata Extraction and Querying Generates RDF triples for each marked up cell Simple RDF, or conforming to ontology models Storage and querying solutions Virtuoso triple store Linked data compliance Already HTML and XML interface and REST API

Ontology Annotations and Properties

RightField Annotation Evaluation Does RightField improve the quantity and consistency of data annotation? Improvements in annotation consistency Assay type Technology type Experimental conditions Factors studied Organism and strains

RightField Annotation Evaluation JERM Metadata Element Scores Dataset IDRightField TemplatePre-RightField Template

Metadata Extraction and Querying No current ‘standard’ RDF format for MIBBI models (although it is in progress) RDF vs ‘traditional’ relational approaches RDF more flexible in dealing with optional and changing metadata elements RDF allows aggregation between different types of experimental data E.g. biological samples, experimental conditions

RightField LifeCycle

Future Work Visualising nodes with large numbers of terms Ontology label ambiguities Linked Data output for SysMO SEEK and related resources

Other Work Using RightField KupKB – Kidney and Urinary Pathway knowledge base ( Knowledge bases for inflammatory bowel disease and Chagas disease BioBanking sample annotation Annotation of historical samples ‘Patient records' for Egyptian mummies, Manchester Museum

RightField Extension: Populous Generic tool for populating ontology templates Supports validation at the point of data entry Expressive Pattern language for OWL Ontology generation Helps biologists with ontology design patterns Simon Jupp, Robert Stevens, University of Manchester

Summary RightField-enabled spreadsheets show a marked increase in the consistency of annotation when compared with free text annotation or other template approaches. Success from embedding and hiding semantics and complexity

Acknowledgements Stuart OwenKaty WolstencroftCarole Goble Wolfgang MuellerOlga Krebs Matthew Horridge