Download presentation
Presentation is loading. Please wait.
Published byLesley Blair Modified over 9 years ago
1
RightField: Semantic Enrichment of Systems Biology Data using Spreadsheets Katy Wolstencroft myGrid, SysMO-DB University of Manchester
2
Outline What RightField does Origins - SysMO-DB project and data sharing in Systems Biology How RightField works Evaluation – how successfully it works Extensions and future directions
3
RightField A tool for embedding ranges of ontology terms into spreadsheets to allow the users of those spreadsheets to semantically annotate their data from simple drop-down lists A tool for automatically extracting semantically annotated metadata from spreadsheets and producing RDF
4
RightField Annotation benefits Makes annotation quicker and more efficient Standardises annotation Hides the ontology complexity from the users RDF production benefits Querying over heterogeneous data files Semantic searching and reasoning Standard format for interoperability Hides semantic web tools from end users Spreadsheets and web browsers
5
SEEK: Systems Biology Data Sharing The SEEK Systems Biology of MicroOrganisms Pan-European > 100 research groups > 320 scientists Distributed, interdisciplinary projects Expected to pool data and results and disseminate Microbiologists, molecular biologists, biochemists, mathematicians....not many informaticians SysMO Consortium A platform for Systems Biology data and models sharing Web based environment for sharing within a consortium and disseminating to the community (an eLaboratory) Standards Compliant Fitting in with laboratory practices
6
~ 1900 assets People – 350 Investigations - 35 Studies - 87 Assays - 167 Data sets - 930 Models - 60 SOPs - 140 Publications -165
7
SSFH CISBIC Consortia using SEEK JenAge SyBaCol Rosage Yeast Glycolysis Forsys
8
Types of data Multiple omics genomics, transcriptomics proteomics, metabolomics fluxomics, reactomics Images Molecular biology Reaction Kinetics Models Metabolic, gene network, kinetic Relationships between data sets/experiments Procedures, experiments, data, results and models Analysis of data
9
Minimum Information Model What is the least amount of information required to: Find Interpret Understand Reuse Different for different data sets CIMRCIMR Core Information for Metabolomics Reporting MIABEMIABE Minimal Information About a Bioactive Entity MIACAMIACA Minimal Information About a Cellular Assay MIAMEMIAME Minimum Information About a Microarray Experiment MIAME/EnvMIAME/Env MIAME / Environmental transcriptomic experiment MIAME/NutrMIAME/Nutr MIAME / Nutrigenomics MIAME/PlantMIAME/Plant MIAME / Plant transcriptomics MIAME/ToxMIAME/Tox MIAME / Toxicogenomics MIAPAMIAPA Minimum Information About a Phylogenetic Analysis MIAPARMIAPAR Minimum Information About a Protein Affinity Reagent MIAPEMIAPE Minimum Information About a Proteomics Experiment MIAREMIARE Minimum Information About a RNAi Experiment MIASEMIASE Minimum Information About a Simulation Experiment
10
Not quite available “off the shelf” Loose guidelines or checklists Specific formats (generally in XML) Specific formats with associated ontologies Remaining questions for the scientists: How do we generate standards compliant data? Which vocabularies/ontologies should I use? How do I know which ontology terms to use where?
11
Data MIBBI ModelOntologies Microarray MIAME:Minimum Information about a Microarray Experiment MGED Proteomics MIAPE: Minimum Information about a Proteomics Experiment PSI-MI, PSI-MS, PSI-MOD Interaction experiments MIMIX:Minimum Information about a Molecular Interaction Experiment PSI-MI Protein-Protein Interaction Systems Biology Models MIRIAM:Minimal Information Required In the Annotation of biochemical Models SBO: Systems Biology Ontology Systems Biology Model Simulation MIASE:Minimum Information About a Simulation Experiment KISAO:Kinetic Simulation Algorithm Ontology
12
SOP Data Templates and Vocabularies Construction Validation SOP Metabolomics Mass Spec Transcriptomics Proteomics Fluxomics Investigations Studies Assays
13
Fitting in with Laboratory practices Scientists can continue to do what they have always done Scientists remain in control Embedding semantics into the tools already in use Excel, excel, excel.....
14
Ontology terms for marked- up cells in drop-down boxes The End Result
15
RightField Architecture Java Platform Independent OWL API Loading ontologies and reasoning Apache POI HSSF libraries Loading and saving of Excel Spreadsheets
16
Availability Open source http://www.rightfield.org.uk
17
Excel Workbook Ontology “Portion” of ontology terms Terms Embedded into Excel Workbook RightField Client How RightField Works – part 1 Marked-up workbook Saved in plain Excel Informaticians/ontologists End Users
19
Loading Ontologies from BioPortal Published ontologies Multiple versions You can also load local ontologies from file or URL
20
JERM = “Just Enough Results Model” What type of data is it Microarray, growth curve, enzyme activity… What was measured Gene expression, OD, metabolite concentration…. What do the values in the datasets mean Units, time series, repeats….
21
Excel workbook loaded into RightField with multiple worksheets
22
Class hierarchies of loaded ontologies. Multiple ontologies shown in separate tabs
23
Selected parent term from the ontology Methods for specifying ontology terms Term lists for selected cells Value Type and Property
24
Excel workbook with marked-up cells
25
Marking-up Columns or Rows
26
Ontology Languages RDFS - RDF Schema OBO - Open Biomedical Ontologies OWL - Web Ontology Language
27
Provenance and Identifiers Term Label The human readable term label Term IRI The (unique) term identifier Ontology IRI Ontology Version The ontology that defines the term The version of the ontology Physical Location The (web) location of the ontology
28
Ontology Information Ontologies encapsulated Scientists can work offline Ensures same versions of ontologies used for a series of experiments No special macros or plugins required, just Excel or Open Office Versions and URIs captured in hidden worksheets Provenance Comparisons between sheets Linking back to the vocabularies
29
Store / Reuse RDF Graph Populate Extract Metadata Extraction and Querying Generates RDF triples for each marked up cell Simple RDF, or conforming to ontology models Storage and querying solutions Virtuoso triple store Linked data compliance Already HTML and XML interface and REST API
30
Ontology Annotations and Properties
31
RightField Annotation Evaluation Does RightField improve the quantity and consistency of data annotation? Improvements in annotation consistency Assay type Technology type Experimental conditions Factors studied Organism and strains
32
RightField Annotation Evaluation JERM Metadata Element Scores Dataset IDRightField TemplatePre-RightField Template 598616244 599319402 7211985 86820362 6912788
33
Metadata Extraction and Querying No current ‘standard’ RDF format for MIBBI models (although it is in progress) RDF vs ‘traditional’ relational approaches RDF more flexible in dealing with optional and changing metadata elements RDF allows aggregation between different types of experimental data E.g. biological samples, experimental conditions
34
RightField LifeCycle
35
Future Work Visualising nodes with large numbers of terms Ontology label ambiguities Linked Data output for SysMO SEEK and related resources
36
Other Work Using RightField KupKB – Kidney and Urinary Pathway knowledge base (http://www.kupkb.org) Knowledge bases for inflammatory bowel disease and Chagas disease BioBanking sample annotation Annotation of historical samples ‘Patient records' for Egyptian mummies, Manchester Museum
37
RightField Extension: Populous Generic tool for populating ontology templates Supports validation at the point of data entry Expressive Pattern language for OWL Ontology generation Helps biologists with ontology design patterns http://www.e-lico.eu/populous Simon Jupp, Robert Stevens, University of Manchester
38
Summary RightField-enabled spreadsheets show a marked increase in the consistency of annotation when compared with free text annotation or other template approaches. Success from embedding and hiding semantics and complexity
39
Acknowledgements Stuart OwenKaty WolstencroftCarole Goble Wolfgang MuellerOlga Krebs Matthew Horridge http://www.sysmo-db.org
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.