Presentation is loading. Please wait.

Presentation is loading. Please wait.

RightField: Semantic Enrichment of Systems Biology Data using Spreadsheets Katy Wolstencroft myGrid, SysMO-DB University of Manchester.

Similar presentations


Presentation on theme: "RightField: Semantic Enrichment of Systems Biology Data using Spreadsheets Katy Wolstencroft myGrid, SysMO-DB University of Manchester."— Presentation transcript:

1 RightField: Semantic Enrichment of Systems Biology Data using Spreadsheets Katy Wolstencroft myGrid, SysMO-DB University of Manchester

2 Outline What RightField does Origins - SysMO-DB project and data sharing in Systems Biology How RightField works Evaluation – how successfully it works Extensions and future directions

3 RightField A tool for embedding ranges of ontology terms into spreadsheets to allow the users of those spreadsheets to semantically annotate their data from simple drop-down lists A tool for automatically extracting semantically annotated metadata from spreadsheets and producing RDF

4 RightField Annotation benefits Makes annotation quicker and more efficient Standardises annotation Hides the ontology complexity from the users RDF production benefits Querying over heterogeneous data files Semantic searching and reasoning Standard format for interoperability Hides semantic web tools from end users Spreadsheets and web browsers

5 SEEK: Systems Biology Data Sharing The SEEK Systems Biology of MicroOrganisms Pan-European > 100 research groups > 320 scientists Distributed, interdisciplinary projects Expected to pool data and results and disseminate Microbiologists, molecular biologists, biochemists, mathematicians....not many informaticians SysMO Consortium A platform for Systems Biology data and models sharing Web based environment for sharing within a consortium and disseminating to the community (an eLaboratory) Standards Compliant Fitting in with laboratory practices

6 ~ 1900 assets People – 350 Investigations - 35 Studies - 87 Assays - 167 Data sets - 930 Models - 60 SOPs - 140 Publications -165

7 SSFH CISBIC Consortia using SEEK JenAge SyBaCol Rosage Yeast Glycolysis Forsys

8 Types of data Multiple omics genomics, transcriptomics proteomics, metabolomics fluxomics, reactomics Images Molecular biology Reaction Kinetics Models Metabolic, gene network, kinetic Relationships between data sets/experiments Procedures, experiments, data, results and models Analysis of data

9 Minimum Information Model What is the least amount of information required to: Find Interpret Understand Reuse Different for different data sets CIMRCIMR Core Information for Metabolomics Reporting MIABEMIABE Minimal Information About a Bioactive Entity MIACAMIACA Minimal Information About a Cellular Assay MIAMEMIAME Minimum Information About a Microarray Experiment MIAME/EnvMIAME/Env MIAME / Environmental transcriptomic experiment MIAME/NutrMIAME/Nutr MIAME / Nutrigenomics MIAME/PlantMIAME/Plant MIAME / Plant transcriptomics MIAME/ToxMIAME/Tox MIAME / Toxicogenomics MIAPAMIAPA Minimum Information About a Phylogenetic Analysis MIAPARMIAPAR Minimum Information About a Protein Affinity Reagent MIAPEMIAPE Minimum Information About a Proteomics Experiment MIAREMIARE Minimum Information About a RNAi Experiment MIASEMIASE Minimum Information About a Simulation Experiment

10 Not quite available “off the shelf” Loose guidelines or checklists Specific formats (generally in XML) Specific formats with associated ontologies Remaining questions for the scientists: How do we generate standards compliant data? Which vocabularies/ontologies should I use? How do I know which ontology terms to use where?

11 Data MIBBI ModelOntologies Microarray MIAME:Minimum Information about a Microarray Experiment MGED Proteomics MIAPE: Minimum Information about a Proteomics Experiment PSI-MI, PSI-MS, PSI-MOD Interaction experiments MIMIX:Minimum Information about a Molecular Interaction Experiment PSI-MI Protein-Protein Interaction Systems Biology Models MIRIAM:Minimal Information Required In the Annotation of biochemical Models SBO: Systems Biology Ontology Systems Biology Model Simulation MIASE:Minimum Information About a Simulation Experiment KISAO:Kinetic Simulation Algorithm Ontology

12 SOP Data Templates and Vocabularies Construction Validation SOP Metabolomics Mass Spec Transcriptomics Proteomics Fluxomics Investigations Studies Assays

13 Fitting in with Laboratory practices Scientists can continue to do what they have always done Scientists remain in control Embedding semantics into the tools already in use Excel, excel, excel.....

14 Ontology terms for marked- up cells in drop-down boxes The End Result

15 RightField Architecture Java Platform Independent OWL API Loading ontologies and reasoning Apache POI HSSF libraries Loading and saving of Excel Spreadsheets

16 Availability Open source http://www.rightfield.org.uk

17 Excel Workbook Ontology “Portion” of ontology terms Terms Embedded into Excel Workbook RightField Client How RightField Works – part 1 Marked-up workbook Saved in plain Excel Informaticians/ontologists End Users

18

19 Loading Ontologies from BioPortal Published ontologies Multiple versions You can also load local ontologies from file or URL

20 JERM = “Just Enough Results Model” What type of data is it Microarray, growth curve, enzyme activity… What was measured Gene expression, OD, metabolite concentration…. What do the values in the datasets mean Units, time series, repeats….

21 Excel workbook loaded into RightField with multiple worksheets

22 Class hierarchies of loaded ontologies. Multiple ontologies shown in separate tabs

23 Selected parent term from the ontology Methods for specifying ontology terms Term lists for selected cells Value Type and Property

24 Excel workbook with marked-up cells

25 Marking-up Columns or Rows

26 Ontology Languages RDFS - RDF Schema OBO - Open Biomedical Ontologies OWL - Web Ontology Language

27 Provenance and Identifiers Term Label The human readable term label Term IRI The (unique) term identifier Ontology IRI Ontology Version The ontology that defines the term The version of the ontology Physical Location The (web) location of the ontology

28 Ontology Information Ontologies encapsulated Scientists can work offline Ensures same versions of ontologies used for a series of experiments No special macros or plugins required, just Excel or Open Office Versions and URIs captured in hidden worksheets Provenance Comparisons between sheets Linking back to the vocabularies

29 Store / Reuse RDF Graph Populate Extract Metadata Extraction and Querying Generates RDF triples for each marked up cell Simple RDF, or conforming to ontology models Storage and querying solutions Virtuoso triple store Linked data compliance Already HTML and XML interface and REST API

30 Ontology Annotations and Properties

31 RightField Annotation Evaluation Does RightField improve the quantity and consistency of data annotation? Improvements in annotation consistency Assay type Technology type Experimental conditions Factors studied Organism and strains

32 RightField Annotation Evaluation JERM Metadata Element Scores Dataset IDRightField TemplatePre-RightField Template 598616244 599319402 7211985 86820362 6912788

33 Metadata Extraction and Querying No current ‘standard’ RDF format for MIBBI models (although it is in progress) RDF vs ‘traditional’ relational approaches RDF more flexible in dealing with optional and changing metadata elements RDF allows aggregation between different types of experimental data E.g. biological samples, experimental conditions

34 RightField LifeCycle

35 Future Work Visualising nodes with large numbers of terms Ontology label ambiguities Linked Data output for SysMO SEEK and related resources

36 Other Work Using RightField KupKB – Kidney and Urinary Pathway knowledge base (http://www.kupkb.org) Knowledge bases for inflammatory bowel disease and Chagas disease BioBanking sample annotation Annotation of historical samples ‘Patient records' for Egyptian mummies, Manchester Museum

37 RightField Extension: Populous Generic tool for populating ontology templates Supports validation at the point of data entry Expressive Pattern language for OWL Ontology generation Helps biologists with ontology design patterns http://www.e-lico.eu/populous Simon Jupp, Robert Stevens, University of Manchester

38 Summary RightField-enabled spreadsheets show a marked increase in the consistency of annotation when compared with free text annotation or other template approaches. Success from embedding and hiding semantics and complexity

39 Acknowledgements Stuart OwenKaty WolstencroftCarole Goble Wolfgang MuellerOlga Krebs Matthew Horridge http://www.sysmo-db.org


Download ppt "RightField: Semantic Enrichment of Systems Biology Data using Spreadsheets Katy Wolstencroft myGrid, SysMO-DB University of Manchester."

Similar presentations


Ads by Google