Presentation is loading. Please wait.

Presentation is loading. Please wait.

Describing and Annotating Experimental Data: Hands On.

Similar presentations


Presentation on theme: "Describing and Annotating Experimental Data: Hands On."— Presentation transcript:

1 Describing and Annotating Experimental Data: Hands On

2 Objectives To explore and understand semantic resources available for data annotation, and to try tools designed to assist both data integration specialists and end user scientists

3 Topics Exploring ontology resources available for data annotation Exploring a semantic application which makes use of these resources Using spreadsheet templates for the semantic annotation of data Creating spreadsheet templates for semantic data annotation

4 1: Exploring Ontologies Go to http://bioportal.bioontology.org/http://bioportal.bioontology.org/ Which ontologies are currently the most popular? What are they for? What format are they described in? In the top banner of the website, select ‘Browse’ Filter for all OWL ontologies and explore the EDAM ontology What is it’s purpose? Which projects is it used in?

5 2: Exploring BioPortal Find the SysMO JERM ontology in the BioPortal What is it about? We will use the JERM and other ontologies in later exercises

6 3. Other Ontology Resources Bioportal allows anyone to upload and share their ontologies The EBI Ontology Lookup Service maintains a list of ontologies used/useful with EBI data Explore the EDAM ontology in the EBI OLS http://www.ebi.ac.uk/ols/beta

7 4:Exploring Experimental Data Go to the FAIRDOM Hub https://fairdomhub.org/ As an anonymous visitor, you will only be able to see things that are open to the public Find all models associated with yeast Find all metabolomics experiments These queries use the semantic annotation automatically extracted during data uploads

8 4:Exploring Experimental Data In one of the metabolomics assay descriptions, click on the Assay Type: Metabolomics link At the top of the page, use the arrow keys to explore the Experimental Assay Type hierarchy and find out how many Gene Expression Profiling experiments are publicly available.

9 5: Using RightField-Enabled Spreadsheets Find the investigation Central Carbon Metabolism of Sulfolobus solfataricus in the FAIRDOM Hub How many data files are associated with this investigation? Find and download the GAPDH kinetic data from this investigation. This file has been annotated with RightField Explore the dropdown lists in yellow. These are terms from the JERM ontology. When the file is uploaded to the FAIRDOM Hub, these values are automatically extracted as RDF triples

10 6: Model Annotation Go to the reconstituted gluconeogenesis system in S. solfataricus model from the same investigation The model is in SBML format (Systems Biology Markup Language) and semantically annotated with terms from GO, CHEBI, SBO and others We can use these annotations to find other data files that might be relevant input or validation for the model Click on Find related data files – how many data files overlap with this model?

11 6: Model Annotation Go back to the reconstituted gluconeogenesis system in S. solfataricus model page Select simulate model - this gives you an interactive view of the model Mouse over the model and right click on atp Follow the annotation links to find the KEGG entry for ATP This view also semantically links to structural information and triggers searches in remote databases

12 7: Using RightField Download and install RightField according to the instructions on the website (http://www.rightfield.org.uk)http://www.rightfield.org.uk Download the GEOArray Excel file here: http://www.myexperiment.org/files/1412.html This is a data upload template from the GEO website (Gene expression omnibus http://www.ncbi.nlm.nih.gov/geo/). We are going to mark-up this template with terms from appropriate common vocabularies (in this case, the MGED ontology and the SysMO-JERM ontology)

13 7: Creating a new RightField Template Open RightField. It will ask you if you want to start with a spreadsheet you have already created. Select ‘yes’ and navigate to the GEO template you just downloaded. Go to ‘File’ and ‘Open from BioPortal’ In the pop-up window, search for ‘Microarray’ and load the Microarray and Gene Expression Data ontology (MGED ontology) Repeat the process to load the SysMO-JERM ontology

14 As you can see, some cells have already been marked-up for you (shown in green). We can now mark-up the rest Select the cell to the right of Assay_type in the spreadsheet. We will set this field to any transcriptomics assay type (from the JERM ontology) In the search box in the top right-hand corner, search for ‘AssayType’ and navigate down the tree until you find ‘Transcriptomics’ Select “Subclasses” from the types of allowed values You will now see all the possible types of transcriptomics displayed at the bottom of the screen Tick the box to include a property and select “hasType” Click ‘apply’. Now this field is a drop-down box of all transcriptomics types 7: Creating a new RightField Template

15 Repeat this process for the cell on the right of ‘Project’ This time select “instances” from the allowed values (if you accidentally select ‘subclasses’, you will see there are none this time. This term simply lists all the individual project names in the SysMO consortium) Tick the box to include a property, and select “is_investigated_by” Click ‘apply’ and this field will be marked-up with a drop down list of projects 7: Creating a new RightField Template

16 Repeat the process for technology type and organism (from the JERM Ontology), and Experiment design type and quality control type (from the MGED ontology). If you wish, you can also mark-up some sample characteristics, using both ontologies. HINT: you can mark-up ranges of spreadsheet cells (e.g. whole columns or rows). If you don’t know which property is appropriate, you can leave this part blank. SEEK will add the default property “hasAssociatedItem” in these cases HINT: If you want to know which properties can be selected, explore the JERM and MGED ontologies in an ontology editor of your choice (e.g. Topbraid Composer or Protege) 7: Creating a new RightField Template

17 When the template is finished, save it and open it in excel You will now see yellow drop-down boxes of terms for everything you set in RightField. Now you can share this file with your colleagues. Experiment annotation will be uniform and standards-compliant across your project. Nobody else needs to use RightField, or navigate the ontologies, they just need to select values in the spreadsheet. 7: Creating a new RightField Template

18 8: Extracting RDF RightField has dual function. As well as setting up the semantic content of the spreadsheets, it can generate and extract the RDF of annotated data. Fill in the spreadsheet template you have just created with values of your choosing, using Excel Save the annotated spreadsheet with a different name. Open it in RightField once more Go to File -> Extract RDF In the pop-up window, add an appropriate root identifier URI (you can use the same root as in the RDF session, followed by JERM/dataset) View and save the RDF you created


Download ppt "Describing and Annotating Experimental Data: Hands On."

Similar presentations


Ads by Google