Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

Slides:



Advertisements
Similar presentations
Data Access & Integration in the ISPIDER Proteomics Grid N. Martin – A. Poulovassilis – L. Zamboulis
Advertisements

Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,
Principles of Personalisation of Service Discovery Electronics and Computer Science, University of Southampton myGrid UK e-Science Project Juri Papay,
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
Gene Ontology John Pinney
A Systematic approach to the Large-Scale Analysis of Genotype- Phenotype correlations Paul Fisher Dr. Robert Stevens Prof. Andrew Brass.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
GADA Workshop 1-2 November 2005 Life Science Grid Middleware in a More Dynamic Environment Milena Radenkovic & Bartosz Wietrzyk The University of Nottingham,
On the Use of Agents in a BioInformatics Grid with slides from Luc Moreau, University of Southampton,UK myGrid.
Introduction to Web services MSc on Bioinformatics for Health Sciences May 2006 Arnaud Kerhornou Iván Párraga García INB.
Data Management in the DOE Genomics:GTL Program Janet Jacobsen and Adam Arkin Lawrence Berkeley National Laboratory University of California, Berkeley.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
The my Grid project aims to provide middleware layers that make the Information Grid appropriate for the needs of bioinformatics. my Grid is building high.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Metadata in my Grid: Finding Services for in silico Science Dr Katy Wolstencroft myGrid University of Manchester.
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
A Scalable Application Architecture for composing News Portals on the Internet Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta Famagusta.
CHESS seminar July 2005 Promoting reuse and repurposing on the Semantic Grid Antoon Goderis University of Manchester, UK CHESS seminar, 19 July 2005.
Title: GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes By Peter F. Hallin, Hans-Henrik Stærfeldt, Eva Rotenberg, Tim T. Binnewies,
The Functional Genomics Experiment Model (FuGE) Andy Jones School of Computer Science and Faculty of Life Sciences, University of Manchester.
Taverna and my Grid Basic overview and Introduction Tom Oinn
Copyright OpenHelix. No use or reproduction without express written consent1.
Proteome data integration characteristics and challenges K. Belhajjame 1, R. Cote 4, S.M. Embury 1, H. Fan 2, C. Goble 1, H. Hermjakob, S.J. Hubbard 1,
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman & Mario Latendresse Bioinformatics Research Group SRI, International.
MyGrid: Personalised e-Biology on the Grid Professor Carole Goble Contact e-Science.
MyGrid: Personalised e-Biology on the Grid Professor Carole Goble Contact
My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble Sun Microsystems BioGrid Symposium, Baltimore, USA.
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Copyright OpenHelix. No use or reproduction without express written consent1.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
Grup.bio.unipd.it CRIBI Genomics group Erika Feltrin PhD student in Biotechnology 6 months at EBI.
Organizing information in the post-genomic era The rise of bioinformatics.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
Anil Wipat University of Newcastle upon Tyne, UK A Grid based System for Microbial Genome Comparison and analysis.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
MyGrid: open knowledge based high level services for bioinformatics the information Grid Professor Carole Goble University of Manchester, UK
Association of variations in I kappa B-epsilon with Graves' disease using classical and my Grid methodologies Peter Li School of Computing Science University.
GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
XML Standards for Proteomics Data Andrew Jones, Dr Jonathan Wastling and Dr Ela Hunt Department of Computing Science and the Institute of Biomedical and.
FuGE: A framework for developing standards for functional genomics Andrew Jones School of Computer Science, University of Manchester Metabomeeting 2.0.
Representing Flow Cytometry Experiments within FuGE Josef Spidlen 1, Peter Wilkinson 2, and Ryan Brinkman 1 1 BC Cancer Research Centre, Vancouver, BC,
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman Bioinformatics Research Group SRI, International February 1, 2008.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
Project Database Handler The Project Database Handler is a brokering application that mediates interactions between the project database and the external.
XML-Based Grid Data System for Bioinformatics Development Noppadon Khiripet, Ph.D Wasinee Rungsarityotin, MS Chularat Tanprasert, Ph.D Royol Chitradon.
PharmaGrid 2004, Switzerland, July Part 5: Wrap Up Professor Carole Goble University of Manchester
Sharing the knowledge of electrophysiology data Phillip Lord, Frank Gibson and the CARMEN Consortium.
Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
The my Grid Information Model Nick Sharman, Nedim Alpdemir, Justin Ferris, Mark Greenwood, Peter Li, Chris Wroe AHM2004, 1 September
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman Mario Latendresse Bioinformatics Research Group SRI International April.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
MyGrid: Personalised Bioinformatics on the Information Grid Robert Stevens, Alan Robinson & Carole Goble University of Manchester & EBI, UK myGrid project.
Workflow and myGrid Justin Ferris IT Innovation Centre 7 October 2003 Life Sciences Grid GGF9.
National Cancer Institute Uma Mudunuri ABCC, NCI-Frederick ISRCE Monthly Meeting, Nov 9th 2010 bioDBnet The biological DataBase network.
Taverna: A Workbench for the Design and Execution of Scientific Workflows Paul Fisher University of Manchester.
Mangaldai College, Mangaldai
Dr. Bhavani Thuraisingham The University of Texas at Dallas
Presentation transcript:

Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester

What are biologists interested in? Complete organisms are much too complicated. Only very well understood systems have well defined pathways. Many biologists focus on one or a small number of genes.

GENOME TRANSCRIPTOME PROTEOME METABOLOME

Sample generation –Origin of sample hypothesis, organism, environment, preparation, paper citations Sample processing –Gels (1D/ 2D) and columns images, gel type and ranges, band/spot coordinates stationary and mobile phases, flow rate, temperature, fraction details Mass Spectrometry machine type, ion source, voltages In Silico analysis peak lists, database name + version, partial sequence, search parameters, search hits, accession numbers The nature of proteomics experiment data

A Systematic Approach to Modelling, Capturing and Disseminating Proteomics Experimental Data

The PEDRo UML schema in reduced form

The Framework Around PEDRo 1.Lab generated data is encoded using the PEDRo data entry tool, producing an XML (PEML) file for local storage, or submission 2.Locally stored PEML files may be viewed in a web browser (with XSLT), allowing web pages to be quickly generated from datasets 3.Upon receipt of a PEML file at the repository site, a validation tool checks the file before entering it into the database 4.The repository (a relational database) holds submitted data, allowing various analyses to be performed, or data to be extracted as a PEML file or another format

INTEGRATION

Why integrate data? “These 200 genes are up-regulated in my experiment. Are any of their protein products known to interact?” Data is stored at a variety of sites and formats. Databases designed mainly for browsing (MIPS, SGD, BIND, SCPD, KEGG). Need databases that allow complex queries. Need to be easily usable by biologists.

Genome Information Management System (GIMS) Paton NW, Khan SA, Hayes A, Moussouni F, Brass A, Eilbeck K, Goble GA, Hubbard SJ, Oliver SG (2000) Conceptual modelling of genomic information. Bioinformatics 16,

GIMS Integrates genomic and functional data. Consists of two parts: –GIMS Database –GIMS User Interface

GIMS data warehouse SGDMIPSmaxD GIMS Database Analysis Library Canned QueriesBrowser

Database implementation Uses the object database FastObjects. All database classes and analysis programs are written in Java. Allows close integration of the programming language with the database. Allows fast access to database data from application programs. Allows data to be stored in a way that reflects the underlying mechanisms in the organism. Very flexible and extensible.

GIMS Contents Data typeData source DNA sequences, chromosome locations of coding regions, e.g. ORFs, tRNAs, centromeres, telomeres etc. MIPS Predicted protein sequences, pI, mol weight, number of transmembrane regions. MIPS Protein attributes (e.g. cellular location, function, protein class, Prosite motifs, phenotype). MIPS Protein interaction data (affinity purification, yeast two-hybrid, genetic interactions). Ho et al.,(2002), Gavin et al.,(2002), MIPS, Uetz et al.. (2000), Ito et al., (2001)

GIMS Contents Data typeData source Metabolic data (reactions, compounds and enzymes). L-compound, L-enzyme Transcription factor.SCPD Transcriptome dataStanford Microarray Database, University of Manchester (BBSRC COGEME Project) Ontology Data Sequence similarity GO SGD

GIMS User Interface Java application. Can download from Communicates with database via RMI. On start-up, application is sent information about database classes and canned queries. Very flexible. Allows user to browse database, ask canned queries, and store and combine data sets. Can save results as txt, html or xml.

Selecting Canned Queries Query categories. Queries in selected category Initially empty store.

Parameterising a Query Previously selected query Parameters for specific run – selects down- regulated genes in the nucleus

Viewing the Results Result collection Operations on collections

Selecting a Second Query

Setting Its Parameters Parameters for specific run – selects down- regulated genes in the same experiment that are transcription factors

Obtaining Its Results

Inter-relating Results Collections selected for operating on Remove one result from the other

Result of Difference

GIMS empowers the biologist

Resources at the centre Provenance record on how the data was produced Workflows that could be used to generate this data People who have registered an interest in this data Ontologies describing data Services that can use or produce this data Annotations Data holdings Literature relevant Related Data

Biologists at the centre Provenance record of workflow runs they have made People Ontologies Preferences for Services Notes Data holdings Literature Workflows they wrote or used People they collaborate with

myGrid EPSRC UK e-Science pilot project. Open Source Upper Middleware for Bioinformatics. (Web) Service-based architecture -> Grid services. 42 months, 24 months in. Prototype v1 Release Sept 2004; some services available now.

Workflows are in silico experiments Annotation Pipeline What is known about my candidate gene? Medline OMIM GO BLAST EMBL DQP Query

Application: Work bench demonstrator The myGrid service components are used in a demonstration application called the “myGrid WorkBench”, which provides a common point of use for the services. We can select data from the myGrid Information repository (mIR), select a workflow based on its semantic description, and examine the results.

e-Science: Provenance Like a bench experiment, my Grid records the materials and methods it has used for an in silico experiment in a provenance log. This is the where, what, when and how the experiment was run. Derivation paths ~ workflows, queries Annotations ~ notes Evolution paths ~ workflow  workflow

e-Science: Notification A notification service can inform the mIR and the user (proxy) that data, workflows, services, etc. have changed and thus prompt actions over data in the mIR. Notifications are presented to the user with a client in the workbench environment. User registers interest in notification topics

The myGrid Team Matthew Addis, Nedim Alpdemir, Rich Cawley, Vijay Dialani, Alvaro Fernandes, Justin Ferris, Rob Gaizauskas, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Claire Jennings, Ananth Krishna, Xiaojian Liu, Darren Marvin, Karon Mee, Simon Miles, Luc Moreau, Juri Papay, Norman Paton, Simon Pearce, Steve Pettifer, Milena Radenkovic, Peter Rice, Angus Roberts, Alan Robinson, Martin Senger, Nick Sharman, Paul Watson, Anil Wipat and Chris Wroe.

Need GRID to empower the biologist