The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.

Slides:



Advertisements
Similar presentations
The MGED Ontology: Providing Descriptors for Microarray Data Trish Whetzel Department of Genetics Center for Bioinformatics University of Pennsylvania.
Advertisements

CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
IPAW'08 – Salt Lake City, Utah, June 2008 Data lineage model for Taverna workflows with lightweight annotation requirements Paolo Missier, Khalid Belhajjame,
Introduction to BioConductor Friday 23th nov 2007 Ståle Nygård Statistical methods and bioinformatics for the analysis of microarray.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 8 Slide 1 System modeling 2.
Dr Gordon Russell, Napier University Unit Data Dictionary 1 Data Dictionary Unit 5.3.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Functional Genomics Ontology FuGO and Metabolomics Society Ontology group Susanna-Assunta Sansone Nutr/Toxicogenomics Projects Coordinator EMBL-EBI Metabolomics.
 Goals Unambiguous description of how the investigation was performed Consistent annotation, powerful queries and data integration  Details NOT model.
Metadata For CARMEN Phillip Lord and Frank Gibson.
FuGO: Development of a Functional Genomics Ontology (FuGO) Patricia L. Whetzel 1, Helen Parkinson 2, Assunta-Susanna Sansone 2,Chris Taylor 2, and Christian.
The MGED Ontology Is An Experimental Ontology Bio-Ontologies Aug 8, 2002 Chris Stoeckert, Helen Parkinson and the MGED Ontology Working Group.
Data Management in the DOE Genomics:GTL Program Janet Jacobsen and Adam Arkin Lawrence Berkeley National Laboratory University of California, Berkeley.
Evidence-Based Information Retrieval in Bioinformatics
Database Systems. What is a database? A database is an organised store of data items.
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
1 MAGE-OM and ArrayExpress database model Ugis Sarkans, EBI.
© 2007 Open Grid Forum OGF Modeling Activities DMTF Alliance Partner Symposium Portland, 2007 July 18 Ellen Stokes
1 st (RSBI) ISA-Tab Workshop – Scope and Outcome  Tackle today's need for exchange of multi-omics experiments Evaluate the ISA-TAB straw-man (incomplete)
Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.
The Functional Genomics Experiment Model (FuGE) Andy Jones School of Computer Science and Faculty of Life Sciences, University of Manchester.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
Usability Issues Documentation J. Apostolakis for Geant4 16 January 2009.
ABC Insurance Co. Paul Barry Steve Randolph Jing Zhou CSC8490 Database Systems & File Management Dr. Goelman Villanova University August 2, 2004.
ISO Environmental management — Life cycle assessment — Data documentation format.
1 MIAME The MIAME website: © 2002 Norman Morrison for Manchester Bioinformatics.
Teranode Tools and Platform for Pathway Analysis Michael Kellen, Solution Manager June 16, 2006.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
Content, Format, and Standards in Genomics Scale Data The ILSI – EBI Collaboration Wm. B. Mattes, PhD, DABT.
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
The european ITM Task Force data structure F. Imbeaux.
Illustrations and Answers for TDT4252 exam, June
The European Bioinformatics Institute MAGE-OM and ArrayExpress a brief introduction to the database model Helen Parkinson European Bioinformatics Institute.
1 LS DAM Overview and the Specimen Core February 16, 2012 Core Team: Ian Fore, D.Phil., NCI CBIIT, Robert Freimuth, Ph.D., Mayo Clinic, Elaine Freund,
Web: Minimal Metadata for Data Services Through DIALOGUE Neil Chue Hong AHM2007.
MIAMExpress and the development of annotation ontologies for gene expression experiments Ele Holloway Microarray Informatics European Bioinformatics Institute.
RADical microarray data: standards, databases, and analysis Chris Stoeckert, Ph.D. University of Pennsylvania Yale Microarray Data Analysis Workshop December.
FuGE: A framework for developing standards for functional genomics Angel Pizarro Univesrity of Pennsylvania Andrew Jones University of Manchester.
XML Standards for Proteomics Data Andrew Jones, Dr Jonathan Wastling and Dr Ela Hunt Department of Computing Science and the Institute of Biomedical and.
FuGE: A framework for developing standards for functional genomics Andrew Jones School of Computer Science, University of Manchester Metabomeeting 2.0.
Representing Flow Cytometry Experiments within FuGE Josef Spidlen 1, Peter Wilkinson 2, and Ryan Brinkman 1 1 BC Cancer Research Centre, Vancouver, BC,
A Practical Approach to Metadata Management Mark Jessop Prof. Jim Austin University of York.
Extending FuGE into other domains Andrew Jones School of Computer Science, University of Manchester
1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)
The MGED Ontology W3C Workshop on Semantic Web for life Sciences October 27, 2004 Presented by Liju Fan MGED Ontology Working Group Senior Scientist, KEVRIC.
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
Enabling e-Research in Combustion Research Community T.V Pham 1, P.M. Dew 1, L.M.S. Lau 1 and M.J. Pilling 2 1 School of Computing 2 School of Chemistry.
Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari
Mining the Biomedical Research Literature Ken Baclawski.
MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06.
Sharing the knowledge of electrophysiology data Phillip Lord, Frank Gibson and the CARMEN Consortium.
Slide 1 Service-centric Software Engineering. Slide 2 Objectives To explain the notion of a reusable service, based on web service standards, that provides.
Personalized Recommendation of Related Content Based on Automatic Metadata Extraction Andreas Nauerz 1, Fedor Bakalov 2, Birgitta.
Ontology Evaluation, Metrics, and Metadata in NCBO BioPortal Natasha Noy Stanford University.
OMERO.editor Where next? (After Beta3). Goal of Editor? (1) To record a complete description of the experiment. Like a lab notebook that someone else.
Faculty of Education, Language and Community Services Stavroula Tsembas Marketing and Distribution: Metadata Linkages What is metadata? information about.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
High throughput biology data management and data intensive computing drivers George Michaels.
1 LS DAM Overview August 7, 2012 Current Core Team: Ian Fore, D.Phil., NCI CBIIT, Robert Freimuth, Ph.D., Mayo Clinic, Mervi Heiskanen, NCI-CBIIT, Joyce.
ArrayExpress Ugis Sarkans EMBL - EBI
IPDA Registry Definitions Project Dan Crichton Pedro Osuna Alain Sarkissian.
The Re3gistry software and the INSPIRE Registry
Service-centric Software Engineering
OBI – Standard Semantic
TargetDB and PEPCDB •
NIEM Tool Strategy Next Steps for Movement
Presentation transcript:

The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society

What is FuGE? Various groups have tried to fuse MAGE and PEDRo in the past –Such a model would be difficult to manage FuGE is a model of the common components of functional genomics experiments Aims to help the development of data standards Should allow some cross-compatibility between different ‘omics experiments Microarray & proteome standards will use parts of FuGE for some data formats

So, what is FuGE? An object model in UML (close to 1 st stable release) An XML Schema (in development) A software API (will be created from UML) FuGE use ontologies extensively, such as MGED Ontology or its successor (FuGO) Developed by members of MGED / PSI with input from cross-omics experimentalists e.g. RSBI

What is FuGE not…? Not an effort to create one data standard for all lab techniques –This problem is hard at technical level and v hard getting agreement from all groups Not a model for metabolomics metadata –But it might help in the development of one –…and we would like to encourage input from the metabolomics community

FuGE Structure 2 sections: Common and Bio Common – components that aid the development of a rich data standard –Protocols, external references, auditing and security settings Bio – biological specific components –Biological (or chemical) materials, bio sequences –Summary of an investigation structure –References to data model specific to each domain

Protocols Protocols have a set of ordered atomic actions –Actions are user-entered text or ontology terms Protocols can be associated with Software and Equipment Protocols, Software and Equipment can have a set of defined Parameters Mechanism for defining a standard protocol, and an instance of a protocol (date, operator…) Nested protocols can be defined for representing complex procedures –An Action can be a reference to another Protocol

FuGE Workflow Material Treatment Material Treatment Material Treatment Material Data Acquisition Data Data Transformation Data = Inputs and outputs of Protocols = Instance of some Protocol

FuGE Workflow Material Treatment Material Treatment Material Treatment Material Data Acquisition Data Data Transformation Data Materials defined using terms from ontologies Treatments defined by Protocols Data represented in domain specific format FuGE is the “glue” for sticking components together

Other useful components Each object can be tagged with audit info: –Who made a change, when, what type of change Security information: –users, groups for accessing/changing data Consistent mechanism for identifying objects –Life sciences IDs (LSIDs) used to uniquely ID components –Objects can be referenced across documents Mechanism for linking to external databases, literature refs and ontologies

Investigation model Stores a summary of the investigation to facilitate queries Purpose of investigation (hypothesis) Design of the investigation –e.g. strain differences, gene knockout, drug doses, time course Stores the important variables –Values from ontology e.g. gene names, units etc… Links from variables to relevant data items

Benefits of shared components Queries over common annotation –Samples, hypotheses, protocols Shared software for experimental annotation and analysis –Microarrays, proteomics and metabolomics (and other experiments!) performed in same lab Developing standards for each technique is a hard problem –Shared resources could alleviate the problems (audit, security, identifying objects, ontologies)

Using FuGE in Practice 1.Imports parts of UML or XML Schema and extend with domain-specific components Example: Attempting to integrate FuGE with our Manchester metabolomics database 2.Reference a FuGE entry for investigation structure and bio samples 3.Define ontologies and use FuGE as it is for experimental metadata This would not include a format for mass spec or NMR data, which would also be needed

Conclusions FuGE was created to solve the general problem: –What are the common requirements for a “functional genomics” data standard? MGED will use FuGE for generating MAGE version 2 PSI evaluating FuGE for protein separation standard format FuGE-based systems being implemented by a number of organisations FuGE could help develop a metabolome format

Acknowledgements FuGE has been developed in collaboration with many groups, including: –Angel Pizarro (U Penn) –Paul Spellman (Lawrence Berkley) –Michael Miller (Rosetta) –Members of Fred Hutchinson CRC, Seattle –RSBI –Various other members of MGED and PSI

Describable Identifiable

Common.Description Many classes inherit from Describable Link to Audit / Security details URI and text description

Protocol

Audit

Investigation

Material

Common.Data Ordered set of Dimensions Data stored in Matrix Matrix must be extended with subclasses