The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute.

Slides:



Advertisements
Similar presentations
The ArrayExpress Gene Expression Database: a Software Engineering and Implementation Perspective Ugis Sarkans European Bioinformatics Institute.
Advertisements

The MGED Ontology: Providing Descriptors for Microarray Data Trish Whetzel Department of Genetics Center for Bioinformatics University of Pennsylvania.
Mouse Phenotype Ontology George Gkoutos. Phenotype Annotation Traditional phenotypic descriptions are captures as free text Information retrieval based.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
Welcome to mini-symposium on ontologies for biological sample description EMBL-EBI Wellcome Trust Genome Campus Deceber 5, 2001.
The European Bioinformatics Institute ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team.
The MGED Ontology Is An Experimental Ontology Bio-Ontologies Aug 8, 2002 Chris Stoeckert, Helen Parkinson and the MGED Ontology Working Group.
MGED Ontology: An Ontology of Biomaterial Descriptions for Microarrays Microarray Data Analysis and Management: Bio-ontologies for Microarrays EMBL-EBI,
The MGED Ontology: A framework for describing functional genomics experiments SOFG Nov. 19, 2002 Chris Stoeckert, Ph.D. Dept. of Genetics & Center for.
Using ArrayExpress. ArrayExpress is an international public repository for well-annotated microarray data, including gene expression, comparative genomic.
GCB/CIS 535 Microarray Topics John Tobias November 15 th, 2004.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
Microarray data repositories
1 ArrayExpress and MAGE Jamboree II Ugis Sarkans, EBI.
EMBL Outstation — The European Bioinformatics Institute MIAME and ArrayExpress - a standard for microarray data annotation and a database to store it Helen.
Microarray Gene Expression Database (MGED) Ontology Working Group Chris Stoeckert Center for Bioinformatics University of Pennsylvania July 26, 2001.
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Gene expression services: ArrayExpress and the Gene Expression Atlas Contact: Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
The importance of meta data capture – problems and solutions Helen Parkinson Microarray Informatics Team European Bioinformatics Institute NERC Meta Data.
EBI is an Outstation of the European Molecular Biology Laboratory. MAGE-TAB - The ArrayExpress Production Experience Helen Parkinson, PhD.
Excerpts from a Sample Description courtesy of M. Hoffman, S. Schmidtke, Lion BioSciences Organism: mus musculus [ NCBI taxonomy browser ] Cell source:
Microrray Data Standardisation Microarray Gene Expression Database group -- MGED December, 2000.
1 Update on ArrayExpress & standards Ugis Sarkans, EBI.
Gene Expression Omnibus (GEO)
Susanna-Assunta Sansone (Toxicogenomics project coordinator) Microarray Informatics Team EMBL- EBI (European Bioinformatics Institute) Transcriptome Symposium,
ILSI-HESI agreement with EBI: ArrayExpress, public repository for toxicogenomics data Susanna Assunta Sansone Microarray Informatics.
Test1 April 2004 Microarray Data Management Jianwei (Jerry) Li.
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Copyright OpenHelix. No use or reproduction without express written consent1.
Standards and Ontologies for Data Annotation Helen Parkinson Microarray Informatics Team European Bioinformatics Institute NBN-EBI Course, October 2002.
MIAMExpress development and local installation DESPRAD Meeting,November 2002 Mohammad shojatalab
The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1.
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
1 MIAME The MIAME website: © 2002 Norman Morrison for Manchester Bioinformatics.
ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team European Bioinformatics Institute MGED.
Resource Curation and Automated Resource Discovery.
From MIAME to MAML: Microarray Gene Expression Database (MGED) Chris Stoeckert Center for Bioinformatics University of Pennsylvania Sept. 19, 2001 GE ^
The European Bioinformatics Institute Atlas of Gene Human Gene Expression Proposal - resources Alvis Brazma, Tom Freeman and Helen Parkinson.
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.
Content, Format, and Standards in Genomics Scale Data The ILSI – EBI Collaboration Wm. B. Mattes, PhD, DABT.
MIAMExpress development October 2002 Mohammad shojatalab
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
The European Bioinformatics Institute MAGE-OM and ArrayExpress a brief introduction to the database model Helen Parkinson European Bioinformatics Institute.
ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team European Bioinformatics Institute MGED.
MIAMExpress and the development of annotation ontologies for gene expression experiments Ele Holloway Microarray Informatics European Bioinformatics Institute.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
A plant-specific annotation and submission tool for the incorporation of Arabidopsis gene expression data into ArrayExpress, the EBI’s public DNA microarray.
RADical microarray data: standards, databases, and analysis Chris Stoeckert, Ph.D. University of Pennsylvania Yale Microarray Data Analysis Workshop December.
Lao H. Saal 1,3,*, Carl Troein 2,*, Johan Vallon-Christersson 1,*, Sofia Gruvberger 1, Björn Samuelsson 2, Åke Borg 1 and Carsten.
Alvis Brazma, Johan Rung, Ugis Sarkans, Thomas Schlitt, Jaak Vilo European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge,
Generating Useful Information in Toxicogenomics: Focused Efforts: Microarray Standards Feb. 6, 2003, The National Academies Chris Stoeckert, Ph.D. Center.
TEMBLOR review meeting - EMBL-EBI, Hinxton, October 20 th 2003 Integration of J-Express with ArrayExpress Partner 20 University of Bergen Inge Jonassen.
1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)
The MGED Ontology W3C Workshop on Semantic Web for life Sciences October 27, 2004 Presented by Liju Fan MGED Ontology Working Group Senior Scientist, KEVRIC.
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions.
The European Bioinformatics Institute ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team.
Data Integration & Data Mining Tool Donald Dunbar BHF CoRE Bioinformatics Team Edinburgh Bioinformatics Meeting April 2013.
Introduction and Applications of Microarray Databases Chen-hsiung Chan Department of Computer Science and Information Engineering National Taiwan University.
ArrayExpress - a Public Repository for Microarray Based Gene Expression Data European Bioinformatics Institute - EMBL outstation and German Cancer Research.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
Ontology Driven Data Collection for EuPathDB Jie Zheng, Omar Harb, Chris Stoeckert Center for Bioinformatics, University of Pennsylvania.
ArrayExpress Ugis Sarkans EMBL - EBI
Using ArrayExpress.
MGED Ontology: An Ontology of Biomaterial Descriptions for Microarrays
From MIAME to MAML: Microarray Gene Expression Database (MGED)
MGED Ontology Working Group Report
FaceBase Hub Years 1 through 5
Presentation transcript:

The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute EMBO Course, October 2001

The European Bioinformatics Institute Talk Structure  ArrayExpress - a public database for microarray data and integration of ontologies  Ontologies for gene expression data  Submission and annotation tool

The European Bioinformatics Institute Problems of microarray data analysis  Size of the datasets  Different platforms - nylon, glass  Different technologies on platforms- oligo/spotted  Referencing external databases which are not stable  Sample annotation  Array annotation  Need for LIMS systems and the need for bioinformaticians

The European Bioinformatics Institute General MIAME principles  Recorded info should be sufficient to interpret and replicate the experiment  Information should be structured so that querying and automated data analysis and mining are feasible

The European Bioinformatics Institute A gene expression database from the data analyst’s point of view Samples Genes Gene expression levels Sample annotations Gene annotations Gene expression matrix

The European Bioinformatics Institute Gene Annotation  Can be given by links to gene sequence databases and GO can be used on the analysis side ( function,process,cell compartment )  MIAME is flexible, allows many kinds of sequence identifiers or even sequence itself.  In some cases it’s more useful to include a real sequence than an inaccurate id  In the end we will need a mapping from a gene list to all the spots on all arrays, this is non trivial given the problems with names

The European Bioinformatics Institute Sample annotation  Gene expression data only have meaning in the context of detailed sample descriptions  If the data is going to be interpreted by independent parties, sample information has to be searchable and in the database  Controlled vocabularies and ontologies (species, cell types, compound nomenclature, treatments, etc) are needed for unambiguous sample description

The European Bioinformatics Institute Standardisation of microarray data and annotations -MGED group The goal of the group is to facilitate the adoption of standards for DNA-array experiment annotation and data representation, as well as the introduction of standard experimental controls and data normalisation methods. Includes most of the worlds largest microarray laboratories and companies (TIGR,Affymetrix Stanford,Sanger,Agilent etc)

The European Bioinformatics Institute Sample annotation- what can be done?  Build an ontology for gene expression data (MGED)  Use existing ontologies and link them in  Incorporate the ontology into the database  Develop internal editing tools for the ontology  Develop browser or other interface for the ontology and link to LIMS  Some use of free text descriptions are unavoidable (curation workload)

The European Bioinformatics Institute Use case scenarios  Return a summary of all experiments that use a specified type of biosource (primary source).  Group the experiments according to treatment.  Return a summary of all experiments done examining effects of a specified treatment  Group the experiments according to biosource.  Return a summary of all experiments measuring the expression of a specified gene.  Indicate when experiments confirm results, provide new information, or conflict.

The European Bioinformatics Institute MIAME – Minimum Information About a Microarray Experiment Publication External links 6 parts of a microarray experiment HybridisationArray Gene (e.g., EMBL ) Sample Source (e.g., Taxonomy ) Data Experiment Normalisation

The European Bioinformatics Institute MGED Biomaterial (sample) Ontology  Under construction by Chris Stoeckert – Using OILed (though other tools exist)  Motivated by MIAME and coordinated with the database model  We will extend classes, provide constraints, define terms, provide new terms and develop cv’s for submissions (EBI)

The European Bioinformatics Institute Part of the MGED biomaterial ontology class Age documentation: The time period elapsed since an identifiable point in the life cycle of an organism. If a developmental stage is specified, the identifiable point would be the beginning of that stage. Otherwise the identifiable point must be specified such as planting. type: primitive superclasses: BiosourceProperty constraints: slot-constraint has_measurement has-value Measurementslot- constraint initial_time_point has-value one-of (planting beginning_of_stage) used in slots: initial_time_point

The European Bioinformatics Institute organism (NCBI taxonomy) cell source - provider cell type (if derived from primary sources (s)) sex age growth conditions development stage organism part (tissue) animal/plant strain or line genetic variation (e.g., gene knockout, transgenic variation) individual individual genetic characteristics (e.g., disease alleles, polymorphisms) disease state or normal target cell type cell line and source (if applicable) in vivo treatments (organism or individual treatments) in vitro treatments (cell culture conditions) treatment type (e.g., small molecule, heat shock, cold shock, food deprivation) compound is additional clinical information available (link) separation technique (e.g., none, trimming, microdissection, FACS) laboratory protocol for sample treatment…… MIAME Section on Sample Source and Treatment

The European Bioinformatics Institute Examples of usable external ontologies  NCBI taxonomy database  Jackson Lab mouse strains and genes  Edinburgh mouse atlas anatomy  HUGO nomenclature for Human genes  Chemical and compound Ontologies - Merck index  TAIR  Flybase  GO

The European Bioinformatics Institute Excerpts from a Sample Description courtesy of M. Hoffman, S. Schmidtke, Lion BioSciences Organism: Mus musculus [ NCBI taxonomy browser ] Cell source: in-house bred mice (contact: Sex: female [ MGED ] Age: weeks after birth [ MGED ] Growth conditions: normal controlled environment o C average temperature housed in cages according to EU legislation specified pathogen free conditions (SPF) 14 hours light cycle 10 hours dark cycle [Developmental stage]: stage 28 (juvenile (young) mice)) [ GXD "Mouse Anatomical Dictionary" ] Organism part: thymus [ GXD "Mouse Anatomical Dictionary" ] Strain or line: C57BL/6 [International Committee on Standardized Genetic Nomenclature for Mice] Genetic Variation: Inbr (J) 150. Origin: substrains 6 and 10 were separated prior to This substrain is now probably the most widely used of all inbred strains. Substrain 6 and 10 differ at the H9, Igh2 and Lv loci. Maint. by J,N, Ola. [International Committee on Standardized Genetic Nomenclature for Mice ] Treatment: in vivo [MGED] [intraperitoneal] injection of [Dexamethasone] into mice, 10 microgram per 25 g bodyweight of the mouse Compound: drug [MGED] synthetic [glucocorticoid] [dexamethasone], dissolved in PBS

The European Bioinformatics Institute Introduction to the database  ArrayExpress is implemented in Oracle  The submission tool is a different implementation of the ArrayExpress model in Mysql  Faster, easier to update  Short term solution to the problem of data submission

The European Bioinformatics Institute ArrayExpress conceptual model Publication External links HybridisationArraySample Source (e.g., Taxonomy ) Experiment Normalisation Gene (e.g., EMBL ) Data

The European Bioinformatics Institute ArrayExpress Database MAGE-OM Model Curation Database User Login Array Submission Protocol Sub. Experiment submission Submission tool Query Interface for Public Data Analysis Tools Expression Profiler Large Scale Submissions MAGE-ML format Submitter LIMS Browse Arrays Browse Protocols Data File Export External Applications Browse Arrays External Databases, EMBL, Ontology Resources… etc

The European Bioinformatics Institute MIAMExpress  Based on MIAME concepts and questionnaire  Experiment, Array, Protocol submissions  CV/Ontology wherever possible  Future versions organism specific pages and related linked ontologies  Allow user driven ontology development  Will be developed according to user needs  Will also need to be an update tool

The European Bioinformatics Institute Design Considerations  Speed and ease of use, scalability  Need to browse existing protocols and array designs in ArrayExpress  Requirement for curator control over submissions  Submissions tracking  Future use as a LIMS  Flexibility

The European Bioinformatics Institute Features of MIAMExpress  Creates a user login account instead of on- the-fly submissions so sessions can be saved  Allows existing protocols to be copied and saved and linked to more than one hyb/expt  Forms the basis of a LIMS using the ArrayExpress model  Will be available as a stand alone tool for local installation  Is open source and free  Will be supported by curation staff and developers

The European Bioinformatics Institute

Expected Users  Users with limited local bioinformatics support  Users of bought in arrays without LIMS  Small scale users with self made arrays who will need to provide a description  Commercial arrays descriptions will be provided

The European Bioinformatics Institute Acknowledgments  Whole Microarray Informatics Team, EBI, esp. Alvis Brazma, Mohammad Shojatalab and Ugis Sarkans  Industry Support team, EBI  MGED steering committee  MIAME working group  Chris Stoeckert, U. Penn. and members of MGED

The European Bioinformatics Institute Demo Version of MIAMExpress  Coming soon to  Beta tester recuitment