Susanna-Assunta Sansone (Toxicogenomics project coordinator) Microarray Informatics Team EMBL- EBI (European Bioinformatics Institute) Transcriptome Symposium,

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Misha Kapushesky November 28, 2003 Expression Profiler: Next Generation.
The ArrayExpress Gene Expression Database: a Software Engineering and Implementation Perspective Ugis Sarkans European Bioinformatics Institute.
The MGED Ontology: Providing Descriptors for Microarray Data Trish Whetzel Department of Genetics Center for Bioinformatics University of Pennsylvania.
Visualisationmodule Catherine Leroy, Pierre Marguerite, Bhuwan Tiwari, Niran Abeygunawardena, Sergio Contrino, Anna Farne, Ele Holloway, Gaurab Mukherjee,
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
Minimum Information About a Microarray Experiment - MIAME MGED 5 workshop.
Welcome to mini-symposium on ontologies for biological sample description EMBL-EBI Wellcome Trust Genome Campus Deceber 5, 2001.
The European Bioinformatics Institute ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team.
The MGED Ontology Is An Experimental Ontology Bio-Ontologies Aug 8, 2002 Chris Stoeckert, Helen Parkinson and the MGED Ontology Working Group.
MGED Ontology: An Ontology of Biomaterial Descriptions for Microarrays Microarray Data Analysis and Management: Bio-ontologies for Microarrays EMBL-EBI,
Transcriptomics Patrick Kemmeren European Bioinformatics Institute Genomics Lab, UMC Utrecht.
The MGED Ontology: A framework for describing functional genomics experiments SOFG Nov. 19, 2002 Chris Stoeckert, Ph.D. Dept. of Genetics & Center for.
Using ArrayExpress. ArrayExpress is an international public repository for well-annotated microarray data, including gene expression, comparative genomic.
MARS: Microarray analysis, retrieval, and storage system Albert F. Cervantes.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
1 ArrayExpress and MAGE Jamboree II Ugis Sarkans, EBI.
EMBL Outstation — The European Bioinformatics Institute MIAME and ArrayExpress - a standard for microarray data annotation and a database to store it Helen.
Microarray Gene Expression Database (MGED) Ontology Working Group Chris Stoeckert Center for Bioinformatics University of Pennsylvania July 26, 2001.
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Gene expression services: ArrayExpress and the Gene Expression Atlas Contact: Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
The importance of meta data capture – problems and solutions Helen Parkinson Microarray Informatics Team European Bioinformatics Institute NERC Meta Data.
Microrray Data Standardisation Microarray Gene Expression Database group -- MGED December, 2000.
The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute.
1 MAGE-OM and ArrayExpress database model Ugis Sarkans, EBI.
1 Update on ArrayExpress & standards Ugis Sarkans, EBI.
European Bioinformatics Institute MGED Society Establishing the infrastructure for sharing microarray data Alvis Brazma European Bioinformatics Institute.
Support for MAGE-TAB in caArray 2.0 Overview and feedback MAGE-TAB Workshop January 24, 2008.
Gene Expression Omnibus (GEO)
ILSI-HESI agreement with EBI: ArrayExpress, public repository for toxicogenomics data Susanna Assunta Sansone Microarray Informatics.
Test1 April 2004 Microarray Data Management Jianwei (Jerry) Li.
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Copyright OpenHelix. No use or reproduction without express written consent1.
Sharing Microarray Experiment Knowledge Chips to Hits Oct. 28, 2002 Chris Stoeckert, Ph.D. Dept. of Genetics & Center for Bioinformatics University of.
Standards and Ontologies for Data Annotation Helen Parkinson Microarray Informatics Team European Bioinformatics Institute NBN-EBI Course, October 2002.
MIAMExpress development and local installation DESPRAD Meeting,November 2002 Mohammad shojatalab
The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1.
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
1 MIAME The MIAME website: © 2002 Norman Morrison for Manchester Bioinformatics.
ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team European Bioinformatics Institute MGED.
DESPRAD subproject Alvis Brazma EMBL-EBI Hinxton, October 20, 2003.
VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to.
From MIAME to MAML: Microarray Gene Expression Database (MGED) Chris Stoeckert Center for Bioinformatics University of Pennsylvania Sept. 19, 2001 GE ^
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.
Content, Format, and Standards in Genomics Scale Data The ILSI – EBI Collaboration Wm. B. Mattes, PhD, DABT.
MIAMExpress development October 2002 Mohammad shojatalab
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
The European Bioinformatics Institute MAGE-OM and ArrayExpress a brief introduction to the database model Helen Parkinson European Bioinformatics Institute.
ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team European Bioinformatics Institute MGED.
MIAMExpress and the development of annotation ontologies for gene expression experiments Ele Holloway Microarray Informatics European Bioinformatics Institute.
A plant-specific annotation and submission tool for the incorporation of Arabidopsis gene expression data into ArrayExpress, the EBI’s public DNA microarray.
PROGNOCHIP-BASE, FORTH-ICS 1 PrognoChip-BASE: An Information System for the Management of Spotted DNA MicroArray Experiments Extension of BASE v
Alvis Brazma, Johan Rung, Ugis Sarkans, Thomas Schlitt, Jaak Vilo European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge,
Generating Useful Information in Toxicogenomics: Focused Efforts: Microarray Standards Feb. 6, 2003, The National Academies Chris Stoeckert, Ph.D. Center.
1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)
The MGED Ontology W3C Workshop on Semantic Web for life Sciences October 27, 2004 Presented by Liju Fan MGED Ontology Working Group Senior Scientist, KEVRIC.
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions.
TEMBLOR mid-term review Participation in DESPRAD project Bernd Drescher Robert Wagner.
The European Bioinformatics Institute ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team.
Introduction and Applications of Microarray Databases Chen-hsiung Chan Department of Computer Science and Information Engineering National Taiwan University.
ArrayExpress - a Public Repository for Microarray Based Gene Expression Data European Bioinformatics Institute - EMBL outstation and German Cancer Research.
ArrayExpress Ugis Sarkans EMBL - EBI
Using ArrayExpress.
MGED Ontology: An Ontology of Biomaterial Descriptions for Microarrays
From MIAME to MAML: Microarray Gene Expression Database (MGED)
MGED Ontology Working Group Report
Presentation transcript:

Susanna-Assunta Sansone (Toxicogenomics project coordinator) Microarray Informatics Team EMBL- EBI (European Bioinformatics Institute) Transcriptome Symposium, April 2002 CHU Pitié-Salpêtrière, Université Paris VI MIAME and ArrayExpress – a standard for microarray gene expression data and the public database at EBI

 EMBL- EBI centre for research and services in bioinformatics that makes and maintains public db: EMBL Nucleotide Sequence, SWISS-PROT, Ensembl, MSD, etc.  Practical reasons: Easy data access Resolves local storage issues Common data exchange formats can be developed  Scientific reasons: Curation can be applied Annotation can be controlled Additional info can be stored that is missing in publications Improve data comparison !  Public standard can be applied Why have a public database?

 MIAME standard  MIAME annotation challenge: MGED BioMaterial Ontology  Uses of MIAME concepts: ArrayExpress: a public repository for gene expression data MIAMExpress submission and annotation tool Talk structure

 MIAME standard

Standard for microarray data - Why?  Size of dataset  Different platforms - nylon, glass  Different technologies - oligos, spotted  References to external db not stable!  Array annotation  Sample annotation  Data sharing needs standardized way to annotate and record the information!

 Microarray Gene Expression Data Group: EBI + world’s largest microarray labs and companies (Sanger, Stanford, TIGR, Universite D'Aix-Marseille II, Affymetrics, Agilent, NCBI, DDBJ, etc.)  MGED Group aims to Facilitate adoption of standards for: –Experiment annotation –Data representation Introduce standard for: –Experimental controls –Data normalization methods Standard for microarray data - MGED Group

 Minimum information about a microarray experiment  NOT a formal specification BUT a set of guidelines  Sufficient information must be recorded to: Correctly interpret and verify the results Replicate the experiments  Structured information must be recorded to: Query and correctly retrieve the data Analyse the data  MIAME- Brazma et al., Nature Genetics, 2001 General MIAME principles

ArraySample Sample source Sample treatments Extraction protocol Labeling protocol Array design information Location of each element Description of each element Hybridization protocol Quantification matrix Analysis protocol Software specifications Image Scanning protocol Software specifications Hybridisation MIAME 6 parts of a microarray experiment MIAME

Strategy Algorithm Control array elements Final data Normalisation 3 data processing levels Lack of gene expression measurement units ! ArraySampleHybridisationArraySampleHybridisationArraySampleHybridisationArraySampleHybridisationExperiment MIAME 6 parts of a microarray experiment

 Annotation implementations are required ! Avoid/reduce free text descriptions Use of controlled terms Definitions and sources for each term Remove of synonyms, or use of synonym mappings Data curation at source (LIMS) Integration of controlled terms in query interfaces  Facilitate data queries-analysis……. MIAME – Annotation challenge

Samples Gene expression matrix A gene expression database from the data analyst’s point of view Gene expression levels ? Genes and transcription units

Samples Genes and transcription units Gene expression matrix A gene expression database from the data analyst’s point of view Array description: - Gene annotations Sample annotations: - Source - Treatment Gene expression levels

MIAME - Gene annotation  Unambiguous identification  Synonyms ! Community approved names Alternative to gene names  Usable external sources e.g.: EMBL-GenBank - sequence accession n. Jackson Lab - approved mouse gene names HUGO - approved human gene names GO categories - function, process, location

MIAME - Sample annotation  Gene expression data only have a meaning in the context of detailed sample descriptions !  Usable external sources e.g.: NCBI Taxonomy - organisms Jackson Lab - mouse strains names Mouse Anatomical Dictionary – mouse anatomy ChemID – compounds ICD-9 – diseases classification  More is needed…..

Annotation – implementations required!  Need an ontology to describe the sample: Defining controlled vocabularies and…… ….Using existing external ontologies  Integrate the ontology in LIMS and databases: Develop browser or interface for the ontology Develop internal editing tools for the ontology  However some free text description is unavoidable

Talk structure  MIAME standard  MIAME annotation challenge: MGED BioMaterial Ontology

What CV and ontology are?  Controlled Vocabulary (CV): Set of restrictive terms used to describe something, in the simplest case it could be a list  Ontology is more then a CV: Describes the relationship between the terms in a structured way, provides semantics and constraints Capture knowledge and make it machine processable

Sample annotation – MGED BioMaterial Ontology  Under construction by Chris Stoeckert (Univ. of Penn.) and MGED members  Use OILed (rdf, daml and html files available)  Motivated by MIAME and guided by ‘case scenarios’  Defines terms, provides constraints, develops CVs for sample annotation  Links also to external CVs and ontologies  Will be extended to other part of a microarray experiment that need to be described

Sample annotation – MGED BioMaterial Ontology an example Sample source and treatment description, and its correct annotation using the MGED BioMaterial Ontology classes and correspondent external references: “Seven week old C57BL/6N mice were treated with fenofibrate. Liver was dissected out, RNA prepared………”

©-BioMaterialDescription ©-Biosource Property ©-Organism ©-Age ©-DevelopmentStage ©-Sex ©-StrainOrLine ©-BiosourceProvider ©-OrganismPart ©-BioMaterialManipulation ©-EnvironmentalHistory ©-CultureCondition ©-Temperature ©-Humidity ©-Light ©-PathogenTests ©-Water ©-Nutrients ©-Treatment ©-CompoundBasedTreatment (Compound) (Treatment_application) (Measurement) MGED BioMaterial Ontology Instances 7 weeks after birth Female Charles River, Japan 22  2  C 55  5% 12 hours light/dark cycle Specified pathogen free conditions ad libitum MF, Oriental Yeast, Tokyo, Japan in vivo, oral gavage 100mg/kg body weight External References NCBI Taxonomy Mouse Anatomical Dictionary International Committee on Standardized Genetic Nomenclature for Mice International Committee on Standardized Genetic Nomenclature for Mice Mouse Anatomical Dictionary ChemIDplus Mus musculus musculus id: Stage 28 C57BL/6 Liver Fenofibrate, CAS

 MIAME standard  Sample annotation: MGED BioMaterial Ontology Talk structure  Uses of MIAME concepts: ArrayExpress a public repository for gene expression data MIAMEpress submission and annotation tool

 Specifies the content of the information: Sufficient Structured Uses of MIAME concepts  Uses: Creation of MIAME-compliant LIMS or databases e.g: ArrayExpress Development of submission/annotation tool for generating MIAME-compliant information e.g.: MIAMExpress

Users EBI Web server Browse-Query Central database Data warehouse ArrayExpress Curation database Image server Update MAGE-ML Output Loader MIAMExpress Submission LIMS Submission MIAMExpress ArrayExpress – data flow

Central database Data warehouse ArrayExpress  Implementation in ORACLE of the MAGE-OM model: Microarray gene expression - Object Model OMG approved standard (MGED and Rosetta, 2001) Model developed in UML  Object model-based query mechanism: Automatic mapping to SQL  Independent of: Experimental platform Image analysis method Normalization method  MAGE-ML data loader: Microarray gene expression - Mark-up Language generated from model ArrayExpress - details

Final data Normalisation ArraySampleHybridisationArraySampleHybridisationArraySampleHybridisationArraySampleHybridisationExperiment MIAME 6 parts of a microarray experiment ArrayExpress – conceptual model

ArrayExpress – simplified model Classes are represented by boxes Classes describe objects Related classes are grouped together in packages MAGE-OM has 16 packages, ~ 150 tables

Human data - EMBL (ironchip) Yeast data - EMBL S. pombe - Sanger Institute Available as example annotated and curated data sets Array descriptions - TIGR Array description - Affymetrix Mouse data - TIGR and HGMP Anopheles data - EMBL Direct pipeline - Sanger Institute LIMS Data - DESPRAD partners Toxicogenomics data- ILSI HESI Near future:Currently: ArrayExpress - data (via MAGE-ML)

ArrayExpress – query interface First release 12 Januray 2002

SEQLOGO EPCLUST Expression data GENOMES sequence, function, annotation SPEXS discover patterns URLMAP provide links External data, tools pathways, function, etc. PATMATCH visualise patterns EP:GO GeneOntology EP:PPI Prot-Prot ia. ArrayExpress – link to Expression Profiler Expression data

 User support and help documentation: Ontologies and CV’s Minimize free text, removal of synonyms Help on MAGE-ML format and MAGE-OM  MIAME compliance-check  Curation at source (LIMS)  To provide high-quality, well-annotated data and allow automated data analysis ArrayExpress – curation effort

MIAMExpress  Submission and annotation tool: Curators will monitor the submissions  Based on MIAME concepts: Experiment, Array and Protocol submissions Generates MIAME-compliant information  Uses MGED BioMaterial Ontology terms: Terms and required fields are explained  Allows user driven ontology development: User can provide new terms and their sources  Allows browsing: Array descriptions Protocols MIAMExpress - details

MIAMExpress  Version 1 launch in December 2002  Expected users: Limited local bioinformatic support No LIMS on site Small scale users with custom made arrays  Can be installed as local version: As a lab-book to annotate your experiment As part of a LIMS  Interfaces: Version 1 is general Future versions, application specific interfaces - Species specific - Toxicogenomics specific (ILSI- HESI) MIAMExpress - details

 Load public data into ArrayExpress: TIGR, EMBL, ILSI HESI, DESPRAD partners  Improve query interfaces  Launch MIAMExpress v.1 (Dec.2002)  MIAMExpress v.2: Extended according to the user needs Integrated MGED ontology Increased usability, flexibility and scalability  Develop curation tools ArrayExpress - future

Acknowledgments  Microarray Informatics Team at EBI (19 members) : Alvis Brazma (Team Leader and MGED President) Helen Parkinson (Curation Coordinator) Mohammad Shojatalab (MIAMExpress Database Programmer) Ugis Sarkans (ArrayExpress Database development coordinator) Jaak Vilo (Expression Profiler) Curators and Programmers.  MGED members and working groups: Alvis Brazma (MGED President, MIAME) Chris Stoeckert, U. Penn. (MGED Ontology Working Group)

 Open sources resources: ArrayExpress and MIAMExpress schema-access to code MIAME document and glossary MAGE-ML dtd and annotation examples MGED Ontology and other resources……… /  Be aware of MIAME ! Nature, Lancet and have already expressed their interest Founding agencies  Join MGED meetings, tutorials and mailing lists: MGED-5 meeting in Japan (Sept. 2002) Ontology for BioSample description, EBI (Nov. 2002) Resources and ….messages