RADical microarray data: standards, databases, and analysis Chris Stoeckert, Ph.D. University of Pennsylvania Yale Microarray Data Analysis Workshop December.

Slides:



Advertisements
Similar presentations
The MGED Ontology: Providing Descriptors for Microarray Data Trish Whetzel Department of Genetics Center for Bioinformatics University of Pennsylvania.
Advertisements

The MGED Ontology Workshop MGED 7 September 8, 2004 Chris Stoeckert Center for Bioinformatics & Dept. of Genetics University of Pennsylvania.
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Minimum Information About a Microarray Experiment - MIAME MGED 5 workshop.
Welcome to mini-symposium on ontologies for biological sample description EMBL-EBI Wellcome Trust Genome Campus Deceber 5, 2001.
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
The MGED Ontology Is An Experimental Ontology Bio-Ontologies Aug 8, 2002 Chris Stoeckert, Helen Parkinson and the MGED Ontology Working Group.
MIAME and Data Standards Phillip Lord. Why Standards? "However, there is a subtle implication that standardization (fixation) is a good thing". An anonymous.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Microarray Data Preprocessing and Clustering Analysis
The bioinformatics of biological processes The challenge of temporal data Per J. Kraulis CMCM, Tartu University.
The MGED Ontology: A framework for describing functional genomics experiments SOFG Nov. 19, 2002 Chris Stoeckert, Ph.D. Dept. of Genetics & Center for.
Using ArrayExpress. ArrayExpress is an international public repository for well-annotated microarray data, including gene expression, comparative genomic.
GCB/CIS 535 Microarray Topics John Tobias November 15 th, 2004.
MARS: Microarray analysis, retrieval, and storage system Albert F. Cervantes.
Differential Expression and False Discovery Rate Revisiting the Princeton Stem Cell Data GPBA Workshop Oct. 15, 2003.
1 ArrayExpress and MAGE Jamboree II Ugis Sarkans, EBI.
Microarray Gene Expression Database (MGED) Ontology Working Group Chris Stoeckert Center for Bioinformatics University of Pennsylvania July 26, 2001.
Microrray Data Standardisation Microarray Gene Expression Database group -- MGED December, 2000.
INTRODUCTION GOAL: to provide novel types of interaction between classification systems and MIAME-compliant databases We present a prototype module aimed.
Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations.
Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.
The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Test1 April 2004 Microarray Data Management Jianwei (Jerry) Li.
Copyright OpenHelix. No use or reproduction without express written consent1.
Sharing Microarray Experiment Knowledge Chips to Hits Oct. 28, 2002 Chris Stoeckert, Ph.D. Dept. of Genetics & Center for Bioinformatics University of.
GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM.
The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1.
1 MIAME The MIAME website: © 2002 Norman Morrison for Manchester Bioinformatics.
From MIAME to MAML: Microarray Gene Expression Database (MGED) Chris Stoeckert Center for Bioinformatics University of Pennsylvania Sept. 19, 2001 GE ^
MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.
Content, Format, and Standards in Genomics Scale Data The ILSI – EBI Collaboration Wm. B. Mattes, PhD, DABT.
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
MIAMExpress and the development of annotation ontologies for gene expression experiments Ele Holloway Microarray Informatics European Bioinformatics Institute.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
A plant-specific annotation and submission tool for the incorporation of Arabidopsis gene expression data into ArrayExpress, the EBI’s public DNA microarray.
Alvis Brazma, Johan Rung, Ugis Sarkans, Thomas Schlitt, Jaak Vilo European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge,
Generating Useful Information in Toxicogenomics: Focused Efforts: Microarray Standards Feb. 6, 2003, The National Academies Chris Stoeckert, Ph.D. Center.
Statistical Testing with Genes Saurabh Sinha CS 466.
1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)
The MGED Ontology W3C Workshop on Semantic Web for life Sciences October 27, 2004 Presented by Liju Fan MGED Ontology Working Group Senior Scientist, KEVRIC.
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.
Extracting binary signals from microarray time-course data Debashis Sahoo 1, David L. Dill 2, Rob Tibshirani 3 and Sylvia K. Plevritis 4 1 Department of.
1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions.
GO enrichment and GOrilla
Introduction and Applications of Microarray Databases Chen-hsiung Chan Department of Computer Science and Information Engineering National Taiwan University.
Overview 16 Databases investigated 4 Systems MIAME Compliant –ArrayExpress, SMD, LAD and GeneX 2.x 2 Systems Support MAGE-ML import and export –ArrayExpress.
The Penn Experience with MAGE-TAB John Brestelli Elisabetta Manduchi Junmin Liu Jonathan Schug Chris Stoeckert NCI MAGE Workshop Jan 24, 2008.
Gene Set Analysis using R and Bioconductor Daniel Gusenleitner
Generation of patterns from gene expression by assigning confidence to differentially expressed genes Elisabetta Manduchi, Gregory R. Grant, Steven E.McKenzie,
Canadian Bioinformatics Workshops
ArrayExpress Ugis Sarkans EMBL - EBI
Using ArrayExpress.
::: Schedule. Biological (Functional) Databases
Statistical Testing with Genes
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
EXTENDING GENE ANNOTATION WITH GENE EXPRESSION
Current and Future Directions
RAD (RNA Abundance Database)
From MIAME to MAML: Microarray Gene Expression Database (MGED)
Integrating Genomic Databases
Functional Genomics Consortium: NIDDK (Kaestner) and (Permutt)
Statistical Testing with Genes
Statistical chart of significantly differentially expressed genes
CD4+CLA+CD103+ T cells from human blood and skin share a transcriptional profile. CD4+CLA+CD103+ T cells from human blood and skin share a transcriptional.
Presentation transcript:

RADical microarray data: standards, databases, and analysis Chris Stoeckert, Ph.D. University of Pennsylvania Yale Microarray Data Analysis Workshop December 5, 2003

Science 298: , 2002

Science 298: , 2002

Very few “stemness” genes were common between the two studies. Why? Inherent problem of testing the stemness hypothesis using a profiling approach? Inherent problem of testing the stemness hypothesis using a profiling approach? Summary by Fortunel et al. (Science 2003) who did a third study and found only one common “stemness” gene. Summary by Fortunel et al. (Science 2003) who did a third study and found only one common “stemness” gene. Or did experimental and computational differences reduce the overlap? Or did experimental and computational differences reduce the overlap? ~ 66% overlap if just consider hematopoietic bone marrow samples (Ivanova et al. Science 2003) ~ 66% overlap if just consider hematopoietic bone marrow samples (Ivanova et al. Science 2003)

To compare experiments, you need some minimum information about the microarray experiments. MIAME formalizes that minimum information Ivanova et al. Science 2003

MIAME and MAGE are Defined Standards from the Microarray Gene Expression Data (MGED) Society MIAME - a document which outlines the minimum information that should be reported about a microarray experiment to enable its unambiguous interpretation and reproduction MIAME - a document which outlines the minimum information that should be reported about a microarray experiment to enable its unambiguous interpretation and reproduction Nature Genetics (2001), 29: Nature Genetics (2001), 29: MAGE - MAGE consists of three parts: An object model (MAGE- OM), a document exchange format, which is derived directly from the object model (MAGE-ML), and software toolkits (MAGE-stk), which seek to enable users to create MAGE-ML MAGE - MAGE consists of three parts: An object model (MAGE- OM), a document exchange format, which is derived directly from the object model (MAGE-ML), and software toolkits (MAGE-stk), which seek to enable users to create MAGE-ML Genome Biology (2002), 3: research Genome Biology (2002), 3: research In addition, the MGED Ontology provides the language (vocabulary and relationships) for MIAME and MAGE. In addition, the MGED Ontology provides the language (vocabulary and relationships) for MIAME and MAGE. Comparative & Functional Genomics (2003), 4: Comparative & Functional Genomics (2003), 4:

Applying MGED Standards Experiment design: Experiment design: Name: cell_comparison_design Name: cell_comparison_design Type: Type: development_or_differentiation_design development_or_differentiation_design species_design species_design cell_type_comparison_design cell_type_comparison_design Experiment Factors: Experiment Factors: hematopoietic cell population (LT-HSC, ST-HSC, HSC, LCP, MBC) hematopoietic cell population (LT-HSC, ST-HSC, HSC, LCP, MBC) Type: BioMaterialCharacteristicCategory: targeted_cell_type Type: BioMaterialCharacteristicCategory: targeted_cell_type mouse developmental stage (fetal, adult) mouse developmental stage (fetal, adult) Type: BioMaterialCharacteristicCategory: developmental_stage Type: BioMaterialCharacteristicCategory: developmental_stage species (human, mouse) species (human, mouse) Type: BioMaterialCharacteristicCategory: organism Type: BioMaterialCharacteristicCategory: organism stem cell type (hematopoietic, embryonic, neural) stem cell type (hematopoietic, embryonic, neural) Type: BioMaterialCharacteristicCategory: cell_type Type: BioMaterialCharacteristicCategory: cell_type MIAME/MAGE info MGED Ontology terms

RAD Enables Use of MGED Standards RNA Abundance Database (RAD) RNA Abundance Database (RAD) Can search for experiments/studies based on annotations Can search for experiments/studies based on annotations Graphs automatically generated of study Graphs automatically generated of study RAD Study-Annotator for entering annotations RAD Study-Annotator for entering annotations MIAME-based MIAME-based Incorporates the MGED Ontology Incorporates the MGED Ontology MR_T for exporting in MAGE MR_T for exporting in MAGE Get RAD Get RAD All source code available All source code available

RAD view of stem cell study

RAD Study-Annotator collects MIAME and Uses the MGED Ontology

RAD helps you publish! ArrayExpress RAD MAGE-RAD Translator Study-Annotator Journals are requiring deposition of microarray experiments in a public repository.

Patterns of Differential Gene Expression

PaGE PaGE stands for Patterns from Gene Expression. PaGE stands for Patterns from Gene Expression. A goal is to compare patterns across more than 2 groups to look at co- regulation. A goal is to compare patterns across more than 2 groups to look at co- regulation. Focuses on fold-change significance as t-statistics not really applicable to describing co-regulation Focuses on fold-change significance as t-statistics not really applicable to describing co-regulation PaGE was developed by our group at Penn! PaGE was developed by our group at Penn! Manduchi et al. Bioinformatics Manduchi et al. Bioinformatics PaGE uses the False Discovery Rate (FDR). PaGE uses the False Discovery Rate (FDR). FDR = # false positives/(# false + true positives) FDR = # false positives/(# false + true positives) PaGE takes a minimum confidence level as a parameter, and finds all genes which exceed this confidence. PaGE takes a minimum confidence level as a parameter, and finds all genes which exceed this confidence. Each gene is reported with its own confidence. FDR = 1- Confidence Each gene is reported with its own confidence. FDR = 1- Confidence PaGE uses ratios of means. B, C, D PaGE uses ratios of means. B, C, D A A A A A A Where A, B, C, and D are group means for each gene and A is the reference group. Use permutations to generate the random distribution of ratios. Use permutations to generate the random distribution of ratios.

Mouse Hematopoietic Stem Cell PaGEs Group B/1 Group C/2 Group D/3 Group A/0

Mouse Hematopoietic Stem Cell PaGEs

StemCellDB: Available real soon!

Summary Standards Standards Using MIAME, MAGE, and the MGED Ontology improves your experiment Using MIAME, MAGE, and the MGED Ontology improves your experiment Databases Databases Databases like RAD facilitate using standards Databases like RAD facilitate using standards Analysis Analysis PaGE provides profiles using differential expression with False Discovery Rate based on ratios. PaGE provides profiles using differential expression with False Discovery Rate based on ratios.

Acknowledgements MGED MGED MIAME, MAGE, and Ontology Working Groups MIAME, MAGE, and Ontology Working Groups RAD RAD Elisabetta Manduchi, Trish Whetzel, Junmin Liu, Angel Pizarro, Greg Grant, Hongxian He, Matt Mailman Elisabetta Manduchi, Trish Whetzel, Junmin Liu, Angel Pizarro, Greg Grant, Hongxian He, Matt Mailman PaGE PaGE Greg Grant, Junmin Liu, Elisabetta Manduchi Greg Grant, Junmin Liu, Elisabetta Manduchi Stem cells Stem cells Ihor Lemischka, Kateri Moore, Natalia Ivanova, Jason Hackney, Laurie Kramer Ihor Lemischka, Kateri Moore, Natalia Ivanova, Jason Hackney, Laurie Kramer Hongxian He, Greg Grant, Lyle Ungar Hongxian He, Greg Grant, Lyle Ungar