RADical microarray data: standards, databases, and analysis Chris Stoeckert, Ph.D. University of Pennsylvania Yale Microarray Data Analysis Workshop December 5, 2003
Science 298: , 2002
Science 298: , 2002
Very few “stemness” genes were common between the two studies. Why? Inherent problem of testing the stemness hypothesis using a profiling approach? Inherent problem of testing the stemness hypothesis using a profiling approach? Summary by Fortunel et al. (Science 2003) who did a third study and found only one common “stemness” gene. Summary by Fortunel et al. (Science 2003) who did a third study and found only one common “stemness” gene. Or did experimental and computational differences reduce the overlap? Or did experimental and computational differences reduce the overlap? ~ 66% overlap if just consider hematopoietic bone marrow samples (Ivanova et al. Science 2003) ~ 66% overlap if just consider hematopoietic bone marrow samples (Ivanova et al. Science 2003)
To compare experiments, you need some minimum information about the microarray experiments. MIAME formalizes that minimum information Ivanova et al. Science 2003
MIAME and MAGE are Defined Standards from the Microarray Gene Expression Data (MGED) Society MIAME - a document which outlines the minimum information that should be reported about a microarray experiment to enable its unambiguous interpretation and reproduction MIAME - a document which outlines the minimum information that should be reported about a microarray experiment to enable its unambiguous interpretation and reproduction Nature Genetics (2001), 29: Nature Genetics (2001), 29: MAGE - MAGE consists of three parts: An object model (MAGE- OM), a document exchange format, which is derived directly from the object model (MAGE-ML), and software toolkits (MAGE-stk), which seek to enable users to create MAGE-ML MAGE - MAGE consists of three parts: An object model (MAGE- OM), a document exchange format, which is derived directly from the object model (MAGE-ML), and software toolkits (MAGE-stk), which seek to enable users to create MAGE-ML Genome Biology (2002), 3: research Genome Biology (2002), 3: research In addition, the MGED Ontology provides the language (vocabulary and relationships) for MIAME and MAGE. In addition, the MGED Ontology provides the language (vocabulary and relationships) for MIAME and MAGE. Comparative & Functional Genomics (2003), 4: Comparative & Functional Genomics (2003), 4:
Applying MGED Standards Experiment design: Experiment design: Name: cell_comparison_design Name: cell_comparison_design Type: Type: development_or_differentiation_design development_or_differentiation_design species_design species_design cell_type_comparison_design cell_type_comparison_design Experiment Factors: Experiment Factors: hematopoietic cell population (LT-HSC, ST-HSC, HSC, LCP, MBC) hematopoietic cell population (LT-HSC, ST-HSC, HSC, LCP, MBC) Type: BioMaterialCharacteristicCategory: targeted_cell_type Type: BioMaterialCharacteristicCategory: targeted_cell_type mouse developmental stage (fetal, adult) mouse developmental stage (fetal, adult) Type: BioMaterialCharacteristicCategory: developmental_stage Type: BioMaterialCharacteristicCategory: developmental_stage species (human, mouse) species (human, mouse) Type: BioMaterialCharacteristicCategory: organism Type: BioMaterialCharacteristicCategory: organism stem cell type (hematopoietic, embryonic, neural) stem cell type (hematopoietic, embryonic, neural) Type: BioMaterialCharacteristicCategory: cell_type Type: BioMaterialCharacteristicCategory: cell_type MIAME/MAGE info MGED Ontology terms
RAD Enables Use of MGED Standards RNA Abundance Database (RAD) RNA Abundance Database (RAD) Can search for experiments/studies based on annotations Can search for experiments/studies based on annotations Graphs automatically generated of study Graphs automatically generated of study RAD Study-Annotator for entering annotations RAD Study-Annotator for entering annotations MIAME-based MIAME-based Incorporates the MGED Ontology Incorporates the MGED Ontology MR_T for exporting in MAGE MR_T for exporting in MAGE Get RAD Get RAD All source code available All source code available
RAD view of stem cell study
RAD Study-Annotator collects MIAME and Uses the MGED Ontology
RAD helps you publish! ArrayExpress RAD MAGE-RAD Translator Study-Annotator Journals are requiring deposition of microarray experiments in a public repository.
Patterns of Differential Gene Expression
PaGE PaGE stands for Patterns from Gene Expression. PaGE stands for Patterns from Gene Expression. A goal is to compare patterns across more than 2 groups to look at co- regulation. A goal is to compare patterns across more than 2 groups to look at co- regulation. Focuses on fold-change significance as t-statistics not really applicable to describing co-regulation Focuses on fold-change significance as t-statistics not really applicable to describing co-regulation PaGE was developed by our group at Penn! PaGE was developed by our group at Penn! Manduchi et al. Bioinformatics Manduchi et al. Bioinformatics PaGE uses the False Discovery Rate (FDR). PaGE uses the False Discovery Rate (FDR). FDR = # false positives/(# false + true positives) FDR = # false positives/(# false + true positives) PaGE takes a minimum confidence level as a parameter, and finds all genes which exceed this confidence. PaGE takes a minimum confidence level as a parameter, and finds all genes which exceed this confidence. Each gene is reported with its own confidence. FDR = 1- Confidence Each gene is reported with its own confidence. FDR = 1- Confidence PaGE uses ratios of means. B, C, D PaGE uses ratios of means. B, C, D A A A A A A Where A, B, C, and D are group means for each gene and A is the reference group. Use permutations to generate the random distribution of ratios. Use permutations to generate the random distribution of ratios.
Mouse Hematopoietic Stem Cell PaGEs Group B/1 Group C/2 Group D/3 Group A/0
Mouse Hematopoietic Stem Cell PaGEs
StemCellDB: Available real soon!
Summary Standards Standards Using MIAME, MAGE, and the MGED Ontology improves your experiment Using MIAME, MAGE, and the MGED Ontology improves your experiment Databases Databases Databases like RAD facilitate using standards Databases like RAD facilitate using standards Analysis Analysis PaGE provides profiles using differential expression with False Discovery Rate based on ratios. PaGE provides profiles using differential expression with False Discovery Rate based on ratios.
Acknowledgements MGED MGED MIAME, MAGE, and Ontology Working Groups MIAME, MAGE, and Ontology Working Groups RAD RAD Elisabetta Manduchi, Trish Whetzel, Junmin Liu, Angel Pizarro, Greg Grant, Hongxian He, Matt Mailman Elisabetta Manduchi, Trish Whetzel, Junmin Liu, Angel Pizarro, Greg Grant, Hongxian He, Matt Mailman PaGE PaGE Greg Grant, Junmin Liu, Elisabetta Manduchi Greg Grant, Junmin Liu, Elisabetta Manduchi Stem cells Stem cells Ihor Lemischka, Kateri Moore, Natalia Ivanova, Jason Hackney, Laurie Kramer Ihor Lemischka, Kateri Moore, Natalia Ivanova, Jason Hackney, Laurie Kramer Hongxian He, Greg Grant, Lyle Ungar Hongxian He, Greg Grant, Lyle Ungar