Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research.

Slides:



Advertisements
Similar presentations
Experimental Design and Differential Expression Class web site: Statistics for Microarrays.
Advertisements

Bias, Variance, and Fit for Three Measures of Expression: AvDiff, Li &Wong’s, and AvLog(PM-BG) Rafael A. Irizarry Department of Biostatistics, JHU (joint.
M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, ; December 2003 Biostatistics Article Oncology.
Pre-processing in DNA microarray experiments Sandrine Dudoit PH 296, Section 33 13/09/2001.
LimmaGUI A Point-and-Click Interface for cDNA Microarray Analysis James Wettenhall and Gordon Smyth Division of Genetics and Bioinformatics Walter and.
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, The Walter and Eliza Hall Institute of Medical.
Mathematical Statistics, Centre for Mathematical Sciences
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Experimental design for microarrays Presented by Alex Sánchez and Carmen Ruíz de Villa Departament d’Estadística. Universitat de Barcelona.
Gene Expression Chapter 9.
Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
The second-simplest cDNA microarray data analysis problem Terry Speed, UC Berkeley Fred Hutchinson Cancer Research Center March 9, 2001.
Normalization for cDNA Microarray Data Yee Hwa Yang, Sandrine Dudoit, Percy Luu and Terry Speed. SPIE BIOS 2001, San Jose, CA January 22, 2001.
Discrimination and clustering with microarray gene expression data Terry Speed, Jane Fridlyand, Yee Hwa Yang and Sandrine Dudoit* Department of Statistics,
Normalization Class web site: Statistics for Microarrays.
Differentially expressed genes
Differential Expression and Tree-based Modeling Class web site: Statistics for Microarrays.
1 Lecture 21, Statistics 246, April 8, 2004 Identifying expression differences in cDNA microarray experiments, cont.
 Goal A: Find groups of genes that have correlated expression profiles. These genes are believed to belong to the same biological process and/or are co-regulated.
Some thoughts of the design of cDNA microarray experiments Terry Speed & Yee HwaYang, Department of Statistics UC Berkeley MGED IV Boston, February 14,
Gene Expression BMI 731 week 5
Corrections and Normalization in microarrays data analysis
Statistics for Microarrays
Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.
Analysis of microarray data
Gene expression and the transcriptome I
Multiple Testing in the Survival Analysis of Microarray Data
Multiple testing in high- throughput biology Petter Mostad.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research.
Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics
Essential Statistics in Biology: Getting the Numbers Right
CDNA Microarrays MB206.
GenMAPP and MAPPFinder for Systems Biology Education Kam Dahlquist Vassar College June 12-20, 2004 BioQUEST Summer Workshop Beloit College.
1 Use of the Half-Normal Probability Plot to Identify Significant Effects for Microarray Data C. F. Jeff Wu University of Michigan (joint work with G.
CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.
False Discovery Rates for Discrete Data Joseph F. Heyse Merck Research Laboratories Graybill Conference June 13, 2008.
Multiple Testing in Microarray Data Analysis Mi-Ok Kim.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
Introduction to Microarrays Dr. Özlem İLK & İbrahim ERKAN 2011, Ankara.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
Techniques for Analysing Microarrays Which genes are involved in ovarian and prostate cancer?
Statistics for Differential Expression Naomi Altman Oct. 06.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University, Sweden Plate Effects in cDNA Microarray Data.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
CK, October, 2003 A Hidden Markov Model for Microarray Time Course Data Christina Kendziorski and Ming Yuan Department of Biostatistics and Medical Informatics.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
CSIRO Insert presentation title, do not remove CSIRO from start of footer Experimental Design Why design? removal of technical variance Optimizing your.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Multiple testing in large-scale gene expression experiments Statistics 246, Spring 2002 Week 8, Lecture 2.
1 Estimation of Gene-Specific Variance 2/17/2011 Copyright © 2011 Dan Nettleton.
The Broad Institute of MIT and Harvard Differential Analysis.
The second-simplest cDNA microarray data analysis problem Terry Speed, UC Berkeley Bioinformatic Strategies For Application of Genomic Tools to Environmental.
Multiple testing in large-scale gene expression experiments
Microarray Data Analysis The Bioinformatics side of the bench.
Empirical Bayes Analysis of Variance Component Models for Microarray Data S. Feng, 1 R.Wolfinger, 2 T.Chu, 2 G.Gibson, 3 L.McGraw 4 1. Department of Statistics,
CGH Data BIOS Chromosome Re-arrangements.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University Plate Effects in cDNA Microarray Data.
Variability & Statistical Analysis of Microarray Data GCAT – Georgetown July 2004 Jo Hardin Pomona College
Statistical Analyses of High Density Oligonucleotide Arrays Rafael A. Irizarry Department of Biostatistics, JHU (joint work with Bridget Hobbs and Terry.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.
1 Lecture 20, Statistics 246, April 6, 2004 Identifying expression differences in cDNA microarray experiments cDNA microarray experiments.
CDNA-Project cDNA project Julia Brettschneider (UCB Statistics)
Estimating expression differences in cDNA microarray experiments
Normalization for cDNA Microarray Data
Presentation transcript:

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research Statistical Analysis of cDNA microarrays II Terry Speed

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research Outline Different types of questions asked in microarray experiments Cluster analysis Single gene method A synthesis

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research Gene Expression Data Gene expression data on p genes for n samples Genes mRNA samples Gene expression level of gene i in mRNA sample j = Log( Red intensity / Green intensity) Log(Avg. PM - Avg. MM) sample1sample2sample3sample4sample5 …

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research Experiments, horses for courses mRNA levels compared in many different contexts —Tumour cell lines —Different tissues, same organism —Same tissue, different organisms (wt, ko, tg) —Same tissue, same organism (trt vs ctl) —Time course experiments No single method of analysis can be appropriate for all. Rather, each type of experiment requires its own analysis.

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research Cluster Analysis Can cluster genes, cell samples, or both. Strengthens signal when averages are taken within clusters of genes (Eisen). Useful (essential ?) when seeking new subclasses of cells, tumours, etc. Leads to readily interpreted figures.

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research Clusters Taken from Nature February, 2000 Paper by Allzadeh. A et al Distinct types of diffuse large B-cell lymphoma identified by Gene expression profiling,

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research Discovering sub-groups

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research Which genes have changed? This is a common enough question. We will illustrate one approach when replicates are available. GOAL: Identify genes with altered expression in the livers of one line of mice with very low HDL cholesterol levels compared to inbred control mice. Experiment: Apo AI knock-out mouse model 8 knockout (ko) mice and 8 control (ctl) mice (C57Bl/6). 16 hybridisations: mRNA from each of the 16 mice is labelled with Cy5, pooled mRNA from control mice is labelled with Cy3. Probes: ~6,000 cDNAs, including 200 related to lipid metabolism.

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research Which genes have changed? 1. For each gene and each hybridisation (8 ko + 8 ctl), use M=log 2 (R/G). 2. For each gene form the t statistic: average of 8 ko Ms - average of 8 ctl Ms sqrt(1/8 (SD of 8 ko Ms) 2 + (SD of 8 ctl Ms) 2 ) 3. Form a histogram of 6,000 t values. 4. Do a normal Q-Q plot; look for values “off the line”. 5. Adjust for multiple testing.

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research Histogram ApoA1

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research Plot of t-statistics

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research Assigning p-values to measures of change Estimate p-values for each comparison (gene) by using the permutation distribution of the t- statistics. For each of the possible permutation of the trt / ctl labels, compute the two-sample t-statistics t* for each gene. The unadjusted p-value for a particular gene is estimated by the proportion of t*’s greater than the observed t in absolute value.

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research Multiple Testing Problem: We have just performed ~6000 tests! => need to control the family-wise false positive rate (Type I error). => use adjusted p-values. Bonferroni adjustment. Multiply p-values by number of tests. Too conservative, doesn’t take into account the dependence structure between the genes. Westfall & Young. Estimate adjusted p-values using the permutation distribution of statistics which take into account the dependence structure between the genes. Less conservative.

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research Apo A1: Adjusted and Unadjusted p-values for the 50 genes with the larges absolute t-statistics.

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research Apo AI. Genes with adjusted p-value < 0.01

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research Limitations Cluster analyses: 1) Usually outside the normal framework of statistical inference; 2) less appropriate when only a few genes are likely to change. 3) Needs lots of experiments Single gene tests: 1) may be too noisy in general to show much 2) may not reveal coordinated effects of positively correlated genes. 3) hard to relate to pathways.

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research A synthesis We and others (Stanford) are working on methods which try to combine the best of both of the preceding approaches. Try to find clusters of genes and average their responses to reduce noise and enhance interpretability. Use testing to assign significance with averages of clusters of genes as we did with single genes.

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research Clustering genes Cluster 6=(1,2) Cluster 7=(1,2,3) Cluster 8=(4,5) Cluster 9= (1,2,3,4,5) Let p = number of genes. 1. Calculate within class correlation. 2. Perform hierarchical clustering which will produce (2p-1) clusters of genes. 3. Average within clusters of genes. 4 Perform testing on averages of clusters of genes as if they were single genes. E.g. p=5

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research Data - Ro1 Transgenic mice with a modified G i coupled receptor (Ro1). Experiment: induced expression of Ro1 in mice. 8 control (ctl) mice 9 treatment mice eight weeks after Ro1 being induced. Long-term question: Which groups of genes work together. Based on paper: Conditional expression of a Gi-coupled receptor causes ventricular conduction delay and a lethal cardiomyopathy, see Redfern C. et al. PNAS, April 25, also (Conklin lab, UCSF)

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research Histogram Cluster of genes (1703, 3754)

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research Top 15 averages of gene clusters = (1703, 3754) = (6194, 1703, 3754) = (4572, 4772, 5809) = (2534, 1343, 1954) = (6089, 5455, 3236, 4014) Might be influenced by 3754 Correlation T Group ID

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research Limitation Hard to extend this method to negatively correlated clusters of genes. Need to consider together with other methods. Need to identify high averages of clusters of genes that are due to high averages from sub- clusters of those genes.

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research Acknowledgments Yee Hwa Yang Sandrine Dudoit Natalie Roberts Ben Bolstad Ingrid Lonnstedt Karen Vranizan WEHI Bioinformatics group Matt Callow (LBL) Bruce Conklin (UCSF)