P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate.

Slides:



Advertisements
Similar presentations
Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.
Advertisements

Shibing Deng Pfizer, Inc. Efficient Outlier Identification in Lung Cancer Study.
Multiple Analysis of Variance – MANOVA
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Logistic Regression Psy 524 Ainsworth.
ECS 289A Presentation Jimin Ding Problem & Motivation Two-component Model Estimation for Parameters in above model Define low and high level gene expression.
Dahlia Nielsen North Carolina State University Bioinformatics Research Center.
Microarray Normalization
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Getting the numbers comparable
More On Preprocessing Javier Cabrera. Outline 1.Transform the data into a scale suitable for analysis. 2.Remove the effects of systematic and obfuscating.
Microarray Data Preprocessing and Clustering Analysis
Differentially expressed genes
GCB/CIS 535 Microarray Topics John Tobias November 8th, 2004.
Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.
Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
One-Way ANOVA Independent Samples. Basic Design Grouping variable with 2 or more levels Continuous dependent/criterion variable H  :  1 =  2 =... =
Microarray Data Analysis
Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.
1 Normalization Methods for Two-Color Microarray Data 1/13/2009 Copyright © 2009 Dan Nettleton.
(4) Within-Array Normalization PNAS, vol. 101, no. 5, Feb Jianqing Fan, Paul Tam, George Vande Woude, and Yi Ren.
Lecture 4 Ttests STAT 3120 Statistical Methods I.
Chapter 9 Two-Sample Tests Part II: Introduction to Hypothesis Testing Renee R. Ha, Ph.D. James C. Ha, Ph.D Integrative Statistics for the Social & Behavioral.
Practical Issues in Microarray Data Analysis Mark Reimers National Cancer Institute Bethesda Maryland.
5-1 Introduction 5-2 Inference on the Means of Two Populations, Variances Known Assumptions.
Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 6: Case Study.
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
January MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing.
Lecture Topic 5 Pre-processing AFFY data. Probe Level Analysis The Purpose –Calculate an expression value for each probe set (gene) from the PM.
Mathematical Model for the Law of Comparative Judgment in Print Sample Evaluation Mai Zhou Dept. of Statistics, University of Kentucky Luke C.Cui Lexmark.
For a specific gene x ij = i th measurement under condition j, i=1,…,6; j=1,2 Is a Specific Gene Differentially Expressed Differential expression.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Empirical Bayes Analysis of Variance Component Models for Microarray Data S. Feng, 1 R.Wolfinger, 2 T.Chu, 2 G.Gibson, 3 L.McGraw 4 1. Department of Statistics,
Handout Six: Sample Size, Effect Size, Power, and Assumptions of ANOVA EPSE 592 Experimental Designs and Analysis in Educational Research Instructor: Dr.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University Plate Effects in cDNA Microarray Data.
Variability & Statistical Analysis of Microarray Data GCAT – Georgetown July 2004 Jo Hardin Pomona College
Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Homogeneity of Variance Pooling the variances doesn’t make sense when we cannot assume all of the sample Variances are estimating the same value. For two.
Generation of patterns from gene expression by assigning confidence to differentially expressed genes Elisabetta Manduchi, Gregory R. Grant, Steven E.McKenzie,
Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.
Micro array Data Analysis. Differential Gene Expression Analysis The Experiment Micro-array experiment measures gene expression in Rats (>5000 genes).
ANALYSIS OF VARIANCE (ANOVA)
Differential Gene Expression
CDNA-Project cDNA project Julia Brettschneider (UCB Statistics)
What we’ll cover today Transformations Inferential statistics
Types of T-tests Independent T-tests Paired or correlated t-tests
Significance Analysis of Microarrays (SAM)
Experimental Power Graphing Program
Inverse Transformation Scale Experimental Power Graphing
Comparative Analysis of Single-Cell RNA Sequencing Methods
Sensitivity of RNA‐seq.
Undergraduated Econometrics
Significance Analysis of Microarrays (SAM)
Getting the numbers comparable
Homogeneity of Variance
Pan Du, Simon Lin Robert H. Lurie Comprehensive Cancer Center
Reversing the TERT promoter mutation to WT reverses the active chromatin marks and alters long-range chromatin interactions. Reversing the TERT promoter.
Volume 14, Issue 7, Pages (February 2016)
Data Transformation, T-Tools and Alternatives
C-MYC controls expression of ABC drug transporters in CD34+ hematopoietic progenitors. c-MYC controls expression of ABC drug transporters in CD34+ hematopoietic.
Pre-processing AFFY data
Presentation transcript:

P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate Samples and Two Convenient Variance-stabilizing Transformations Peter J. Munson, Ph.D. Mathematical and Statistical Computing Laboratory DCB, CIT, NIH

P. J. Munson, National Institutes of Health, Nov. 2001Page 2 Introduction Math. Stat. Comp. Lab. at NIH Run Affy LIMS database –Started Dec 2000, Stores >700 chips, –Serves 3 core facilities at NIH Study 1 –2 treatments, 5 time points, 6 subjects, 60 U95A chips, PBMC cells Study 2 –3 treatments, 5 time points, 5 subj., 75 Hu6800 chips, human cells in culter Study 3 –4 doses, 2 time oints, 20 subjects, 20 RG U34A chips, blood cells

P. J. Munson, National Institutes of Health, Nov. 2001Page 3 Outline Development of Consistency Test Variance-stabilizing transforms –Generalize Logarithm, GLog –Adaptive transform for Average Diff, TAD Normalization –Normal quantile + adaptive transform Application Probe-pair data visualization: –Parallel Axis Coordinate Display

P. J. Munson, National Institutes of Health, Nov. 2001Page 4 Comparing Two Cell Lines Data from Carlisle, et al., Mol.Carcinogen., 2000 Don’t subtract background Ignore background-level points Calibrate on median intensity of each cell type Over 3-fold change = = Outside dashed lines Are these expression level changes significant? real?

P. J. Munson, National Institutes of Health, Nov. 2001Page 5 Duplicate Experiments and "Consistency" Plot Identifies Real Changes in Expression Vimentin Keratin 5

P. J. Munson, National Institutes of Health, Nov. 2001Page 6 Replication Permits Calculation of Significance (P-values) 4 False-positives Out of 5760 spots: P ≈ 4/5760 =

P. J. Munson, National Institutes of Health, Nov. 2001Page 7 Consistency Plot Compare duplicate experiments, Log Ratio scale Set Cutoffs for Over-, Under- expression Calculate number detected, D Assume Independence, calculate expected number, E, above both, below both cutoffs Estimate false positive rate, E/D D=24 E=0. 6 E/D=3% E= D=24 D=16

P. J. Munson, National Institutes of Health, Nov. 2001Page 8 p53 +/+ cells 6 hrs, replicate reciprocal experiment

P. J. Munson, National Institutes of Health, Nov. 2001Page 9 Consistency Test on Relative Expression DEFINE: x(g, i) = relative expression value for gene g (=1,...,n) in experiment i (=1,...,m) F i (X) = empirical cdf of x i across genes (spots) c = min j x(g, j), across experiments THEN assuming that { x(g, i), g=1,...,n } are an independent sample from distribution F i, the probability that x(g, i) is consistently large is: p up (g) = Pr(X i ≥ c, for all i) = ∏ i (1 - F i (c))

P. J. Munson, National Institutes of Health, Nov. 2001Page 10 Consistency Test on Relative Expression- 2 DEFINE: x(g, i) = relative expression value for gene g (= 1,...,n) in experiment i (= 1,...,m) p up (g) = ∏ i (1 - F i ( min j x(g, j) )) p dn (g) = ∏ i (F i ( max j x(g, j) )) THEN Expected number of false positives: E(g) = n * p(g)

P. J. Munson, National Institutes of Health, Nov. 2001Page 11 Assumptions of Consistency Test Independence between experiments “Exchangeability” of genes Homogeneity of variance across genes (i.e. across expression intensity) Does NOT require: Identical distribution in separate experiments But, variance homogeneity violated for Affy Avg. Diff. data

P. J. Munson, National Institutes of Health, Nov. 2001Page 12 Variance Stabilizing Transformations Logarithm Box-Cox, power Generalized Logarithm, GLog Adaptive, TAD

P. J. Munson, National Institutes of Health, Nov. 2001Page 13 Model Variance as Function of Mean AD

P. J. Munson, National Institutes of Health, Nov. 2001Page 14 Model Variance as Function of Mean AD Var(y) = a0 Var(y) = a0 + a1*y Var(y) = a0 + a1*y + a2*y 2 Var(y) = a2*y 2 =>> use logarithms What about: Var(y) = a0 + a2*y 2

P. J. Munson, National Institutes of Health, Nov. 2001Page 15 Var(y) = a0 + a2 * y 2 = a0*( 1+ (y/c) 2 ) where c = sqrt(a0/a2) GLog(y; c) = sign(y) *ln{ |y/c| + sqrt(1 + y 2 /c 2 ) } = s.d. at y = 0 / CV, e.g. = 10 / 0.1 = 100 Generalized Log Transform (G-Log)

P. J. Munson, National Institutes of Health, Nov. 2001Page 16 Quantile Normalization for AD (before)

P. J. Munson, National Institutes of Health, Nov. 2001Page 17 Quantile Normalization for AD (after)

P. J. Munson, National Institutes of Health, Nov. 2001Page 18 Normal Quantile Transform after GLog(AD) (it’s almost linear)

P. J. Munson, National Institutes of Health, Nov. 2001Page 19 Adaptive Transform of AD (TAD) - 1 Model variance (over many replicates) vs. mean AD Plot: Log(SD) or Wilson-Hilferty, SD^(2/3) transform vs. Mean of NQ(AD) Fit smooth function, g which predicts SD

P. J. Munson, National Institutes of Health, Nov. 2001Page 20 T(X) = Int(-inf,X,1/g) Adaptive Transform of AD (TAD) - 2

P. J. Munson, National Institutes of Health, Nov. 2001Page 21 Adaptive Transform of AD (TAD)

P. J. Munson, National Institutes of Health, Nov. 2001Page 22 Consistency Test p-values Time 2 vs. Time 0Time 1 vs. Time 0 Treatment Sham

P. J. Munson, National Institutes of Health, Nov. 2001Page 23 Results of Study 1 (5 time points, 2 treatments, 6 subjects)

P. J. Munson, National Institutes of Health, Nov. 2001Page 24 Probe Pair Data, Delta TAD = 2 Parallel Axis Coordinate Display

P. J. Munson, National Institutes of Health, Nov. 2001Page 25 Probe Pair Data Delta TAD = 0.5

P. J. Munson, National Institutes of Health, Nov. 2001Page 26 Probe Pair Data, Delta TAD = -1.5

P. J. Munson, National Institutes of Health, Nov. 2001Page 27 Probe Pair Data, Delta TAD = -0.5

P. J. Munson, National Institutes of Health, Nov. 2001Page 28 Acknowledgements Lynn Young, MSCL Vinay Prabhu, MSCL Jennifer Barb, MSCL Howard Shindel, MSCL Andrew Schwartz, CIT Steve Bailey, CIT Robert Danner, CC Anthony Suffredini, CC Peter Eichacker, CC James Shelhamer, CC Eric Gerstenberger, CC Sayed Daoud, NCI Yves Pommier, NCI John Weinstein, NCI David Krizman, NCI Alex Carlisle, NCI David Rocke, UC Davis