1-18-20051 Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable.

Slides:



Advertisements
Similar presentations
Linear Models for Microarray Data
Advertisements

Limma: Linear Models for Microarray Data R user group 21 June 2005 Judith Boer.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.
Gene Expression Index Stat Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.
Microarray Normalization
Lecture 23: Tues., Dec. 2 Today: Thursday:
Preprocessing Methods for Two-Color Microarray Data
Differentially expressed genes
Lecture 6 Outline: Tue, Sept 23 Review chapter 2.2 –Confidence Intervals Chapter 2.3 –Case Study –Two sample t-test –Confidence Intervals Testing.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
1 Practicals, Methodology & Statistics II Laura McAvinue School of Psychology Trinity College Dublin.
Low Level Statistics and Quality Control Javier Cabrera.
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab.
Chapter 2 Simple Comparative Experiments
Copyright © 2010 Pearson Education, Inc. Chapter 25 Paired Samples and Blocks.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
5-3 Inference on the Means of Two Populations, Variances Unknown
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
1 Normalization Methods for Two-Color Microarray Data 1/13/2009 Copyright © 2009 Dan Nettleton.
(4) Within-Array Normalization PNAS, vol. 101, no. 5, Feb Jianqing Fan, Paul Tam, George Vande Woude, and Yi Ren.
Multiple testing in high- throughput biology Petter Mostad.
Graphical Analysis. Why Graph Data? Graphical methods Require very little training Easy to use Massive amounts of data can be presented more readily Can.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
The following slides have been adapted from to be presented at the Follow-up course on Microarray Data Analysis.
CDNA Microarrays MB206.
Panu Somervuo, March 19, cDNA microarrays.
Name: Angelica F. White WEMBA10. Teach students how to make sound decisions and recommendations that are based on reliable quantitative information During.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 24, Slide 1 Chapter 24 Paired Samples and Blocks.
Assume we have two experimental conditions (j=1,2) We measure expression of all genes n times under both experimental conditions (n two- channel.
First approach - repeating a simple analysis for each gene separately - 30k times Assume we have two experimental conditions (j=1,2) We measure.
We calculated a t-test for 30,000 genes at once How do we handle results, present data and results Normalization of the data as a mean of removing.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 25 Paired Samples and Blocks.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
Statistical analysis of expression data: Normalization, differential expression and multiple testing Jelle Goeman.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
CHAPTER 12 Descriptive, Program Evaluation, and Advanced Methods.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
1 Example Analysis of a Two-Color Array Experiment Using LIMMA 3/30/2011 Copyright © 2011 Dan Nettleton.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
C HAPTER 4: I NTRODUCTORY L INEAR R EGRESSION Chapter Outline 4.1Simple Linear Regression Scatter Plot/Diagram Simple Linear Regression Model 4.2Curve.
(1) Normalization of cDNA microarray data Methods, Vol. 31, no. 4, December 2003 Gordon K. Smyth and Terry Speed.
Lecture 6 Design Matrices and ANOVA and how this is done in LIMMA.
For a specific gene x ij = i th measurement under condition j, i=1,…,6; j=1,2 Is a Specific Gene Differentially Expressed Differential expression.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Analyzing Expression Data: Clustering and Stats Chapter 16.
SPH 247 Statistical Analysis of Laboratory Data 1 May 5, 2015 SPH 247 Statistical Analysis of Laboratory Data.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
> FDRpvalue FDRTPvalue plot(LimmaFit$coefficients[complete.data],-log(FDRTPvalue[complete.data],base=10),type="p",main="Two-Sample.
Final Project Everybody still registered for the grade who did not have their own project will receive an with file names to be used for their project.
Statistics 25 Paired Samples. Paired Data Data are paired when the observations are collected in pairs or the observations in one group are naturally.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
Chapter 12 Inference for Linear Regression. Reminder of Linear Regression First thing you should do is examine your data… First thing you should do is.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 9 l Simple Linear Regression 9.1 Simple Linear Regression 9.2 Scatter Diagram 9.3 Graphical.
Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.
Lab 5 Unsupervised and supervised clustering Feb 22 th 2012 Daniel Fernandez Alejandro Quiroz.
limma Data to import:
The simple linear regression model and parameter estimation
Presentation transcript:

Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable with new ones Issues of the background subtraction Limma as the general tool for analyzing microarray data Outline

limma... is a package for the analysis of microarray data, especially the use of linear models for analyzing designed experiments and the assessment of differential expression. Specially constructed data objects to represent various aspects of microarray data Specially constructed "object methods" for importing, normalizing, displaying and analyzing microarray data All objects and methods are transparent All objects can be accessed and modified outside of limma Unique in the implementation of the empirical Bayes procedure for identifying differentially expressed genes by "borrowing" information from different genes (everything so far has been gene by gene)

Measurement Error Model With Additive Background There are other models for accounting for the background signal Simple subtraction of the background intensities often introduces additional variability in the observed signal The problem is in the fact that we use a single-observation estimate for  B With this in mind, various strategies have been proposed to pool background information from more than one spot to estimate  B Foreground (F) Background (B) Old Model New Model

limma Data to import: File descriptions: Spot descriptions: Importing data: source("

limma library(limma) data.directory<-" targets<-readTargets(" spottypes<-readSpotTypes(" LimmadataC<-read.maimages(files=targets$FileName,source="genepix", path = data.directory, columns=list(Gf = "F532 Median", Gb ="B532 Median", Rf = "F635 Median", Rb = "B635 Median"), annotation=c("Name","ID","Block","Row","Column"),wt.fun=wtflags(0))

RGList class > attributes(LimmadataC) $names [1] "R" "G" "Rb" "Gb" "weights" "targets" "genes" $class [1] "RGList" attr(,"package") [1] "limma"

RGList class > LimmadataC$genes[1,] Name ID Block Row Column 1 no name Rn > LimmadataC$R[1:3,] 51-C1-3-vs-W W2-3-vs-C C3-3-vs-W W4-3-vs-C C5-3-vs-W W6-3-vs-C6-5 [1,] [2,] [3,] > LimmadataC$Rb[1:3,] 51-C1-3-vs-W W2-3-vs-C C3-3-vs-W W4-3-vs-C C5-3-vs-W W6-3-vs-C6-5 [1,] [2,] [3,]

RGList class LimmadataC$genes$Status<-controlStatus(spottypes,LimmadataC) LimmadataC$weights[LimmadataC$genes$ID=="Blank"]<-0 LimmadataC$printer<-getLayout(LimmadataC$genes) > attributes(LimmadataC) $names [1] "R" "G" "Rb" "Gb" "weights" "targets" "genes" "printer" $class [1] "RGList" attr(,"package") [1] "limma" > LimmadataC$genes[1,] Name ID Block Row Column Status 1 no name Rn cDNA

Plotting data in a RGList object > plotMA(LimmadataC,array=1,xlim=c(-1,16),ylim=c(-3,8))

limma PlotMA automatically subtracts the background intensities before plotting data Does not plot data with weight 0 If you want to plot all data or the data without subtracting background, you need to do a little work source("

limma > NBLimmadataC<-backgroundCorrect(LimmadataC,method="none") > attributes(NBLimmadataC) $names [1] "R" "G" "weights" "targets" "genes" "printer" $class [1] "RGList" attr(,"package") [1] "limma" Note that background measurements are gone

Scatter with and without background subtraction Background subtracted data is more spread More data points without background subtractions

Plotting all data points Want to plot data points with weight 0 as well Create datasets with and without subtracting background and set all weights to 1 SpotsPerArray<-dim(LimmadataC$R)[1] Narrays<-dim(LimmadataC$R)[2] Limmadata<-LimmadataC Limmadata$weights[1:SpotsPerArray,1:Narrays]<-1 NBLimmadata<-NBLimmadataC NBLimmadata$weights[1:SpotsPerArray,1:Narrays]<-1

Plotting all data points Background Subtracted Raw All dataZero-weight data removed

Which one to use? Removing points with the weight zero seems reasonable Subtracting background costs us some data points even if one channel is above background since differences of log-transformed measurements are used only Subtracting background seems to increase the variability, but it is unclear how would this affect results For now proceed without background subtraction, but compare results at the end Exploring other proposed background-adjustment methods also seems like a good idea

Data Analysis Loess normalization source("eh3.uc.edu/LimmaLoess.R") > NNBLimmadataC<-normalizeWithinArrays(NBLimmadataC, method="loess") > attributes(NNBLimmadataC) $names [1] "weights" "targets" "genes" "printer" "M" "A" $class [1] "MAList" attr(,"package") [1] "limma"

Loess-normalized data

source(" > design<-modelMatrix(targets, ref="C") Found unique target names: C W > design W 51-C1-3-vs-W W2-3-vs-C C3-3-vs-W W4-3-vs-C C5-3-vs-W W6-3-vs-C Paired t-test using limma

> LimmaPTT<-lmFit(MA,design) Error in.class1(object) : Object "MA" not found > LimmaPTT<-lmFit(NNBLimmadataC,design) > > attributes(LimmaPTT) $names [1] "coefficients" "stdev.unscaled" "sigma" "df.residual" [5] "cov.coefficients" "pivot" "method" "design" [9] "genes" "Amean" $class [1] "MArrayLM" attr(,"package") [1] "limma" Paired t-test using limma

> mean(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[2,]) [1] > var(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[2,])^0.5 [1] > 1/(6^0.5) [1] > mean(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[2,])/((var(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[2,])^0.5)*(1/(6^0.5))) [1] > > LimmaPTT$coefficients[2] [1] > LimmaPTT$stdev.unscaled[2] [1] > LimmaPTT$sigma[2] [1] > LimmaPTT$coefficients[2]/(LimmaPTT$sigma[2]*LimmaPTT$stdev.unscaled[2]) [1] Paired t-test using limma

> mean(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[1,]) [1] > var(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[1,])^0.5 [1] > 1/(6^0.5) [1] > mean(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[1,])/((var(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[1,])^0.5)*(1/(6^0.5))) [1] > > LimmaPTT$coefficients[1] [1] > LimmaPTT$stdev.unscaled[1] [1] 0.5 > LimmaPTT$sigma[1] [1] > LimmaPTT$coefficients[1]/(LimmaPTT$sigma[1]*LimmaPTT$stdev.unscaled[1]) [1] > NNBLimmadataC$weights[1,] 51-C1-3-vs-W W2-3-vs-C C3-3-vs-W W4-3-vs-C C5-3-vs-W W6-3-vs-C Paired t-test using limma

> dfp 0 > LimmaPTT$LimmaTStat<-LimmaPTT$coefficients/(LimmaPTT$sigma*LimmaPTT$stdev.unscaled) > LimmaPTT$LimmaTPvalue<-rep(NA,SpotsPerArray) > LimmaPTT$LimmaTPvalue[dfp]<-2*pt(LimmaPTT$LimmaTStat[dfp],LimmaPTT$df.residual[dfp],lower.tail=FALSE) > attributes(LimmaPTT) $names [1] "coefficients" "stdev.unscaled" "sigma" "df.residual" [5] "cov.coefficients" "pivot" "method" "design" [9] "genes" "Amean" "LimmaTStat" "LimmaTPvalue" $class [1] "MArrayLM" attr(,"package") [1] "limma" Paired t-test using limma

Facilitates easy data import and normalization Keeps track of "bad" spots To run the basic t-test, it takes a bit of additional work If we were to use the empirical Bayes statistics as implemented in limma, it would be even easier Empirical Bayes is generally BETTER than simple t-test Will talk about this type of analysis next week Limma also allows fitting models with multiple factors which we will also talk about next week Next time – multiple hypothesis testing and p-value adjustments limma so far