1-24-061 limma Data to import:

Slides:



Advertisements
Similar presentations
 Objective: To determine whether or not a curved relationship can be salvaged and re-expressed into a linear relationship. If so, complete the re-expression.
Advertisements

Multi-way Anova Identifying and quantifying sources of variation Ability to "factor out" certain sources - ("adjusting") For the beginning, we.
Limma: Linear Models for Microarray Data R user group 21 June 2005 Judith Boer.
Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.
MicroArray Image Analysis Robin Liechti
Gene Expression Index Stat Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.
Low beam intensity (MERIT beam spot size – part II) Goran Skoro 30 June 2008.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
LSP 120: Quantitative Reasoning and Technological Literacy Section 118 Özlem Elgün.
Project: – Several options for bid: Bid our signal Develop several strategies Develop stable bidding strategy Simulating Normal Random Variables.
Stat 112: Lecture 15 Notes Finish Chapter 6: –Review on Checking Assumptions (Section ) –Outliers and Influential Points (Section 6.7) Homework.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Simulating Normal Random Variables Simulation can provide a great deal of information about the behavior of a random variable.
Preprocessing Methods for Two-Color Microarray Data
1 Preprocessing for Affymetrix GeneChip Data 1/18/2011 Copyright © 2011 Dan Nettleton.
Stat 112: Lecture 13 Notes Finish Chapter 5: –Review Predictions in Log-Log Transformation. –Polynomials and Transformations in Multiple Regression Start.
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
1 Normalization Methods for Two-Color Microarray Data 1/13/2009 Copyright © 2009 Dan Nettleton.
(4) Within-Array Normalization PNAS, vol. 101, no. 5, Feb Jianqing Fan, Paul Tam, George Vande Woude, and Yi Ren.
Inference for regression - Simple linear regression
Unit 1 Understanding Numeric Values, Variability, and Change 1.
Panu Somervuo, March 19, cDNA microarrays.
Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable.
UNLOCKING THE SECRETS HIDDEN IN YOUR DATA Data Analysis.
SIMPLE TWO GROUP TESTS Prof Peter T Donnan Prof Peter T Donnan.
Correlation Association between 2 variables 1 2 Suppose we wished to graph the relationship between foot length Height
Stat 112 Notes 15 Today: –Outliers and influential points. Homework 4 due on Thursday.
We calculated a t-test for 30,000 genes at once How do we handle results, present data and results Normalization of the data as a mean of removing.
Sec 1.5 Scatter Plots and Least Squares Lines Come in & plot your height (x-axis) and shoe size (y-axis) on the graph. Add your coordinate point to the.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
The Nature of Science & Science Skills Test Review.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
Lecture Topic 5 Pre-processing AFFY data. Probe Level Analysis The Purpose –Calculate an expression value for each probe set (gene) from the PM.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
Lecture 6 Re-expressing Data: It’s Easier Than You Think.
1 Example Analysis of a Two-Color Array Experiment Using LIMMA 3/30/2011 Copyright © 2011 Dan Nettleton.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
LSP 120: Quantitative Reasoning and Technological Literacy Topic 1: Introduction to Quantitative Reasoning and Linear Models Lecture Notes 1.3 Prepared.
(1) Normalization of cDNA microarray data Methods, Vol. 31, no. 4, December 2003 Gordon K. Smyth and Terry Speed.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Analyzing Expression Data: Clustering and Stats Chapter 16.
SPH 247 Statistical Analysis of Laboratory Data 1 May 5, 2015 SPH 247 Statistical Analysis of Laboratory Data.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University Plate Effects in cDNA Microarray Data.
Variability & Statistical Analysis of Microarray Data GCAT – Georgetown July 2004 Jo Hardin Pomona College
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
> FDRpvalue FDRTPvalue plot(LimmaFit$coefficients[complete.data],-log(FDRTPvalue[complete.data],base=10),type="p",main="Two-Sample.
Final Project Everybody still registered for the grade who did not have their own project will receive an with file names to be used for their project.
SJSU College of Business Business Productivity Tools Fall 2016 Summary of Lessons and Learning Objectives.
The simple linear regression model and parameter estimation
Different Types of Data
Chapter 10: Re-Expression of Curved Relationships
How to Start This PowerPoint® Tutorial
This Week Review of estimation and hypothesis testing
Performing What-if Analysis
Checking Regression Model Assumptions
Normalization Methods for Two-Color Microarray Data
How to Start This PowerPoint® Tutorial
Checking Regression Model Assumptions
Unit-2 Divide and Conquer
Re-expressing Data:Get it Straight!
Warm Up 1. x2y6x10 x3y 2. (3z)4 3. (4a4)2(2a3)3 Supplies: Calculator.
How to Start This PowerPoint® Tutorial
Normalization for cDNA Microarray Data
Dilepton Mass. Progress report.
Exercise 1: Gestational age and birthweight
Pre-processing AFFY data
Presentation transcript:

limma Data to import: File descriptions: Spot descriptions: Importing data: source("

limma > library(limma) > data.directory<-" > targets<-readTargets(" > targets array experiment cy _MO-S N17-72SV2-3-vs-CSV2-5-B.gpr Nic-WT72hr_ _MO-S N73-CSV3-3-vs-72SV3-5-A.gpr Ctl-WT00hr_ _MO-S N98-CSV1-3-vs-72SV1-5-B.gpr Ctl-WT00hr_1 cy5 date 1 Ctl-WT00hr_2 7/21/ Nic-WT72hr_3 7/21/ Nic-WT72hr_1 7/18/2003

limma > spottypes<- readSpotTypes(" > spottypes SpotType ID Name Color 1 cDNA * * black 2 Blank *Blank* * blue 3 Control * *control* red 4 Empty *Empty* * blue 5 empty *empty* * blue >

RGList class > LimmaDataNickel<-read.maimages(files=targets$experiment,source="genepix", path = data.directory, names=paste("Cy5_",targets$cy5,"_VS_Cy3_",targets$cy3,sep=""), + columns=list(Gf = "F532 Median",Gb ="B532 Median", Rf = "F635 Median", Rb = "B635 Median"), + annotation=c("Name","ID","Block","Row","Column"),wt.fun=wtflags(0)) Read Read Read > attributes(LimmaDataNickel) $names [1] "R" "G" "Rb" "Gb" "weights" "targets" "genes" $class [1] "RGList" attr(,"package") [1] "limma"

RGList class > LimmaDataNickel$R[1:3,] Cy5_Ctl-WT00hr_2_VS_Cy3_Nic-WT72hr_2 Cy5_Nic-WT72hr_3_VS_Cy3_Ctl-WT00hr_3 [1,] [2,] [3,] Cy5_Nic-WT72hr_1_VS_Cy3_Ctl-WT00hr_1 [1,] 726 [2,] 248 [3,] 120 > LimmaDataNickel$Rb[1:3,] Cy5_Ctl-WT00hr_2_VS_Cy3_Nic-WT72hr_2 Cy5_Nic-WT72hr_3_VS_Cy3_Ctl-WT00hr_3 [1,] [2,] [3,] Cy5_Nic-WT72hr_1_VS_Cy3_Ctl-WT00hr_1 [1,] 129 [2,] 127 [3,] 127 >

RGList class > LimmaDataNickel$weights[LimmaDataNickel$genes$ID=="Blank"]<-0 > LimmaDataNickel$weights[LimmaDataNickel$genes$ID=="empty"]<-0 > LimmaDataNickel$printer<-getLayout(LimmaDataNickel$genes) > attributes(LimmaDataNickel) $names [1] "R" "G" "Rb" "Gb" "weights" "targets" "genes" "printer" $class [1] "RGList" attr(,"package") [1] "limma"

RGList class > LimmaDataNickel$genes[1,] Name ID Block Row Column 1 D > LimmaDataNickel$genes$Status<-controlStatus(spottypes,LimmaDataNickel) Matching patterns for: ID Name Found cDNA Found 288 Blank Found 0 Control Found 0 Empty Found 381 empty Setting attributes: values Color > > LimmaDataNickel$genes[1,] Name ID Block Row Column Status 1 D cDNA

Plotting data in a RGList object par(mfrow=c(3,1)) plotMA(LimmaDataNickel,array=1,xlim=c(-1,16),ylim=c(-3,8),zero.weights=T) plotMA(LimmaDataNickel,array=2,xlim=c(-1,16),ylim=c(-3,8),zero.weights=T) plotMA(LimmaDataNickel,array=3,xlim=c(-1,16),ylim=c(-3,8),zero.weights=T)

limma PlotMA automatically subtracts the background intensities before plotting data It plots M = log 2 (Cy5)-log 2 (Cy3) on y-axis and A = [log 2 (Cy5)+log 2 (Cy3)]/2 on x-axis Does not plot data with weight zero unless you ask it to If you want to plot all data or the data without subtracting background, you need to do a little work

Background Adjustments > NBLimmaDataNickel<-backgroundCorrect(LimmaDataNickel,method="none") > attributes(NBLimmaDataNickel) $names [1] "R" "G" "weights" "targets" "genes" "printer" $class [1] "RGList" attr(,"package") [1] "limma" Note that background measurements are gone Whole bunch of background adjustment procedures

Plotting data in a RGList object > par(mfrow=c(3,1)) > plotMA(LimmaDataNickel,array=1,xlim=c(-1,16),ylim=c(-3,8),zero.weights=F) > plotMA(LimmaDataNickel,array=2,xlim=c(-1,16),ylim=c(-3,8),zero.weights=F) > plotMA(LimmaDataNickel,array=3,xlim=c(-1,16),ylim=c(-3,8),zero.weights=F)

Plotting data in a RGList object > par(mfrow=c(3,1)) > plotMA(NBLimmaDataNickel,array=1,xlim=c(-1,16),ylim=c(-3,8),zero.weights=F) > plotMA(NBLimmaDataNickel,array=2,xlim=c(-1,16),ylim=c(-3,8),zero.weights=F) > plotMA(NBLimmaDataNickel,array=3,xlim=c(-1,16),ylim=c(-3,8),zero.weights=F)

Which one to use? Removing points with the weight zero seems reasonable Subtracting background costs us some data points even if one channel is above background since differences of log-transformed measurements are used only Subtracting background seems to increase the variability, but it is unclear how would this affect results For now proceed without background subtraction, but compare results at the end Exploring other proposed background-adjustment methods also seems like a good idea

Within Array Normalization > NNBLimmaDataNickel<-normalizeWithinArrays(NBLimmaDataNickel,method="none") > attributes(NNBLimmaDataNickel) $names [1] "weights" "targets" "genes" "printer" "M" "A" $class [1] "MAList" attr(,"package") [1] "limma" Left with log-ratios and averages - the same things as in the scatter plot produced by plotMA

Checking Out M and A > NNBLimmaDataNickel$M[1:3,] Cy5_Ctl-WT00hr_2_VS_Cy3_Nic-WT72hr_2 Cy5_Nic-WT72hr_3_VS_Cy3_Ctl-WT00hr_3 [1,] [2,] [3,] Cy5_Nic-WT72hr_1_VS_Cy3_Ctl-WT00hr_1 [1,] [2,] [3,] > NNBLimmaDataNickel$A[1:3,] Cy5_Ctl-WT00hr_2_VS_Cy3_Nic-WT72hr_2 Cy5_Nic-WT72hr_3_VS_Cy3_Ctl-WT00hr_3 [1,] [2,] [3,] Cy5_Nic-WT72hr_1_VS_Cy3_Ctl-WT00hr_1 [1,] [2,] [3,] > Homework suggestion: Calculate these directly from R and G

Two-Sample t-test > NBLimmaDataNickelRG<-log2(cbind(NBLimmaDataNickel$R,NBLimmaDataNickel$G)) > dimnames(NBLimmaDataNickelRG)[[2]] [1] "Cy5_Ctl-WT00hr_2_VS_Cy3_Nic-WT72hr_2" [2] "Cy5_Nic-WT72hr_3_VS_Cy3_Ctl-WT00hr_3" [3] "Cy5_Nic-WT72hr_1_VS_Cy3_Ctl-WT00hr_1" [4] "Cy5_Ctl-WT00hr_2_VS_Cy3_Nic-WT72hr_2" [5] "Cy5_Nic-WT72hr_3_VS_Cy3_Ctl-WT00hr_3" [6] "Cy5_Nic-WT72hr_1_VS_Cy3_Ctl-WT00hr_1" > Nic<-c(2,3,4) > Ctl<-c(1,5,6)

Two-Sample t-test > MNic<-apply(NBLimmaDataNickelRG[,Nic],1,mean,na.rm=TRUE) > VNic<-apply(NBLimmaDataNickelRG[,Nic],1,var,na.rm=TRUE) > MCtl<-apply(NBLimmaDataNickelRG[,Ctl],1,mean,na.rm=TRUE) > VCtl<-apply(NBLimmaDataNickelRG[,Ctl],1,var,na.rm=TRUE) > NNic<-apply(!is.na(NBLimmaDataNickelRG[,Nic]),1,sum,na.rm=TRUE) > NCtl<-apply(!is.na(NBLimmaDataNickelRG[,Ctl]),1,sum,na.rm=TRUE) > > VNicCtl<-(((NNic-1)*VNic)+((NCtl-1)*VCtl))/(NCtl+NNic-2) > > DF<-NNic+NCtl-2 > > TStat<-abs(MNic-MCtl)/((VNicCtl*((1/NNic)+(1/NCtl)))^0.5) > TPvalue<-2*pt(TStat,DF,lower.tail=FALSE) > TStat[1] [1] > TPvalue[1] [1]

Paired t-test > dimnames(NNBLimmaDataNickel)[[2]] [1] "Cy5_Ctl-WT00hr_2_VS_Cy3_Nic-WT72hr_2" [2] "Cy5_Nic-WT72hr_3_VS_Cy3_Ctl-WT00hr_3" [3] "Cy5_Nic-WT72hr_1_VS_Cy3_Ctl-WT00hr_1" > NNBLimmaDataNickelLR<-NNBLimmaDataNickel$M > NNBLimmaDataNickelLR[,1]<-(-NNBLimmaDataNickelLR[,1]) > MLR<-apply(NNBLimmaDataNickelLR,1,mean,na.rm=TRUE) > VLR<-apply(NNBLimmaDataNickelLR,1,var,na.rm=TRUE) > NLR<-apply(!is.na(NNBLimmaDataNickelLR),1,sum,na.rm=TRUE) > > PTDF<-NLR-1 > > PTStat<-abs(MLR)/((VLR*(1/NLR))^0.5) > PTPvalue<-2*pt(PTStat,PTDF,lower.tail=FALSE) > PTStat[1] [1] > PTPvalue[1] [1]

Two-sample vs Paired t-test > par(mfrow=c(2,2)) > plot(MNic-MCtl,MLR) > plot(VNicCtl,VLR) > plot(TStat,PTStat) > plot(-log10(TPvalue),-log10(PTPvalue),xlim=c(0,4),ylim=c(0,4))

Limma Analysis > dimnames(NNBLimmaDataNickel)[[2]] [1] "Cy5_Ctl-WT00hr_2_VS_Cy3_Nic-WT72hr_2" [2] "Cy5_Nic-WT72hr_3_VS_Cy3_Ctl-WT00hr_3" [3] "Cy5_Nic-WT72hr_1_VS_Cy3_Ctl-WT00hr_1" > design<-c(-1,1,1) > > LimmaFit<-lmFit(NNBLimmaDataNickel,design) > attributes(LimmaFit) $names [1] "coefficients" "stdev.unscaled" "sigma" "df.residual" [5] "cov.coefficients" "pivot" "method" "design" [9] "genes" "Amean" $class [1] "MArrayLM" attr(,"package") [1] "limma"

Limma Analysis > par(mfrow=c(1,1)) > plot(LimmaFit$coefficients,MLR)

Limma Analysis > complete.data<-apply(NNBLimmaDataNickel$weights,1,sum)==3 > sum(complete.data) [1] 9621 > par(mfrow=c(1,1)) > plot(LimmaFit$coefficients[complete.data],MLR[complete.data])

Limma Analysis > LimmaFitTstat<- abs(LimmaFit$coefficients/(LimmaFit$sigma*LimmaFit$stdev.unscaled)) > LimmaFitPvalue<-2*pt(LimmaFitTstat,LimmaFit$df.residual,lower.tail=FALSE) > > par(mfrow=c(1,2)) > plot(LimmaFitTstat[complete.data],PTStat[complete.data]) > plot(-log10(LimmaFitPvalue[complete.data]),-log10(PTPvalue[complete.data]))

Facilitates easy data import and normalization Keeps track of "bad" spots To run the basic t-test, it takes a bit of additional work If we were to use the empirical Bayes statistics as implemented in limma, it would be even easier Empirical Bayes is generally BETTER than simple t-test Will talk about this type of analysis next week Limma also allows fitting models with multiple factors which we will also talk about next week limma so far