Genomic Profiles of Brain Tissue in Humans and Chimpanzees.

Slides:



Advertisements
Similar presentations
Linear Models for Microarray Data
Advertisements

Randomized Complete Block and Repeated Measures (Each Subject Receives Each Treatment) Designs KNNL – Chapters 21,
Topic 12: Multiple Linear Regression
1 Chapter 4 Experiments with Blocking Factors The Randomized Complete Block Design Nuisance factor: a design factor that probably has an effect.
Chapter 4 Randomized Blocks, Latin Squares, and Related Designs
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
1 Statistical Inference H Plan: –Discuss statistical methods in simulations –Define concepts and terminology –Traditional approaches: u Hypothesis testing.
Chapter 5 Introduction to Factorial Designs
Lecture 23: Tues., Dec. 2 Today: Thursday:
ANOVA Determining Which Means Differ in Single Factor Models Determining Which Means Differ in Single Factor Models.
Final Review Session.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Simple Linear Regression Analysis
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Variance and covariance Sums of squares General linear models.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Objectives of Multiple Regression
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
Chapter 14 Introduction to Multiple Regression
Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
ANOVA (Analysis of Variance) by Aziza Munir
Lab 5 instruction.  a collection of statistical methods to compare several groups according to their means on a quantitative response variable  Two-Way.
Copyright © 2011 Pearson Education, Inc. Analysis of Variance Chapter 26.
Jeopardy Opening Robert Lee | UOIT Game Board $ 200 $ 200 $ 200 $ 200 $ 200 $ 400 $ 400 $ 400 $ 400 $ 400 $ 10 0 $ 10 0 $ 10 0 $ 10 0 $ 10 0 $ 300 $
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Chapter 13 Multiple Regression
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
MRNA Expression Experiment Measurement Unit Array Probe Gene Sequence n n n Clinical Sample Anatomy Ontology n 1 Patient 1 n Disease n n ProjectPlatform.
PSYC 3030 Review Session April 19, Housekeeping Exam: –April 26, 2004 (Monday) –RN 203 –Use pencil, bring calculator & eraser –Make use of your.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.1 One-Way ANOVA: Comparing.
Analysis Overheads1 Analyzing Heterogeneous Distributions: Multiple Regression Analysis Analog to the ANOVA is restricted to a single categorical between.
Statistics for Differential Expression Naomi Altman Oct. 06.
Single-Factor Studies KNNL – Chapter 16. Single-Factor Models Independent Variable can be qualitative or quantitative If Quantitative, we typically assume.
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
Lecture 6 Design Matrices and ANOVA and how this is done in LIMMA.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Paper Review on Cross- species Microarray Comparison Hong Lu
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Hypothesis test flow chart frequency data Measurement scale number of variables 1 basic χ 2 test (19.5) Table I χ 2 test for independence (19.9) Table.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 26 Analysis of Variance.
Factorial BG ANOVA Psy 420 Ainsworth. Topics in Factorial Designs Factorial? Crossing and Nesting Assumptions Analysis Traditional and Regression Approaches.
Simple and multiple regression analysis in matrix form Least square Beta estimation Beta Simple linear regression Multiple regression with two predictors.
Chapter 14 Introduction to Multiple Regression
Essentials of Modern Business Statistics (7e)
Regression Analysis.
Chapter 5 Introduction to Factorial Designs
Randomized Complete Block and Repeated Measures (Each Subject Receives Each Treatment) Designs KNNL – Chapters 21,
Fixed, Random and Mixed effects
JMP Example 5 Use the previous yield data from different dissolution temperatures. Make a model that describes the effect of temperature on the yield.
Presentation transcript:

Genomic Profiles of Brain Tissue in Humans and Chimpanzees

Data The samples are hybridized to Affymetrix Genechip® Human Genome U95B [HG_U95B] (A-AFFY-1) Samples: 3 humans and 3 chimps 7 brain regions some samples have multiple hybridizations to other Hu arrays I selected 4 regions, 1 sample per biological replicate per region = 24 arrays

Data Our set up is this: Brain i cortex cerebellum caudate Broca human or chimp This is called a split plot design. The cortex and cerebellum from the same brain are more similar than cortex and cerebellum from different brains in the same species. This induces a random effect for biological replicate. We will not deal with this in the analysis, but Limma can handle 1 random effect.

Pre-processing Since I do not have enough memory to read in the arrays and then normalize, I used brains=justRMA() to store the normalized probeset summaries. "affy" automatically downloaded the HG_U95Av2 cdf to attach probeset names. The probeset names can be recovered using the "slot" geneNames: gNames=geneNames(brains)

Pre-processing We do not want to use the Affymetrix control probes in the analysis, so we should eliminate these. length(gName) cbind(12500:12625,gNames[12500:12625])

Pre-processing We also need the sample information which we will use to define the treatments. This is stored in the "phenotype data" pData(brains) Only the array names have been stored. To add the species and brain region information, we need to add columns to the phenotype data. brain.exprs=exprs(brains)[1:12558,]

Pre-processing We also need the sample information which we will use to define the treatments. This is stored in the "phenotype data" pData(brains) Only the array names have been stored. To add the species and brain region information, we need to add columns to the phenotype data. pData(brains)=cbind(pData(brains), species=c(rep("chimp",12),rep("human",12)), tissue=rep(c("pcortex","caudate","cerebellum", "broca"),6), trts=c(rep(c("ca","cb","cc","cd"),3), rep(("ha","hb","hc","hd",3)))

Running Limma There are 3 steps to running Limma: Run the least squares mixed model. Adjust the t-tests by using an eBayes model on the variances. Use multiple comparisons adjustments to select genes that have statistically significant differential expression.

Limma Models A typical model for Y=log(expression) for these data would be: Y=  +s+r+(sr)+   average over all the genes s species main (average) effect r brain region main effect (sr) interaction  random error Limma can run this model, but the resulting test statistics test that all the parameters are 0, including 

Limma Models Computationally, limma does regression on a design matrix. If we understand design matrices, we can better understand how to get the information we need from limma. (Here I move to the board and explain indicator variables and contrasts)

Creating a Design Matrix with a Formula factorial=model.matrix(~species*tissue,data=brain.pheno) Sets up a design matrix with a column of 1's and (0,1) indicator for the levels of species and tissue. The levels are put in alphabetical order, and the first category is omitted. This is the preferred formulation for what is called a "type 3" analysis, which is what we teach in Statistics class, but not very convenient for Limma analysis, which provides a t-test for the regression coefficients.

Creating a Design Matrix Generally we are interested in questions like: Does the mean expression of this gene differ in chimps and humans? Does the expression of this gene in the cortex differ between chimps and humans. These are most readily expressed as contrasts among means. What I find most convenient is to start by setting up a design matrix for the treatments, using the cell means model. This provides the required estimate of error variance as well as names for the columns of the design matrix which are useful for setting up a contrast matrix.

Creating a Design Matrix design=model.matrix(~trt-1,data=brain.pheno) fitmeans=lmFit(brain.exprs,design) We then decide what contrasts we want to do. e.g. We might be most interested in differences between human and chimp in each brain region: ca-ha, cb-hb, cc-hc, cd-hd Or we might want to know if the difference between 2 regions is the same in human and chimp: (ca-cb)-(ha-hb) We can have as many of these differences as we want.

Creating a Design Matrix I also like to get an F-test which is an overall test of differential expression. For this, an set of T-1 independent contrasts will do. Limma provides an F-test no matter what contrasts are in the contrast matrix, but this is not a standard test. It corresponds to the usual ANOVA F-test when T-1 independent contrasts are provided.

Creating a Design Matrix e.g. main contrasts of interest are: average human - average chimp human - chimp in each region ave H - ave C = (ha+hb+hc+hd)-(ca+cb+cc+cd) ( ) difference in a=cortex ha-ca ( ) etc

ave H - ave C = (ha+hb+hc+hd)-(ca+cb+cc+cd) ( ) difference in a=cortex ha-ca ( ) etc contrast.matrix=makeContrasts( main= (trtsha+trtshb+trtshc+trtshd)- (trtsca+trtscb+trtscc+trtscd), cortex=trtsha-trtsca,caudate=trtshb- trtscb,cereb=trtshc-trtscc, broca=trtshd-trtscd, levels=design)

Fitting the Contrasts fit.contrasts=contrasts.fit(fitMean,contrast.matrix) #eBayes step efit.contrasts=eBayes(fit.contrasts)

Selecting the Differentially Expressed Genes It is nice to look at the p-values before adjusting for multiple comparisons par(mfrow=c(2,3)) for (i in 1:5) { hist(efit.contrasts$p.value[,i], main= paste(colnames(efit.contrasts$p.value)[i])) }

Selecting the Differentially Expressed Genes topTable(efit.contrasts,coef="broca",n=20, adjust="BH") or look at the qvalues and pi0 library(qvalue) q.broca=qvalue(efit.contrasts$p.value$broca) q.broca$pi0