SPH 247 Statistical Analysis of Laboratory Data 1 May 5, 2015 SPH 247 Statistical Analysis of Laboratory Data.

Slides:



Advertisements
Similar presentations
Test of (µ 1 – µ 2 ),  1 =  2, Populations Normal Test Statistic and df = n 1 + n 2 – 2 2– )1– 2 ( 2 1 )1– 1 ( 2 where ] 2 – 1 [–
Advertisements

Tests of Significance for Regression & Correlation b* will equal the population parameter of the slope rather thanbecause beta has another meaning with.
Uncertainty in fall time surrogate Prediction variance vs. data sensitivity – Non-uniform noise – Example Uncertainty in fall time data Bootstrapping.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.
Graphing Linear Equations In Slope-Intercept Form.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Intro to Statistics for the Behavioral Sciences PSYC 1900
PSY 1950 General Linear Model November 12, The General Linear Model Or, What the Hell ’ s Going on During Estimation?
GCB/CIS 535 Microarray Topics John Tobias November 8th, 2004.
Basic Mathematics for Portfolio Management. Statistics Variables x, y, z Constants a, b Observations {x n, y n |n=1,…N} Mean.
Assumption and Data Transformation. Assumption of Anova The error terms are randomly, independently, and normally distributed The error terms are randomly,
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Genomic Profiles of Brain Tissue in Humans and Chimpanzees.
SPH 247 Statistical Analysis of Laboratory Data April 9, 2013SPH 247 Statistical Analysis of Laboratory Data1.
Statistics for Biologists 1. Outline estimation and hypothesis testing two sample comparisons linear models non-linear models application to genome scale.
Inferential Statistics: SPSS
Inference for regression - Simple linear regression
Chapter 11 Simple Regression
Multiple testing in high- throughput biology Petter Mostad.
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Notes on Data Collection and Analysis Dale Weber PLTW EDD Fall 2009.
Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable.
1 Two Color Microarrays EPP 245/298 Statistical Analysis of Laboratory Data.
We will use Gauss-Jordan elimination to determine the solution set of this linear system.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
Statistical analysis of expression data: Normalization, differential expression and multiple testing Jelle Goeman.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
Fundamental Statistics in Applied Linguistics Research Spring 2010 Weekend MA Program on Applied English Dr. Da-Fu Huang.
Multiple Regression BPS chapter 28 © 2006 W.H. Freeman and Company.
Scientific Computing General Least Squares. Polynomial Least Squares Polynomial Least Squares: We assume that the class of functions is the class of all.
Exercise 1 You have a clinical study in which 10 patients will either get the standard treatment or a new treatment Randomize which 5 of the 10 get the.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Analysis Overheads1 Analyzing Heterogeneous Distributions: Multiple Regression Analysis Analog to the ANOVA is restricted to a single categorical between.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Single-Factor Studies KNNL – Chapter 16. Single-Factor Models Independent Variable can be qualitative or quantitative If Quantitative, we typically assume.
1 Example Analysis of a Two-Color Array Experiment Using LIMMA 3/30/2011 Copyright © 2011 Dan Nettleton.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
© Buddy Freeman, 2015 Let X and Y be two normally distributed random variables satisfying the equality of variance assumption both ways. For clarity let.
General Linear Model.
Lecture 6 Design Matrices and ANOVA and how this is done in LIMMA.
Linear Models Alan Lee Sample presentation for STATS 760.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
June 30, 2008Stat Lecture 16 - Regression1 Inference for relationships between variables Statistics Lecture 16.
One-way ANOVA Example Analysis of Variance Hypotheses Model & Assumptions Analysis of Variance Multiple Comparisons Checking Assumptions.
V. Rouillard  Introduction to measurement and statistical analysis CURVE FITTING In graphical form, drawing a line (curve) of best fit through.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
> FDRpvalue FDRTPvalue plot(LimmaFit$coefficients[complete.data],-log(FDRTPvalue[complete.data],base=10),type="p",main="Two-Sample.
SPH 247 Statistical Analysis of Laboratory Data. Fitting a model to genes We can fit a model to the data of each gene after the whole arrays have been.
Expression profiling & functional genomics Exercises.
Differences Among Groups
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.
Chapter 11 REGRESSION Multiple Regression  Uses  Explanation  Prediction.
Lab 5 Unsupervised and supervised clustering Feb 22 th 2012 Daniel Fernandez Alejandro Quiroz.
Analyze Of VAriance. Application fields ◦ Comparing means for more than two independent samples = examining relationship between categorical->metric variables.
RNA-Seq analysis in R (Bioconductor)
B&A ; and REGRESSION - ANCOVA B&A ; and
Differential Gene Expression
BIVARIATE REGRESSION AND CORRELATION
No notecard for this quiz!!
Simple Linear Regression
Graphing Linear Equations
Presentation transcript:

SPH 247 Statistical Analysis of Laboratory Data 1 May 5, 2015 SPH 247 Statistical Analysis of Laboratory Data

limma-voom We have seen how limma can fit a linear model to each gene in microarray This is not limited to microarrays and can be used also with RNA-Seq data The main problem is that the linear models used in limma assume that the error variance is equal at all levels But in RNA-Seq, the variance rises with the mean Voom applies weights to address this problem. May 5, 2015SPH 247 Statistical Analysis of Laboratory Data2

Bottomly Data Two inbred mouse strains, C7BL6/6J (10 animals) and DBA/2J (11 animals) Gene expression in striatal neurons by RNA-Seq and expression arrays. 36,536 unique genes of which 11,870 had a count of at least 10 across the 21 samples. We can find differentially expressed genes with limma- voom April 28, 2015SPH 247 Statistics for Laboratory Data3

Data Files bottomly_count_table.txt rows (genes) plus one header row of column names 22 columns, 1 column of gene names, 10 C7BL6, 11 DBA2 gene SRX SRX SRX SRX ENSMUSG ENSMUSG ENSMUSG ENSMUSG ENSMUSG May 5, 2015SPH 247 Statistical Analysis of Laboratory Data4

Data Files bottomly_phenodata.txt 21 rows, one per sample 5 columns sample.id num.tech.reps strain experiment.number lane.number 1 SRX C57BL/6J SRX C57BL/6J SRX C57BL/6J SRX C57BL/6J SRX C57BL/6J 6 3 Three experiments (4, 6, 7), 7 lanes per experiment, C7BL6 in lanes 6, 7, 8 of experiment 4 C7BL6 in lanes 1, 2, 3, 5 of experiment 6 C7BL6 in lanes 1, 2, 3 of experiment 7 Two possible factors: strain (intended) experiment number (block?) May 5, 2015SPH 247 Statistical Analysis of Laboratory Data5

Input code source("input.bottomly.R") tmp = min.count), ] data2 <- read.table("bottomly_phenodata.txt", header=T) strain <- factor(as.character(data2$strain)) exp.num <- factor(as.character(data2$experiment.number)) 1) Reads count data from file 2) Eliminates gene names as a column, but adds them as row names 3) Filters data to eliminate genes with a mean count of less than one per sample (11175 of genes retained) 4) Reads characteristics of the sample 5) Creates variables for strain and experiment number May 5, 2015SPH 247 Statistical Analysis of Laboratory Data6

Analysis Code source("limma.analysis.R") require(limma) require(edgeR) dge <- DGEList(counts=count.data, genes=rownames(count.data), group=strain) dge <- calcNormFactors(dge) design1 <- model.matrix(~strain) v1 <- voom(dge, design1, plot = TRUE) fit1 <- lmFit(v1, design1) fit1 <- eBayes(fit1) 1) Creates dge object 2) Normalizes 3) Design matrix from strain only 4) Log transform and apply variance weights with voom 5) Fits linear models to each gene 6) “Shrinks” variances May 5, 2015SPH 247 Statistical Analysis of Laboratory Data7

May 5, 2015SPH 247 Statistical Analysis of Laboratory Data8

Results > names(fit1) [1] "coefficients" "stdev.unscaled" "sigma" "df.residual" [5] "cov.coefficients" "pivot" "genes" "Amean" [9] "method" "design" "df.prior" "s2.prior" [13] "var.prior" "proportion" "s2.post" "t" [17] "df.total" "p.value" "lods" "F" [21] "F.p.value" > colnames(fit1$p.value) [1] "(Intercept)" "strainDBA/2J“ > hist(fit1$p.value[,2]) > sum(fit1$p.value[,2] < 1e-4) [1] 385 > sum(p.adjust(fit1$p.value[,2],"fdr") < 0.05) [1] 1011 May 5, 2015SPH 247 Statistical Analysis of Laboratory Data9

May 5, 2015SPH 247 Statistical Analysis of Laboratory Data10

Exercise Edit code so that exp.num is fit as well as strain. Compare the number of FDR significant genes What is the overlap? design2 <- model.matrix(~exp.num+strain) The p.value matrix will have four columns instead of 2. Note that the blocking variables should be first in the formula if any kind of anova will be done. May 5, 2015SPH 247 Statistical Analysis of Laboratory Data11