A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.

Slides:



Advertisements
Similar presentations
Bayesian mixture models for analysing gene expression data Natalia Bochkina In collaboration with Alex Lewin, Sylvia Richardson, BAIR Consortium Imperial.
Advertisements

Linear Models for Microarray Data
BPS - 5th Ed. Chapter 241 One-Way Analysis of Variance: Comparing Several Means.
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
Microarray Data Analysis Statistical methods to detect differentially expressed genes.
Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.
Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
Gene Expression Data Analyses (3)
Differentially expressed genes
ANOVA Determining Which Means Differ in Single Factor Models Determining Which Means Differ in Single Factor Models.
1 Lecture 21, Statistics 246, April 8, 2004 Identifying expression differences in cDNA microarray experiments, cont.
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
1 Test of significance for small samples Javier Cabrera.
Making Sense of Complicated Microarray Data
Significance Tests P-values and Q-values. Outline Statistical significance in multiple testing Statistical significance in multiple testing Empirical.
Today Concepts underlying inferential statistics
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
General Linear Model & Classical Inference
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Wfleabase.org/docs/tileMEseq0905.pdf Notes and statistics on base level expression May 2009Don Gilbert Biology Dept., Indiana University
Multiple testing in high- throughput biology Petter Mostad.
Essential Statistics in Biology: Getting the Numbers Right
1 Use of the Half-Normal Probability Plot to Identify Significant Effects for Microarray Data C. F. Jeff Wu University of Michigan (joint work with G.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Differential Expression II Adding power by modeling all the genes Oct 06.
CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.
Differential Gene Expression Dennis Kostka, Christine Steinhoff Slides adapted from Rainer Spang.
We calculated a t-test for 30,000 genes at once How do we handle results, present data and results Normalization of the data as a mean of removing.
Design of Experiments Problem formulation Setting up the experiment Analysis of data Panu Somervuo, March 20, 2007.
Microarray data analysis David A. McClellan, Ph.D. Introduction to Bioinformatics Brigham Young University Dept. Integrative Biology.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Statistical analysis of expression data: Normalization, differential expression and multiple testing Jelle Goeman.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
Introduction to Microarrays Dr. Özlem İLK & İbrahim ERKAN 2011, Ankara.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
One-way ANOVA: - Comparing the means IPS chapter 12.2 © 2006 W.H. Freeman and Company.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
Statistics for Differential Expression Naomi Altman Oct. 06.
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
1 Example Analysis of a Two-Color Array Experiment Using LIMMA 3/30/2011 Copyright © 2011 Dan Nettleton.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Differential Expressions: Multiple Treatments ANOVA Kruskal Wallis Factorial Set-up.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
1 Estimation of Gene-Specific Variance 2/17/2011 Copyright © 2011 Dan Nettleton.
Variability & Statistical Analysis of Microarray Data GCAT – Georgetown July 2004 Jo Hardin Pomona College
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Canadian Bioinformatics Workshops
Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.
Micro array Data Analysis. Differential Gene Expression Analysis The Experiment Micro-array experiment measures gene expression in Rats (>5000 genes).
Lab 5 Unsupervised and supervised clustering Feb 22 th 2012 Daniel Fernandez Alejandro Quiroz.
Lecture 15 Wrap up of class. What we intended to do and what we have done: Topics: What is the Biological Problem at hand? Types of data: micro-array,
Canadian Bioinformatics Workshops
Estimating the False Discovery Rate in Genome-wide Studies BMI/CS 576 Colin Dewey Fall 2008.
Estimation of Gene-Specific Variance
Differential Gene Expression
Differential Expression of RNA-Seq Data
Presentation transcript:

A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards

The Microarray Study Process

Study Objectives  Class comparison: differential expression  Class prediction: classification  Class discovery: clustering

Differential Expression How to identify genes whose expression level changes across conditions in the study?

Analysis Strategy The study may be to:  Compare two groups (eg treatment vs control)  Compare more than two groups  More than one comparison (eg 2 treatments at 3 timepoints) As a first approximation, we can think of our approach as: 1.Choose the appropriate analysis method for a single gene 2.Apply to all genes, correcting for multiplicity (eg FDR).

Additive and multiplicative scales  Most statistical models use additive scales and constant variance  Gene expression appears to work more on a multiplicate scale (fold changes rather than expression differences), and the variance in gene expression depends on its absolute value.  Conclusion: transform the data by taking logarithms (conventionally base 2).

Fold Change & Log Ratios We have transformed our data by taking logarithms! So differences are log- ratios (log fold changes) log(a/b) = log(a) – log(b) With two-channel (cDNA) data the numbers we analyze (usually) are the within-spot log-ratios: M = log(R) – log(G) To estimate log fold change across replicate slides we compute the average log-ratio across the replicates. With one-channel (affy) data the numbers we analyze are the logs of the expression measures (eg rma) To estimate log fold change between two groups of arrays we compute the average log-expression within each group and calculate the difference. LR = ( Y 1i )/n 1 – ( Y 2i )/n 2

Analysis

then for gene 2,... then for gene

Some examples of methods  Two-sample t-test  Linear regression y t = y 0 + ¯ Z y 0 baseline expression (before treatment) Z (0=control, 1=treatment) ¯ group effect  ANOVA models  Non-parametric tests ....

Multiplicity  Typically a list of p-values is obtained, one per gene.  Now we need to select the ones likely to be differentially expressed.  If we used p<0.05 as criterion this would lead to 1000 (=0.05x20000) genes being selected even though there was no differential expression.

Multiplicity  If select genes using the criterion p < ® /N, where N is total no of genes, (Bonferroni’s correction), this controls the familywise error rate Pr(any type I error) = Pr(any false selections) < ®  But this is usually too stringent.

False Discovery Rate  FDR= Proportion of false positives within selected genes.  Two uses:  If top 100 genes are selected for further study, what proportion may be expected to be false positive?  If we want a proportion of 5% false positives, how many genes should be selected?  Adjusted p-values can be defined (q-values) such that selecting genes with q g < ® results in FDR< ®

LIMMA Package: Linear Models for Microarray Data  arbitrarily complex experiments: linear models, contrasts  empirical Bayes methods for differential expression: t- tests, F-tests, posterior odds  inference methods for duplicate spots, technical replication  analyse log-ratios or log-intensities  spot quality weights  control of FDR across genes and contrasts  stemmed heat diagrams, Venn diagrams  pre-processing: background correction, within and between array normalization Analysis of differential expression studies

Empirical Bayes Methods in Limma  Problem with ordinary t-tests here: small estimates of S.D. can arise by chance, giving false positives.  Limma uses an empirical Bayes approach:  the gene variances are given a prior distribution (the sample distribution). Each variance is then updated using the data to obtain posterior distribution, and an an estimate is derived from the posterior distribution.  This shrinks the variances towards the prior mean. This estimate is then substituted in classical t-statistics (the ”degrees of freedom” are adjusted), giving the so-called moderated t-test.

 Good evidence that this is more robust than the classical approach.  Given a prior estimate p of the proportion of DE genes, the posterior probability p g that a gene g is DE can be calculated. The B-statistic given by Limma is the log-odds ie log(O g =p g /(1- p g )). This is useful for ranking genes.  Smyth, GK (2004). Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments, Stat. Appl. In Genetics and Mol. Biol., 3, 1.