Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Analysis of Microarray Data By H. Bjørn Nielsen.

Similar presentations


Presentation on theme: "Statistical Analysis of Microarray Data By H. Bjørn Nielsen."— Presentation transcript:

1 Statistical Analysis of Microarray Data By H. Bjørn Nielsen

2 Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical Analysis Fit to Model (time series) Expression Index Calculation Advanced Data Analysis ClusteringPCAClassification Promoter Analysis Meta analysisSurvival analysisRegulatory Network Comparable Gene Expression Data Normalization Image analysis The DNA Array Analysis Pipeline

3 What's the question? Typically we want to identify differentially expressed genes without alcoholwith alcohol alcohol dehydrogenase Example: alcohol dehydrogenase is expressed at a higher level when alcohol is added to the media

4 He’s going to say it However, the measurements contain stochastic noise There is no way around it Statistics

5 Noisy measurements p-value statistics You can choose to think of statistics as a black box But, you still need to understand how to interpret the results

6 P-value The chance of rejecting the null hypothesis by coincidence ---------------------------- For gene expression analysis we can say: the chance that a gene is categorized as differentially expressed by coincidence The output of the statistics

7 The statistics gives us a p-value for each gene We can rank the genes according to the p-value But, we can’t really trust the p-value in a strict statistical way!

8 Why not! For two reasons: 1.We are rarely fulfilling all the assumptions of the statistical test 2.We have to take multi-testing into account

9 The t-test Assumptions 1. The observations in the two categories must be independent 2. The observations should be normally distributed 3. The sample size must be ‘large’ (>30 replicates)

10 Multi-testing? In a typical microarray analysis we test thousands of genes If we use a significance level of 0.05 and we test 1000 genes. We expect 50 genes to be significant by chance 1000 x 0.05 = 50

11 log2 fold change (M) P-value Volcano Plot

12 What's inside the black box ‘statistics’ t-test or ANOVA

13 The t-test Calculate T Lookup T in a table

14 The t-test II The t-test tests for difference in means (  ) Intensity of gene x Density  wt wt  mut mutant

15 t The t statistic is based on the sample mean and variance The t-test III

16 Conclusion Array data contains stochastic noise –Therefore statistics is needed to conclude on differential expression We can’t really trust the p-value But the statistics can rank genes The capacity/needs of downstream processes can be used to set cutoff FDR can be estimated t-test is used for two category tests ANOVA is used for multiple categories


Download ppt "Statistical Analysis of Microarray Data By H. Bjørn Nielsen."

Similar presentations


Ads by Google