Micro array Data Analysis. Differential Gene Expression Analysis The Experiment Micro-array experiment measures gene expression in Rats (>5000 genes).

Slides:



Advertisements
Similar presentations
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Advertisements

Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
PTP 560 Research Methods Week 9 Thomas Ruediger, PT.
Comparing Two Population Means The Two-Sample T-Test and T-Interval.
Chapter Seventeen HYPOTHESIS TESTING
10-1 Introduction 10-2 Inference for a Difference in Means of Two Normal Distributions, Variances Known Figure 10-1 Two independent populations.
Statistics Are Fun! Analysis of Variance
Independent Samples and Paired Samples t-tests PSY440 June 24, 2008.
BCOR 1020 Business Statistics
Chapter Goals After completing this chapter, you should be able to:
Overview of Lecture Parametric Analysis is used for
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 9-1 Introduction to Statistics Chapter 10 Estimation and Hypothesis.
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Lecture 9: One Way ANOVA Between Subjects
A Decision-Making Approach
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 10-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
Chapter 2 Simple Comparative Experiments
Chapter 11: Inference for Distributions
Inferences About Process Quality
5-3 Inference on the Means of Two Populations, Variances Unknown
Hypothesis Testing Using The One-Sample t-Test
Microarray Data Analysis
AM Recitation 2/10/11.
Hypothesis Testing:.
Two Sample Tests Ho Ho Ha Ha TEST FOR EQUAL VARIANCES
Comparing Means From Two Sets of Data
Education 793 Class Notes T-tests 29 October 2003.
Inference about Two Population Standard Deviations.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 2 – Slide 1 of 25 Chapter 11 Section 2 Inference about Two Means: Independent.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
Comparing Two Population Means
T tests comparing two means t tests comparing two means.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Two Sample Tests Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama.
Chapter 9 Hypothesis Testing and Estimation for Two Population Parameters.
Hypothesis Testing CSCE 587.
Chapter 11 Inference for Distributions AP Statistics 11.1 – Inference for the Mean of a Population.
Mid-Term Review Final Review Statistical for Business (1)(2)
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Testing means, part II The paired t-test. Outline of lecture Options in statistics –sometimes there is more than one option One-sample t-test: review.
1 10 Statistical Inference for Two Samples 10-1 Inference on the Difference in Means of Two Normal Distributions, Variances Known Hypothesis tests.
Copyright © Cengage Learning. All rights reserved. 14 Elements of Nonparametric Statistics.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
1 ANALYSIS OF VARIANCE (ANOVA) Heibatollah Baghi, and Mastee Badii.
KNR 445 Statistics t-tests Slide 1 Introduction to Hypothesis Testing The z-test.
T tests comparing two means t tests comparing two means.
Statistical Analysis II Lan Kong Associate Professor Division of Biostatistics and Bioinformatics Department of Public Health Sciences December 15, 2015.
© Copyright McGraw-Hill 2004
The t-distribution William Gosset lived from 1876 to 1937 Gosset invented the t -test to handle small samples for quality control in brewing. He wrote.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 26 Chapter 11 Section 1 Inference about Two Means: Dependent Samples.
Hypothesis test flow chart frequency data Measurement scale number of variables 1 basic χ 2 test (19.5) Table I χ 2 test for independence (19.9) Table.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
T tests comparing two means t tests comparing two means.
Lecture 8 Estimation and Hypothesis Testing for Two Population Parameters.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
SUMMARY EQT 271 MADAM SITI AISYAH ZAKARIA SEMESTER /2015.
Chapter 7 Inference Concerning Populations (Numeric Responses)
Oneway ANOVA comparing 3 or more means. Overall Purpose A Oneway ANOVA is used to compare three or more average scores. A Oneway ANOVA is used to compare.
Comparing Multiple Groups:
9.3 Hypothesis Tests for Population Proportions
Two-Sample Hypothesis Testing
Comparing Multiple Groups: Analysis of Variance ANOVA (1-way)
Review of Chapter 11 Comparison of Two Populations
Elementary Statistics
Hypothesis Tests for a Standard Deviation
What are their purposes? What kinds?
Presentation transcript:

Micro array Data Analysis

Differential Gene Expression Analysis The Experiment Micro-array experiment measures gene expression in Rats (>5000 genes). The Rats split into two groups: (WT: Wild-Type Rat, KO: Knock Out Treatment Rat) Each group measured under similar conditions (Paired Experiment). Question: Which genes are affected by the treatment? How significant is the effect? How big is the effect?

Hypothesis Testing Uses hypothesis testing methodology. For each Gene (>5,000) Pose Null Hypothesis (Ho) that gene is not affect Pose Alternative Hypothesis (Ha) that gene is affected Use statistical techniques to calculate the probability of rejecting the hypothesis (p-value) If p-value < some critical value reject Ho and Accept Ha The issues: Estimation of Variance : Limited sample size (= few replicates) Normal Distribution assumptions: Law of large number does not apply Multiple Testing: ~ genes per experiments

Statistics 101 Comparing Two Independent Samples Z Test for the Difference in Two Means (variance known) t Test for Difference in Two Means (variance unknown) F Test for Difference in two Variances Comparing Two Related Samples: t Tests for the Mean Difference Wilcoxon Rank-Sum Test: Difference in Two Medians

Normal Distribution and Confidence Intervals 1-  = 0.95  /2 =

Hypothesis Testing: Two Sample Tests TEST FOR EQUAL VARIANCES TEST FOR EQUAL MEANS HHoHHo HHaHHa Population 1 Population 2 Population 1 Population 2 HHoHHo HHaHHa Population 1 Population 2 Population 1Population 2

Normal Distribution vs T-distribution Difference between normal distribution and t-distribution Normal distributiont-distribution

Single Sample t-test t-test: Used to compare the mean of a sample to a known number (often 0). Assumptions: Subjects are randomly drawn from a population and the distribution of the mean being tested is normal. Test: The hypotheses for a single sample t-test are: H o : u = u 0 H a : u u 0 p-value: probability of error in rejecting the hypothesis of no difference between the two groups. (where u 0 denotes the hypothesized value to which you are comparing a population mean)

Independent Group t-test Independent Group t-test: Used to compare the means of two independent groups. Assumptions: Subjects are randomly assigned to one of two groups. The distribution of the means being compared are normal with equal variances. Test: The hypotheses for the comparison of two independent groups are: H o : u 1 = u 2 (means of the two groups are equal) H a : u 1 <> u 2 (means of the two group are not equal) A low p-value for this test (less than 0.05 for example) means that there is evidence to reject the null hypothesis in favour of the alternative hypothesis.

Paired t-test: Most commonly used to evaluate the difference in means between two groups. Used to compare means on the same or related subject over time or in differing circumstances. Compares the differences in mean and variance between two data sets Example: Test scores between a group of patients who have been given a certain medicine and the other, in which patients have received a placebo Assumptions: The observed data are from the same subject or from a matched subject and are drawn from a population with a normal distribution. Can work with very small values. Paired t-test

Characteristics: Subjects are often tested in a before-after situation (across time, with some intervention occurring such as a diet), or subjects are paired such as with twins, or with subject as alike as possible. An extension of this test is the repeated measure ANOVA. Test: The paired t-test is actually a test that the differences between the two observations is 0. So, if D represents the difference between observations, the hypotheses are: H o : D = 0 (the difference between the two observations is 0) H a : D 0 (the difference is not 0)

Calculating t-test (t statistic) First calculate t statistic value and then calculate p value For the paired student’s t-test, t is calculated using the following formula: And n is the number of pairs being tested. For an unpaired (independent group) student’s t-test, the following formula is used: Where σ (x) is the standard deviation of x and n (x) is the number of elements in x. Where d is calculated by

H 0 :  1   2 H 1 :  1 >  2 H 0 :  1  2 H 0 :  1 -  2  0 H 1 :  1 -  2 > 0 H 0 :  1 -  2  H 1 :  1 -  2 < 0 OR Left Tail Right Tail  H 1 :  1 <  2 Setting Up the Hypothesis H 0 :  1 -  2 = 0 H 1 :  1 -  2  0 H 0 :  1 =  2 H 1 :  1   2 OR Two Tail

Calculating t-test (p value) When carrying out a test, a P-value can be calculated based on the t- value and the ‘Degrees of freedom’. There are three methods for calculating P: One Tailed >: One Tailed <: Two Tailed: Where P is calculated in the following way: The number of degrees (v) of freedom is calculated as: Paired: n (x) +n (y) -2 Unpaired: n- 1 where n is the number of pairs. This value should normally be greater than 1. where B is the beta function:

t-test Calculation & Interpretation Results of the t-test: If the p-value associated with the t-test is small (usually set at p 0.05), there is not enough evidence to reject the null hypothesis, and you conclude that there is evidence that the mean is not different from the hypothesized value. t Reject H Uses a Statistics/Data Mining Software to calculate t and p !!!

Graphical Interpretation The graphical comparison allows you to visually see the distribution of the two groups. If the p-value is low, chances are there will be little overlap between the two distributions. If the p-value is not low, there will be a fair amount of overlap between the two groups. There are a number of options available in the comparison graph to allow you to examine the two groups. These include box plots, means, medians, and error bars.

Back to the Gene Expression problems The Experiment Micro-array experiment measures gene expression in Rats (>5000 genes). The Rats split into two groups: (WT: Wild-Type Rat, KO: Knock Out Treatment Rat) Each group measured under similar conditions (Paired Experiment). Question: Which genes are affected by the treatment? How significant is the effect? How big is the effect? 5000 red groups 5000 blue groups

A Data Analysis Pipeline To find genes that differ in their behaviour between the two classes the pipeline consists of a T-Test for each gene between the two different classes. The results of the T-Test are connected to the original table providing a P-Value that represents the similarity between the two classes.

The Final Table Two more nodes are used. The first to derive a value for effect the difference of the logged mean values of expression for each class. The second is to transform the P-Value on to a log scale to give a measure of significance Effect = log(WT) – log(KO) Significance = - log(p)

Visualise the Result :Volcano Plot Effect vs. Significance Selections of items that have both a large effect and are highly significant can be identified easily. Choosing log scales is a matter of convenience Effect can be both +ve or -ve High Effect & Significance Boring stuff -ve effect+ve effect High p Low p

Numerical Interpretation (Significance) Using log 10 for Y axis: p< 0.1 (1 decimal place) p< 0.01 (2 decimal places) Using log 2 for X axis:

Numerical Interpretation (Effect) Using log 10 for Y axis: Using log 2 for X axis: Effect has doubled 2 1 (2 raised to the power of 1) Two Fold Change Effect has halved (2 raised to the power of 0.5) Fold Change= Technical Jargon for comparing gene expression values

Interpretation of t-test The graph above plots the fold change for each measurement (WT1 vs KO1, WT2 vs KO2, WT3 vs KO2) for the red points Notice all individual fold changes +ve and high, Also notice variation in value is small The graph to the right the fold change for each measurement (WT1 vs KO1, WT2 vs KO2, WT3 vs KO2) for the green point Notice all individual fold changes -ve and high, Also notice variation in value is small fc1 fc2fc3fc4 fc1 fc2fc3fc4

Interpretation of t-test The graph above plots the fold change for each measurement (WT1 vs KO1, WT2 vs KO2, WT3 vs KO2) for the chosen point Notice all individual fold changes +ve and high, Also notice variation in value is large The graph to the right the fold change for each measurement (WT1 vs KO1, WT2 vs KO2, WT3 vs KO2) for the chosen point Notice all individual fold changes are both +ve and -ve and high, also notice variation in value is high fc1 fc2fc3fc4 fc1 fc2fc3fc4

Summary t-Test good for small samples (in our case 4 paired observations) Data Analysis Pipeline suited for repetitive tasks, some task, visual representation intuitive Volcano plot good for large sets of such observations (1000 sets each 4 paired observations)