1-3-20041 Assume we have two experimental conditions (j=1,2) We measure expression of all genes n times under both experimental conditions (n two- channel.

Slides:



Advertisements
Similar presentations
Chapter 9 Hypothesis Testing Understandable Statistics Ninth Edition
Advertisements

Our goal is to assess the evidence provided by the data in favor of some claim about the population. Section 6.2Tests of Significance.
Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved.
© 2013 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Introductory Statistics: Exploring the World through.
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
Elementary hypothesis testing
Nemours Biomedical Research Statistics March 19, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Hypothesis Testing Steps of a Statistical Significance Test. 1. Assumptions Type of data, form of population, method of sampling, sample size.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
UNDERSTANDING RESEARCH RESULTS: STATISTICAL INFERENCE.
Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Inferences About Means of Single Samples Chapter 10 Homework: 1-6.
4-1 Statistical Inference The field of statistical inference consists of those methods used to make decisions or draw conclusions about a population.
BCOR 1020 Business Statistics Lecture 21 – April 8, 2008.
Review of Stats Fundamentals
Chapter 9 Hypothesis Testing.
Statistics 270– Lecture 25. Cautions about Z-Tests Data must be a random sample Outliers can distort results Shape of the population distribution matters.
5-3 Inference on the Means of Two Populations, Variances Unknown
Statistical Inference Dr. Mona Hassan Ahmed Prof. of Biostatistics HIPH, Alexandria University.
Experimental Statistics - week 2
Lecture 8 1 Hypothesis tests Hypothesis H 0 : Null-hypothesis is an conjecture which we assume is true until we have too much evidence against it. H 1.
Section 9.1 Introduction to Statistical Tests 9.1 / 1 Hypothesis testing is used to make decisions concerning the value of a parameter.
Hypothesis Testing.
4-1 Statistical Inference The field of statistical inference consists of those methods used to make decisions or draw conclusions about a population.
1 Level of Significance α is a predetermined value by convention usually 0.05 α = 0.05 corresponds to the 95% confidence level We are accepting the risk.
- Interfering factors in the comparison of two sample means using unpaired samples may inflate the pooled estimate of variance of test results. - It is.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
Today’s lesson Confidence intervals for the expected value of a random variable. Determining the sample size needed to have a specified probability of.
Chapter 9: Testing Hypotheses
Chapter 8 Introduction to Hypothesis Testing
Hypothesis testing Chapter 9. Introduction to Statistical Tests.
RDPStatistical Methods in Scientific Research - Lecture 11 Lecture 1 Interpretation of data 1.1 A study in anorexia nervosa 1.2 Testing the difference.
Essential Statistics Chapter 131 Introduction to Inference.
First approach - repeating a simple analysis for each gene separately - 30k times Assume we have two experimental conditions (j=1,2) We measure.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Hypothesis Testing Hypothesis Testing Topic 11. Hypothesis Testing Another way of looking at statistical inference in which we want to ask a question.
Agresti/Franklin Statistics, 1 of 122 Chapter 8 Statistical inference: Significance Tests About Hypotheses Learn …. To use an inferential method called.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 12 Inference About A Population.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Chapter 8 Introduction to Hypothesis Testing ©. Chapter 8 - Chapter Outcomes After studying the material in this chapter, you should be able to: 4 Formulate.
Back to basics – Probability, Conditional Probability and Independence Probability of an outcome in an experiment is the proportion of times that.
1 When we free ourselves of desire, we will know serenity and freedom.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
BPS - 3rd Ed. Chapter 141 Tests of significance: the basics.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Ex St 801 Statistical Methods Inference about a Single Population Mean.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Logic and Vocabulary of Hypothesis Tests Chapter 13.
AP Statistics Section 11.1 B More on Significance Tests.
9.3/9.4 Hypothesis tests concerning a population mean when  is known- Goals Be able to state the test statistic. Be able to define, interpret and calculate.
Formulating the Hypothesis null hypothesis 4 The null hypothesis is a statement about the population value that will be tested. null hypothesis 4 The null.
- We have samples for each of two conditions. We provide an answer for “Are the two sample means significantly different from each other, or could both.
Statistical Inference Drawing conclusions (“to infer”) about a population based upon data from a sample. Drawing conclusions (“to infer”) about a population.
Applied Quantitative Analysis and Practices LECTURE#14 By Dr. Osman Sadiq Paracha.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman 1 Assumptions 1) Sample is large (n > 30) a) Central limit theorem applies b) Can.
PEP-PMMA Training Session Statistical inference Lima, Peru Abdelkrim Araar / Jean-Yves Duclos 9-10 June 2007.
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Chapter 9 Hypothesis Testing Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
4-1 Statistical Inference Statistical inference is to make decisions or draw conclusions about a population using the information contained in a sample.
Ex St 801 Statistical Methods Part 2 Inference about a Single Population Mean (HYP)
P-value Approach for Test Conclusion
Chapter 9: Hypothesis Tests Based on a Single Sample
Basic Practice of Statistics - 3rd Edition Introduction to Inference
Statistical Inference for the Mean: t-test
Presentation transcript:

Assume we have two experimental conditions (j=1,2) We measure expression of all genes n times under both experimental conditions (n two- channel microarrays) For a specific gene (focusing on a single gene) x ij = i th measurement under condition j Statistical models for expression measurements under two different Identifying Differentially Expressed Genes  1,  2,  are unknown model parameters -  j represents the average expression measurement in the large number of replicated experiments,  represents the variability of measurements Question if the gene is differentially expressed corresponds to assessing if  1   2 Strength of evidence in the observed data that this is the case is expressed in terms of a p- value

Estimate the model parameters based on the data P-value Calculating t-statistic which summarizes information about our hypothesis of interest (  1   2 ) Establishing the null-distribution of the t-statistic (the distribution assuming the “null- hypothesis” that  1 =  2 ) The “null-distribution” in this case turns out to be the t-distribution with n-1 degrees of freedom P-value is the probability of observing as extreme or more extreme value under the “null- distribution” as it was calculated from the data (t * )

t-distribution Number of experimental replicates affects the precision at two levels 1.Everything else being equal, increase in sample size increases the t * 2.Everything else being equal, increase in sample size “shrinks” the “null-distribution” Suppose that t * =3. What is the difference in p-values depending on the sample size alone. p-value = 0.2 p-value = 0.1 p-value = 0.01 p-value = 0.003

t-distribution #Plotting t-distributions with different degrees of freedom x<-seq(-5,5,.1) plot(x,dt(x,100),type="l",col="black",lwd=2,ylab="") lines(x,dt(x,10),col="green",lwd=2) lines(x,dt(x,2),col="blue",lwd=2) lines(x,dt(x,1),col="red",lwd=2) legend(2, y =0.4, c("df = 1; p-value = ","df = 2","df = 10","df = 100"), col = c("red","blue","green","black"), lty=rep("solid",4), lwd=2) #Calculating two-tailed p-values > 2*pt(3,100,lower.tail=FALSE) [1] > 2*pt(3,10,lower.tail=FALSE) [1] > 2*pt(3,2,lower.tail=FALSE) [1] > 2*pt(3,1,lower.tail=FALSE) [1] >

Performing t-test >loadURL(" > SimpleData[1,] Name ID W1 C1 W2 C2 W3 C3 C4 W4 C5 W5 C6 W6 1 no name Rn > LSimpleData<-SimpleData > LSimpleData[,3:14]<-log(SimpleData[,3:14],base=2) > LSimpleData[1,] Name ID W1 C1 W2 C2 W3 C3 C4 W4 C5 W5 C6 W6 1 no name Rn > grep("W",dimnames(SimpleData)[[2]]) [1] > grep("C",dimnames(SimpleData)[[2]]) [1] > W<-grep("W",dimnames(SimpleData)[[2]]) > C<-grep("C",dimnames(SimpleData)[[2]]) > t.test(LSimpleData[1,W],LSimpleData[1,C],var.equal=TRUE) Two Sample t-test data: LSimpleData[1, W] and LSimpleData[1, C] t = , df = 10, p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean of x mean of y

Performing t-test > MW<-mean(t(LSimpleData[1,W])) > MW [1] > MC<-mean(t(LSimpleData[1,C])) > MC [1] > VW<-var(t(LSimpleData[1,W])) > VW > VC<-var(t(LSimpleData[1,C])) > VC > NW<-sum(!is.na(LSimpleData[1,W])) > NW [1] 6 > NC<-sum(!is.na(LSimpleData[1,C])) > NC [1] 6 > VWC<-(((NW-1)*VW)+((NC-)*VC))/(NC+NW-2) > VWC > DF<-NW+NC-2 > DF [1] 10 > TStat<-abs(MW-MC)/((VWC*((1/NW)+(1/NC)))^0.5) > TStat > TPvalue<-2*pt(TStat,DF,lower.tail=FALSE) > TPvalue > >t.test(LSimpleData[1,W],LSimpleData[1,C],var.equal=TRUE) Two Sample t-test data: LSimpleData[1, W] and LSimpleData[1, C] t = , df = 10, p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean of x mean of y source( source(

Statistical Inference and Statistical Significance – P-value Statistical Inference consists of drawing conclusions about the measured phenomenon (e.g. gene expression) in terms of probabilistic statements based on observed data. P-value is one way of doing this. P-value is NOT the probability of null hypothesis being true. Rigorous interpretation of p-value is tricky. It was introduced to measure the level of evidence against the “null-hypothesis” or better to say in favor of a “positive experimental finding” In this context p-value of could be interpreted as a stronger evidence than the p- value of 0.01 Establishing Statistical Significance (is a difference in expression level statistically significant or not) requires that we establish “cut-off” points for our “measure of significance” (p-value) For various historic reasons the cut-off 0.05 is generally used to establish “statistical significance”. It’s a rather arbitrary cut-off, but it is taken as a gold standard Originally the p-value was introduced as a descriptive measure to be used in conjuction with other criteria to judge the strength of evidence one way or another

Statistical Inference and Statistical Significance-Hypothesis Testing The 5% cut-off points comes from the Hypothesis testing world In this world the exact magnitude of p-value does not matter. It only matters if it is smaller than the pre-specified statistical significance cut-off (  ). The null hypothesis is rejected in favor of the alternative hypothesis at a significance level of  = 0.05 if p-value<0.05 Type I error is committed when the null-hypothesis is falsely rejected Type II error is committed when the null-hypothesis is not rejected but it is false By following this “decision making scheme” you will on average falsely reject 5% of null- hypothesis If such a “decision making scheme” is adopted to identify differentially expressed genes on a microarray, 5% of non-differentially expressed genes will be falsely implicated as differentially expressed. Family-wise Type I Error is committed if any of a set of null hypothesis is falsely rejected Establishing statistical significance is a necessary but not sufficient step in assuring the “reproducibility” of a scientific finding – Important point that will be further discussed when we start talking about issues in experimental design The other essential ingredient is a “representative sample” from the “population of interest” This is still a murky point in molecular biology experimentation