An Introduction to Statistical Inference Mike Wasikowski June 12, 2008.

Slides:



Advertisements
Similar presentations
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Advertisements

Inferential Statistics
Economics 105: Statistics Go over GH 11 & 12 GH 13 & 14 due Thursday.
Is it statistically significant?
PSY 307 – Statistics for the Behavioral Sciences Chapter 20 – Tests for Ranked Data, Choosing Statistical Tests.
Chapter Seventeen HYPOTHESIS TESTING
9-1 Hypothesis Testing Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental.
Topic 2: Statistical Concepts and Market Returns
Evaluating Hypotheses
Final Review Session.
Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 10 Notes Class notes for ISE 201 San Jose State University.
Hypothesis Testing for Population Means and Proportions
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
4-1 Statistical Inference The field of statistical inference consists of those methods used to make decisions or draw conclusions about a population.
Chapter 2 Simple Comparative Experiments
Horng-Chyi HorngStatistics II41 Inference on the Mean of a Population - Variance Known H 0 :  =  0 H 0 :  =  0 H 1 :    0, where  0 is a specified.
Experimental Evaluation
Chapter 11: Inference for Distributions
Inferences About Process Quality
BCOR 1020 Business Statistics Lecture 18 – March 20, 2008.
Chapter 9 Hypothesis Testing.
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
BCOR 1020 Business Statistics
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
5-3 Inference on the Means of Two Populations, Variances Unknown
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
Nonparametrics and goodness of fit Petter Mostad
Nonparametric or Distribution-free Tests
Chapter 9 Title and Outline 1 9 Tests of Hypotheses for a Single Sample 9-1 Hypothesis Testing Statistical Hypotheses Tests of Statistical.
Copyright © 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Statistical Inference: Estimation and Hypothesis Testing chapter.
AM Recitation 2/10/11.
Hypothesis Testing:.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
More About Significance Tests
NONPARAMETRIC STATISTICS
Statistics for bioinformatics Filtering microarray data.
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Chapter 15 Data Analysis: Testing for Significant Differences.
Non-parametric Tests. With histograms like these, there really isn’t a need to perform the Shapiro-Wilk tests!
9-1 Hypothesis Testing Statistical Hypotheses Definition Statistical hypothesis testing and confidence interval estimation of parameters are.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Confidence intervals and hypothesis testing Petter Mostad
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Nonparametric Statistics
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Lesson 15 - R Chapter 15 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
1 9 Tests of Hypotheses for a Single Sample. © John Wiley & Sons, Inc. Applied Statistics and Probability for Engineers, by Montgomery and Runger. 9-1.
Nonparametric Statistical Methods. Definition When the data is generated from process (model) that is known except for finite number of unknown parameters.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Nonparametric Statistical Methods. Definition When the data is generated from process (model) that is known except for finite number of unknown parameters.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
ENGR 610 Applied Statistics Fall Week 7 Marshall University CITE Jack Smith.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Created by Erin Hodgess, Houston, Texas Section 7-1 & 7-2 Overview and Basics of Hypothesis Testing.
Hypothesis Tests u Structure of hypothesis tests 1. choose the appropriate test »based on: data characteristics, study objectives »parametric or nonparametric.
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
Two-Sample-Means-1 Two Independent Populations (Chapter 6) Develop a confidence interval for the difference in means between two independent normal populations.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
1 Underlying population distribution is continuous. No other assumptions. Data need not be quantitative, but may be categorical or rank data. Very quick.
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. Part Four ANALYSIS AND PRESENTATION OF DATA.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Part Four ANALYSIS AND PRESENTATION OF DATA
Chapter 4. Inference about Process Quality
Presentation transcript:

An Introduction to Statistical Inference Mike Wasikowski June 12, 2008

Statistics Up till now, we have looked at probability Analyzing data in which chance played some part of its development Two main branches Estimation of parameters Testing hypotheses about parameters To use statistical analysis, must ensure we have a random sample of the population Methods described are classical methods, "probability of data D given hypothesis H" Bayesian methods are also sometimes used, "probability of hypothesis H given data D"

Contents What is an estimator? Unbiased estimators Biased estimators Parametric hypothesis tests Nonparametric hypothesis tests Multiple tests/experiments

Classical Estimation Methods Probability distributions P X (x;θ) and density functions f X (x;θ) have parameters Can use the observed value x of X to estimate θ To estimate the parameters, must use multiple iid observations, x 1, x 2,..., x n Estimator of parameters θ is a function of the rv's X 1, X 2,..., X n, written as either θ(X 1, X 2,..., X n ) or θ Value of θ is the estimate of θ

Desirable Properties of Estimators Unbiased: E(θ) = θ Small variance: observed value of θ should be close to θ Normal distribution, either exactly or approximately: allows us to use the properties of the normal distribution to provide properties of θ

Estimating μ Use X to estimate μ Know the mean value of X is μ, so X is unbiased Know the variance of X is σ 2 /n, so it is small when n is large Central limit theorem tells us that the distribution of X will be approximately normal with a large number of observations Our estimated value of μ is x

Confidence Intervals From section , for large n, P(X- 2σ/sqrt(n) < μ < X+2σ/sqrt(n)) ~ 0.95 Probability of random interval (X- 2σ/sqrt(n), X+2σ/sqrt(n)) containing μ is approximately 95% Observed value of interval given all x i is (x-2σ/sqrt(n),x+2σ/sqrt(n))

Estimating σ 2 Can develop unbiased estimator of σ 2 by σ 2 = (Σ(X i -X) 2 )/(n-1) Our estimated value of σ 2 is s 2 = (Σ(x i - x) 2 )/(n-1) One potential problem: unless n is very large, this variance will also typically be large Variance of X = σ 2 /n = (Σ(X i -X) 2 )/(n(n-1))

Estimated Confidence Intervals We then have the 95% confidence interval for μ as (X-2S/sqrt(n), X+2S/sqrt(n)) Observed interval from data is (x- 2s/sqrt(n), x+2s/sqrt(n)) Again, warning: unless n is very large, this interval will be large and may not be useful

Binomial and Multinomial Probability Estimates Consider RV Y(p,n), where p is a parameter and n is the index Know the mean value of Y/n is p and variance of Y/n is p(1-p)/n By above, p = Y/n is an unbiased estimator of p Typical estimate of variance of p is p(1-p)/n = y(n-y)/n 3, where y = number of successes Above estimate is biased, unbiased estimate is y(n-y)/(n 3 - n) similarly to σ 2 estimate Estimate of {p i } is similarly calculated by converting multinomial problem into a series of binomial problems

Biased Estimators Not all estimators are unbiased Biased estimator θ is one where E(θ) differs from θ Bias = E(θ)- θ Assess accuracy of θ by MSE rather than variance MSE(θ) = E((θ - θ)^2) = Var(θ)+Bias(θ) 2 When E(θ) = θ+O(n -1 ), call the estimator asymptotically unbiased MSE and variance would differ by O(n -2 )

Why use biased estimators? Some parameters cannot be estimated in an unbiased manner Biased estimators are better than unbiased estimators if MSE < variance

Hypothesis Testing Test a null hypothesis (H 0 ) versus an alternate hypothesis (H 1 or H a ) Five steps: 1)Declare null hypothesis and alternate hypothesis 2)Select significance level α 3)Determine the test statistic to be used 4)Determine what observed values of test statistic would lead to rejection of H 0 5)Use data to determine whether observed value of test statistic meets or exceeds significance point from step 4

Declaring Hypotheses Must declare null hypothesis and alternate hypothesis before seeing any data to avoid bias Hypotheses can be simple (specifies all values of unknown parameters) or complex (does not specify all values of unknown parameters) Natural alternate hypothesis is complex Alternate hypotheses can be either one-sided (θ> θ 0 or θ < θ 0 ) or two-sided (θ != θ 0 )

Selecting Significance Level Two types of errors can be made from a hypothesis test Type I: reject H 0 when it is true Type II: fail to reject H 0 when it is false Unless we have limitless observations, cannot make the probability of making either error arbitrarily small Typical method is to focus on type I errors and fix α to be arbitrarily low Common values of α are 1% and 5%

Choosing Test Statistic There is much theory available for choosing good test statistics Chapter 9 (Alex) discusses finding the optimal test statistic that, for a given type I error rate, will minimize the rate of making type II errors for a number of observations

Finding Significance Points Find the value of the significance points K for the test statistic General ex: α = 0.05, P(type I error) = P(X >= K | H 0 ) = 0.05 If the RV is discrete, it may be impossible to find an exact value of K such that the rate of type I errors is exactly α In practice, we err conservatively and round up the value of K

Finding Conclusions Compare the result of the test statistic derived from observations to significance points K Two conclusions can be drawn from a hypothesis test: fail to reject null, or reject null in favor of alternate A hypothesis test never tells you if a hypothesis is true or false

P-values An equivalent method skips calculating significance point K Instead, calculate the achieved significance level (p-value) of the test statistic Then compare p-value to α If p-value <= α, reject H 0 If p-value > α, fail to reject H 0

Power of Hypothesis Tests Recall step 3 involves choosing an optimal test statistic If both hypotheses are simple, choice of α implicitly determines β, rate of type II error Power of hypothesis test = 1- β, rate of avoiding type II errors If we have a complex alternate hypothesis, probability of rejecting H 0 depends on actual value of parameters in test, so there is no unique value of beta Chapter 9 discusses how to find the power of tests with alternate hypotheses

Z-test Classic example: what is the mean of data drawn from a normal distribution? H 0 : μ = μ 0, H 1: μ > μ 0 Use X as our optimal test statistic RV Z = (X - μ 0 )sqrt(n)/σ has distribution N(0,1) when H 0 is true For α = 0.05, get Z ≥ for significance level

One-sample t-test Must estimate the sample variance with s 2 Now use one-sample t-test, t = (x-μ 0 )sqrt(n)/s If we know that X_1, X_2,..., X_n are NID(mu,sigma^2), H 0 distribution is well known T = (X-μ 0 )sqrt(n)/S Called the t-distribution with n-1 degrees of freedom T is asymptotically equal to Z, differs greatly for small n

Two-sample t-test What if we need to compare between two different RV's? Ex: repeated experiment comparing two methods H 0 : μ 1 = μ 2, H 1 = μ 1 != μ 2 Consider X 11, X 12,..., X 1m ~ NID(μ 1, σ 2 ) and X 21, X 22,..., X 2n ~ NID(μ 2,σ 2 ) to be RV's from which our observations are drawn Use two-sample t-test Large positive or negative values cause rejection of H 0

Two-sample t-test T-distribution RV Observed value of RV

Paired t-test Suppose values of X 1i and X 2i are logically paired by some manner Can instead perform a paired t-test, use D i = X 1i -X 2i for our test H 0 : μ D = 0, H 1 : μ D != 0 Then use T = Dsqrt(n)/S D as our test statistic This method can eliminate sources of variance Beginnings of source for ANOVA, where we break variation into different components Also foundations for F-test, test of ratio between two variances

Chi-square test Consider a multinomial distribution H 0 : p i = specific value for each i=1..k, H 1 : at least one of p i != predefined value Use X 2 as our test statistic, X 2 = Σ(Y i -np i ) 2 /(np i ) Larger observed values of X 2 will lead to rejection of H 0 When H 0 is true and n is large, X 2 ~ chi-square distribution with k-1 degrees of freedom

Association tests Compare elements of a population by placing into one of a number of categories for two properties Fisher's exact test compares two different binary properties of a population H 0 : two properties are independent of one another, H 1 : two properties are dependent in some manner Can also use chi-square test on tables of arbitrary number of rows and columns

Hypothesis Testing with Maximum as Test Statistic Bioinformatics has several areas where maximum of many RV's is a useful test statistic BLAST, local alignment of sequences, only care about the most likely alignment Let X 1, X 2,..., X n ~ N(μ i,1) H 0 : μ i = 0 for all i, H 1 : one μ i > 0 with the rest μ i = 0 Optimal test statistic: X max Reject H 0 if P(X max > x max | H_0) < α Use equation 1-F(x max ) n to find P-value Some options still exist if we cannot calculate the cdf, one possibility is total variation distance

Nonparametric Tests Two-sample t-test is a distribution-dependent test, relies on RV's having the normal distribution If we use the t-test when at least one of the underlying RV's is not normal, using the calculated p-value will result in an invalid testing procedure Nonparametric, or distribution-free, tests avoid problems with using tests specific to a distribution

Permutation Tests Avoids assumption of normal distribution Have RV's X 11, X 12,..., X 1m iid and X 21, X 22,..., X 2n iid, with possibly differing distributions Assume that X 1i independent of X 2j for all (i,j) H 0 : X 1i 's distributed identically as X 2j 's, H 1 : distributions differ Q = nCr(m+n,m) possible placements of X 11, X 12,..., X 1m, X 21, X 22,..., X 2n into two groups, permutations H 0 says each Q has same probability of arising

Permutation Tests Calculate test statistic for each permutation Reject H 0 if observed value of statistic is among the most 100α% extreme values of the test statistic Choice of test statistic depends on what we think may be different about the two distributions t-tests could be used if we feel they have different means, F-test if different variances Problems with these tests: granularity with too few samples, computational complexity with too many

Mann-Whitney Test Frequently used alternative to two-sample t-test Observed values x 11, x 12,...,x 1m and x 21, x 22,..., x 2n are listed in increasing order Associate all observations with their rank in this list Sum of all ranks is (m+n)(m+n+1)/2 H 0 : X 1i 's, X 2j 's are identically distributed, H 1 : at least one parameter of the distributions differ For large sample sizes, use central limit theorem to test null hypothesis using z-score For small sample sizes, can calculate exact p-value as a permutation test

Wilcoxon Signed-rank Test Test for value of the median of a generic continuous RV; if distribution is symmetric, also tests for mean H0: med = M 0, H1: med != M 0 Calculate absolute differences |x i - M 0 |, order from smallest to largest, give ranks to each value Observed test statistic = sum of ranks of positive differences Use central limit theorem to compare groups with large number of samples Can also calculate exact p-value as permutation for small sample sizes

Multiple Associated Tests If we test many associated hypotheses where each H 0 is true, chance will lead to one or more being rejected Family-wide p-value can be used to avoid this result If we want a family-wide significance level of 0.05, each test should have α = 0.05/g, the number of different tests we are performing This correction applies even if the tests are not independent of one another, recall indicator variable discussion Obvious problem: if we perform multiple different tests, this procedure will result in a very low required p-value to reject H 0 for each individual test

Multiple Experiments In science, it is common to repeat tests to verify results What if the p-values of each test are close to α but not less? Use a combined p-value to show significance of each p-value in conjunction with others V = -2log(P 1 P 2...P k ) gives a quantity with a chi- square distribution of 2k degrees of freedom Can result in seeing significant results when no individual null hypothesis was rejected

Questions?