Inferential Statistics

Slides:



Advertisements
Similar presentations
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Advertisements

CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved.
STATISTICAL INFERENCE PART V
AP Statistics – Chapter 9 Test Review
Hypothesis testing Week 10 Lecture 2.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Chapter Seventeen HYPOTHESIS TESTING
9-1 Hypothesis Testing Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental.
BCOR 1020 Business Statistics Lecture 22 – April 10, 2008.
1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
BCOR 1020 Business Statistics Lecture 18 – March 20, 2008.
Chapter 9 Hypothesis Testing.
BCOR 1020 Business Statistics Lecture 20 – April 3, 2008.
BCOR 1020 Business Statistics
Today Concepts underlying inferential statistics
Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Chapter Ten Introduction to Hypothesis Testing. Copyright © Houghton Mifflin Company. All rights reserved.Chapter New Statistical Notation The.
AM Recitation 2/10/11.
Statistics 11 Hypothesis Testing Discover the relationships that exist between events/things Accomplished by: Asking questions Getting answers In accord.
Hypothesis Testing:.
Probability Distributions and Test of Hypothesis Ka-Lok Ng Dept. of Bioinformatics Asia University.
Overview of Statistical Hypothesis Testing: The z-Test
Week 9 Chapter 9 - Hypothesis Testing II: The Two-Sample Case.
Confidence Intervals and Hypothesis Testing - II
Testing Hypotheses about a Population Proportion Lecture 29 Sections 9.1 – 9.3 Tue, Oct 23, 2007.
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
1 STATISTICAL HYPOTHESES AND THEIR VERIFICATION Kazimieras Pukėnas.
Chapter 8 Hypothesis Testing “Could these observations really have occurred by chance?” Shannon Sprott GEOG /3/2010.
Fundamentals of Hypothesis Testing: One-Sample Tests
Tests of significance & hypothesis testing Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.
1/2555 สมศักดิ์ ศิวดำรงพงศ์
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap th Lesson Introduction to Hypothesis Testing.
Chapter 8 Hypothesis Testing. Section 8-1: Steps in Hypothesis Testing – Traditional Method Learning targets – IWBAT understand the definitions used in.
Chapter 9.3 (323) A Test of the Mean of a Normal Distribution: Population Variance Unknown Given a random sample of n observations from a normal population.
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Chapter 9: Testing Hypotheses
1 Introduction to Hypothesis Testing. 2 What is a Hypothesis? A hypothesis is a claim A hypothesis is a claim (assumption) about a population parameter:
9-1 Hypothesis Testing Statistical Hypotheses Definition Statistical hypothesis testing and confidence interval estimation of parameters are.
Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests.
Chapter 8 Introduction to Hypothesis Testing ©. Chapter 8 - Chapter Outcomes After studying the material in this chapter, you should be able to: 4 Formulate.
Hypothesis Testing State the hypotheses. Formulate an analysis plan. Analyze sample data. Interpret the results.
Chap 8-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 8 Introduction to Hypothesis.
1 Chapter 8 Introduction to Hypothesis Testing. 2 Name of the game… Hypothesis testing Statistical method that uses sample data to evaluate a hypothesis.
Unit 8 Section 8-1 & : Steps in Hypothesis Testing- Traditional Method  Hypothesis Testing – a decision making process for evaluating a claim.
1 9 Tests of Hypotheses for a Single Sample. © John Wiley & Sons, Inc. Applied Statistics and Probability for Engineers, by Montgomery and Runger. 9-1.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Overview.
Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall 9-1 σ σ.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
One-Sample Hypothesis Tests Chapter99 Logic of Hypothesis Testing Statistical Hypothesis Testing Testing a Mean: Known Population Variance Testing a Mean:
AP Statistics Section 11.1 B More on Significance Tests.
© Copyright McGraw-Hill 2004
Statistical Inference Drawing conclusions (“to infer”) about a population based upon data from a sample. Drawing conclusions (“to infer”) about a population.
Testing Hypotheses about a Population Proportion Lecture 31 Sections 9.1 – 9.3 Wed, Mar 22, 2006.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Understanding Basic Statistics Fourth Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Nine Hypothesis Testing.
Major Steps. 1.State the hypotheses.  Be sure to state both the null hypothesis and the alternative hypothesis, and identify which is the claim. H0H0.
Created by Erin Hodgess, Houston, Texas Section 7-1 & 7-2 Overview and Basics of Hypothesis Testing.
Hypothesis Tests u Structure of hypothesis tests 1. choose the appropriate test »based on: data characteristics, study objectives »parametric or nonparametric.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
 What is Hypothesis Testing?  Testing for the population mean  One-tailed testing  Two-tailed testing  Tests Concerning Proportions  Types of Errors.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
4-1 Statistical Inference Statistical inference is to make decisions or draw conclusions about a population using the information contained in a sample.
AP STATISTICS REVIEW INFERENCE
CONCEPTS OF HYPOTHESIS TESTING
Presentation transcript:

Inferential Statistics Jin Guo

Inferential Statistics Definition: the branch of statistics concerned with drawing conclusions about a population from a sample. Sample: representative, typically random Main functions: Estimating Population Parameters Testing statistically based hypotheses

Estimating Population Parameters Estimating parameters related to central tendency (mean), variability (the standard deviation), and proportion (P). Example: Estimating a Population Mean The mean from infinite number of random samples from a normal distribution Mean: parameter (mean of the population) we are trying to estimate when unbiased. Standard Deviation: standard error of the mean In probability theory, the central limit theorem (CLT) states that, given certain conditions, the mean of a sufficiently large number of independent random variables, each with a well-defined mean and well-defined variance, will be approximately normally distributed.

Estimating Population Parameters Point and Interval Estimate Point estimate: use a single value of a statistic to estimate the population parameter. Interval estimate: is defined by two numbers, between which a population parameter is said to lie. A confidence interval gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data.

Testing hypotheses Example: Assume there are 1,000,000 on-line students in this course, I claim that 80 percent of them are very satisfied with today’s class.

Testing hypotheses Statistical hypotheses Null hypothesis: sample observations result purely from chance. Alternative hypothesis Outcome: reject the null hypothesis or fail to reject the null hypothesis Decision Errors Type I error: reject a null hypothesis that is true. The probability of committing this error is called significance level. Type II error: fail to reject a null hypothesis that is false. The probability of not committing this error is called the power of the test.

Testing hypotheses Decision Rules: P-value: the probability of observing a test statistic as extreme as S, assuming the null hypothesis is true. If the P- value is less than the significance level, we reject the null hypothesis. Region of acceptance: it is defined so that the chance of making a Type I error is equal to the significance level. If the test statistic falls within the region of acceptance, the null hypothesis is not rejected.

Testing hypotheses One-Tailed Test Two-Tailed Test A test of a statistical hypothesis, where the region of rejection is on only one side of the sampling distribution. For example, suppose the null hypothesis states that the mean is less than or equal to 10. The alternative hypothesis would be that the mean is greater than 10. The region of rejection would consist of a range of numbers located on the right side of sampling distribution; that is, a set of numbers greater than 10. Two-Tailed Test A test of a statistical hypothesis, where the region of rejection is on both sides of the sampling distribution. For example, suppose the null hypothesis states that the mean is equal to 10. The alternative hypothesis would be that the mean is less than 10 or greater than 10. The region of rejection would consist of a range of numbers located on both sides of sampling distribution; that is, the region of rejection would consist partly of numbers that were less than 10 and partly of numbers that were greater than 10.

Testing hypotheses Procedure: State the hypotheses: include a null hypothesis and an alternative hypothesis, mutually exclusive. Formulate an analysis plan: Specify significance level and test method. Test method includes a test statistic (mean score, proportion, difference between means, difference between proportions, z-score, t-score, chi-square, etc) and a sampling distribution. Analyze sample data: Calculate the test statistic and P-value Interpret the results: Compare the P-value to the significance level, and rejecting the null hypothesis when the P-value is less than the significance level.

Testing hypotheses Test methods : One-sample tests: a sample is being compared to the population from a hypothesis. Two-sample tests: comparing two samples, typically experimental and control samples from a scientifically controlled experiment. Paired tests: comparing two samples where members are paired between samples so the difference between the members becomes the sample. Chi-squared tests use the same calculations and the same probability distribution for different applications: Chi-squared tests for variance are used to determine whether a normal population has a specified variance. The null hypothesis is that it does. Chi-squared tests of independence are used for deciding whether two variables are associated or are independent. The variables are categorical rather than numeric. It can be used to decide whether left-handedness is correlated with libertarian politics (or not). The null hypothesis is that the variables are independent. The numbers used in the calculation are the observed and expected frequencies of occurrence (from contingency tables). Chi-squared goodness of fit tests are used to determine the adequacy of curves fit to data. The null hypothesis is that the curve fit is adequate. It is common to determine curve shapes to minimize the mean square error, so it is appropriate that the goodness-of-fit calculation sums the squared errors.

Testing hypotheses Test methods : Z-tests: comparing means under stringent conditions regarding normality and a known standard deviation. T-tests: comparing means under relaxed conditions (less is assumed). F-tests (analysis of variance, ANOVA): comparing two variance. It is are commonly used when deciding whether groupings of data by category are meaningful. Chi-squared tests use the same calculations and the same probability distribution for different applications: chi-squared tests for variance, chi-squared tests of independence, chi-squared goodness of fit tests.

Testing hypotheses Purpose Test Method Means one sample t-test Difference between means two sample t-test Proportions one sample z-test Difference between proportions two-proportion z-test Regression Slope linear regression t-test Difference between matched pairs matched-pairs t-test Difference between variances two-sample f-test Goodness of fit chi-square goodness of fit test Homogeneity chi-square test for homogeneity Independence chi-square test for independence One-sample tests are appropriate when a sample is being compared to the population from a hypothesis. The population characteristics are known from theory or are calculated from the population. Two-sample tests are appropriate for comparing two samples, typically experimental and control samples from a scientifically controlled experiment. Paired tests are appropriate for comparing two samples where it is impossible to control important variables. Rather than comparing two sets, members are paired between samples so the difference between the members becomes the sample. Typically the mean of the differences is then compared to zero. Z-tests are appropriate for comparing means under stringent conditions regarding normality and a known standard deviation. T-tests are appropriate for comparing means under relaxed conditions (less is assumed). Tests of proportions are analogous to tests of means (the 50% proportion). Chi-squared tests use the same calculations and the same probability distribution for different applications: Chi-squared tests for variance are used to determine whether a normal population has a specified variance. The null hypothesis is that it does. Chi-squared tests of independence are used for deciding whether two variables are associated or are independent. The variables are categorical rather than numeric. It can be used to decide whether left-handedness is correlated with libertarian politics (or not). The null hypothesis is that the variables are independent. The numbers used in the calculation are the observed and expected frequencies of occurrence (from contingency tables). Chi-squared goodness of fit tests are used to determine the adequacy of curves fit to data. The null hypothesis is that the curve fit is adequate. It is common to determine curve shapes to minimize the mean square error, so it is appropriate that the goodness-of-fit calculation sums the squared errors. F-tests (analysis of variance, ANOVA) are commonly used when deciding whether groupings of data by category are meaningful. If the variance of test scores of the left-handed in a class is much smaller than the variance of the whole class, then it may be useful to study lefties as a group. The null hypothesis is that two variances are the same – so the proposed grouping is not meaningful. Reference: http://en.wikipedia.org/wiki/Category:Statistical_tests

Back to our example Assume there are 1,000,000 on-line students in this course, I claim that 80 percent of them are very satisfied with today’s class. To test this claim, I survey 100 students through email, using simple random sampling. Among the sampled students, 73 percent say they are very satisfied. Based on these findings, can we reject the hypothesis that 80% of the students are very satisfied? Use a 0.05 level of significance.

Solution State null hypothesis and an alternative hypothesis. Null hypothesis: P = 0.80 Alternative hypothesis: P ≠ 0.80 Formulate an analysis plan: significance level -- 0.05. test method -- one-sample z-test (for testing proportions).

Solution Conditions for the test method: The sampling method is simple random sampling. Each sample point can result in just two possible outcomes. We call one of these outcomes a success and the other, a failure. The sample includes at least 10 successes and 10 failures. (Some texts say that 5 successes and 5 failures are enough.) The population size is at least 10 times as big as the sample size.

Z-test Test statistic A z-score (standard score): indicates how many standard deviations an element is from the mean. It can be calculated from the formula: z = (X - μ) / σ where z is the z-score, X is the value of the element, μ is the population mean, and σ is the standard deviation. Interpret z-scores: the normal random variable of a standard normal distribution . A z-score equal to 0 represents an element equal to the mean. A z-score less than 0 represents an element less than the mean. A z-score greater than 0 represents an element greater than the mean. A z-score equal to 1 represents an element that is 1 standard deviation greater than the mean; a z-score equal to 2, 2 standard deviations greater than the mean; etc.

Solution Analyze sample data: Using sample data, we calculate the standard deviation (σ) and compute the z-score test statistic (z). σ = sqrt[ P * ( 1 - P ) / n ] = sqrt [(0.8 * 0.2) / 100] = 0.04 z = (p - P) / σ = (.73 - .80)/0.04 = -1.75 where P is the hypothesized value of population proportion in the null hypothesis, p is the sample proportion, and n is the sample size.

Solution Analyze sample data: Interpret results: P-value: two-tailed test, the probability that the z-score is less than -1.75 or greater than 1.75. We use the Normal Distribution Calculator to find P(z < -1.75) = 0.04, and P(z > 1.75) = 0.04. Thus, the P-value = 0.04 + 0.04 = 0.08. Interpret results: Since the P-value (0.08) is greater than the significance level (0.05), we cannot reject the null hypothesis.

More about inferential statistics http://www.socialresearchmethods.net/kb/statinf.php http://en.wikipedia.org/wiki/Inferential_statistics http://en.wikipedia.org/wiki/Statistical_hypothesis_testing http://stattrek.com/ Statistics test books

Questions?