A small taste of inferential statistics

Slides:



Advertisements
Similar presentations
Introductory Mathematics & Statistics for Business
Advertisements

Tests of Hypotheses Based on a Single Sample
CHAPTER 15: Tests of Significance: The Basics Lecture PowerPoint Slides The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner.
CHAPTER 14: Confidence Intervals: The Basics
Chapter 16 Inferential Statistics
1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Anthony Greene1 Simple Hypothesis Testing Detecting Statistical Differences In The Simplest Case:  and  are both known I The Logic of Hypothesis Testing:
Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
Chapter 10 Section 2 Hypothesis Tests for a Population Mean
Statistical Techniques I EXST7005 Lets go Power and Types of Errors.
Review: What influences confidence intervals?
Introduction to Hypothesis Testing
Business Statistics - QBM117
Cal State Northridge  320 Ainsworth Sampling Distributions and Hypothesis Testing.
1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.
Chapter Sampling Distributions and Hypothesis Testing.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview of Lecture Independent and Dependent Variables Between and Within Designs.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
PY 427 Statistics 1Fall 2006 Kin Ching Kong, Ph.D Lecture 6 Chicago School of Professional Psychology.
PSY 307 – Statistics for the Behavioral Sciences
Today Concepts underlying inferential statistics
Probability Population:
Hypothesis Testing:.
Overview of Statistical Hypothesis Testing: The z-Test
Chapter 10 Hypothesis Testing
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
Chapter 8 Hypothesis testing 1. ▪Along with estimation, hypothesis testing is one of the major fields of statistical inference ▪In estimation, we: –don’t.
Fundamentals of Hypothesis Testing: One-Sample Tests
Sampling Distributions and Hypothesis Testing. 2 Major Points An example An example Sampling distribution Sampling distribution Hypothesis testing Hypothesis.
Statistical Analysis Statistical Analysis
1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.
Jan 17,  Hypothesis, Null hypothesis Research question Null is the hypothesis of “no relationship”  Normal Distribution Bell curve Standard normal.
From last lecture (Sampling Distribution): –The first important bit we need to know about sampling distribution is…? –What is the mean of the sampling.
Chapter 10 Hypothesis Testing
Lecture 7 Introduction to Hypothesis Testing. Lecture Goals After completing this lecture, you should be able to: Formulate null and alternative hypotheses.
Making decisions about distributions: Introduction to the Null Hypothesis 47:269: Research Methods I Dr. Leonard April 14, 2010.
Individual values of X Frequency How many individuals   Distribution of a population.
Hypothesis Testing Introduction to Statistics Chapter 8 Mar 2-4, 2010 Classes #13-14.
1 rules of engagement no computer or no power → no lesson no SPSS → no lesson no homework done → no lesson GE 5 Tutorial 5.
Chapter 20 Testing hypotheses about proportions
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Statistical Inference Statistical Inference involves estimating a population parameter (mean) from a sample that is taken from the population. Inference.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
Lecture 9 Chap 9-1 Chapter 2b Fundamentals of Hypothesis Testing: One-Sample Tests.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
1 Chapter 8 Introduction to Hypothesis Testing. 2 Name of the game… Hypothesis testing Statistical method that uses sample data to evaluate a hypothesis.
Ch 10 – Intro To Inference 10.1: Estimating with Confidence 10.2 Tests of Significance 10.3 Making Sense of Statistical Significance 10.4 Inference as.
Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall 9-1 σ σ.
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Hypothesis Testing An understanding of the method of hypothesis testing is essential for understanding how both the natural and social sciences advance.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
26134 Business Statistics Tutorial 11: Hypothesis Testing Introduction: Key concepts in this tutorial are listed below 1. Difference.
Welcome to MM570 Psychological Statistics
Hypothesis Testing Introduction to Statistics Chapter 8 Feb 24-26, 2009 Classes #12-13.
Statistical Techniques
1 Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 9 Hypothesis Testing: Single.
Inferential Statistics. Population Curve Mean Mean Group of 30.
15 Inferential Statistics.
Chapter 9 Hypothesis Testing: Single Population
Presentation transcript:

CS1512 Foundations of Computing Science 2 Lecture 5 Inferential statistics

A small taste of inferential statistics Originals from the University of San Diego, adapted by K.van Deemter

Reasons for sampling If you want to know something about a population, your results would be most accurate if you could study the entire population. But it is often not feasible (cost, time) to study the whole population.

An example … We suspect that there less crime in Aberdeen than the national average How can we test this? We do not have the funds to measure the crime rate in every street in Abdn, so we take a random sample of one or more streets.

An example ... Sampling in general: Study a sample, and try to draw conclusions about the sample space (population) as a whole The larger the sample, the more accurately will it tend to reflect the properties of the population In this example: We calculate how much crime, on average, the streets in our sample have experienced and compare it to the national average.

A simplistic approach involving a sample of one Suppose UK crime is normally distributed, with 4 crimes per street (mean ) and known st. dev.  Now choose a sample of one Abdn street, which happens to have experienced 2 crimes Suppose Aberdeen crime levels were the same as the national average, how probable would it be to find 2 crimes or less crimes in a given street? Recall that this can be computed given mean and standard deviation of a normally distr. population. If this is highly unlikely then say “it looks as if Abdn has less crime than the national average”

But ... National crime may not be normally distributed The standard deviation  on the number of crimes per street may be very high As a result of this, you may find that 2 or less crimes per street may not be so improbable For these reasons, a more sophisticated approach is called for The trick is to look at a larger sample and focus on the sample mean

A more sophisticated approach involving larger samples What is the probability of obtaining the sample mean that you did? Compare your sample to other samples of the same size from the same population. To make calculations easy, suppose your variable can have values 2,4,6,8 only (e.g. two crimes, four crimes, etc). Consider all possible samples of two: {2,4}, {4,2},{2,6}, {6,2}, {4,4},...

Creating a Sampling Distribution of the Mean Although there are 16 different possible samples, there are not 16 different sample means possible. The ones that are possible have different probabilities.

The sampling distribution of the mean Has the same mean as the original distribution Tends to be (almost) normally distributed Has a smaller standard deviation The larger the sample size n, the smaller the standard deviation of the mean There is a formula which says how the new standard deviation depends on the old one () and the sample size n. In case you’re curious:

Creating a Sampling Distribution of the Mean

Sampling Distribution of the Mean This distribution describes the entire spectrum of sample means that could occur just by chance. In other words, the sampling distribution of the mean allows us to determine whether, among the set of random possibilities, the one observed sample mean can be viewed as a common outcome or a rare outcome.

Using the Sampling Distribution of the Mean to Determine Probability Common outcome. Probability of obtaining a particular sample mean. Rare outcome. Rare outcome.

How would statisticians handle this? But we were not gambling on the likelihood that one particular sample mean will occur E.g., our guess was not: “average crime in Aberdeen is 3 crimes per street” Our guess was that crime in the average Aberdeen street was below the national average How would statisticians handle this?

The correct procedure (just a sketch!) We start with the hypothesis that the crime rate on average in Aberdeen is the same as the national average. This is called the null Hypothesis (H0). This is roughly the opposite of what you try to confirm (which is called the alternative Hypothesis HA or the research Hypothesis): that there’s less crime in Aberdeen To test the null hypothesis, we ask what sample means would occur if many samples of the same size were drawn at random from our population if our null hypothesis was true. Then we compare our sample mean with the means in this sampling distribution. We’re not paying attention to the difference between one- and two-tailed tests here, so the expression “the opposite” is a bit imprecise.

An example… Suppose that the relationship between our sample mean and those of the sampling distribution of the mean looks like this… Our hypothesized value. Our obtained value.

An example… If so, our sample mean is one that could reasonably occur if the null hypothesis is true, and we will retain this hypothesis as one that could be true. (i.e., The crime rate of Aberdeen could be the same as the national average.)

An example… On the other hand, if the relationship between our sample mean and those of the sampling distribution of the mean looks like this…

An example… Our sample mean is so deviant that it would be quite unusual to obtain such a value when our hypothesis is true. In this case, we would reject our hypothesis and conclude that it is more likely that the crime rate of Abdn is not the same as the national average. The population represented by the sample differs significantly from the comparison population.

Going into this a bit more deeply (no need to understand this in detail) But how deviant is deviant enough? In other words, How unlikely does H0 need to be to count as false? In some areas a probability of 0.5 is generally agreed to be small enough ( 95% certainty) In areas where errors are costly (e.g., medicine), it’s often chosen as low as 0.1 ( 99% certainty) This is called the decision rule. We say that the difference between observed mean m and the hypothesised mean  is significant if the decision rule decides that m is unlikely to have come about by accident. 0.1, 0.5, etc. are also called levels of significance

Critical Values We can use the tables to calculate the critical values, which separate the upper 2.5% and lower 2.5% of sample means from the remainder.

Another example… A psychologist is working with people who have had surgery. The psychologist thinks that people may recover from the operation more quickly if friends and family are in the room with them after the operation. It is known that time to recover from this kind of surgery is normally distributed with a mean of 12 days and a standard deviation of 5 days. The procedure of having friends and family in the room for the period after the surgery is done with 9 randomly selected patients. The patients recover in an average of 8 days. Using the .01 level of significance, what should the researcher conclude?

Statistical analysis of example For illustration, we show here how this experiment is analysed statistically. H0 is the null hypothesis HA is the alternative hypothesis (research hyp.) A test statistic says how far from the population mean the sample mean is. An often-used statistic is Z Z involves the sample mean m, the hypothesised mean , and the standard deviation on the means

Statistical analysis of example An often-used test statistic is Z. Z involves the sample mean m, the hypothesised mean , and the standard deviation on the means We have seen that the standard deviation on the means is The formula for Z is Z = the difference between m and , compared with the new standard deviation

Statistical analysis of example State the research hypothesis: State the statistical hypothesis: Set decision rule: Calculate the test statistic: Decide if results are significant: Interpret results as relating to the statistical hypothesis: Is it true that patients who have friends and family with them following surgery recover more or less quickly than people who do not? Retain H0, -2.40 > -2.58 Patients who have friends and family with them did not recover significantly faster, or slower, than patients who do not have social support.

Does it follow that “friends and family do not have the predicted effect”? No! You may have used too few subjects, for example. The facts did point in the right direction (because recovery was 4 days faster, on average), so maybe do a bigger experiment An experiment can never confirm the null hypothesis, only disconfirm it.

Summing up inferential statistics This is essentially what’s been done when you read that one medicine is more effective than another one user interface is better liked than another one computer program runs faster than another, on typical input In most cases, people are comparing one sample with another (rather than with a completely known population, as in our examples) Still, the techniques are always similar. Note the qualification “on typical input”. You don’t always need statistics for questions of this kind, for example when you’re only interested in the worst case.

Summing up statistics and probability We’ve covered some key concepts only (plus a quick illustration of how these concepts can be used in hypothesis testing) More from Professor Hunter, who will talk about simulations and random number generators More in year 2, when you learn about HCI In the lectures on probability, we wrote “P(q) = a”, where 0 <= a <= 1 Now we move on to Symbolic Logic, where we focus on the cases where a=0 or a=1