Yea, we finally get to do Statistics! Kert Viele Department of Statistics University of Kentucky.

Slides:



Advertisements
Similar presentations
Chapter 7 Hypothesis Testing
Advertisements

1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Anthony Greene1 Simple Hypothesis Testing Detecting Statistical Differences In The Simplest Case:  and  are both known I The Logic of Hypothesis Testing:
Chapter 12: Testing hypotheses about single means (z and t) Example: Suppose you have the hypothesis that UW undergrads have higher than the average IQ.
Hypothesis Testing making decisions using sample data.
Introduction to Hypothesis Testing Chapter 8. Applying what we know: inferential statistics z-scores + probability distribution of sample means HYPOTHESIS.
Statistical Issues in Research Planning and Evaluation
Testing Hypotheses About Proportions Chapter 20. Hypotheses Hypotheses are working models that we adopt temporarily. Our starting hypothesis is called.
Statistical Techniques I EXST7005 Lets go Power and Types of Errors.
Chapter 10: Hypothesis Testing
Introduction to Hypothesis Testing
Cal State Northridge  320 Ainsworth Sampling Distributions and Hypothesis Testing.
Introduction to Hypothesis Testing
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Inference about a Mean Part II
8-2 Basics of Hypothesis Testing
Ch. 9 Fundamental of Hypothesis Testing
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 11 Introduction to Hypothesis Testing.
Example 10.1 Experimenting with a New Pizza Style at the Pepperoni Pizza Restaurant Concepts in Hypothesis Testing.
Standard Error and Research Methods
Chapter 8 Introduction to Hypothesis Testing. Hypothesis Testing Hypothesis testing is a statistical procedure Allows researchers to use sample data to.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Intermediate Statistical Analysis Professor K. Leppel.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 9 Introduction to Hypothesis Testing.
Statistical Techniques I
Chapter 8 Hypothesis testing 1. ▪Along with estimation, hypothesis testing is one of the major fields of statistical inference ▪In estimation, we: –don’t.
Understanding the Variability of Your Data: Dependent Variable Two "Sources" of Variability in DV (Response Variable) –Independent (Predictor/Explanatory)
AP Statistics Section 11.1 A Basics of Significance Tests
1 Lecture 19: Hypothesis Tests Devore, Ch Topics I.Statistical Hypotheses (pl!) –Null and Alternative Hypotheses –Testing statistics and rejection.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
IE241: Introduction to Hypothesis Testing. We said before that estimation of parameters was one of the two major areas of statistics. Now let’s turn to.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
Statistical Inference Statistical Inference involves estimating a population parameter (mean) from a sample that is taken from the population. Inference.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Ch 10 – Intro To Inference 10.1: Estimating with Confidence 10.2 Tests of Significance 10.3 Making Sense of Statistical Significance 10.4 Inference as.
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
CHAPTER 9 Testing a Claim
KNR 445 Statistics t-tests Slide 1 Introduction to Hypothesis Testing The z-test.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
Chapter 9: Hypothesis Tests Based on a Single Sample 1.
STA Lecture 221 !! DRAFT !! STA 291 Lecture 22 Chapter 11 Testing Hypothesis – Concepts of Hypothesis Testing.
Week 6. Statistics etc. GRS LX 865 Topics in Linguistics.
1 Hypothesis Testing Basic Problem We are interested in deciding whether some data credits or discredits some “hypothesis” (often a statement about the.
Hypothesis Testing Introduction to Statistics Chapter 8 Feb 24-26, 2009 Classes #12-13.
Statistical Techniques
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Unit 5: Hypothesis Testing.
Chapter 8: Introduction to Hypothesis Testing. Hypothesis Testing A hypothesis test is a statistical method that uses sample data to evaluate a hypothesis.
Chapter 12 Tests of Hypotheses Means 12.1 Tests of Hypotheses 12.2 Significance of Tests 12.3 Tests concerning Means 12.4 Tests concerning Means(unknown.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.1 Significance Tests:
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
The Law of Averages. What does the law of average say? We know that, from the definition of probability, in the long run the frequency of some event will.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
+ Testing a Claim Significance Tests: The Basics.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.1 Significance Tests:
Unit 5: Hypothesis Testing
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
CHAPTER 9 Testing a Claim
Hypothesis Testing: Hypotheses
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Significance Tests: The Basics
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Statistical Test A test of significance is a formal procedure for comparing observed data with a claim (also called a hypothesis) whose truth we want to.
CHAPTER 9 Testing a Claim
Presentation transcript:

Yea, we finally get to do Statistics! Kert Viele Department of Statistics University of Kentucky

Scientific Experimentation Scientific theories make predictions. In a scientific experiment to test a theory, a new situation is created where the theory makes a specific prediction. We try to observe if that prediction comes true. If the prediction does come true, it is evidence in favor of the theory (typically not proof). If the prediction does not occur, the theory must be revised, or perhaps complete discarded.

A simple theory A very simple theory is that “All ravens are black”. This theory only makes predictions about ravens, so the natural thing to do to test this theory is to observe ravens. If you observe black ravens, this confirms the theory (doesn’t proof it though, we’d have to observe all ravens to do that) If you observe a white raven, the theory is proven wrong.

Statistical Theories Statistical hypotheses are hypotheses that discuss probabilities. Saying “this coin is fair” means that in the long run 50% of the flips will be heads. Many theories can be placed in this framework. The hypothesis “all ravens are black”, in statistical terms, is that the probability a raven is black is 100%, and thus the probability a raven is not black is 0%.

If the impossible happens…discard the theory In statistical terms, what happens when we observe a non-black raven? We can go “hmm, what were the odds of that? If the hypothesis were true, that would NEVER happen” Therefore, the hypothesis is NOT true.

With statistics, nothing is impossible! Fundamental idea – if the hypothesis says an event is impossible but it happens anyway, the hypothesis is wrong Unfortunately, with most statistical hypothesis, nothing is impossible. Let’s go back to the hypothesis that a particular coin is fair.

What about unlikely events? Suppose we flip the coin 100 times and observe 98 heads and 2 tails. Most people would be more than a little suspicious of whether the coin is fair. The reasonable source of this suspicion is that observing 98 head and 2 tails is quite unlikely. However, it is NOT impossible.

Back to the drawing board In fact, no outcome is impossible. You can get anywhere from 0 to 100 heads in 100 flips. If our standard was only to disprove a hypothesis when something impossible (according to the hypothesis) occurred, then we would never make any progress with the hypothesis “this coin is fair”. We’d always be in limbo.

The “unlikely” standard Since the “impossibility” standard is unavailable for most statistical hypothesis, we use a weaker standard (the best we have). Impossible = with probability 0 Unlikely = with a small probability (more than 0, but small). For historical reasons, unlikely = 5% typically. We work with “if something unlikely occurs, then the hypothesis is likely wrong”, called “rejecting the hypothesis”.

Summary until now…still awake? The statistical idea of hypothesis tests is based on the idea of concluding an hypothesis is likely wrong when an unlikely event occurs. This is of course weaker than completely disproving a hypothesis when an impossible event occurs, but hey, it’s what we got.

Implementing the “unlikely” standard The standard procedure in a statistical hypothesis test is to “reject” the hypothesis when the data fall in a rejection region. The rejection region is set so that, if the hypothesis is true, the rejection region has a 5% (unlikely) chance of occurring. In most standard situations, the hypothesis translates (use probability theory) to an approximate normal distribution for the data.

Both are 5%, which do you reject?

Reject in the tails Generally, you reject the hypothesis if the observed data is too far away from the hypothesized value (i.e. too far into the tails). For normal distributions, you get your usual “reject if the data is 1.96 standard deviations from the predicted mean”. The 1.96 is often just truncated to 2 for “eyeballing” purposes.

You got a problem with that? You may disagree with 5% as a reasonable definition of “unlikely”. You may use something else, but keep in mind that this is just “the way it is” with respect to many journal publications, etc. The probability has all been worked out in many common situations. Most standard statistical packages allow you to enter data (from EXCEL or another package) and just click a few buttons to get an answer. Just remember, know what the buttons means when you click them!

Normal distributions In most common situation, the hypothesis you are testing implies the data will have a normal (bell-shaped distribution). Normal distributions have two parameters, a mean μ (mu, unless you are from England where it is “moo”) and a variance σ (sigma) μ determines the center of the distribution, while σ determines the spread.

Different Normal Distributions

Small σ’s are good. In scientific context, μ is fixed by the process we are studying (the coin has some probability of landing head, our animals have a certain level of activity). We unfortunately don’t know what μ is, and we can’t control it. σ, on the other hand, is much more controllable. Anything that eliminates noise in our data decreases σ.

Why are small σ’s good? Consider two competing scientific theories, one which states μ=μ 0 and another which states μ≠μ 0. Under the null hypothesis μ=μ 0, we expect a normal distribution to appear, centered at μ 0 The value of σ depends on the inherent noise in the problem AND our experimental design choices. For the example that follows, μ 0 = 10.

Our hypothesis testing procedure Reject in the red area, do not reject in the green area

We can make mistakes… Recall we “reject” the hypothesis if our observed data is too far from the hypothesis (i.e. out in the tails of the null distribution). This is what we called the rejection region However, nothing is impossible in most statistical hypotheses, there is a chance we can make a mistake.

Type I error We fixed the rejection region so that, when the null hypothesis is true, we have a 5% chance of incorrect rejecting the hypothesis. This is called the type I error, or “size” of the test. This of course also means that, when the null hypothesis is true, we have a 95% chance of making the correct decision.

What if the hypothesis is false? If our hypothesis is not true, then the real value of μ is something other than μ 0, and our data comes from a different distribution. We keep the same rejection region. What does this look like?

With μ=14, but with the same rejection region. There’s more red.

Type II error Again, when the null distribution is the right one, we have a 5% chance of making a mistake and a 95% of not making a mistake. When the alternative hypothesis is true, we have different probabilities. For the graph shown here, the probability of rejecting is 26.6%. But wait, when the alternative hypothesis is true, rejecting is a good thing.

Type II error, continued When the alternative hypothesis is true, we want to reject the null hypothesis. The probability of doing this is called the “power” of the test. When the alternative hypothesis is true, the act of not rejecting is called type II error. A good test has low probabilities of both type I and type II error.

Decreasing σ results in a better chance of finding the truth. Suppose we were able to decrease σ, making the normal distributions “thinner”. This changes the rejection region, because we still fix the probability of type I error at 5%. Basically, this results in making the “acceptance region”, the green area, smaller. The increased accuracy lets us “hit a smaller target”, so to speak.

Reminder of the old null distribution

The new improved, smaller σ version

Under the alternative distribution, the data appears in the red more often.

All graphs together for comparison

The effect of decreasing σ When the larger σ is used, the power was 26.6%. When the smaller σ is used, the power increases to 98.0%. Thus the probability of type II error is only 2%. Decreasing σ is a VERY good thing.

How do you decrease σ? Thus, we focus a lot of effort on decreasing σ, which gives us better power and a better chance of making a strong scientific conclusion. So how to we decrease σ? Remove noise from the experiment. There are several ways to do this.

Larger sample sizes are better, but more expensive in many ways. Increase sample size! The larger the sample size, the thinner the normal distributions are. Of course, this goal of thinner normals competes with other demands on our time, our budget, and possible ethical concerns (you don’t want to expose people to a potentially dangerous drug any more than you have to, for example) Never run an experiment with too small a sample size, though, you’ll get NOTHING out of it!

Blocking on known sources of noise. If we know a specific variable will affect our responses, we will often try to control for that variable. A quick example – if you are teaching, students who did well on last year’s standardized tests will tend to do well on this year’s standardized tests, and there is only so much that will change that.

Example You are interesting in investigating the relative effectiveness of two different types of strength training (method A and method B) Your response variable is the amount of weight that can be lifted at the end of the training. You also know the amount each subject can currently lift.

Example continued Presumably people who are stronger now will be stronger later. The issue is by how much. One option is to fit an ANCOVA using the current weight as a continuous covariate. If you are concerned about linearity, then a randomized block design is sensible as well.

Conducting the experiment To block on the current weight, take all the subjects and rank them by the current weight they can lift. Divide the sorted list into block on 2 people each (2 = number of methods of strength training) Within each block, assign one person to method A and one person to method B, at random. This makes it FAR less likely method A or method B will have too much of an abundance of “strong people” in their groups than just random assignment of all subjects to groups A and B.

Moral of the randomization Block for what you know makes a difference. The randomization will likely equalize the groups on what you DON’T know makes a difference. It’s possible there aren’t any other factors, and the randomization doesn’t matter, but it’s easy and it potentially helps. Finding out about another variable later is a bad scene.

Repeated measures Animals vary amongst themselves, either through genetic or environmental factors. Where possible, we would like to control this variation. We do this by applying each treatment to each animal. This is “blocking on animal”. Blocking on animal is often not possible, for example the animal may be changed by the treatment. When you block on animal randomize the order of the treatment so no treatment gets consistently placed first or last in the treatment order.

Control groups Whenever making comparisons between treatments, you must have a group of animals for each potential treatment, including the “current”, or “default” treatment. Without such a group, any difference you observe could be due to the fact the animals were involved in an experiment.