Null Hypothesis Significance Testing What the heck have we been doing this whole time?

Slides:



Advertisements
Similar presentations
1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Advertisements

Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
1 Hypothesis Testing William P. Wattles, Ph.D. Psychology 302.
Chapter 10.  Real life problems are usually different than just estimation of population statistics.  We try on the basis of experimental evidence Whether.
Inference Sampling distributions Hypothesis testing.
Confidence Interval and Hypothesis Testing for:
Empirical Analysis Doing and interpreting empirical work.
Using Statistics in Research Psych 231: Research Methods in Psychology.
PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, August 5, 1981 Egon Pearson 11 August.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview of Lecture Independent and Dependent Variables Between and Within Designs.
Chapter 8: Inferences Based on a Single Sample: Tests of Hypotheses Statistics.
BCOR 1020 Business Statistics
Today Concepts underlying inferential statistics
Using Statistics in Research Psych 231: Research Methods in Psychology.
Null Hypothesis Signficance Testing
The problem of sampling error in psychological research We previously noted that sampling error is problematic in psychological research because differences.
Inferential Statistics
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Testing Hypotheses.
Statistics for the Social Sciences
Overview of Statistical Hypothesis Testing: The z-Test
Testing Hypotheses I Lesson 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics n Inferential Statistics.
1 Dr. Jerrell T. Stracener EMIS 7370 STAT 5340 Probability and Statistics for Scientists and Engineers Department of Engineering Management, Information.
Chapter 8 Hypothesis testing 1. ▪Along with estimation, hypothesis testing is one of the major fields of statistical inference ▪In estimation, we: –don’t.
Tuesday, September 10, 2013 Introduction to hypothesis testing.
Tests of significance & hypothesis testing Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.
Lesson 11 - R Review of Testing a Claim. Objectives Explain the logic of significance testing. List and explain the differences between a null hypothesis.
1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.
1 Statistical Inference Greg C Elvers. 2 Why Use Statistical Inference Whenever we collect data, we want our results to be true for the entire population.
Chapter 9 Power. Decisions A null hypothesis significance test tells us the probability of obtaining our results when the null hypothesis is true p(Results|H.
Psy B07 Chapter 4Slide 1 SAMPLING DISTRIBUTIONS AND HYPOTHESIS TESTING.
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
IE241: Introduction to Hypothesis Testing. We said before that estimation of parameters was one of the two major areas of statistics. Now let’s turn to.
PSY 1950 Null Hypothesis Significance Testing September 29, 2008.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
Ch 10 – Intro To Inference 10.1: Estimating with Confidence 10.2 Tests of Significance 10.3 Making Sense of Statistical Significance 10.4 Inference as.
Statistical Inference An introduction. Big picture Use a random sample to learn something about a larger population.
CHAPTER 9 Testing a Claim
Hypothesis Testing An understanding of the method of hypothesis testing is essential for understanding how both the natural and social sciences advance.
 Descriptive Methods ◦ Observation ◦ Survey Research  Experimental Methods ◦ Independent Groups Designs ◦ Repeated Measures Designs ◦ Complex Designs.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
Inferential Statistics Inferential statistics allow us to infer the characteristic(s) of a population from sample data Slightly different terms and symbols.
Statistical Techniques
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Tests of Significance: Stating Hypothesis; Testing Population Mean.
Inferential statistics by example Maarten Buis Monday 2 January 2005.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.1 Significance Tests:
Section 9.1 First Day The idea of a significance test What is a p-value?
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Inferential Statistics Psych 231: Research Methods in Psychology.
Effect size reporting reveals the weakness Fisher believed inherent in the Neyman-Pearson approach to statistical analysis Michael T. Bradley & A. Luke.
1 Basics of Inferential Statistics Mark A. Weaver, PhD Family Health International Office of AIDS Research, NIH ICSSC, FHI Lucknow, India, March 2010.
Chapter 8: Inferences Based on a Single Sample: Tests of Hypotheses
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
CHAPTER 9 Testing a Claim
Hypothesis Testing: Hypotheses
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Testing Hypotheses I Lesson 9.
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Presentation transcript:

Null Hypothesis Significance Testing What the heck have we been doing this whole time?

Some thoughts “statistical significance testing retards the growth of scientific knowledge; it never makes a positive contribution” (Schmidt & Hunter, 1997, p. 37). “The almost universal reliance on merely refuting the null hypothesis is a terrible mistake, is basically unsound, poor scientific strategy, and one of the worst things that ever happened in the history of psychology” (Meehl, 1978, p. 817). Cohen (1994) suggested that Statistical Hypothesis Inference Testing produces a more appropriate acronym. What is NHST, what isn’t it, and why is it over- and mis-used?

What is hypothesis testing about? Use an inferential procedure to examine the credibility of a hypothesis about a population based on the probability of our sample data

How is NHST made possible? The sampling distribution tells us the degree of variability to expect with regard to some statistic. We can then see whether our sample stat varies greatly from the random error we would expect from sampling from a population with a particular value (point estimate) for that statistic. Example: is the mean of test scores from this school all that different from the national average?

Logic of the NHST If we believe something to be different why do we start by hypothesizing that things are the same? Falsification cannot prove things true Provides a basis for statistical test Gives us somewhere to start i.e. something to test; We don’t know the precise value of an alternative

Hypothesis testing The explosion: NHST almost non- existent prior to 1940, today almost used exclusively Used to see extremely controlled, low N or N of 1 studies. The idea was to get rid of the error beforehand. However the practical side of psychology (esp. in education) wanted to see different groups tested.

Fisher vs. Neyman vs. For heavyweight stats champion thingy

Fisher Rejected the Bayesian model of p(H|D) for the frequentist approach of p(D|H), claims too subjective. His work in the early part of the 20 th century eventually led to near unilateral use of many of his techniques in the field psychology

Fisher’s “level of significance” How determined Early Fisher: set some acceptable standard, say.05 Later: State exact level as a communication to researchers

Neyman-Pearson’s  ‘Level of significance’ must be set before the experiment to interpret it as a long run frequency of error (Type I):  level Also added  (Type II), power (1-  ), alternative hypothesis So now that we have this new sort of thing to worry about (  ), how do we make it more confusing? Set the standard  level at….05.

So what does a significant result mean? Fisher: epistemic interpretation about the likelihood of the null hypothesis (how much do we believe in the false null), p is a property of the data N-P: behavioristic interpretation (reject or don’t) that refers to repeated experimentation, p is a property of the test in fact we don’t really have a p-value to report, our statistic either falls in our region of rejection or doesn’t

So what does a non-significant result mean? Fisher: nothing, can’t prove the null (can only disprove) N-P: act as if the null were true.

What the p-value means Probability obtained tells us: If the null hypothesis were true, the probability of obtaining a sample statistic (mean, difference between means, etc.) of the kind observed

What we want it to mean We want the p value to be a probability about a hypothesis. P(H0|D) Some probability of H0 conditional on the data

Decisions and reality (N-P approach) State of the World H 0 trueH 0 false Research Decision Reject H 0 Type I error Correct rejection Fail to reject H 0 Correct fail to reject Type II error

Probabilities State of the World H 0 trueH 0 false Research Decision Reject H 0 Type I p=  p=1-  =Power Fail to reject H 0 p=1-  Type II p= 

Psych today- the hybrid Fisher and N-P interpretations of p- value, incorrect inferences about the probabilities of hypotheses or error rates, dogmatic approach to scientific investigation.

What’s the alternative “a magic alternative to NHST, some other objective mechanical ritual to replace it. It doesn’t exist” (Cohen, 1997, p. 31) Goodness of fit intervals (pretty tricky stats) Bayesian (which does give P(H|D), but has its own problems) Confidence intervals Effect sizes Graphs and more descriptives

Solutions 1. Don’t forget to use the noggin when conducting analyses- don’t let stat programs or textbooks tell you what it is “significant”. 2. There are other ways to analyze data without using NHST. But don’t fall in to the same trap of rigid thinking with those either. 3. Focus on effect sizes, report as much information as possible, let others know exactly why you came to your conclusions 4. Collect good data (not as easy as it sounds) and have good theories and clear ideas driving the motivation for your research.

Resources Gigerenzer, G. (1993). The Superego, The Ego and the Id in Statistical Reasoning. In Keren & Lewis (Eds.) Data Analysis in the Behavioral Sciences. Cohen, J. (1994). The earth is round, p <.05. American Psychologist, 49, Hubbard R. & Bayarri, M.J. (2003). Confusion Over Measures of Evidence (p's) Versus Errors (α's) in Classical Statistical Testing. The American Statistician. Volume: 57 Number: 3 Page: 171 – 178 Oakes, M Statistical Inference: A Commentary for the Social and Behavioral Sciences. Chichester, John Wiley & Sons. Abelson, Robert. Statistics as Principled Argument. Mahwah, NJ:Erlbaum, 1995.

Quotes agn.html agn.html