Three Common Misinterpretations of Significance Tests and p-values 1. The p-value indicates the probability that the results are due to sampling error.

Slides:



Advertisements
Similar presentations
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Advertisements

Testing Theories: Three Reasons Why Data Might not Match the Theory.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Hypothesis Testing An introduction. Big picture Use a random sample to learn something about a larger population.
Inference Sampling distributions Hypothesis testing.
Review of the Basic Logic of NHST Significance tests are used to accept or reject the null hypothesis. This is done by studying the sampling distribution.
Behavioural Science II Week 1, Semester 2, 2002
Business Statistics - QBM117
Hypothesis Testing Steps of a Statistical Significance Test. 1. Assumptions Type of data, form of population, method of sampling, sample size.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview of Lecture Independent and Dependent Variables Between and Within Designs.
Hypothesis testing. Null hypothesis Alternative (experimental) hypothesis.
Today Concepts underlying inferential statistics
Chapter 8: Hypothesis Testing and Inferential Statistics What are inferential statistics, and how are they used to test a research hypothesis? What is.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Descriptive Statistics
Rosnow, Beginning Behavioral Research, 5/e. Copyright 2005 by Prentice Hall Ch. 12: Statistical Significance, Effect Size, and Power Analysis.
The problem of sampling error in psychological research We previously noted that sampling error is problematic in psychological research because differences.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Hypothesis Construction Claude Oscar Monet: The Blue House in Zaandam, 1871.
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Basic Statistics. Basics Of Measurement Sampling Distribution of the Mean: The set of all possible means of samples of a given size taken from a population.
Statistical Analysis A Quick Overview. The Scientific Method Establishing a hypothesis (idea) Collecting evidence (often in the form of numerical data)
Testing Theories: Three Reasons Why Data Might not Match the Theory Psych 437.
Understanding Statistics
User Study Evaluation Human-Computer Interaction.
Two Variable Statistics
Data Analysis (continued). Analyzing the Results of Research Investigations Two basic ways of describing the results Two basic ways of describing the.
A Broad Overview of Key Statistical Concepts. An Overview of Our Review Populations and samples Parameters and statistics Confidence intervals Hypothesis.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Review of the Basic Logic of NHST Significance tests are used to accept or reject the null hypothesis. This is done by studying the sampling distribution.
Multiple Regression BPS chapter 28 © 2006 W.H. Freeman and Company.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables.
Statistical Significance for a two-way table Inference for a two-way table We often gather data and arrange them in a two-way table to see if two categorical.
ANALYSIS OF VARIANCE STAT 200. OUTLINE Introduction of concepts without numbers, notation or details Motivation The four steps: Hypothesis in words The.
Experimental Research Methods in Language Learning Chapter 10 Inferential Statistics.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
1 CHAPTER 4 CHAPTER 4 WHAT IS A CONFIDENCE INTERVAL? WHAT IS A CONFIDENCE INTERVAL? confidence interval A confidence interval estimates a population parameter.
Simple examples of the Bayesian approach For proportions and means.
Scatter Diagrams scatter plot scatter diagram A scatter plot is a graph that may be used to represent the relationship between two variables. Also referred.
© Copyright McGraw-Hill 2004
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
APPLIED DATA ANALYSIS IN CRIMINAL JUSTICE CJ 525 MONMOUTH UNIVERSITY Juan P. Rodriguez.
AP Statistics Chapter 11 Notes. Significance Test & Hypothesis Significance test: a formal procedure for comparing observed data with a hypothesis whose.
AP Statistics Chapter 21 Notes
Hypothesis Testing. Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean μ = 120 and variance σ.
Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
More about tests and intervals CHAPTER 21. Do not state your claim as the null hypothesis, instead make what you’re trying to prove the alternative. The.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
15 Inferential Statistics.
Dr. Amjad El-Shanti MD, PMH,Dr PH University of Palestine 2016
P-values.
Understanding Results
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
P(H0|X) is very different from P(X|H0)
Hypothesis Tests: One Sample
Chapter 8: Hypothesis Testing and Inferential Statistics
Inferential Statistics:
Chapter 9: Hypothesis Tests Based on a Single Sample
Hypothesis Construction
marketing research with Spss
Analyzing the Association Between Categorical Variables
Intro to Confidence Intervals Introduction to Inference
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Three Common Misinterpretations of Significance Tests and p-values 1. The p-value indicates the probability that the results are due to sampling error or “chance.” 2. A statistically significant result is a “reliable” result. 3. A statistically significant result is a powerful, important result.

Misinterpretation # 1 The p-value is a conditional probability. The probability of observing a specific range of sample statistics GIVEN (i.e., conditional upon) that the null hypothesis is true. P(D|H o ) This is not equivalent to the probability of the null hypothesis being true, given the data. P(H o |D)  P(D| H o )

Misinterpretation # 1 This later question (i.e., “How likely is it that the results are due to sampling error or chance?”) that tends to motivate the use of significance tests on the part of researchers. However, these tests do not answer this question directly. In order to answer this question, one needs to consider additional pieces of information: (a) the likelihood that the null hypothesis is true before doing the study, (b) the probability of observing the data given other hypotheses (e.g., the alternative hypothesis), and (c) the probability that other hypotheses are true before doing the study.

Bayes’ Theorem Bayes’ theorem provides a way to combine these different pieces of information: Note: You don’t need to memorize this formula, but please be able to use it and understand it.

P(H 0 )P(H 1 )P(D|H 0 )P(D|H 1 )P(H 0 |D) Here, P(H 0 |D) does = P(D|H 0 ) Here, P(H 0 |D) > P(D|H 0 ) Here, P(H 0 |D) < P(D|H 0 )

Misinterpretation # 2 Is a significant result a “reliable,” easily replicated result? Not necessarily. The p-value is a poor indicator of the replicability of a finding. Replicability (assuming a real effect exists, that is, that he null hypothesis is false), is primarily a function of statistical power.

Misinterpretation # 2 If a study had a statistical power equivalent to 80%, what is the probability of obtaining a “significant” result twice? The probability of two independent events both occurring is the simple product of the probability of each of them occurring..80 .80 =.64 If power = 50%?.50 .50 =.25 Bottom line: The likelihood of replicating a result is determined by statistical power, not the p-value derived from a significance test. When power of the test is low, the likelihood of a long-run series of replications is even lower.

Misinterpretation # 3 Is a significant result a powerful, important result? Not necessarily. The importance of the result, of course, depends on the issue at hand, the theoretical context of the finding, etc.

Misinterpretation # 3 We can measure the practical or theoretical significance of an effect using an index of effect size. An effect size is a quantitative index of the strength of the relationship between two variables. Some common measures of effect size that we’ve discussed in this class are correlations, regression weights, and R-squared. (These same indices can be used when one or more of the variables of interest is categorical.)

Correlation between SAT and college GPA Some common effect sizes in the “real world” r .30 r .25 Correlation between personality as a child and personality as an adult r .30 Effect of psychotherapy on psychological well-being r .01 Effect of aspirin on heart attacks

Misinterpretation # 3 Importantly, the same effect size can have different p- values, depending on the sample size of the study. For example, a correlation of.30 would not statistically significant with a sample size of 30, but would be statistically significant with a sample size of 130. Bottom line: The p-value is a poor way to evaluate the practical “significance” of a research result.