Testing Theories: Three Reasons Why Data Might not Match the Theory Psych 437.

Slides:



Advertisements
Similar presentations
Psych 5500/6500 t Test for Two Independent Groups: Power Fall, 2008.
Advertisements

COMM 472: Quantitative Analysis of Financial Decisions
1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Testing Theories: Three Reasons Why Data Might not Match the Theory.
1 G Lect 2a G Lecture 2a Thinking about variability Samples and variability Null hypothesis testing.
Some terminology When the relation between variables are expressed in this manner, we call the relevant equation(s) mathematical models The intercept and.
Introduction to Hypothesis Testing Chapter 8. Applying what we know: inferential statistics z-scores + probability distribution of sample means HYPOTHESIS.
Review of the Basic Logic of NHST Significance tests are used to accept or reject the null hypothesis. This is done by studying the sampling distribution.
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
Behavioural Science II Week 1, Semester 2, 2002
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 10: Hypothesis Tests for Two Means: Related & Independent Samples.
Business Statistics - QBM117 Interval estimation for the slope and y-intercept Hypothesis tests for regression.
Experimental Evaluation
PY 427 Statistics 1Fall 2006 Kin Ching Kong, Ph.D Lecture 6 Chicago School of Professional Psychology.
Using Statistics in Research Psych 231: Research Methods in Psychology.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview of Lecture Between Group & Within Subjects Designs Mann-Whitney Test.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Probability Population:
7.1 Lecture 10/29.
Inferential Statistics
Three Common Misinterpretations of Significance Tests and p-values 1. The p-value indicates the probability that the results are due to sampling error.
The problem of sampling error in psychological research We previously noted that sampling error is problematic in psychological research because differences.
Standard Error of the Mean
Chapter 5For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Suppose we wish to know whether children who grow up in homes without access to.
Sampling error Error that occurs in data due to the errors inherent in sampling from a population –Population: the group of interest (e.g., all students.
Fig Theory construction. A good theory will generate a host of testable hypotheses. In a typical study, only one or a few of these hypotheses can.
Standard Error and Research Methods
Chapter 2: The Research Enterprise in Psychology
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Statistics 11 Hypothesis Testing Discover the relationships that exist between events/things Accomplished by: Asking questions Getting answers In accord.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
Hypothesis testing Dr David Field
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
Chapter 2 The Research Enterprise in Psychology. n Basic assumption: events are governed by some lawful order  Goals: Measurement and description Understanding.
Testing Hypotheses Tuesday, October 28. Objectives: Understand the logic of hypothesis testing and following related concepts Sidedness of a test (left-,
The problem of sampling error It is often the case—especially when making point predictions—that what we observe differs from what our theory predicts.
Theory testing Part of what differentiates science from non-science is the process of theory testing. When a theory has been articulated carefully, it.
Investment Analysis and Portfolio management Lecture: 24 Course Code: MBF702.
Let’s flip a coin. Making Data-Based Decisions We’re going to flip a coin 10 times. What results do you think we will get?
Confidence Intervals for the Regression Slope 12.1b Target Goal: I can perform a significance test about the slope β of a population (true) regression.
1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.
Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.
Chapter 8 Introduction to Hypothesis Testing
The Research Enterprise in Psychology. The Scientific Method: Terminology Operational definitions are used to clarify precisely what is meant by each.
Basic linear regression and multiple regression Psych Fraley.
Inferential Statistics 2 Maarten Buis January 11, 2006.
Hypotheses tests for means
Testing Theories: The Problem of Sampling Error. The problem of sampling error It is often the case—especially when making point predictions—that what.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 1): Two-tail Tests & Confidence Intervals Fall, 2008.
1 Psych 5500/6500 t Test for Dependent Groups (aka ‘Paired Samples’ Design) Fall, 2008.
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Issues concerning the interpretation of statistical significance tests.
Hypothesis Testing An understanding of the method of hypothesis testing is essential for understanding how both the natural and social sciences advance.
C82MST Statistical Methods 2 - Lecture 1 1 Overview of Course Lecturers Dr Peter Bibby Prof Eamonn Ferguson Course Part I - Anova and related methods (Semester.
©2005, Pearson Education/Prentice Hall CHAPTER 1 Goals and Methods of Science.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
Finishing up: Statistics & Developmental designs Psych 231: Research Methods in Psychology.
Hypothesis Testing. Outline of Today’s Discussion 1.Logic of Hypothesis Testing 2.Evaluating Hypotheses Please refrain from typing, surfing or printing.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Chapter 8: Introduction to Hypothesis Testing. Hypothesis Testing A hypothesis test is a statistical method that uses sample data to evaluate a hypothesis.
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Data Analysis. Qualitative vs. Quantitative Data collection methods can be roughly divided into two groups. It is essential to understand the difference.
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Hypothesis Testing: Hypotheses
Hypothesis Testing A hypothesis is a claim or statement about the value of either a single population parameter or about the values of several population.
Inferential Statistics
1 Chapter 8: Introduction to Hypothesis Testing. 2 Hypothesis Testing The general goal of a hypothesis test is to rule out chance (sampling error) as.
STA 291 Spring 2008 Lecture 17 Dustin Lueker.
Presentation transcript:

Testing Theories: Three Reasons Why Data Might not Match the Theory Psych 437

Theory testing Part of what differentiates science from non-science is the process of theory testing. When a theory has been articulated carefully, it should be possible to derive quantitative hypotheses from the theory—propositions about what should be observed under specific conditions.

Example Example: Imagine that we have a theory concerning the relationship between television habits and obesity. According to our theory, there is a correlation between the amount of television that people watch and their obesity levels. Our theory, however, does not assume that this correlation is due to a causal relationship between the two variables. Rather, our theory assumes that exercise causally influences obesity, and that people tend to get less exercise when they are watching TV.

Exercise ObesityTV viewing --

Hypotheses Given this simple theory, there are a number of hypotheses that we can derive. Here are four: –there will be a positive correlation between television viewing and obesity –there will be a negative correlation between time spent exercising and obesity –there will be a negative correlation between television viewing and time spent exercising –if we hold exercise constant (statistical control), we will not observe a correlation between television viewing and obesity

Hypotheses Notice that each of these implications is inherently quantitative: –people who watch more television will be more obese (greater than; less than; correlation—all of these are quantitative statements) –if we were to hold exercise constant, we would not observe a relationship between television viewing and weight (precise numerical prediction: zero correlation)

Directional predictions The first kind of hypothesis is what we call a directional hypothesis or a directional prediction. A directional prediction concerns the direction of a difference between two groups or the sign (+ vs. -) of a correlation between two variables. Example: If we had two groups of people, those who watch TV and those who do not, we would predict that the average weight of the TV group to be higher than that of the no-TV group (M TV > M no-TV )

Testing directional predictions How do we test directional predictions? Virtually all approaches to testing quantitative hypotheses are based on the logic that the difference between a prediction and an empirical observation should be small: (T – O)

Testing directional predictions We can apply this logic easily in this circumstance. We simply find the average weight for the two groups and compare them. If our hypothesis (that M TV > M no- TV ) is correct, then M TV - M no-TV should be > 0. Thus, we find support for our hypothesis if this difference is greater than zero (i.e., positive). We disconfirm the hypothesis if the difference is less than or equal to zero.

The weakness of directional tests It is important to keep in mind that testing directional hypotheses is a relatively weak way to test a theory. Imagine for a moment that our model is wrong and that, in addition, there is no association between TV viewing and obesity. The measured correlation between TV viewing and obesity will not literally be 0, for many reasons, some of which we’ll discuss later. In this circumstance, we have a 50% chance of getting the prediction correct, even if the theory is misguided.

Point predictions Sometimes our hypothesis may be precise enough to make a specific, rather than a directional, quantitative prediction For example, in the previous example, our theory allows us to derive a point prediction: the difference between the groups will be zero (exactly zero) when exercise is held constant. How do we test point predictions?

Testing point predictions PersonXY a11 b33 c62 d124 r =.16 The correlation between these two variables is.16. Clearly the prediction is incorrect, but not by much. How do we quantify how much? (predicted – observed) 2 ( ) 2 =.025

Riskiness Point predictions are much more risky for a theory. Why? Assuming all possible observations are equally likely, precise predictions are simply more likely to be wrong. Thus, when the predictions turn out to be pretty close to what is observed, the theory gets “more credit.” /20 1/2

Riskiness As a result, scholars are typically more tolerant of errors when they involve precise predictions. A theory that makes a precise prediction and gets it “not quite right” is generally considered more successful than a theory that makes a weak directional prediction and gets it exactly right /20 1/2

Evaluating point predictions How much of an error is too much of an error when a point prediction is being tested? There is no “standard” for making this decision, and, arguably, the amount of error that one is willing to tolerate may vary from one research context to the next. It is useful, however, to examine the comparative accuracy of alternative or competing hypotheses— hypotheses derived from different theories.

Competing theories and hypotheses In our example, there could be an alternative theory that predicts that there should be a correlation between television viewing and obesity equal to.50. If we were to observe a correlation of.16, the original theory clearly is more accurate than the other, despite the fact that it was slightly off. (If the alternative theory predicted a correlation of about.32, the evidence would be equivocal. (.16 falls between the two predictions of.00 and.32.))

What do errors mean? What does it mean when there is a difference between the value predicted and the value observed? –There are variables that matter that were not included in the theory –Imprecision in measurement (noise in the data) –Sampling error

What do errors mean? Where does error come from? –(1) Psychologically interesting variables that matter, but are not included in the theory This represents a problem with the theory. It might be incomplete (not too much of a problem), or just dead wrong (big problem). Let’s assume that, in reality, some variable of interest (e.g., obesity) is a positive function of x (e.g., television viewing) and the square of x: y = 2 + 2x + 2x 2

Psychologically interesting variables that matter, but are not included in the model real model: y = 2 + 2x + 2x 2 Thus, if we had 5 people with scores on x of –2, -1, 0, 1, and 2, their values of y would be 6, 2, 2, 6, and 14, respectively.

Psychologically interesting variables that matter, but are not included in the model If we were to test a model with just one predictor variable (e.g., x), we would clearly make some errors in prediction. y = 2 + 2x In this case, the errors are not huge, but they exist nonetheless. These errors are due to the fact that the theory is incomplete. (Note: This does not necessarily mean the theory is horribly flawed.)

What do errors mean? Where does error come from? –(2) Imprecision in measurement (noise in the data) This represents a problem with the data, and the measurement process that produced it. It is not a problem with the theory per se.

Imprecision in measurement (noise in the data) As we discussed previously, a measured score can be broken down into three components: O = T + E + S –O = observed score –T = “True” score –E = random error component –S = systematic error component (we’ll ignore this component for now since we could construe it as a variable that was omitted from the model) O = T + E

Another Example Here we have the same model, but random errors of measurement, e, have been added to each observation y = 2 + 2x + 2x 2 y e (y+e) [1] [2] [3] [4] [5]

What do errors mean? Where does error come from? –(3) Sampling error This represents a problem with the data too, and the sampling process that generated it. It is not a problem with the theory per se.

Sampling error Error that occurs in data due to inadequacies in sampling from a population –Population: the group of interest (e.g., all people with access to televisions) –Sample: a subset of the population that is studied (i.e., people in this class) Note: In published research, the problem of sampling error seems to be the one that concerns psychologists the most, and we’ll discuss some of the methods psychologists use to deal with it in our next lecture.

One of the advantages of comparative theory testing It is important to note that these last two sources of error (measurement errors and sampling errors) are specific to a data set. They are properties of the data (i.e., the measures used or the way the sample was obtained), not the theory. As such, these two problems are not too fatal if we are testing two competing theories. Why? Because the errors will count against both theories, and what is left over is the differences in the predictions of the two theories.