Testing hypotheses Continuous variables.

Slides:



Advertisements
Similar presentations
Correlation and Linear Regression.
Advertisements

Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
Correlation Chapter 9.
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
PPA 501 – Analytical Methods in Administration Lecture 8 – Linear Regression and Correlation.
PPA 415 – Research Methods in Public Administration
The Simple Regression Model
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 6: Correlation.
Week 9: Chapter 15, 17 (and 16) Association Between Variables Measured at the Interval-Ratio Level The Procedure in Steps.
Relationships Among Variables
Correlation and Regression A BRIEF overview Correlation Coefficients l Continuous IV & DV l or dichotomous variables (code as 0-1) n mean interpreted.
Week 12 Chapter 13 – Association between variables measured at the ordinal level & Chapter 14: Association Between Variables Measured at the Interval-Ratio.
Chapter 15 Correlation and Regression
Chapter 6 & 7 Linear Regression & Correlation
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Data Analysis (continued). Analyzing the Results of Research Investigations Two basic ways of describing the results Two basic ways of describing the.
CATEGORICAL VARIABLES Testing hypotheses using. Independent variable: Income, measured categorically (nominal variable) – Two values: low income and high.
Correlation & Regression
Examining Relationships in Quantitative Research
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association.
CATEGORICAL VARIABLES Testing hypotheses using. When only one variable is being measured, we can display it. But we can’t answer why does this variable.
Chapter 16 Data Analysis: Testing for Associations.
Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
Examining Relationships in Quantitative Research
Inferential Statistics Introduction. If both variables are categorical, build tables... Convention: Each value of the independent (causal) variable has.
Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.
Advanced Statistical Methods: Continuous Variables REVIEW Dr. Irina Tomescu-Dubrow.
BIVARIATE/MULTIVARIATE DESCRIPTIVE STATISTICS Displaying and analyzing the relationship between continuous variables.
Chapter 15 Association Between Variables Measured at the Interval-Ratio Level.
Bivariate Association. Introduction This chapter is about measures of association This chapter is about measures of association These are designed to.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Other tests of significance. Independent variables: continuous Dependent variable: continuous Correlation: Relationship between variables Regression:
Theme 5. Association 1. Introduction. 2. Bivariate tables and graphs.
Linear Regression Essentials Line Basics y = mx + b vs. Definitions
Copyright © Cengage Learning. All rights reserved.
Chapter 12 Understanding Research Results: Description and Correlation
Regression and Correlation
Regression Analysis.
Introduction to Regression Analysis
Chi-Square X2.
SCATTERPLOTS, ASSOCIATION AND RELATIONSHIPS
PCB 3043L - General Ecology Data Analysis.
CATEGORICAL VARIABLES
Chapter 5 STATISTICS (PART 4).
Basic Statistics Overview
Understanding Standards Event Higher Statistics Award
Applied Statistical Analysis
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Chapter 15: Correlation.
Regression and Residual Plots
Difference Between Means Test (“t” statistic)
Inferential Statistics
An Introduction to Statistics
Lecture Notes The Relation between Two Variables Q Q
CATEGORICAL VARIABLES
Testing hypotheses Continuous variables.
VARIABILITY Distributions Measuring dispersion
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
VARIABILITY Distributions Measuring dispersion
Product moment correlation
An Introduction to Correlational Research
15.1 The Role of Statistics in the Research Process
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Regression & Correlation (1)
Review I am examining differences in the mean between groups How many independent variables? OneMore than one How many groups? Two More than two ?? ?
Warsaw Summer School 2017, OSU Study Abroad Program
Correlation & Regression
Honors Statistics Review Chapters 7 & 8
Presentation transcript:

Testing hypotheses Continuous variables

Review: testing a hypothesis using categorical variables Hypothesis: Lower income  higher murder rate H L To test this hypothesis we can build tables and calculate percentages. Note that we recoded two continuous variables - income and murder rate - so they became categorical Interpreting percentages is a bit “loosey goosey.” For a more precise estimate of the relationship between the variables we can use the frequencies table to calculate the “Chi-Square” (X2) statistic. We’ll do that later… Frequencies table Percentages table High Murder Low Income 3 1 2 4 High Murder Low Income 75% 25% 33% 67%

CORRELATION r statistic scattergram cloud of dots line of best fit intercorrelations “control” variables

Correlation statistic - r Correlation: measure of the strength of an association (relationship) between continuous variables Values of r Range from –1 to +1 -1 is a perfect negative association (correlation), meaning that as the scores of one variable increase, the scores of the other variable decrease at exactly the same rate +1 is a perfect positive association, meaning that both variables go up or down together, in perfect harmony Intermediate values of r (close to zero) indicate weak or no relationship Zero r (never in real life) means no relationship – that the variables do not change or “vary” together, except as what might happen through chance alone Remember that “negative” doesn’t mean “no” relationship. A negative relationship is just as much a relationship as a positive relationship. + 1 - 1 No relationship Perfect positive relationship Perfect negative relationship

Scattergrams Murder rate Median income Depict the distribution of two continuous variables Each case will have a score for each variable. A “dot” is placed where these scores intersect. When part of a hypothesis, the independent variable scores go on the X (horizontal) axis, and the dependent variable scores go on the Y (vertical) axis Lowest actual scores (not necessarily a zero) and highest actual scores go on the extremes (using scales with regular intervals is fine) The dots form a “cloud” Median income Murder rate X axis Y axis “dot” “Cloud” of dots Hypothesis: Lower income  higher murder rate

Testing a hypothesis using continuous variables Lower income Higher murder rate Median income Murder rate Distribution of cities by median income Distribution of cities by murder rate Median income Murder rate

Two “scattergrams” – each with a “cloud” of dots Y Y NOTE: Dependent variable (Y) is always placed on the vertical axis r = +1 1 2 3 4 5 6 1 2 3 4 5 6 r = - 1 NOTE: Independent variable (X) is always placed on the horizontal axis X X 1 2 3 4 5 1 2 3 4 5 Can changes in one variable be predicted by changes in the other?

Can changes in one variable be predicted by changes in the other? As X changes in value, does Y move correspondingly, either in the same or opposite direction? Here there seems to be no connection between X and Y. One cannot predict values of Y from values of X. r = 0 1 2 3 4 5 X 1 2 3 4 5

Can changes in one variable be predicted by changes in the other? Here as X changes in value by one unit Y also changes in value by one unit. Knowing the value of X one can predict the value of Y. X and Y go up and down together, meaning a positive relationship. r = +1 1 2 3 4 5 X 1 2 3 4 5

Can changes in one variable be predicted by changes in the other? Here as X changes in value by one unit Y also changes in value by one unit. Knowing the value of X one can predict the value of Y. X and Y go up and down in an opposite direction, meaning a negative relationship. r = -1 1 2 3 4 5 X 1 2 3 4 5

Computing r using the “Line of best fit” To arrive at a value of “r” a straight line is placed through the cloud of dots (the actual, “observed” data) This line is placed so that the cumulative distance between itself and the dots is minimized The smaller this distance, the higher the r r’s are normally calculated with computers. Paired scores (each X/Y combination) and the means of X and Y are used to compute: a, where the line crosses the Y axis b, the slope of the line When relationships are very strong or very weak, one can estimate the r value by simply examining the graph Y 1 2 3 4 5 6 b a X 1 2 3 4 5 2

“Line of best fit” Y if y =5, x=3.4 1 2 3 4 5 6 if x =.5, y=2.3 X The line of best fit predicts a value for one variable given the value of the other variable There will be a difference between these estimated values and the actual, known (“observed”) values. This difference is called a “residual” or an “error of the estimate.” As the error between the known and predicted values decreases – as the dots cluster more tightly around the line – the absolute value of r (whether + or –) increases if y =5, x=3.4 1 2 3 4 5 6 if x =.5, y=2.3 X 1 2 3 4 5

A perfect fit: Line of best fit goes “through” each dot Y Y 1 2 3 4 5 6 1 2 3 4 5 6 r = +1.0 a perfect fit r = -1.0 a perfect fit X X 1 2 3 4 5 1 2 3 4 5 4

r = +.65 An intermediate fit yields an intermediate value of r Moderate cumulative distance between line of best fit and “cloud” of dots Y 1 2 3 4 5 6 r = +.65 An intermediate fit yields an intermediate value of r X 1 2 3 4 5 2

Large cumulative distance between line of best fit and “cloud” of dots Y 1 2 3 4 5 6 r = - .19 A poor fit yields a low value of r X 1 2 3 4 5

HYPOTHESIS TESTING r2 and R2 - regression coefficient extreme scores restricted range partial correlation and control variables other correlation techniques

R-squared (r2 or R2), the regression coefficient (aka coefficient of determination) Proportion of the change in the dependent variable (also known as the “effect” variable), in percentage terms, that is accounted for by change in the independent variable (also known as the “predictor” variable) Taken by squaring the correlation coefficient (r) “Little” r squared (r2) depicts the explanatory power of a single independent/predictor variable “Big” R squared (R2) combines the effects of multiple independent/predictor variables. It’s the more commonly used.

Hypothesis: Lower income  higher murder rate How to “read” a scattergram Move along the IV. Do the values of the DV change in a consistent direction? Look across the IV. Does knowing the value of the IV help you predict the value of the DV? Place a straight line through the cloud of dots, trying to minimize the overall distance between the line and the dots. Is the line at a pronounced angle? To the extent that you can answer “yes” to each of these, there is a relationship Change in the IV accounts for thirty-six percent of the change in the DV. A moderate-to-strong relationship, in the hypothesized direction – hypothesis confirmed! r = -.6 r2 = .36

Hypothesis1: Height  Weight Hypothesis2: Age  Weight Class exercise Hypothesis1: Height  Weight Hypothesis2: Age  Weight Height (inches) Weight Age 62 130 23 167 26 64 145 30 150 28 68 60 122 63 125 31 66 20 69 236 40 115 21 175 22 65 29 208 190 74 230 25 67 34 117 27 71 195 24 180 220 70 Build a scattergram for your assigned hypothesis Be sure that the independent variable is on the X axis, smallest value on the left, largest on the right, just like when graphing any distribution Be sure that the dependent variable is on the Y axis, smallest value on the bottom, largest on top Place a dot representing a case at the intersection of its values on X and Y Place a STRAIGHT line where it minimizes the overall distance between itself and the cloud of dots Use this overall distance to estimate a possible value of r, from -1 (perfect negative relationship,) to 0 (no relationship), to +1 (perfect positive relationship) Remember that “negative” doesn’t mean “no” relationship. Negative relationships are just as much a relationship as positive relationships.

Impact of extreme scores With all cases, weak to moderate positive relationship Less extreme cases, very weak negative relationship Extreme scores can be produced by measurement errors or other circumstances (here, it could be chronic illness or a hereditary disorder). To prevent confusion, such cases are often dropped, but notice should always be given.

Effects of restricted range Age  Height People get taller as they age, right? In this sample, age has no relationship with height. Why? Because the range for age is severely restricted: each case is already an adult! What do we learn from this? KNOW YOUR DATA!

Hypothesis: fewer gun laws  more gun homicides (from Police Issues) Intercorrelations Might associations between variables be distorted by their relationship with other variables (“intercorrelations”)? Issue: Whenever we measure the effect of a variable, we inevitably include the effects of other variables with which our variable of interest is related Example: When we measure the effect of poverty on crime, part of the effect reflects the variable education, with which poverty is related Research articles often begin data analysis with a “correlation matrix” that displays the bivariate (two-variable) correlations between all continuous variables Hypothesis: fewer gun laws  more gun homicides (from Police Issues)

Hypothesis: fewer gun laws  more gun homicides Poverty is strongly associated with law scores and with gun homicides Could the association between law scores and gun homicides reflect, at least in part, the relationship between poverty and gun homicides? We use “partial correlation” to remove the influence of poverty from the relationship between law scores and gun homicides We do so by statistically “controlling” for poverty. Poverty becomes a “control” variable

Controlling for poverty using “partial correlation” Sure enough, when we control for poverty, the relationship between law score and gun homicides (originally, -.366*) becomes non-significant. Poverty was exaggerating the influence of law score on gun homicide To be fair, is it also working the other way around? Let’s test the relationship between poverty and gun homicides, controlling by law score. The original relationship between poverty and gun homicides (-.397*) decreases only slightly. So the more likely cause of changes in gun homicides is changes in poverty, not changes in law scores. THINK BACK! This process accomplishes the same for continuous variables as first-order partial tables did for categorical variables So, what about age and weight?

Hypothesis: Age  weight (older  heavier) Partial correlation of age  weight controlling for height (sample of 19 youths, ages 2-20) The relationship between age and weight decreases only slightly - from .990 to .850. So our hypothesis remains well confirmed. Zero-order correlations between age and weight Partial correlation, controlling for height

Miscellaneous stuff… “Spearman’s r”: Correlation technique for ordinal categorical variables (e.g., Low/Medium/High) Changing the level of measurement from continuous to categorical: Height 76 74 72 70 68 66 64 62 60 58 Weight 240 220 200 180 160 140 120 100 3 4 12 7 LIGHT HEAVY SHORT TALL

Some parting thoughts If we do not use probability (e.g., random) sampling Our results apply only to the cases we “observed” and coded Accounting for the influence of other variables can be tricky R and related statistics are often unimpressive; describing what they mean can be tricky If we use probability sampling Our results can be extended to the population But how accurate will our results be? After all, statistics (e.g., r, R2) will vary from sample from sample. That, actually, is a good thing. If we sample correctly, procedures we will learn (i.e., “inferential statistics”) will allow us to estimate the difference between sample statistics and the actual population parameters. That’s called “error.” These together - the statistical results, and the error - will allow us to interpret our results with far greater precision than is possible without probability sampling. Stand by!

Exam preview You will apply what you have learned about populations, samples, sampling methods and building a scattergram to the “College Education and Police Job Performance” article. You will be given a hypothesis and data from a sample. There will be two variables – the dependent variable, and the independent variable. Both will be categorical, and each will have two levels (e.g., low/high, etc.) You will build a table containing the frequencies (number of cases).   You will build another table with the percentages.  You will analyze the results. Are they consistent with the hypothesis?  You will be given the same data as above, broken down by a control variable. It will also be categorical, with two levels.  You will build first order partial tables, one with frequencies (number of cases), the other with percentages, for each level of the control variable.  You will be asked whether introducing the control variable affects your assessment of the hypothesized zero-order relationship. This requires that you separately compare the results for each level of the control variable to the zero-order table. Does introducing the control variable tell us anything new?   You will be given another hypothesis and data. There will be two variables – the dependent variable and the independent variable. Both are continuous variables. You will build a scattergram and draw in a line of best fit.  You will state whether the scattergram supports the hypothesis. Be careful! First, is there a relationship between variables? Second, is it in the same direction (positive or negative) as the hypothesized relationship?