Simple Correlation Scatterplots & r Interpreting r Outcomes vs. RH:

Slides:



Advertisements
Similar presentations
Bivariate &/vs. Multivariate
Advertisements

Simple Linear Regression and Correlation by Asst. Prof. Dr. Min Aung.
Transformations & Data Cleaning
Inference for Regression
Pearson’s X 2 Correlation vs. X² (which, when & why) Qualitative/Categorical and Quantitative Variables Contingency Tables for 2 Categorical Variables.
Describing Relationships Using Correlation and Regression
Chapter 6: Correlational Research Examine whether variables are related to one another (whether they vary together). Correlation coefficient: statistic.
Correlation CJ 526 Statistical Analysis in Criminal Justice.
CJ 526 Statistical Analysis in Criminal Justice
Behavioural Science II Week 1, Semester 2, 2002
Multiple Regression Models Advantages of multiple regression Important preliminary analyses Parts of a multiple regression model & interpretation Differences.
Power Analysis for Correlation & Multiple Regression Sample Size & multiple regression Subject-to-variable ratios Stability of correlation values Useful.
Multiple Regression Models: Some Details & Surprises Review of raw & standardized models Differences between r, b & β Bivariate & Multivariate patterns.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Simple Regression correlation vs. prediction research prediction and relationship strength interpreting regression formulas –quantitative vs. binary predictor.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Some Details about Bivariate Stats Tests Conceptualizing the four stats tests Conceptualizing NHST, critical values and p-values NHST and Testing RH: Distinguishing.
REGRESSION AND CORRELATION
Bivariate & Multivariate Regression correlation vs. prediction research prediction and relationship strength interpreting regression formulas process of.
Simple Correlation RH: correlation questions/hypotheses and related statistical analyses –simple correlation –differences between correlations in a group.
Today Concepts underlying inferential statistics
Using Statistics in Research Psych 231: Research Methods in Psychology.
An Introduction to Classification Classification vs. Prediction Classification & ANOVA Classification Cutoffs, Errors, etc. Multivariate Classification.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Pearson’s r Scatterplots for 2 Quantitative Variables Research and Null Hypotheses for r Casual Interpretation of Correlation Results (& why/why not) Computational.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Effect Sizes, Power Analysis and Statistical Decisions Effect sizes -- what and why?? review of statistical decisions and statistical decision errors statistical.
Relationships Among Variables
Correlation and Regression A BRIEF overview Correlation Coefficients l Continuous IV & DV l or dichotomous variables (code as 0-1) n mean interpreted.
1/2555 สมศักดิ์ ศิวดำรงพงศ์
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Hypothesis Testing PowerPoint Prepared by Alfred.
User Study Evaluation Human-Computer Interaction.
Experimental Research Methods in Language Learning Chapter 11 Correlational Analysis.
Data Analysis (continued). Analyzing the Results of Research Investigations Two basic ways of describing the results Two basic ways of describing the.
Association between 2 variables
The Correlational Research Strategy
CORRELATIONS: TESTING RELATIONSHIPS BETWEEN TWO METRIC VARIABLES Lecture 18:
Psych 230 Psychological Measurement and Statistics Pedro Wolf September 23, 2009.
Correlation Analysis. Correlation Analysis: Introduction Management questions frequently revolve around the study of relationships between two or more.
Pearson’s r & X 2 Correlation vs. X² (which, when & why) Qualitative/Categorical and Quantitative Variables Scatterplots for 2 Quantitative Variables Research.
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
Chapter 16 Data Analysis: Testing for Associations.
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder.
Digression - Hypotheses Many research designs involve statistical tests – involve accepting or rejecting a hypothesis Null (statistical) hypotheses assume.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
The Correlational Research Strategy Chapter 12. Correlational Research The goal of correlational research is to describe the relationship between variables.
Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.
Simple Correlation Scatterplots & r –scatterplots for continuous - binary relationships –H0: & RH: –Non-linearity Interpreting r Outcomes vs. RH: –Supporting.
Chapter 7 Calculation of Pearson Coefficient of Correlation, r and testing its significance.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Fall 2015 Room 150 Harvill.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Inferential Statistics Psych 231: Research Methods in Psychology.
Descriptive Statistics Psych 231: Research Methods in Psychology.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Inferential Statistics Psych 231: Research Methods in Psychology.
Correlation and Simple Linear Regression
Inferential Statistics:
Correlation and Simple Linear Regression
Review for Exam 2 Some important themes from Chapters 6-9
Correlation and Simple Linear Regression
Simple Linear Regression and Correlation
Inferential Statistics
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Simple Correlation Scatterplots & r Interpreting r Outcomes vs. RH: Linearity, strength & direction scatterplots for continuous - binary relationships Quantitative & binary predictors H0: & RH: Non-linearity Interpreting r Outcomes vs. RH: Supporting vs. “contrary” results Outcomes vs. Population Correct vs. Error results

A scatterplot a graphical depiction of the relationship between two quantitative (or binary) variables each participant’s x & y values depicted as a point in x-y space Pearson’s correlation coefficient (r value) summarizes the direction and strength of the linear relationship between two quantitative variables into a single number (range from -1.00 to 1.00) you should always examine the scatterplot before considering the correlation between two variable NHST can be applied to test if the correlation in the sample is sufficiently large to reject H0: of no linear relationship between the variables in the population A linear regression formula allows us to take advantage of this relationship to estimate or predict the value of one variable (the criterion) from the other (the predictor). prediction should only be applied if the relationship between the variables is “linear” and “substantial”

Example of a “scatterplot” Puppy Age (x) Eats (y) Sam Ding Ralf Pit Seff … Toby 5 4 3 2 1 8 20 12 4 24 .. 16 2 4 1 .. 3 Amount Puppy Eats (pounds) 4 8 12 16 20 24 Age of Puppy (weeks)

When examining a scatterplot, we look for three things... relationship no relationship linear non-linear direction (if linear) positive negative strength strong moderate weak No relationship nonlinear, strong Hi Hi Lo Lo Lo Hi Lo Hi linear, positive, weak linear, positive, strong linear, negative, moderate Hi Hi Hi Lo Lo Lo Lo Lo Hi Lo Hi Hi

We can use correlation to examine the relationship between a quantitative predictor variable and a quantitative criterion variable. Y Y Y strong + weak + +1.00 X X X A positive r tells us those higher X values tend to have higher Y values Y Y Y strong - weak - .00 X X X A negative r tells us those with lower X values tend to have higher Y values A nonsignificant r tells us there is no linear relationship between X & Y

A nonsignificant r tells us the groups have “equivalent” means on Y We can also use correlation to examine the relationship between a binary predictor variable and a quantitative criterion variable. Y Y Y strong + weak + +1.00 grp 1 grp 2 grp 1 grp 2 grp 1 grp 2 A positive r tells us the group with the higher X code as the higher mean Y Y Y Y strong - weak - .00 grp 1 grp 2 grp 1 grp 2 grp 1 grp 2 A negative r tells us the group with the lower X code as the higher mean Y A nonsignificant r tells us the groups have “equivalent” means on Y

For each of the following show the envelope for the H0: and the RH: People who have more depression before therapy will be those who have more depression after therapy. Instructor isn’t related to practice. H0: RH: Depression after Depression before H0: RH: # Errors Those who study more have fewer errors on the spelling test Study H0: RH: Practice 0 1 Instructor

For each of the following show the envelope for the H0: and the RH: People who score better on the pretest will be those who tend to score worse on the posttest I predict that snapping turtles (coded 1) will eat more crickets than painted turtles (coded 0). H0: RH: Post-test Pretest H0: RH: Depression You can’t predict depression from the number of therapy sessions # Sessions H0: RH: # Crickets 0 1 Species

The Pearson’s correlation ( r ) summarizes the direction and strength of the linear relationship shown in the scatterplot r has a range from -1.00 to 1.00 1.00 a perfect positive linear relationship 0.00 no linear relationship at all -1.00 a perfect negative linear relationship r assumes that the relationship is linear if the relationship is not linear, then the r-value is an underestimate of the strength of the relationship at best and meaningless at worst For a non-linear relationship, r will be based on a “rounded out” envelope -- leading to a misrepresentative r

Extreme Non-linear relationship r value is “misinformative” Scatterplot as correlation “sees it” actual scatterplot notice... there is an x-y relationship regression line has 0 slope & r = 0 -- no linear relationship

Moderate Non-linear relationship r value is an underestimate of the strength of the nonlinear relationship Scatterplot as correlation “sees it” actual scatterplot notice... there is an x-y relationship regression line has non-0 slope & r ~= 0 but, the regression line not a great representation of the bivariate relationship

NHST testing with r H0: r = 0.00 is the same as r2 = 0.00 get used to working with both r (the correlation between the 2 vars ) and r2 (the “variance shared between the 2 vars)” Performing the significance test software will usually provide an exact p-value (use p < .05) a general formula is … r² N = sample size F = ------------------------ (1 - r²) / (N - 2) Find F-critical using df = 1 & N-2

What “retaining H0:” and “Rejecting H0:” means... When you retain H0: you’re concluding… The linear relationship between these variables in the sample is not strong enough to allow me to conclude there is a linear relationship between them in the population represented by the sample. When you reject H0: you’re concluding… The linear relationship between these variables in the sample is strong enough to allow me to conclude there is a linear relationship between them in the population represented by the sample.

effect significance vs. effect size vs. shared variance The p-value (value range 1.0 – 0) tells the probability of making a Type I error if you reject the H0: based on the sample data e.g., p = .10 means “if we reject H0: based on these data there is a 10% chance that there really is no relationship between the variables in the population represented by the sample” The usual “acceptable risk” is less than 5% or p < .05 r (range -1.0 – 1.0) tells strength and direction of the bivariate relationship between Y & X “large enough to be interesting” value vary across research areas , but a common guideline is .10 = small, .30 = medium and .50 = large r2 (range 0 – 1.0) tells how much of the Y variability is “accounted for,” “predicted from” or “caused by” X e.g., r=.30 means that .302 (9%) of the Y variability is accounted for by X “large enough to be interesting” will vary across research areas , but a common guideline is 1% = small, 10% = medium and 25% = large

Interpret each of the following (significance, strength & direction) For age & social skills: r = .25, p = .043. For practice and performance errors: r = -.52, p = .015 For age and performance: r = -.33, p = .231 For gender (m=1, f=2) and social skills: r = .14, p = .004 For gender (m=1, f=2) and performance: r = -.31, p = .029 For gender (m=1, f=2) and practice: r = .11, p = .098 Sig – medium – positive  Older adolescents tend to have higher social skills scores Sig – large – negative  Those who practiced more tended to have fewer errors Nonsig – medium? - negative ? There is no linear relationship between age and performance Sig – small – positive  Females had higher mean on social skills scores Sig – medium – negative Males had higher mean performance Nonsig – small? – positive?  No mean practice difference between males & females

Statistical Conclusion Errors In the population there are only three possibilities... … and three possible statistical decisions In the Population -r r = 0 +r Outcomes -r r = 0 +r Type I error Correctly rejected H0: Type III error Type II error Type II error Correctly retained H0: Type I error Type III error Correctly rejected H0: Please note that this is a different question than whether the results “match” the RH: This is about whether the results from the sample are “correct” – whether the results are “represent the population. This is about statistical conclusion validity

The 9 outcomes come in 5 types … Type I error -- “false alarm” - finding a significant mean difference between the conditions in the study when there really isn’t a difference between the populations Type II error -- “miss” - finding no difference between the conditions of the study when there really is a difference between the populations Type III error -- “misspecification” - finding a difference between the conditions of the study that is different from the the difference between the populations Correctly retained H0: -- finding no difference between the conditions of the study when there really is no difference between the populations Correctly rejected H0: -- finding a difference between the conditions of the study that is the same as the the difference between the populations

Correct rejection Type III Correct retention Type I Type II Practice with statistical decision errors ... We found that students who did more homework problems tended to have higher exam scores, which is what the other studies have found. Correct rejection We found that students who did more homework problems tended to have lower exam scores. All other studies found the opposite effect. Type III We found that students who did more homework problems and those who did fewer problems tended to have about the same exam scores, which is what the other studies have found. Correct retention We found that students who did more homework problems tended to have lower exam scores. Ours is the only study with this finding. Can’t tell -- what DID the other studies find? We found that students who did more homework problems tended to have lower exam scores. Ours is the only study with this finding, others find no relationship. Type I We found that students who did more homework problems and those who did fewer problems tended to have about the same exam scores. Everybody else has found that homework helps. Type II

correlation RH: vs. outcomes There are only three possible Research Hypotheses … and three possible statistical outcomes Research Hypotheses -r r = 0 +r Outcomes -r r = 0 +r ? ? ? ? ? ? ? ? So, there are only 9 possible combinations of RH: & Outcomes … of 4 types “effect as expected” “unexpected null” “unexpected effect” “backward effect” Results contrary to RH:

The direction of the r might be opposite that of the RH: Keep in mind that rejecting H0: does not guarantee support for the research hypothesis? Why not ??? The direction of the r might be opposite that of the RH: The RH: might be that’s there’s no correlation (RH: = H0:) ? ? ? ? Remember !!! Our purpose is not to “Reject the H0:” … nor even to “support our RH:” … Our real purpose is for our results to represent the relationship between the constructs in the target population !!!!!

A quick focus on the two that are most often confused … Type III Statistical Decision Error When our significant findings have a direction or pattern different from that found in the population A difference between “the effect we found” and “the effect we should have found” “Results contrary to our RH:” When our findings have a direction or pattern different from what we had hypothesized A difference between “the effect we found” and “the effect we hypothesized” A result can be BOTH!!!!! (Or neither, or one, or the other !!!)

+ direction/pattern H0: - direction/pattern RH:, statistical conclusions & statistical decision errors ... RH: supported Unexpected H0: Unexpected effect “backward” Results Statistical Decision + direction/pattern (p < .05) H0: (p > .05) - direction/pattern (p < .05 RH: + direction/pattern H0: - direction/pattern Correct rejection, Type I or Type III Correct rejection, Type I or Type III Correct rejection, Type I or Type III Correct retention or Type II Correct retention or Type II Correct retention or Type II Correct rejection, Type I or Type III Correct rejection, Type I or Type III Correct rejection, Type I or Type III

Our RH: was incorrect, not supported, but our results were right!!! Lets practice … Our RH: was that there will be a negative correlation between performance on the GRE and cumulative GPA. We found r = .47, p = .016. These results are “contrary to our RH:” -- a significant relationship in the opposite direction from the RH: A literature review revealed 105 other studies involving these two variables, each of which found a correlation between .43 and .61 (all p < .05). The consistent results of these other studies suggests that our finding was a correct rejection – what we found “does describe the relationship between these variables in the population”. Our RH: was incorrect, not supported, but our results were right!!!

Our RH: was incorrect, not supported & our results were wrong!!! Another … Our RH: was that there will be a positive correlation between the severity of depression at the beginning of therapy and the amount of improvement a patient shows during the first six weeks of therapy. We found r = .27, p = .085. These results are “contrary to our RH:” -- a nonsignificant relationship isn’t the RH: +r The 14 studies of these two variables which followed ours each found a correlation between -.33 and -.41 (all p < .05). The consistent findings of these other studies suggests that our finding was a Type II error – what we found “doesn’t describe the likely relationship between these variables in the population”. Our RH: was incorrect, not supported & our results were wrong!!!

Our RH: was incorrect but supported & our results were wrong !!! Try this one … Our RH: was that there will be a positive correlation between social skills and comfort in an unfamiliar social situation. We found r (82) = .37, p = .016. These results “support our RH:” - a significant relationship in the RH: direction A literature review revealed 22 other studies involving these two variables, each of which found a correlation between -.13 and .11 (all p > .05) The consistent results of these other studies suggests that our finding was a Type I error – what we found “does not describe the relationship between these variables in the population”. Our RH: was incorrect but supported & our results were wrong !!!

Our finding was consistent with earlier research! Last one … Our RH: is that there will be a positive correlation between how much a person likes to compliment people and the number of close friends a person reports. We found r (58) = .30, p < .05. These results “support our RH:” -- a significant, positive relationship, as hypothesized A literature review revealed 8 other studies of these two variables, each of which found a correlation between .25 and .32 (all p < .05). Our finding was consistent with earlier research! The “researchers Trifecta” RH: is correct & supported and the results are correct 1!!! Keep in mind … There are 27 combinations of RH: (+ 0 -), Results (+ 0 -) and Population value (+ 0 -). “Success” depends more on a consistent agreement of the last two than of the first two!