Major Points Scatterplots The correlation coefficient –Correlations on ranks Factors affecting correlations Testing for significance Intercorrelation matrices.

Slides:



Advertisements
Similar presentations
Hypothesis Testing Steps in Hypothesis Testing:
Advertisements

Inference for Regression
Review ? ? ? I am examining differences in the mean between groups
Correlation & Regression Chapter 10. Outline Section 10-1Introduction Section 10-2Scatter Plots Section 10-3Correlation Section 10-4Regression Section.
Correlation. The Problem Are two variables related?Are two variables related? XDoes one increase as the other increases? e. g. skills and incomee. g.
Describing Relationships Using Correlation and Regression
Education 793 Class Notes Joint Distributions and Correlation 1 October 2003.
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
Correlation CJ 526 Statistical Analysis in Criminal Justice.
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
Cal State Northridge  320 Andrew Ainsworth PhD Regression.
CJ 526 Statistical Analysis in Criminal Justice
Lecture 4: Correlation and Regression Laura McAvinue School of Psychology Trinity College Dublin.
The Simple Regression Model
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 6: Correlation.
Cal State Northridge  320 Andrew Ainsworth PhD Correlation.
Correlation A correlation exists between two variables when one of them is related to the other in some way. A scatterplot is a graph in which the paired.
Correlation and Regression. Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes.
Statistics Review Levels of Measurement.
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
Cal State Northridge 427 Ainsworth
Chapter 9 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 What is a Perfect Positive Linear Correlation? –It occurs when everyone has the.
Relationships Among Variables
Linear Regression Modeling with Data. The BIG Question Did you prepare for today? If you did, mark yes and estimate the amount of time you spent preparing.
Covariance and correlation
Correlation.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Correlation and Regression
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
Chapter 15 Correlation and Regression
1 Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Bivariate Correlation Lesson 11. Measuring Relationships n Correlation l degree relationship b/n 2 variables l linear predictive relationship n Covariance.
Research & Statistics Looking for Conclusions. Statistics Mathematics is used to organize, summarize, and interpret mathematical data 2 types of statistics.
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
Investigating the Relationship between Scores
Elementary Statistics Correlation and Regression.
Psych 230 Psychological Measurement and Statistics Pedro Wolf September 23, 2009.
Correlation Analysis. Correlation Analysis: Introduction Management questions frequently revolve around the study of relationships between two or more.
By: Amani Albraikan.  Pearson r  Spearman rho  Linearity  Range restrictions  Outliers  Beware of spurious correlations….take care in interpretation.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
Chapter 9 Correlation. 2 Chapter 9 Correlation Major Points The problemThe problem ScatterplotsScatterplots An exampleAn example The correlation coefficientThe.
Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise.
Data Analysis.
Chapter 7 Calculation of Pearson Coefficient of Correlation, r and testing its significance.
1 Chapter 10 Correlation. 2  Finding that a relationship exists does not indicate much about the degree of association, or correlation, between two variables.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
1 MVS 250: V. Katch S TATISTICS Chapter 5 Correlation/Regression.
Chapter 13 Understanding research results: statistical inference.
Data Analysis. Qualitative vs. Quantitative Data collection methods can be roughly divided into two groups. It is essential to understand the difference.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 3 Investigating the Relationship of Scores.
Regression and Correlation
Difference between two groups Two-sample t test
Correlation and Simple Linear Regression
Inference for Regression
Correlation – Regression
Elementary Statistics
EDRS6208 Fundamentals of Education Research 1
Scientific Practice Correlation.
Correlation and Simple Linear Regression
Fundamental Statistics for the Behavioral Sciences, 4th edition
Correlation and Simple Linear Regression
Inferential Statistics
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Major Points Scatterplots The correlation coefficient –Correlations on ranks Factors affecting correlations Testing for significance Intercorrelation matrices Other kinds of correlations

The Problem Are two variables related? Does one increase as the other increases? e. g. skills and income Does one decrease as the other increases? e. g. health problems and nutrition How can we get a graphical representation of the degree of relationship?

Relation between father and son’s height: Pearson, (1896) l Reliability 1

Another dataset: Heart Disease and Cigarettes Landwehr & Watkins report data on heart disease and cigarette smoking in 21 developed countries Data have been rounded for computational convenience. The results were not affected.

Scatterplot of Heart Disease CHD Mortality goes on y axis Cigarette consumption on x axis What does each dot represent? Best fitting line included for clarity

Cigarette Consumption per Adult per Day CHD Mortality per 10, {X = 6, Y = 11} 2

Cigarette Consumption per Adult per Day CHD Mortality per 10, {X = 6, Y = 11} 3

What Does the Scatterplot Show? As smoking increases, so does coronary heart disease mortality. Relationship looks strong Not all data points on line. This gives us “residuals” or “errors of prediction”

Example Scatterplots x x x x x x x x x x x x x x x x x x x x x x y x x x x x x x x x x x x x x x x x x x x x x y High correlationLow correlation 4

Scatter plots: r =.00 5

r =.15

r =.40 6

r =.81

r =.99 7

r = -.79 Guessing correlations: from Rice Universityfrom Rice University 10

Another way to visualize a correlation Variance in A Variance in b Variance in A Variance in b Covariance 11

What is a Correlation Coefficient A measure of degree of relationship. Sign refers to direction. Based on covariance Measure of degree to which large scores go with large scores, and small scores with small scores Pearson’s correlation coefficient is most often used

Cigarette Consumption and Coronary Heart Disease Mortality for 21 countries Cigarette Consumption: per adult per day Coronary Heart Disease: Mortality per 10,000 population

Covariance The formula Index of degree to which both list of numbers covary When would cov XY be large and positive? When would cov XY be large and negative?

Calculation Cov XY = s X = 2.33 s Y = 6.69

Correlation Coefficient Symbolized by r Covariance ÷ (product of st. dev.)

Correlation in a random sample Generated 6 sets of random numbers (100 each) The correlation Matrix

Factors Affecting r Range restrictions Outliers Nonlinearity e.g. anxiety and performance Heterogeneous subsamples Everyday examples

The effect of outliers on correlations Dataset: 20 cases selected from darts and pros DARTS Pros r =.80

Dataset: one case altered to give more extreme values DARTS Pros r =

Summary of effect of outliers A few extreme values can have extreme effects Especially when sample size is sample You cannot randomly toss out data! You need to have a theoretical or statistical justification

Restriction of range: Countries With Low Consumptions Data With Restricted Range Truncated at 5 Cigarettes Per Day Cigarette Consumption per Adult per Day CHD Mortality per 10,

R between between grades in high school and grades in college. Scatter plot for 250 students who vary on High School GPA Scatter plot for students who have GPA equal to or greater than 3.5

no effect on Pearson's correlation coefficient. Example: r between height and weight is the same regardless of whether height is measured in inches, feet, centimeters or even miles. This is a very desirable property since choice of measurement scales that are linear transformations of each other is often arbitrary. Effect of linear transformations of data

An example: Scores on the Scholastic Aptitude Test (SAT) range from to 800 is an arbitrary range. You could subtract 100 points from each score and multiply each score by 3. Scores on the SAT would then range from Test would remain the same. r between SAT and some other variable (such as college grade point average) would not be affected by this linear transformation.

Non linear relationships Example: Anxiety and Performance r =.07 13

The interpretation of a correlation coefficient Ranges from –1 to 1 No correlation in the data means you will get a is 0 r or near it Suffers from sampling error (like everything else!). So you need to estimate true population correlation from the sample correlation.

Correlations in the sample differ from the correlations in the population by some amount (sampling error) Sometimes it is higher than population correlation, sometimes it is lower, rarely on the target. How do you know when to accept and when to reject correlation?

Possible ways to decide Accept it if it fits your hypothesis, reject it otherwise! Toss a coin Democratically: Ask your officemates to vote.

Fisherian Statistics: Null and Alternative Hypothesis Sampling error implies that sometimes the results we obtain will be due to chance (since not every sample will accurately resemble the population) The null hypothesis expresses the idea that an observed difference is due to chance. For example: There is no difference between the norms regarding the use of and voice mail

The alternative hypothesis (the experimental hypothesis) is often the one that you formulate: there is a correlation between people’s perception of a website’s reliability and the probability of their buying something on the site Why bother to have a null hypothesis? –Can you reject the null hypothesis The alternative hypothesis

An Example Relationship between browsing and buying on an electronic commerce site Data gathered from server logs Hypothesis: Those who browse longer also tend to purchase Hypothesis can be framed in another way: There is no relationship between time spent browsing and likelihood of purchase (Null Hypothesis)

Testing the significance of a r Population parameter =  Null hypothesis H 0 :  = 0 What would a true null mean here? What would a false null mean here? Alternative hypothesis (H 1 )

Tables of Significance Table in Appendix E.2 For N - 2 = 19 df, r crit =.433 Our correlation >.433 Reject H 0 Correlation is significant. More cigarette consumption associated with more CHD mortality.

SPSS Printout SPSS Printout gives test of significance. Double asterisks with footnote indicate p <.01.

SPSS Printout

SPSS printout for scatterplot

OPTIM RELINFL RELINV RELHOPE A matrix of scatterplots Correlation is significant at the 0.01 level (2-tailed). ** **.167**.266**.272** **.419**.167**.449** **.266**.419**.544**1.000 OPTIM RELINFL RELINV RELHOPE OPTIMRELINFLRELINVRELHOPE

A review of Scatterplots next three slides Infant mortality and number of physicians Life expectance and health care expenditures Cancer rate and solar radiation