Correlation and regression 1: Correlation Coefficient

Slides:



Advertisements
Similar presentations
Quantitative Methods in Social Research 2010/11 Week 5 (morning) session 11th February 2011 Descriptive Statistics.
Advertisements

Correlation and Linear Regression.
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.
Describing Relationships Using Correlation and Regression
Chapter 8 Linear Regression © 2010 Pearson Education 1.
Chapter 10 Regression. Defining Regression Simple linear regression features one independent variable and one dependent variable, as in correlation the.
Scatter Diagrams and Linear Correlation
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
Section 7.1 ~ Seeking Correlation
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
CORRELATON & REGRESSION
Basic Statistical Concepts Psych 231: Research Methods in Psychology.
Lecture 4: Correlation and Regression Laura McAvinue School of Psychology Trinity College Dublin.
Basic Statistical Concepts
Statistics Psych 231: Research Methods in Psychology.
Basic Statistical Concepts Part II Psych 231: Research Methods in Psychology.
Correlation and Regression. Relationships between variables Example: Suppose that you notice that the more you study for an exam, the better your score.
Leon-Guerrero and Frankfort-Nachmias,
Chapter 8: Bivariate Regression and Correlation
Chapter 12 Correlation and Regression Part III: Additional Hypothesis Tests Renee R. Ha, Ph.D. James C. Ha, Ph.D Integrative Statistics for the Social.
Correlation By Dr.Muthupandi,. Correlation Correlation is a statistical technique which can show whether and how strongly pairs of variables are related.
STATISTICS: BASICS Aswath Damodaran 1. 2 The role of statistics Aswath Damodaran 2  When you are given lots of data, and especially when that data is.
Scatter Plots and Linear Correlation. How do you determine if something causes something else to happen? We want to see if the dependent variable (response.
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
Introduction to Quantitative Data Analysis (continued) Reading on Quantitative Data Analysis: Baxter and Babbie, 2004, Chapter 12.
Chapter 6 & 7 Linear Regression & Correlation
Regression and Correlation. Bivariate Analysis Can we say if there is a relationship between the number of hours spent in Facebook and the number of friends.
Wednesday, October 12 Correlation and Linear Regression.
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Chapter 8 – 1 Chapter 8: Bivariate Regression and Correlation Overview The Scatter Diagram Two Examples: Education & Prestige Correlation Coefficient Bivariate.
Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution.
1.6 Linear Regression & the Correlation Coefficient.
 Graph of a set of data points  Used to evaluate the correlation between two variables.
Objective: Understanding and using linear regression Answer the following questions: (c) If one house is larger in size than another, do you think it affects.
Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Describing Relationships Using Correlations. 2 More Statistical Notation Correlational analysis requires scores from two variables. X stands for the scores.
CORRELATION. Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson’s coefficient of correlation.
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
3.2: Linear Correlation Measure the strength of a linear relationship between two variables. As x increases, no definite shift in y: no correlation. As.
April 1 st, Bellringer-April 1 st, 2015 Video Link Worksheet Link
Creating a Residual Plot and Investigating the Correlation Coefficient.
3.3 Correlation: The Strength of a Linear Trend Estimating the Correlation Measure strength of a linear trend using: r (between -1 to 1) Positive, Negative.
Correlation and Regression: The Need to Knows Correlation is a statistical technique: tells you if scores on variable X are related to scores on variable.
Section 2.6 – Draw Scatter Plots and Best Fitting Lines A scatterplot is a graph of a set of data pairs (x, y). If y tends to increase as x increases,
Correlation The apparent relation between two variables.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
CHAPTER 5 CORRELATION & LINEAR REGRESSION. GOAL : Understand and interpret the terms dependent variable and independent variable. Draw a scatter diagram.
2.5 Using Linear Models A scatter plot is a graph that relates two sets of data by plotting the data as ordered pairs. You can use a scatter plot to determine.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8- 1.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
.  Relationship between two sets of data  The word Correlation is made of Co- (meaning "together"), and Relation  Correlation is Positive when the.
STATISTICS 12.0 Correlation and Linear Regression “Correlation and Linear Regression -”Causal Forecasting Method.
Chapter 14 STA 200 Summer I Scatter Plots A scatter plot is a graph that shows the relationship between two quantitative variables measured on the.
1 MVS 250: V. Katch S TATISTICS Chapter 5 Correlation/Regression.
GOAL: I CAN USE TECHNOLOGY TO COMPUTE AND INTERPRET THE CORRELATION COEFFICIENT OF A LINEAR FIT. (S-ID.8) Data Analysis Correlation Coefficient.
Chapter 15 Association Between Variables Measured at the Interval-Ratio Level.
Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,
Part II Exploring Relationships Between Variables.
Describing Bivariate Relationships. Bivariate Relationships When exploring/describing a bivariate (x,y) relationship: Determine the Explanatory and Response.
Scatterplots Chapter 6.1 Notes.
The Weather Turbulence
Correlation and Regression
M248: Analyzing data Block D UNIT D3 Related variables.
Correlation.
y = mx + b Linear Regression line of best fit REMEMBER:
Algebra Review The equation of a straight line y = mx + b
Presentation transcript:

Correlation and regression 1: Correlation Coefficient Friday 8th February 2013

Looking at the relationship between two interval-ratio variables When we want to know how two variables are related to one another the pattern of the data points on the scatterplot can illustrate various patterns and relationships, including: data correlation positive or direct relationships between variables negative or inverse relationships between variables non-linear patterns

Thinking about lines… What can we measure?: Gradient – a measure of how the line slopes Intercept – where the line cuts the y axis Correlation – a measure of how well the line fits the data Equation for a line: y = a + bx a is the point at which the line crosses the y axis (when x=0). b is a measure of the slope (the amount of change in y that occurs with a 1-unit change in x). 5 4 3 2 1 y = 1.5 + 0.5x 0 1 2 3 4 5

Linear relationship The technique of line-fitting, known as regression is used to measure how well a line fits a scatter of plots. When the data points form a straight line on the graph, the linear relationship between the variables is stronger and the correlation is higher. The following scatterplot shows a strong linear relationship between the two variables. We say that these two variables are highly correlated.

Positive and negative relationships Positive or direct relationships If the points cluster around a line that runs from the lower left to upper right of the graph area, then the relationship between the two variables is positive or direct. An increase in the value of x is more likely to be associated with an increase in the value of y. The closer the points are to the line, the stronger the relationship. Negative or inverse relationships If the points tend to cluster around a line that runs from the upper left to lower right of the graph, then the relationship between the two variables is negative or inverse. An increase in the value of x is more likely to be associated with a decrease in the value of y.

There are lots of online sites where you can explore this topic: Three examples: http://argyll.epsb.ca/jreed/math9/strand4/scatterPlot.htm This site lets you produce your own scatter plot, produce a line of best fit, practice interpolating data points on the line, and look at the correlation coefficient. http://www.stat.berkeley.edu/~stark/Java/Html/Correlation.htm This site lets you alter a scatter plot and add your own points, see the point of averages, standard deviation lines, and correlation coefficient as well as plot the regression line and more. http://www.stat.uiuc.edu/courses/stat100/java/GCApplet/GCAppletFrame.html This site allows you to guess correlations. You can also take a look at Chapter 8 of Statistics for the Terrified.

Working out the correlation coefficient (Pearson’s r) Pearson’s r tells us how much one variable changes as the values of another changes – their covariation. Variation is measured with the standard deviation. This measures average variation of each variable from the mean for that variable. Covariation is measured by calculating the amount by which each value of X varies from the mean of X, and the amount by which each value of Y varies from the mean of Y and multiplying the differences together and finding the average (by dividing by n-1). Pearson’s r is calculated by dividing this by (SD of x) x (SD of y) in order to standardize it.

Working out the correlation coefficient (Pearson’s r) This can also be calculated as the average sum of the products of the standardized values of x and y: Because r is standardized it will always fall between +1 and -1. A correlation of either 1 or -1 means perfect association between the two variables. A correlation of 0 means that there is no association. Note: correlation does not mean causation. We can only investigate causation by reference to our theory. However (thinking about it the other way round) there is unlikely to be causation if there is not correlation.

Worked Example: x y x in standardized units y in standardized units Product 1 5 3 9 4 7 13 Average of x = 4, SD = 2 Average of y = 7, SD = 4 Note: reminder of how to standardize scores:

Worked Example: x y x in standardized units y in standardized units Product 1 5 -1.5 -0.5 3 9 0.5 4 7 0.0 13 1.5 Average of x = 4, SD = 2 Average of y = 7, SD = 4 Note: reminder of how to standardize scores:

Worked Example: x y x in standardized units y in standardized units Product 1 5 -1.5 -0.5 0.75 3 9 0.5 -0.25 4 7 0.0 0.00 -0.75 13 1.5 2.25 Average of x = 4, SD = 2 Average of y = 7, SD = 4 Average of the products: = 0.75 + -0.25 + 0 + -0.75 + 2.25 = 2.00 Note: reminder of how to standardize scores: Divide by n-1: = 2.00/(5-1) = 2/4 = .5

Explained Variation Pearson’s r measures strength of association between two variables. It does not tell you how much of variable y is explained by variable x. To get this you need to calculate r2. This is known as the coefficient of determination. In this example r2 = 0.5 x 0.5 = 0.25. Therefore 25% of the variation in y is explained by x.

Going back to the line… The regression line for y on x estimates the average value for y corresponding to each value of x Associated with each increase of one SD of x there is an increase of r SDs in y, on the average. The regression estimate y Point of averages r x SDy SDx x