Download presentation
Presentation is loading. Please wait.
Published byArnold Park Modified over 9 years ago
2
Introduction to Correlation and Regression Introduction to Correlation and Regression Rahul Jain
3
Outline Outline Introduction Introduction Linear Correlation Linear Correlation Regression Regression Simple Linear Regression Simple Linear Regression Model/Formulas Model/Formulas
4
Outline continued Applications Applications Real-life Applications Real-life Applications Practice Problems Practice Problems Internet Resources Internet Resources Applets Applets Data Sources Data Sources
5
Correlation Correlation Correlation A measure of association between two numerical variables. A measure of association between two numerical variables. Example (positive correlation) Example (positive correlation) Typically, in the summer as the temperature increases people are thirstier. Typically, in the summer as the temperature increases people are thirstier.
6
Specific Example Specific Example For seven random summer days, a person recorded the temperature and their water consumption, during a three-hour period spent outside. For seven random summer days, a person recorded the temperature and their water consumption, during a three-hour period spent outside. Temperature (F) Water Consumption (ounces) 7516 8320 85 25 8527 9232 9748 99 48
7
How would you describe the graph?
8
How “strong” is the linear relationship?
9
Measuring the Relationship Pearson’s Sample Correlation Coefficient, r measures the direction and the strength of the linear association between two numerical paired variables. measures the direction and the strength of the linear association between two numerical paired variables.
10
Direction of Association Positive Correlation Positive CorrelationNegative Correlation
11
Strength of Linear Association r value Interpretation 1 perfect positive linear relationship 0no linear relationship perfect negative linear relationship
12
Strength of Linear Association
13
Other Strengths of Association r value Interpretation 0.9strong association 0.5moderate association 0.25weak association
14
Other Strengths of Association
15
Formula = the sum n = number of paired items x i = input variabley i = output variable x = x-bar = mean of x’s y = y-bar = mean of y’s s x = standard deviation of x’s s y = standard deviation of y’s
16
403.715 Coefficient of Determination: R 2 A quantification of the significance of estimated model is denoted by R 2. R 2 > 85% = significant model R 2 < 85% = model is perceived as inadequate Low R 2 will suggest a need for additional predictors for modeling the mean of Y
17
Regression Regression Regression Specific statistical methods for finding the “line of best fit” for one response (dependent) numerical variable based on one or more explanatory (independent) variables. Specific statistical methods for finding the “line of best fit” for one response (dependent) numerical variable based on one or more explanatory (independent) variables.
18
Curve Fitting vs. Regression Regression Regression Includes using statistical methods to assess the "goodness of fit" of the model. (ex. Correlation Coefficient) Includes using statistical methods to assess the "goodness of fit" of the model. (ex. Correlation Coefficient)
19
Regression: 3 Main Purposes To describe (or model) To describe (or model) To predict ( or estimate) To predict ( or estimate) To control (or administer) To control (or administer)
20
Simple Linear Regression Statistical method for finding Statistical method for finding the “line of best fit” the “line of best fit” for one response (dependent) numerical variable for one response (dependent) numerical variable based on one explanatory (independent) variable. based on one explanatory (independent) variable.
21
Least Squares Regression GOAL - minimize the sum of the square of the errors of the data points. GOAL - minimize the sum of the square of the errors of the data points. This minimizes the Mean Square Error This minimizes the Mean Square Error
22
Example Example Plan an outdoor party. Plan an outdoor party. Estimate number of soft drinks to buy per person, based on how hot the weather is. Estimate number of soft drinks to buy per person, based on how hot the weather is. Use Temperature/Water data and regression. Use Temperature/Water data and regression.
23
Steps to Reaching a Solution Draw a scatterplot of the data. Draw a scatterplot of the data.
24
Steps to Reaching a Solution Draw a scatterplot of the data. Draw a scatterplot of the data. Visually, consider the strength of the linear relationship. Visually, consider the strength of the linear relationship.
25
Steps to Reaching a Solution Draw a scatterplot of the data. Draw a scatterplot of the data. Visually, consider the strength of the linear relationship. Visually, consider the strength of the linear relationship. If the relationship appears relatively strong, find the correlation coefficient as a numerical verification. If the relationship appears relatively strong, find the correlation coefficient as a numerical verification.
26
Steps to Reaching a Solution Draw a scatterplot of the data. Draw a scatterplot of the data. Visually, consider the strength of the linear relationship. Visually, consider the strength of the linear relationship. If the relationship appears relatively strong, find the correlation coefficient as a numerical verification. If the relationship appears relatively strong, find the correlation coefficient as a numerical verification. If the correlation is still relatively strong, then find the simple linear regression line. If the correlation is still relatively strong, then find the simple linear regression line.
27
Interpreting and Visualizing Interpreting the result: Interpreting the result: y = ax + b y = ax + b The value of a is the slope The value of a is the slope The value of b is the y-intercept The value of b is the y-intercept r is the correlation coefficient r is the correlation coefficient r 2 is the coefficient of determination r 2 is the coefficient of determination
28
5. Interpreting and Visualizing Write down the equation of the line in slope intercept form. Write down the equation of the line in slope intercept form. Press Y= and enter the equation under Y1. (Clear all other equations.) Press Y= and enter the equation under Y1. (Clear all other equations.) Press GRAPH and the line will be graphed through the data points. Press GRAPH and the line will be graphed through the data points.
29
Questions ???
30
Interpretation in Context Regression Equation: Regression Equation: y=1.5*x - 96.9 y=1.5*x - 96.9 Water Consumption = 1.5*Temperature - 96.9 1.5*Temperature - 96.9
31
Interpretation in Context Slope = 1.5 (ounces)/(degrees F) Slope = 1.5 (ounces)/(degrees F) for each 1 degree F increase in temperature, you expect an increase of 1.5 ounces of water drank. for each 1 degree F increase in temperature, you expect an increase of 1.5 ounces of water drank.
32
Interpretation in Context y-intercept = -96.9 y-intercept = -96.9 For this example, when the temperature is 0 degrees F, then a person would drink about -97 ounces of water. For this example, when the temperature is 0 degrees F, then a person would drink about -97 ounces of water. That does not make any sense! That does not make any sense! Our model is not applicable for x=0. Our model is not applicable for x=0.
33
Prediction Example Predict the amount of water a person would drink when the temperature is 95 degrees F. Predict the amount of water a person would drink when the temperature is 95 degrees F. Solution: Substitute the value of x=95 (degrees F) into the regression equation and solve for y (water consumption). If x=95, y=1.5*95 - 96.9 = 45.6 ounces. Solution: Substitute the value of x=95 (degrees F) into the regression equation and solve for y (water consumption). If x=95, y=1.5*95 - 96.9 = 45.6 ounces.
34
Simple Linear Regression Model The model for simple linear regression is There are mathematical assumptions behind the concepts that we are covering today. we are covering today.
35
Formulas Prediction Equation:
36
Standard error of the estimate
37
Real Life Applications Cost Estimating for Future Space Flight Vehicles (Multiple Regression)
38
Real Life Applications Estimating Seasonal Sales for Department Stores (Periodic) Estimating Seasonal Sales for Department Stores (Periodic)
39
Real Life Applications Predicting Student Grades Based on Time Spent Studying Predicting Student Grades Based on Time Spent Studying
40
Real Life Applications...... What ideas can you think of? What ideas can you think of? What ideas can you think of that your students will relate to? What ideas can you think of that your students will relate to?
41
Practice Problems Is there any correlation between shoe size and height? Is there any correlation between shoe size and height? Does gender make a difference in this analysis? Does gender make a difference in this analysis?
42
Practice Problems Can the number of points scored in a basketball game be predicted by Can the number of points scored in a basketball game be predicted by The time a player plays in the game? The time a player plays in the game? By the player’s height? By the player’s height? Idea modified from Steven King, Aiken, SC. NCTM presentation 1997.) Idea modified from Steven King, Aiken, SC. NCTM presentation 1997.)
43
Resources Data Analysis and Statistics. Curriculum and Evaluation Standards for School Mathematics. Addenda Series, Grades 9-12. NCTM. 1992. Data Analysis and Statistics. Curriculum and Evaluation Standards for School Mathematics. Addenda Series, Grades 9-12. NCTM. 1992. Data and Story Library. Internet Website. http://lib.stat.cmu.edu/D ASL/ 2001. Data and Story Library. Internet Website. http://lib.stat.cmu.edu/D ASL/ 2001. http://lib.stat.cmu.edu/D ASL/http://lib.stat.cmu.edu/D ASL/
44
Internet Resources Correlation Correlation Guessing Correlations - An interactive site that allows you to try to match correlation coefficients to scatterplots. University of Illinois, Urbanna Champaign Statistics Program. http://www.stat.uiuc.edu/~stat100/j ava/guess/GCApplet.html Guessing Correlations - An interactive site that allows you to try to match correlation coefficients to scatterplots. University of Illinois, Urbanna Champaign Statistics Program. http://www.stat.uiuc.edu/~stat100/j ava/guess/GCApplet.html http://www.stat.uiuc.edu/~stat100/j ava/guess/GCApplet.html http://www.stat.uiuc.edu/~stat100/j ava/guess/GCApplet.html
45
Internet Resources Regression Regression Effects of adding an Outlier. Effects of adding an Outlier. W. West, University of South Carolina. W. West, University of South Carolina. http://www.stat.sc.edu/~west/ javahtml/Regression.html http://www.stat.sc.edu/~west/ javahtml/Regression.html http://www.stat.sc.edu/~west/ javahtml/Regression.html http://www.stat.sc.edu/~west/ javahtml/Regression.html
46
Internet Resources Regression Regression Estimate the Regression Line. Compare the mean square error from different regression lines. Can you find the minimum mean square error? Rice University Virtual Statistics Lab. http://www.ruf.rice.edu/~lane/stat_si m/reg_by_eye/index.html Estimate the Regression Line. Compare the mean square error from different regression lines. Can you find the minimum mean square error? Rice University Virtual Statistics Lab. http://www.ruf.rice.edu/~lane/stat_si m/reg_by_eye/index.html http://www.ruf.rice.edu/~lane/stat_si m/reg_by_eye/index.html http://www.ruf.rice.edu/~lane/stat_si m/reg_by_eye/index.html
47
Internet Resources: Data Sets Data and Story Library. Data and Story Library. Excellent source for small data sets. Search for specific statistical methods (e.g. boxplots, regression) or for data concerning a specific field of interest (e.g. health, environment, sports). http://lib.stat.cmu.edu/DASL/ Excellent source for small data sets. Search for specific statistical methods (e.g. boxplots, regression) or for data concerning a specific field of interest (e.g. health, environment, sports). http://lib.stat.cmu.edu/DASL/ http://lib.stat.cmu.edu/DASL/
48
Internet Resources Other Other Statistics Applets. Using Web Applets to Assist in Statistics Instruction. Robin Lock, St. Lawrence University. http://it.stlawu.edu/~rlock/maa99/ Statistics Applets. Using Web Applets to Assist in Statistics Instruction. Robin Lock, St. Lawrence University. http://it.stlawu.edu/~rlock/maa99/ http://it.stlawu.edu/~rlock/maa99/
49
Internet Resources Other Other Ten Websites Every Statistics Instructor Should Bookmark. Robin Lock, St. Lawrence University. http://it.stlawu.edu/~rlock/10site s.html Ten Websites Every Statistics Instructor Should Bookmark. Robin Lock, St. Lawrence University. http://it.stlawu.edu/~rlock/10site s.html http://it.stlawu.edu/~rlock/10site s.html http://it.stlawu.edu/~rlock/10site s.html
50
For More Information… On-line version of this presentation http://www.mtsu.edu/~stats /corregpres/index.html More information about regression Visit STATS @ MTSU web site http://www.mtsu.edu/~stats
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.