Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Correlation and Regression

Similar presentations


Presentation on theme: "Introduction to Correlation and Regression"— Presentation transcript:

1 Introduction to Correlation and Regression
Ginger Holmes Rowell, Ph. D. Associate Professor of Mathematics Middle Tennessee State University

2 Outline Introduction Linear Correlation Regression
Simple Linear Regression Using the TI-83 Model/Formulas

3 Outline continued Applications Internet Resources
Real-life Applications Practice Problems Internet Resources Applets Data Sources

4 Correlation Correlation Example (positive correlation)
A measure of association between two numerical variables. Example (positive correlation) Typically, in the summer as the temperature increases people are thirstier. Typically, in the summer as the temperature increases people are thirstier. Consider the two numerical variables, temperature and water consumption. We would expect the higher the temperature, the more water a given person would consume. Thus we would say that in the summer time, temperature and water consumption are positively correlated.

5 Water Consumption (ounces)
Specific Example Temperature (F) Water Consumption (ounces) 75 16 83 20 85  25 85 27 92 32 97 48 99 For seven random summer days, a person recorded the temperature and their water consumption, during a three-hour period spent outside.   (The data is shown in the table with the temperature placed in increasing order.)

6 How would you describe the graph?
This graph helps us visualize what appears to be a somewhat linear relationship between temperature and the amount of water one drinks.

7 How “strong” is the linear relationship?

8 Measuring the Relationship
Pearson’s Sample Correlation Coefficient, r measures the direction and the strength of the linear association between two numerical paired variables.

9 Direction of Association
Positive Correlation Negative Correlation Direction of the Association: The association can be either positive or negative. Positive Correlation: as the x variable increases so does the y variable. Example: In the summer, as the temperature increases, so does thirst. Negative Correlation: as the x variable increases, the y variable decreases. Example: As the price of an item increases, the number of items sold decreases.

10 Strength of Linear Association
r value Interpretation 1 perfect positive linear relationship no linear relationship -1 perfect negative linear relationship Strength of the Association:  The strength of the linear association is measured by the sample Correlation Coefficient, r.  r can be any value from –1 to +1.    The closer r is to one (in magnitude) the stronger the linear association.   If r equals zero, then there is no linear association between the two variables. 

11 Strength of Linear Association

12 Other Strengths of Association
r value Interpretation 0.9 strong association 0.5 moderate association 0.25 weak association *  No other values of r have precise definitions of strength. See the chart below. Note:  All of the values in the second table are positive. Thus the associations are positive. The same strength interpretations hold for negative values of r, only the direction interpretations of the association would change.

13 Other Strengths of Association

14 Formula = the sum n = number of paired items xi = input variable
      n = number of paired items xi = input variable yi = output variable  x = x-bar = mean of x’s y = y-bar = mean of y’s sx= standard deviation of x’s sy= standard deviation of y’s

15 Regression Regression
Specific statistical methods for finding the “line of best fit” for one response (dependent) numerical variable based on one or more explanatory (independent) variables.

16 Curve Fitting vs. Regression
Includes using statistical methods to assess the "goodness of fit" of the model.  (ex. Correlation Coefficient)

17 Regression: 3 Main Purposes
To describe (or model) To predict (or estimate) To control (or administer) To describe or model a set of data with one dependent variable and one (or more) independent variables. To predict or estimate the values of the dependent variable based on given value(s) of the independent variable(s). To control or administer standards from a useable statistical relationship.

18 Simple Linear Regression
Statistical method for finding the “line of best fit” for one response (dependent) numerical variable based on one explanatory (independent) variable.   Simple: only one independent variable Linear in the Independent Variable:  the independent variable only appears to the first power.

19 Least Squares Regression
GOAL - minimize the sum of the square of the errors of the data points. Regression analysis tries to fit a model to one response variable based on one or more explanatory variables. In most cases, there will be error. See the graph below for an example of Simple Linear Regression. The actual data points (x,y) are the blue dots The Least Squares linear model (regression line) of the response (y) variable based on the the explanatory (x) variable is shown in black.   The errors (residuals) for each data point to its predicted value are the vertical lines shown in red. The goal, in general, is to minimize the errors from the actual data to the regression line. The least squares line minimizes the sum of the square of the errors. This minimizes the Mean Square Error

20 Example Plan an outdoor party.
Estimate number of soft drinks to buy per person, based on how hot the weather is. Use Temperature/Water data and regression. If you were planning an outdoor party in the summer time, you might need to estimate how many soft drinks to buy. In your planning, you will, of course, need to know how many people will be attending. However, as you determine the number of soft drinks for each person, you might want to also consider how hot it will be. The temperatures have been forecasted for around 95 degrees on the day of the outdoor party. (In real life, you would probably make an estimate based on your previous experience and make a logical increase based on the temperature.) We will use some hypothetical data to determine how thirsty people might be. We will use the Temperature/Water example data for this.

21 Steps to Reaching a Solution
Draw a scatterplot of the data.

22 Steps to Reaching a Solution
Draw a scatterplot of the data. Visually, consider the strength of the linear relationship.

23 Steps to Reaching a Solution
Draw a scatterplot of the data. Visually, consider the strength of the linear relationship. If the relationship appears relatively strong, find the correlation coefficient as a numerical verification.

24 Steps to Reaching a Solution
Draw a scatterplot of the data. Visually, consider the strength of the linear relationship. If the relationship appears relatively strong, find the correlation coefficient as a numerical verification. If the correlation is still relatively strong, then find the simple linear regression line.

25 Our Next Steps Learn to Use the TI-83 for Correlation and Regression.
Interpret the Results (in the Context of the Problem).

26 Finding the Solution: TI-84
Using the TI- 83 graphing calculator Turn on the calculator diagnostics. Enter the data. Graph a scatterplot of the data. Find the equation of the regression line and the correlation coefficient. Graph the regression line on a graph with the scatterplot.

27 Preliminary Step Turn the Diagnostics On. Press 2nd 0 (for Catalog).
Scroll down to DiagnosticOn. The marker points to the right of the words. Press ENTER. Press ENTER again. The word Done should appear on the right hand side of the screen.

28 Water Consumption (ounces)
Example Temperature (F) Water Consumption (ounces) 75 16 83 20 85  25 85 27 92 32 97 48 99

29 1. Enter the Data into Lists
Press STAT. Under EDIT, select 1: Edit. Enter x-values (input) into L1 Enter y-values (output) into L2. After data is entered in the lists, go to 2nd MODE to quit and return to the home screen. Note: If you need to clear out a list, for example list 1, place the cursor on L1  then hit CLEAR and ENTER .

30 2. Set up the Scatterplot. Press 2nd Y= (STAT PLOTS).
Select 1: PLOT 1 and hit ENTER. Use the arrow keys to move the cursor down to On and hit ENTER. Arrow down to Type: and select the first graph under Type. Under Xlist: Enter L1. Under Ylist: Enter L2. Under Mark: select any of these.

31 3. View the Scatterplot Press 2nd MODE to quit and return to the home screen. To plot the points, press ZOOM and select 9: ZoomStat. The scatterplot will then be graphed.

32 4. Find the regression line.
Press STAT. Press CALC. Select 4: LinReg(ax + b). Press 2nd 1 (for List 1) Press the comma key, Press 2nd 2 (for List 2) Press ENTER.  

33 5. Interpreting and Visualizing
Interpreting the result: y = ax + b The value of a is the slope The value of b is the y-intercept r is the correlation coefficient r2 is the coefficient of determination

34 5. Interpreting and Visualizing
Write down the equation of the line in slope intercept form. Press Y= and enter the equation under Y1. (Clear all other equations.)  Press GRAPH and the line will be graphed through the data points.

35 Questions ???

36 Interpretation in Context
Regression Equation: y=1.5*x Water Consumption = 1.5*Temperature

37 Interpretation in Context
Slope = 1.5 (ounces)/(degrees F) for each 1 degree F increase in temperature, you expect an increase of 1.5 ounces of water drank.

38 Interpretation in Context
y-intercept = -96.9 For this example, when the temperature is 0 degrees F, then a person would drink about -97 ounces of water. That does not make any sense! Our model is not applicable for x=0.   And we cannot expect it be accurate at such low temperatures, because the sample of data was taken in the summer time and the temperatures range from 75 to 99 degrees Fahrenheit. Thus the model only predicts for temperatures in approximately that range. 

39 Prediction Example Predict the amount of water a person would drink when the temperature is 95 degrees F. Solution: Substitute the value of x=95 (degrees F) into the regression equation and solve for y (water consumption). If x=95, y=1.5* = 45.6 ounces.   Solution: Substitute the value of x=95 (degrees F) into the regression equation y=1.5*x   and solve for y (water consumption). y=1.5*x if x=95, then y=1.5* = 45.6 ounces.   Note: Since 95 degrees F is in the range of values that were used to find the regression equation, then we can use this equation to predict the water consumption. If the desired prediction had been for 50 degrees F, then the model should not be used, since the value of x does not fall in the range of predictability for this model. 

40 Strength of the Association: r2
Coefficient of Determination – r2 General Interpretation: The coefficient of determination tells the percent of the variation in the response variable that is explained (determined) by the model and the explanatory variable.  

41 Interpretation of r2 Example: r2 =92.7%. Interpretation:
Almost 93% of the variability in the amount of water consumed is explained by outside temperature using this model. Note: Therefore 7% of the variation in the amount of water consumed is not explained by this model using temperature. Note: This means that approximately 7% of the variation in the water consumption is not explained by the temperature. So perhaps there is another variable that accounts for the remaining portion of the variation.  Can you think of a reasonable variable? Multiple Regression is the name of the method that would include more than one independent variable to predict amount of water people would drink.

42 Questions ???

43 Simple Linear Regression Model
The model for simple linear regression is There are mathematical assumptions behind the concepts that we are covering today.

44 Formulas Prediction Equation:

45 Real Life Applications
Cost Estimating for Future Space Flight Vehicles (Multiple Regression)

46 Nonlinear Application
Predicting when Solar Maximum Will Occur solar/predict.htm Predicting the Solar MaximumPlanning for satellite orbits and space missions often require advance knowledge of solar activity levels. NASA scientists are using new techniques to predict sunspot maxima years in advance. Click here for the latest predictions for the current solar cycle. Click on the image for current predictions

47 Real Life Applications
Estimating Seasonal Sales for Department Stores (Periodic)

48 Real Life Applications
Predicting Student Grades Based on Time Spent Studying

49 Real Life Applications
. . . What ideas can you think of? What ideas can you think of that your students will relate to?

50 Practice Problems Measure Height vs. Arm Span
Find line of best fit for height. Predict height for one student not in data set. Check predictability of model. Compare the height and arm span of students in the class, except for one student. Test for the strength of association between the two variables. Measure the arm span of the one student and ask the class to predict the student's height.

51 Practice Problems Is there any correlation between shoe size and height? Does gender make a difference in this analysis?

52 Practice Problems Can the number of points scored in a basketball game be predicted by The time a player plays in the game? By the player’s height? Idea modified from Steven King, Aiken, SC. NCTM presentation 1997.) Have students search the Internet or other sources for data on basketball players. In particular, find the number of points scored in a game and the number of minutes played in that same game. Based on the data, could the number of points scored in a game be predicted by the amount of time a player plays in a game? Have students suggest alternative ways to set up the situation that would provide better prediction capabilities and ask them to find the data to check their theories. (Idea modified from Steven King, Aiken, SC. NCTM presentation 1997.)  

53 Resources Data Analysis and Statistics. Curriculum and Evaluation Standards for School Mathematics.  Addenda Series, Grades 9-12.  NCTM Data and Story Library.  Internet Website.    

54 Internet Resources Correlation
Guessing Correlations - An interactive site that allows you to try to match correlation coefficients to scatterplots. University of Illinois, Urbanna Champaign Statistics Program.

55 Internet Resources Regression Effects of adding an Outlier.
W. West, University of South Carolina.

56 Internet Resources Regression
Estimate the Regression Line. Compare the mean square error from different regression lines. Can you find the minimum mean square error? Rice University Virtual Statistics Lab.

57 Internet Resources: Data Sets
Data and Story Library. Excellent source for small data sets. Search for specific statistical methods (e.g. boxplots, regression) or for data concerning a specific field of interest (e.g. health, environment, sports).

58 Internet Resources: Data Sets
FEDSTATS. "The gateway to statistics from over 100 U.S. Federal agencies" "Kid's Pages." (not all related to statistics)

59 Internet Resources Other
Statistics Applets. Using Web Applets to Assist in Statistics Instruction. Robin Lock, St. Lawrence University.

60 Internet Resources Other
Ten Websites Every Statistics Instructor Should Bookmark. Robin Lock, St. Lawrence University.

61 For More Information… On-line version of this presentation
/corregpres/index.html More information about regression Visit MTSU web site


Download ppt "Introduction to Correlation and Regression"

Similar presentations


Ads by Google