Download presentation
Presentation is loading. Please wait.
Published byRichard Preston Modified over 8 years ago
1
CHAPTER 4, REGRESSION ANALYSIS… EXPLORING ASSOCIATIONS BETWEEN VARIABLES
2
RELATIONSHIPS BETWEEN... Talk to the person next to you. Think of two things that you believe may be related. For example, height and weight are generally related... generally, the taller the person, generally, the more they weigh. Or, the age of your car and its value... generally, the older a car, the less it is worth. Share out two numerical categories that you believe are related on the side board.
4
DO YOU BELIEVE THERE IS A RELATIONSHIP BETWEEN... TIME SPENT STUDYING AND GPA? # OF CIGARETTES SMOKED DAILY & LIFE EXPECTANCY SALARY AND EDUCATION LEVEL? AGE AND HEIGHT?
5
RELATIONSHIPS When we consider data that comes in pairs or two’s or has two variables, the data is referred to as bivariate data. Much of the bivariate data we will examine is numeric. There may or may not exist a relationship/an association between the 2 variables. Does one variable influence the other? Or vice versa? Or do the two variables just ‘go together’ by chance? Or is the relationship influenced by another variable(s) that we are unaware of? Does one variable ‘cause’ the other? Caution! Discuss what you wrote on side board.
6
BIVARIATE DATA Proceed similarly as univariate distributions … (review... which graphical models do we typically use with univariate numerical data?) Still graph (use visual model(s) to describe data; scatter plot; LSRL; Least Squares Regression Line) Still look at overall patterns and deviations from those patterns (DOFS; Direction, Outlier(s), Form, Strength); review how did we look for patterns in univariate numeric data; what did we use? Still analyze numerical summary (descriptive statistics)
7
BIVARIATE DISTRIBUTIONS Explanatory variable, x, ‘factor,’ may help predict or explain changes in response variable; usually on horizontal axis Response variable, y, measures an outcome of a study, usually on vertical axis
8
BIVARIATE DATA DISTRIBUTIONS For example... Alcohol (explanatory) and body temperature (response). Generally, the more alcohol consumed, the higher the body temperature. Still use caution with ‘cause.’ Sometimes we don’t have variables that are clearly explanatory and response. Sometimes there could be two ‘explanatory’ variables, such as ACT scores and SAT scores, or activity level and physical fitness. Discuss with a partner for 1 minute; come up with another (or look at board) situation where we have two variables that are related, but neither are clearly explanatory nor response.
9
GRAPHICAL MODELS… Many graphing models display uni-variate data exclusively (review). Discuss for 30 seconds and share out. Main graphical representation used to display bivariate data (two quantitative variables) is scatterplot.
10
SCATTERPLOTS Scatterplots show relationship between two quantitative variables measured on the same individuals or objects. Each individual/object in data appears as a point (x, y) on the scatterplot. Plot explanatory variable (if there is one) on horizontal axis. If no distinction between explanatory and response, either can be plotted on horizontal axis. Label both axes. Scale both axes with uniform intervals (but scales don’t have to match)
11
LABEL & SCALE SCATTERPLOT VARIABLES: CLEARLY EXPLANATORY AND RESPONSE??
12
CREATING & INTERPRETING SCATTERPLOTS Let’s collect some data; work in pairs; do rock- paper-scissor to choose which person will be measured; on the board write your height (to nearest ½ inch) and your hand span (to nearest 16 th of an inch) Input into Stat Crunch & create scatter plot; which is our explanatory and which is our response variable? Let’s do some predicting... to the best of our ability...
13
INTERPRETING SCATTERPLOTS Look for overall patterns (DOFS) including: direction: up or down, + or – association? outliers/deviations: individual value(s) falls outside overall pattern; no outlier rule for bi-variate data – unlike uni-variate data form: linear? curved? clusters? gaps? strength: how closely do the points follow a clear form? Strong, weak, moderate?
14
MEASURING LINEAR ASSOCIATION Scatterplots (bi-variate data) show direction, outliers/ deviation(s), form, strength of relationship between two quantitative variables Linear relationships are important; common, simple pattern; linear relationships are our focus in this course Linear relationship is strong if points are close to a straight line; weak if scattered about Other relationships (quadratic, logarithmic, etc.)
17
LET’S GO BACK TO HEIGHT AND HEAD CIRCUMFERENCE... With a partner, look at scatter plot and analyze through DOFS (direction, outlier(s), form, strength) Three minutes... If someone has a height 59”, what would (generally) you expect their hand span to be? If someone has a height of 71”, what would (generally) you expect their hand span to be?
18
CREATING & INTERPRETING SCATTERPLOTS Go to my website, download the COC Math 140 Survey Data Fall 2015 OR Spring 2016. Copy & paste columns (‘Height’ And ‘Weight’) Is data messy? Does it need to be ‘fixed?’... Hint, scan for ordered pairs (this is bivariate data); each and every point must be an ordered pair. Graph it; do we need to evaluate any points (any possible inaccuracies?)
19
CREATING & INTERPRETING SCATTERPLOTS ‘Height’ & ‘Weight’ Create a scatter plot of the data. Analyze (DOFS) Let’s do some predictions... It is difficult to do predictions sometimes? We will get back to this with a ‘better’ model...
20
HOW STRONG ARE THESE RELATIONSHIPS? WHICH ONE IS STRONGER?
21
MEASURING LINEAR ASSOCIATION: CORRELATION OR “R” Sometimes our eyes are not a good judge Need to specify just how strong or weak a linear relationship is with bivariate data Need a numeric measure Correlation or ‘r’
22
MEASURING LINEAR ASSOCIATION: CORRELATION OR “R” * Correlation (r) is a numeric measure of direction and strength of a linear relationship between two quantitative variables Correlation (r) is always between -1 and 1 Correlation (r) is not resistant (look at formula; based on mean) r doesn’t tell us about individual data points, but rather trends in the data * Never calculate by formula; use Stat Crunch (dependent on having raw data)
23
CALCULATING CORRELATION “R”
24
MEASURING LINEAR ASSOCIATION: CORRELATION OR “R” r ≈0 not strong linear relationship r close to 1 strong positive linear relationship r close to -1 strong negative linear relationship Go back to our height/hand span data & calculate ‘r,’ correlation; then practice calculating ‘r’ with our height and weight data (in Stat Crunch, stats, summary stats, correlation)
26
GUESS THE CORRELATION WWW.ROSSMANCHANCE.COM/APPLETS (ALSO STAT CRUNCH) WWW.ROSSMANCHANCE.COM/APPLETS ‘March Madness’ bracket-style Guess the Correlation tournament Playing cards; match up head-to-head competition/rounds Look at a scatterplot, make your guess Student who is closest survives until the next round
27
CORRELATION & REGRESSION APPLET PARTNER ACTIVITY Go to www.whfreeman.com/tps5ewww.whfreeman.com/tps5e Go to applets Go to Correlation & Regression Now download (from my website) COC Math 140 Chapter 4 Correlation Partner Activity & follow the directions. Partner up with someone you have not partnered with yet; this should take no more than 15-20 minutes, including the write-up; print out & turn in with both your names on it.
28
CAUTION… INTERPRETING CORRELATION Note: be careful when addressing form in scatterplots Strong positive linear relationship ► correlation ≈ 1 But Correlation ≈ 1 does not necessarily mean relationship is linear; always plot data!
29
R ≈ 0.816 FOR EACH OF THESE
30
FACTS ABOUT CORRELATION Correlation doesn’t care which variables is considered explanatory and which is considered response; can switch x & y; still same correlation (r) value Try with height & hand span; try with height & weight data from Math 140 Fall 2015 data CAUTION! Switching x & y WILL change your scatterplot; try with our data sets!… just won’t change ‘r’
31
FACTS ABOUT CORRELATION r is in standard units, so r doesn’t change if units are changed If we change from yards to feet, or years to months, or gallons to liters... r is not effected + r, positive association - r, negative association
32
FACTS ABOUT CORRELATION Correlation is always between -1 & 1 Makes no sense for r = 13 or r = -5 r = 0 means very weak linear relationship r = 1 or -1 means strong linear association
34
FACTS ABOUT CORRELATION Both variables must be quantitative, numerical. Doesn’t make any sense to discuss r for qualitative or categorical data Correlation is not resistant (like mean and SD). Be careful using r when outliers are present (think of the formula, think of our partner activity)
35
FACTS ABOUT CORRELATION r isn’t enough! … if we just consider r, it could be misleading; we must also consider the distribution’s mean, standard deviation, graphical representation, etc. Correlation does not imply causation; i.e., # ice cream sales in a given week and # of pool accidents
36
ABSURD EXAMPLES… CORRELATION DOES NOT IMPLY CAUSATION… Did you know that eating chocolate makes winning a Nobel Prize more likely? The correlation between per capita chocolate consumption and the number of Nobel laureates per 10 million people for 23 selected countries is r = 0.791 Did you know that statistics is causing global warming? As the number of statistics courses offered has grown over the years, so has the average global temperature!
37
LEAST SQUARES REGRESSION Last section… scatterplots of two quantitative variables r measures strength and direction of linear relationship of scatterplot
38
WHAT WOULD WE EXPECT THE SODIUM LEVEL TO BE IN A HOT DOG THAT HAS 170 CALORIES?
39
LEAST SQUARES REGRESSION BETTER model to summarize overall pattern by drawing a line on scatterplot Not any line; we want a best-fit line over scatterplot Least Squares Regression Line (LSRL) or Regression Line
40
LEAST-SQUARES REGRESSION LINE
41
LET’S DO SOME PREDICTING BY USING THE LSRL... About how much would a home cost if it were: 2,000 square feet? 2,600 square feet? 1,600 square feet?
42
LET’S DO SOME PREDICTING BY USING THE LSRL... About how large would a home be if it were worth: $450,000? $350,000? $220,000? Also, let’s discuss where the x and y axes start...
43
LEAST SQUARES REGRESSION EQUATION TO PREDICT VALUES
44
LEAST SQUARES REGRESSION EQUATION Typical to be asked to interpret slope & y-intercept of the equation of the LSRL, in context Caution: Interpret the slope of the equation of LSRL as the predicted or average change or expected change in the response variable given a unit change in the explanatory variable NOT change in y for a unit change in x; LSRL is a model; models are not perfect
45
INTERPRET SLOPE & Y- INTERCEPT... Notice the embedded context in the equation of the LSRL
46
LSRL: OUR DATA Go back to our data (height & hand span; then height & weight). Create scatter plot; then put LSRL on our scatter plot; also determine the equation of the LSRL Stat Crunch: stat, regression, simple linear, x variable, y variable, graphs, fitted line plot
47
LSRL: OUR DATA Look at graph of our LSRL for our data Look at our LSRL equation for our data Our line fits scatterplot well (best fit) but not perfectly Make some predictions… do we use our graph or our equation? Which is easier? Which is better? More on this in a minute... Interpret our y-intercept; does it make sense? Interpretation of our slope?
48
ANOTHER EXAMPLE… VALUE OF A TRUCK
49
TRUCK EXAMPLE…
50
AGES & HEIGHTS… Age (years)Height (inches) 018 128 440 542 849
51
LET’S REVIEW FOR A MOMENT… Input data into Stat Crunch Create scatterplot and describe scatterplot (what do we include in a description?) Calculate r (different from slope; why?), equation of LSRL; interpret equation of LSRL in context; does y-intercept make sense? Create a graph of LSRL Based on the graph of the LSRL or the equation of the LSRL (you choose), make a prediction as to the height of a person at age 35.
53
LSRL: OUR DATA Extrapolation: Use of a regression line (or equation of a regression line) for prediction outside the range of values of the explanatory variable, x, used to obtain the line/equation of the line. Such predictions are often not accurate. Friends don’t let friends extrapolate!
54
CALCULATING THE EQUATION OF THE LSRL: WHAT IF WE DON’T HAVE THE RAW DATA?
55
CALCULATING THE EQUATION FOR THE LSRL: WHAT IF WE DON’T HAVE THE RAW DATA?
56
EXAMPLE: CREATING EQUATION OF LSRL (WITHOUT RAW DATA)
61
DETOUR… MEMORY MONDAY (OR WAY-BACK WEDNESDAY)… What is r? What is r’s range? r tells us how linear (and direction) scatterplot is. ‘r’ ranges from -1 to 1. ‘r’ describes the scatterplot only (not LSRL) Why do we want/need ‘r’?
62
NOW…
63
NOW... We need a numerical measurement that tell us how well the LSRL fits/accurately describes the scatter plot points, the data. Coefficient of Determination, or r 2
64
COEFFICIENT OF DETERMINATION … Do all the points on the scatterplot fall exactly on the LSRL? Sometimes too high and sometimes too low Is LSRL a good model to use for a particular data set? How well does our model fit our data?
65
COEFFICIENT OF DETERMINATION OR R 2 “R-sq” software (Stat Crunch) output Always 0 ≤ r 2 ≤ 1 Never calculate by hand; always use Stat Crunch No need to memorize formula; trust me... It’s ugly!
67
COEFFICIENT OF DETERMINATION OR R 2 Interpretation of r 2 : We say, “x% of the variation in (y variable) is explained by the least squares regression line relating (y variable) to (x variable) Let’s practice calculating r 2 and interpreting it for head circumference and hand span data; and height and weight data. Stat, regression, simple linear,... Remember this describes the LSRL not scatter plot
68
GENERAL FACTS TO REMEMBER ABOUT BIVARIATE DATA Distinction between explanatory and response variables. If switched, scatterplot changes and LSRL changes (but what doesn’t change?) LSRL minimizes distances from data points to line only vertically
69
GENERAL FACTS TO REMEMBER ABOUT BIVARIATE DATA
71
CORRELATION & REGRESSION WISDOM Which of the following scatterplots has the highest correlation?
72
CORRELATION & REGRESSION WISDOM All r = 0.816; all have same exact LSRL equation Lesson: Always graph your data! … because correlation and regression describe only linear relationships
73
CORRELATION & REGRESSION WISDOM Correlation and regression describe only linear relationships
74
CORRELATION & REGRESSION WISDOM Correlation is not causation! Association does not imply causation… want a Nobel Prize? Eat some chocolate! How about Methodist ministers & rum imports? YearNumber of Methodist Ministers in New England Cuban Rum Imported to Boston (in # of barrels) 1860638,376 1865486,506 1870537,005 1875648,486 18908511,265 19008010,547 191514018,559
75
BEWARE OF NONSENSE ASSOCIATIONS… r = 0.9749, but no economic relationship between these variables Strong association is due entirely to the fact that both imports & health spending grew rapidly in these years. Common year is other variable. Any two variables that both increase over time will show a strong association. Doesn’t mean one explains the other or influences the other
76
CORRELATION & REGRESSION WISDOM Correlation is not resistant; always plot data and look for unusual trends. … what if Bill Gates walked into a bar?
77
CORRELATION & REGRESSION WISDOM
78
OUTLIERS & INFLUENTIAL POINTS All influential points are outliers, but not all outliers are influential points. Outliers: observations lie outside overall pattern
79
OUTLIERS & INFLUENTIAL POINTS Influential points/observations: If removed would significantly change LSRL (slope and/or y-intercept)
80
INPUT FOLLOWING DATA... Calculate the equation of the LSRL # Hours Spent Studying for the Stats Test Percentage Earned on the Stats Test 389 492 4.594 185 1.586 183
81
INPUT FOLLOWING DATA... Now, calculate the equation of the LSRL again with this additional piece of data... What do you observe about the scatter plot and the equation of the LSRL? # Hours Spent Studying for the Stats Test Percentage Earned on the Stats Test 389 492 4.594 185 1.586 183 770
82
INPUT FOLLOWING DATA... Now, calculate the equation of the LSRL again with this (slightly different) set of data... What do you observe about the scatter plot and the equation of the LSRL? # Hours Spent Studying for the Stats Test Percentage Earned on the Stats Test 389 492 4.594 185 1.586 183 1.592
83
CLASS ACTIVITY… 1.Groups of 2; go to COC Math 140 Fall 2015 data OR Spring 2016 data; choose 2 numerical categories that you believe are associated. Be sure to go through your data and ‘clean’ it up; justify any ‘cleaning’ you do. 2.Create scatterplot and describe the association between the two variables using DOFS. Calculate the correlation of the scatter plot (r). 3.Do you think that a regression line appropriate for our data? Why or why not? 4.Even if you believe a line is not appropriate for your data, go ahead and create LSRL graph & calculate equation of the LSRL; calculate the coefficient of determination (r 2 ) & interpret r 2. 5.Interpret the slope and the y-intercept of the LSRL in context. Continue on next slide for more questions....
84
CLASS ACTIVITY… 6.Make a prediction (you can use your LSRL graph or your equation of the LSRL; your choice). 7.If there is/are outliers and/or influential point(s) on your scatter plot, circle it/them in red and label it/them appropriately as ‘outlier’ and/or ‘influential point.’ 6.Print everything up, put each group member name on it, turn it in.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.