CHAPTER 4, REGRESSION ANALYSIS… EXPLORING ASSOCIATIONS BETWEEN VARIABLES.

Slides:



Advertisements
Similar presentations
Scatter Diagrams and Linear Correlation
Advertisements

AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
The Practice of Statistics
Chapter 3: Examining Relationships
CHAPTER 3 Describing Relationships
Describing Relationships: Scatterplots and Correlation
CHAPTER 3 Describing Relationships
Ch 2 and 9.1 Relationships Between 2 Variables
Chapter 7 Scatterplots, Association, Correlation Scatterplots and correlation Fitting a straight line to bivariate data © 2006 W. H. Freeman.
Describing relationships …
Relationships Scatterplots and correlation BPS chapter 4 © 2006 W.H. Freeman and Company.
LECTURE UNIT 7 Understanding Relationships Among Variables Scatterplots and correlation Fitting a straight line to bivariate data.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Chapter 3 Section 3.1 Examining Relationships. Continue to ask the preliminary questions familiar from Chapter 1 and 2 What individuals do the data describe?
Objectives (IPS Chapter 2.1)
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.1 Scatterplots.
Chapters 8 & 9 Linear Regression & Regression Wisdom.
DESCRIBING RELATIONSHIPS …. RELATIONSHIPS BETWEEN... Talk to the person next to you. Think of two things that you believe may be related. For example,
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.1 Scatterplots.
Objectives 2.1Scatterplots  Scatterplots  Explanatory and response variables  Interpreting scatterplots  Outliers Adapted from authors’ slides © 2012.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
Chapter 3-Examining Relationships Scatterplots and Correlation Least-squares Regression.
Chapter 4 Scatterplots and Correlation. Chapter outline Explanatory and response variables Displaying relationships: Scatterplots Interpreting scatterplots.
Chapter 2 Examining Relationships.  Response variable measures outcome of a study (dependent variable)  Explanatory variable explains or influences.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Unit 3 Correlation. Homework Assignment For the A: 1, 5, 7,11, 13, , 21, , 35, 37, 39, 41, 43, 45, 47 – 51, 55, 58, 59, 61, 63, 65, 69,
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Module 11 Math 075. Module 11 Math 075 Bivariate Data Proceed similarly as univariate distributions … What is univariate data? Which graphical models.
Module 5: examining relationships...
Chapter 4 Correlation.
Chapter 3: Describing Relationships
Module 12 Math 075.
Chapter 2 Looking at Data— Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Summarizing Bivariate Data
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Presentation transcript:

CHAPTER 4, REGRESSION ANALYSIS… EXPLORING ASSOCIATIONS BETWEEN VARIABLES

RELATIONSHIPS BETWEEN... Talk to the person next to you. Think of two things that you believe may be related. For example, height and weight are generally related... generally, the taller the person, generally, the more they weigh. Or, the age of your car and its value... generally, the older a car, the less it is worth. Share out two numerical categories that you believe are related on the side board.

DO YOU BELIEVE THERE IS A RELATIONSHIP BETWEEN... TIME SPENT STUDYING AND GPA? # OF CIGARETTES SMOKED DAILY & LIFE EXPECTANCY SALARY AND EDUCATION LEVEL? AGE AND HEIGHT?

RELATIONSHIPS When we consider data that comes in pairs or two’s or has two variables, the data is referred to as bivariate data. Much of the bivariate data we will examine is numeric. There may or may not exist a relationship/an association between the 2 variables. Does one variable influence the other? Or vice versa? Or do the two variables just ‘go together’ by chance? Or is the relationship influenced by another variable(s) that we are unaware of? Does one variable ‘cause’ the other? Caution! Discuss what you wrote on side board.

BIVARIATE DATA Proceed similarly as univariate distributions … (review... which graphical models do we typically use with univariate numerical data?) Still graph (use visual model(s) to describe data; scatter plot; LSRL; Least Squares Regression Line) Still look at overall patterns and deviations from those patterns (DOFS; Direction, Outlier(s), Form, Strength); review how did we look for patterns in univariate numeric data; what did we use? Still analyze numerical summary (descriptive statistics)

BIVARIATE DISTRIBUTIONS Explanatory variable, x, ‘factor,’ may help predict or explain changes in response variable; usually on horizontal axis Response variable, y, measures an outcome of a study, usually on vertical axis

BIVARIATE DATA DISTRIBUTIONS For example... Alcohol (explanatory) and body temperature (response). Generally, the more alcohol consumed, the higher the body temperature. Still use caution with ‘cause.’ Sometimes we don’t have variables that are clearly explanatory and response. Sometimes there could be two ‘explanatory’ variables, such as ACT scores and SAT scores, or activity level and physical fitness. Discuss with a partner for 1 minute; come up with another (or look at board) situation where we have two variables that are related, but neither are clearly explanatory nor response.

GRAPHICAL MODELS… Many graphing models display uni-variate data exclusively (review). Discuss for 30 seconds and share out. Main graphical representation used to display bivariate data (two quantitative variables) is scatterplot.

SCATTERPLOTS Scatterplots show relationship between two quantitative variables measured on the same individuals or objects. Each individual/object in data appears as a point (x, y) on the scatterplot. Plot explanatory variable (if there is one) on horizontal axis. If no distinction between explanatory and response, either can be plotted on horizontal axis. Label both axes. Scale both axes with uniform intervals (but scales don’t have to match)

LABEL & SCALE SCATTERPLOT VARIABLES: CLEARLY EXPLANATORY AND RESPONSE??

CREATING & INTERPRETING SCATTERPLOTS Let’s collect some data; work in pairs; do rock- paper-scissor to choose which person will be measured; on the board write your height (to nearest ½ inch) and your hand span (to nearest 16 th of an inch) Input into Stat Crunch & create scatter plot; which is our explanatory and which is our response variable? Let’s do some predicting... to the best of our ability...

INTERPRETING SCATTERPLOTS Look for overall patterns (DOFS) including: direction: up or down, + or – association? outliers/deviations: individual value(s) falls outside overall pattern; no outlier rule for bi-variate data – unlike uni-variate data form: linear? curved? clusters? gaps? strength: how closely do the points follow a clear form? Strong, weak, moderate?

MEASURING LINEAR ASSOCIATION Scatterplots (bi-variate data) show direction, outliers/ deviation(s), form, strength of relationship between two quantitative variables Linear relationships are important; common, simple pattern; linear relationships are our focus in this course Linear relationship is strong if points are close to a straight line; weak if scattered about Other relationships (quadratic, logarithmic, etc.)

LET’S GO BACK TO HEIGHT AND HEAD CIRCUMFERENCE... With a partner, look at scatter plot and analyze through DOFS (direction, outlier(s), form, strength) Three minutes... If someone has a height 59”, what would (generally) you expect their hand span to be? If someone has a height of 71”, what would (generally) you expect their hand span to be?

CREATING & INTERPRETING SCATTERPLOTS Go to my website, download the COC Math 140 Survey Data Fall 2015 OR Spring Copy & paste columns (‘Height’ And ‘Weight’) Is data messy? Does it need to be ‘fixed?’... Hint, scan for ordered pairs (this is bivariate data); each and every point must be an ordered pair. Graph it; do we need to evaluate any points (any possible inaccuracies?)

CREATING & INTERPRETING SCATTERPLOTS ‘Height’ & ‘Weight’ Create a scatter plot of the data. Analyze (DOFS) Let’s do some predictions... It is difficult to do predictions sometimes? We will get back to this with a ‘better’ model...

HOW STRONG ARE THESE RELATIONSHIPS? WHICH ONE IS STRONGER?

MEASURING LINEAR ASSOCIATION: CORRELATION OR “R” Sometimes our eyes are not a good judge Need to specify just how strong or weak a linear relationship is with bivariate data Need a numeric measure Correlation or ‘r’

MEASURING LINEAR ASSOCIATION: CORRELATION OR “R” * Correlation (r) is a numeric measure of direction and strength of a linear relationship between two quantitative variables Correlation (r) is always between -1 and 1 Correlation (r) is not resistant (look at formula; based on mean) r doesn’t tell us about individual data points, but rather trends in the data * Never calculate by formula; use Stat Crunch (dependent on having raw data)

CALCULATING CORRELATION “R”

MEASURING LINEAR ASSOCIATION: CORRELATION OR “R” r ≈0  not strong linear relationship r close to 1  strong positive linear relationship r close to -1  strong negative linear relationship Go back to our height/hand span data & calculate ‘r,’ correlation; then practice calculating ‘r’ with our height and weight data (in Stat Crunch, stats, summary stats, correlation)

GUESS THE CORRELATION (ALSO STAT CRUNCH) ‘March Madness’ bracket-style Guess the Correlation tournament Playing cards; match up head-to-head competition/rounds Look at a scatterplot, make your guess Student who is closest survives until the next round

CORRELATION & REGRESSION APPLET PARTNER ACTIVITY Go to Go to applets Go to Correlation & Regression Now download (from my website) COC Math 140 Chapter 4 Correlation Partner Activity & follow the directions. Partner up with someone you have not partnered with yet; this should take no more than minutes, including the write-up; print out & turn in with both your names on it.

CAUTION… INTERPRETING CORRELATION Note: be careful when addressing form in scatterplots Strong positive linear relationship ► correlation ≈ 1 But Correlation ≈ 1 does not necessarily mean relationship is linear; always plot data!

R ≈ FOR EACH OF THESE

FACTS ABOUT CORRELATION Correlation doesn’t care which variables is considered explanatory and which is considered response; can switch x & y; still same correlation (r) value Try with height & hand span; try with height & weight data from Math 140 Fall 2015 data CAUTION! Switching x & y WILL change your scatterplot; try with our data sets!… just won’t change ‘r’

FACTS ABOUT CORRELATION r is in standard units, so r doesn’t change if units are changed If we change from yards to feet, or years to months, or gallons to liters... r is not effected + r, positive association - r, negative association

FACTS ABOUT CORRELATION Correlation is always between -1 & 1 Makes no sense for r = 13 or r = -5 r = 0 means very weak linear relationship r = 1 or -1 means strong linear association

FACTS ABOUT CORRELATION Both variables must be quantitative, numerical. Doesn’t make any sense to discuss r for qualitative or categorical data Correlation is not resistant (like mean and SD). Be careful using r when outliers are present (think of the formula, think of our partner activity)

FACTS ABOUT CORRELATION r isn’t enough! … if we just consider r, it could be misleading; we must also consider the distribution’s mean, standard deviation, graphical representation, etc. Correlation does not imply causation; i.e., # ice cream sales in a given week and # of pool accidents

ABSURD EXAMPLES… CORRELATION DOES NOT IMPLY CAUSATION… Did you know that eating chocolate makes winning a Nobel Prize more likely? The correlation between per capita chocolate consumption and the number of Nobel laureates per 10 million people for 23 selected countries is r = Did you know that statistics is causing global warming? As the number of statistics courses offered has grown over the years, so has the average global temperature!

LEAST SQUARES REGRESSION Last section… scatterplots of two quantitative variables r measures strength and direction of linear relationship of scatterplot

WHAT WOULD WE EXPECT THE SODIUM LEVEL TO BE IN A HOT DOG THAT HAS 170 CALORIES?

LEAST SQUARES REGRESSION BETTER model to summarize overall pattern by drawing a line on scatterplot Not any line; we want a best-fit line over scatterplot Least Squares Regression Line (LSRL) or Regression Line

LEAST-SQUARES REGRESSION LINE

LET’S DO SOME PREDICTING BY USING THE LSRL... About how much would a home cost if it were: 2,000 square feet? 2,600 square feet? 1,600 square feet?

LET’S DO SOME PREDICTING BY USING THE LSRL... About how large would a home be if it were worth: $450,000? $350,000? $220,000? Also, let’s discuss where the x and y axes start...

LEAST SQUARES REGRESSION EQUATION TO PREDICT VALUES

LEAST SQUARES REGRESSION EQUATION Typical to be asked to interpret slope & y-intercept of the equation of the LSRL, in context Caution: Interpret the slope of the equation of LSRL as the predicted or average change or expected change in the response variable given a unit change in the explanatory variable NOT change in y for a unit change in x; LSRL is a model; models are not perfect

INTERPRET SLOPE & Y- INTERCEPT... Notice the embedded context in the equation of the LSRL

LSRL: OUR DATA Go back to our data (height & hand span; then height & weight). Create scatter plot; then put LSRL on our scatter plot; also determine the equation of the LSRL Stat Crunch: stat, regression, simple linear, x variable, y variable, graphs, fitted line plot

LSRL: OUR DATA Look at graph of our LSRL for our data Look at our LSRL equation for our data Our line fits scatterplot well (best fit) but not perfectly Make some predictions… do we use our graph or our equation? Which is easier? Which is better? More on this in a minute... Interpret our y-intercept; does it make sense? Interpretation of our slope?

ANOTHER EXAMPLE… VALUE OF A TRUCK

TRUCK EXAMPLE…

AGES & HEIGHTS… Age (years)Height (inches)

LET’S REVIEW FOR A MOMENT… Input data into Stat Crunch Create scatterplot and describe scatterplot (what do we include in a description?) Calculate r (different from slope; why?), equation of LSRL; interpret equation of LSRL in context; does y-intercept make sense? Create a graph of LSRL Based on the graph of the LSRL or the equation of the LSRL (you choose), make a prediction as to the height of a person at age 35.

LSRL: OUR DATA Extrapolation: Use of a regression line (or equation of a regression line) for prediction outside the range of values of the explanatory variable, x, used to obtain the line/equation of the line. Such predictions are often not accurate. Friends don’t let friends extrapolate!

CALCULATING THE EQUATION OF THE LSRL: WHAT IF WE DON’T HAVE THE RAW DATA?

CALCULATING THE EQUATION FOR THE LSRL: WHAT IF WE DON’T HAVE THE RAW DATA?

EXAMPLE: CREATING EQUATION OF LSRL (WITHOUT RAW DATA)

DETOUR… MEMORY MONDAY (OR WAY-BACK WEDNESDAY)… What is r? What is r’s range? r tells us how linear (and direction) scatterplot is. ‘r’ ranges from -1 to 1. ‘r’ describes the scatterplot only (not LSRL) Why do we want/need ‘r’?

NOW…

NOW... We need a numerical measurement that tell us how well the LSRL fits/accurately describes the scatter plot points, the data. Coefficient of Determination, or r 2

COEFFICIENT OF DETERMINATION … Do all the points on the scatterplot fall exactly on the LSRL? Sometimes too high and sometimes too low Is LSRL a good model to use for a particular data set? How well does our model fit our data?

COEFFICIENT OF DETERMINATION OR R 2 “R-sq” software (Stat Crunch) output Always 0 ≤ r 2 ≤ 1 Never calculate by hand; always use Stat Crunch No need to memorize formula; trust me... It’s ugly!

COEFFICIENT OF DETERMINATION OR R 2 Interpretation of r 2 : We say, “x% of the variation in (y variable) is explained by the least squares regression line relating (y variable) to (x variable) Let’s practice calculating r 2 and interpreting it for head circumference and hand span data; and height and weight data. Stat, regression, simple linear,... Remember this describes the LSRL not scatter plot

GENERAL FACTS TO REMEMBER ABOUT BIVARIATE DATA Distinction between explanatory and response variables. If switched, scatterplot changes and LSRL changes (but what doesn’t change?) LSRL minimizes distances from data points to line only vertically

GENERAL FACTS TO REMEMBER ABOUT BIVARIATE DATA

CORRELATION & REGRESSION WISDOM Which of the following scatterplots has the highest correlation?

CORRELATION & REGRESSION WISDOM All r = 0.816; all have same exact LSRL equation Lesson: Always graph your data! … because correlation and regression describe only linear relationships

CORRELATION & REGRESSION WISDOM Correlation and regression describe only linear relationships

CORRELATION & REGRESSION WISDOM Correlation is not causation! Association does not imply causation… want a Nobel Prize? Eat some chocolate! How about Methodist ministers & rum imports? YearNumber of Methodist Ministers in New England Cuban Rum Imported to Boston (in # of barrels) , , , , , , ,559

BEWARE OF NONSENSE ASSOCIATIONS… r = , but no economic relationship between these variables Strong association is due entirely to the fact that both imports & health spending grew rapidly in these years. Common year is other variable. Any two variables that both increase over time will show a strong association. Doesn’t mean one explains the other or influences the other

CORRELATION & REGRESSION WISDOM Correlation is not resistant; always plot data and look for unusual trends. … what if Bill Gates walked into a bar?

CORRELATION & REGRESSION WISDOM

OUTLIERS & INFLUENTIAL POINTS All influential points are outliers, but not all outliers are influential points. Outliers: observations lie outside overall pattern

OUTLIERS & INFLUENTIAL POINTS Influential points/observations: If removed would significantly change LSRL (slope and/or y-intercept)

INPUT FOLLOWING DATA... Calculate the equation of the LSRL # Hours Spent Studying for the Stats Test Percentage Earned on the Stats Test

INPUT FOLLOWING DATA... Now, calculate the equation of the LSRL again with this additional piece of data... What do you observe about the scatter plot and the equation of the LSRL? # Hours Spent Studying for the Stats Test Percentage Earned on the Stats Test

INPUT FOLLOWING DATA... Now, calculate the equation of the LSRL again with this (slightly different) set of data... What do you observe about the scatter plot and the equation of the LSRL? # Hours Spent Studying for the Stats Test Percentage Earned on the Stats Test

CLASS ACTIVITY… 1.Groups of 2; go to COC Math 140 Fall 2015 data OR Spring 2016 data; choose 2 numerical categories that you believe are associated. Be sure to go through your data and ‘clean’ it up; justify any ‘cleaning’ you do. 2.Create scatterplot and describe the association between the two variables using DOFS. Calculate the correlation of the scatter plot (r). 3.Do you think that a regression line appropriate for our data? Why or why not? 4.Even if you believe a line is not appropriate for your data, go ahead and create LSRL graph & calculate equation of the LSRL; calculate the coefficient of determination (r 2 ) & interpret r 2. 5.Interpret the slope and the y-intercept of the LSRL in context. Continue on next slide for more questions....

CLASS ACTIVITY… 6.Make a prediction (you can use your LSRL graph or your equation of the LSRL; your choice). 7.If there is/are outliers and/or influential point(s) on your scatter plot, circle it/them in red and label it/them appropriately as ‘outlier’ and/or ‘influential point.’ 6.Print everything up, put each group member name on it, turn it in.