Presentation is loading. Please wait.

Presentation is loading. Please wait.

Describing relationships …

Similar presentations


Presentation on theme: "Describing relationships …"— Presentation transcript:

1 Describing relationships …

2 Relationships between ...
Talk to the person next to you. Think of two things that you believe may be related. For example, height and weight are generally related... The taller the person, generally, the more they weigh. Share out.

3 Do you believe there Is there a relationship between...
Time spent studying and GPA? # of cigarettes smoked daily & life expectancy Salary and education level? Age and height? Age of automobile and value of automobile value? Possibly discuss categorical vs. numerical data (spiral)

4 Relationships When we consider (possible) relationships between 2 (numeric) variables, the data is referred to as bi-variate data. There may or may not exist a relationship/an association between the 2 variables. Does one variable ‘cause’ the other? Caution! Does one variable influence the other? Or is the relationship influenced by another variable(s) that we are unaware of?

5 CSI Stats: The case of mrs. Flynn’s missing bear…
Mrs. Flynn’s bear is missing.. The only people with access to Mrs. Flynn’s room are the other GV math teachers. She asks her colleagues if they know where the bear is; no one confesses to the crime. But the next day, Mrs. Flynn catches a break—she finds a clear handprint. The careless culprit has left behind crucial evidence! At this point, Mrs. Flynn calls in the CSI Stats team (your class) to help her identify the prime suspect in “The Case of the Missing Bear.” Could also or instead of do # hours you slept last night and # hours you spent on HW; does there seem to be a relationship?

6 Starting our investigation…
1. Ask a partner to measure your right hand span to the nearest centimeter (cm). Hand span is the maximum distance from the tip of the thumb to the tip of the pinkie finger on a person’s fully stretched- out hand. 2. Write your measured hand span as well as your height (both in cm) on the board with two columns, hand span (cm) and height (cm). Use a blue marker if you are male; use a black marker if you are female Note: to calculate your height in cm, multiply your height in inches by Copy the data table onto your paper. Next, let’s plot our ordered pairs (hand span, height). Draw your axes and label the horizontal axis “Hand span (cm)” and the vertical axis “Height (cm).” Scaling… Give each person a piece of graph paper

7 Investigation continued…
4. Since neither hand span nor height can be close to 0 cm, we want to start our horizontal and vertical scales at larger numbers. Scale the horizontal axis in 0.5-cm increments starting with 15 cm. Scale the vertical axis in 5-cm increments starting with 135 cm. 5. What does the graph tell us about the relationship between hand span and height. Summarize your observations in a sentence or two (on your graph paper). 6. Now Mrs. Flynn will give you a copy of the handprint found at the scene and the math department roster. Which math teacher does your group believe is the “prime suspect”? Justify your answer with appropriate statistical evidence. Let’s see if that teacher really does have the bear.

8 Note… keep this data! We will be using it again soon.

9 Bivariate Data Proceed similarly as uni-variate distributions … Still graph (use model to describe data; scatter plot; LSRL) Still look at overall patterns and deviations from those patterns (DOFS) Still analyze numerical summary (2-var stats) DOFS – direction, outlier(s), form, strength

10 Bivariate Distributions
Explanatory variable, x, ‘factor,’ may help predict or explain changes in response variable; usually on horizontal axis Response variable, y, measures an outcome of a study, usually on vertical axis Like hand span helped us predict the height… helping us find our bear thief

11 Bivariate Data Distributions
For example ... Alcohol (explanatory) and body temperature (response). Generally, the more alcohol consumed, the higher the body temperature. Still use caution with ‘cause.’ Sometimes we don’t have variables that are clearly explanatory and response. Sometimes there could be two ‘explanatory’ variables. Examples: Discuss with a partner for 1 minute

12 Explanatory & Response or Two Explanatory Variables?
ACT Score and SAT Score Activity level and physical fitness SAT Math and SAT Verbal Scores

13 Graphical models… Many graphing models display uni-variate data exclusively (review). Discuss for 30 seconds and share out. Main graphical representation used to display bivariate data (two quantitative variables) is scatterplot.

14 Scatterplots Scatterplots show relationship between two quantitative variables measured on the same individuals Each individual in data appears as a point (x, y) on the scatterplot. Plot explanatory variable (if there is one) on horizontal axis. If no distinction between explanatory and response, either can be plotted on horizontal axis. Label both axes. Scale both axes with uniform intervals (but scales don’t have to match) Like bear thief example

15 Label & Scale Scatterplot Variables: Clearly Explanatory and Response??
Can either be explanatory and response? Yes, in this case. Notice it doesn’t start at 0 on either axis. That’s ok for scatterplots… can be deceptive for bar graphs though.

16 Creating & Interpreting Scatterplots
Let’s go back to our hand span and height data. Input hand span into L1 and height into L2 (careful when inputting) Calculator: dim error

17 Creating & Interpreting Scatterplots
2nd – stat plot make sure only one plot is on select first graph (picture of scatterplot) reference L1 and L2 choose the mark you wish to use zoom – 9 (forces your data to fit your screen) trace: provides (x,y) coordinate for each point on scatterplot

18 Interpreting Scatterplots
Look for overall patterns (DOFS) including: direction: up or down, + or – association? outliers/deviations: individual value(s) falls outside overall pattern; no outlier rule for bi-variate data –unlike uni-variate data form: linear? curved? clusters? gaps? strength: how closely do the points follow a clear form? (“r” describes strength of linear relationships only) DOFS – direction, outlier(s), form, strength

19 Scatterplots: Note Might be asked to graph a scatterplot from data Might need to sketch what’s on calculator Use trace to do so Doesn’t have to be 100% exactly accurate; do your best Scaling, labeling: a must!

20 Adding Categorical Variables to Scatterplots
Input male data into L3 & L4 and female data into L5 & L6 Turn on two stat plots (one with L3 & L4 and the other with L5 & L6) Be sure to choose different marks for each scatterplot (like + for one and . for the other) This embeds categorical data into scatterplot

21 Measuring Linear Association: Correlation or “r”
Scatterplots (bi-variate data) show direction, outliers/ deviation(s), form, strength of relationship between two quantitative variables Linear relationships are important; common, simple pattern Linear relationship is strong if points are close to a straight line; weak if scattered about Other relationships (quadratic, logarithmic, etc.) We need more than ‘strong’ or ‘weak’… leads us into r or correlation

22 How strong are these relationships? Which one is stronger?

23 Measuring Linear Association: Correlation or “r”
Eyes are not a good judge Need to specify just how strong or weak a linear relationship is Need a numeric measure Correlation or ‘r’

24 Measuring Linear Association: Correlation or “r”
* Correlation (r) measures direction and strength of a linear relationship between two quantitative variables Correlation (r) is always between -1 and 1 Correlation (r) is not resistant (look at formula; based on mean) * Never calculate by formula; use calculator (dependent on having raw data)

25 Measuring Linear Association: Correlation or “r”
r ≈0  not strong linear relationship r close to 1  strong positive linear relationship r close to -1  strong negative linear relationship

26 To estimate the correlation from a scatterplot, have student imagine drawing an oval around the points. The rounder the oval, the closer the correlation will be to 0. The longer and skinnier the oval, the closer the correlation will be to + or – 1. Rounder ovals look like 0’s and skinnier ovals look like tilted 1’s.

27 Guess the correlation www.rossmanchance.com/applets
‘March Madness’ bracket-style Guess the Correlation tournament Number off; randomly choose numbers to match up head-to- head competition/rounds Look at a scatterplot, each write down your guess on notecards and reveal at same time Student who is closest survives until the next round Page 151 in TE in textbook; make # cards for choosing

28 Correlation & regression applet
then statistical applets, then Correlation & Regression applet Work with one partner Follow activity directions; press red ‘clear’ button after every scatter plot you create You and your partner are to verbally discuss all questions, but only need to turn in your summaries Should take less than 15 minutes total; book computers in core

29 Caution… interpreting correlation
So, the moral of the story is… be careful when addressing form in scatterplots Strong positive linear relationship ► correlation ≈ 1 But Correlation ≈ 1 does not necessarily mean relationship is linear; always plot data! Y=x^2; correlation = 0.97; these are not bi-conditional (bi-conditionals go either way and are still true)

30 Calculating Correlation “r”
n, x1, x2, etc., 𝒙 , y1, y2, etc., 𝒚 , sx, sy, … Not the way to go; explain formula

31 Calculating Correlation “r”
Always use your calculator; never calculate by hand 2 lists --bivariate data (same # of data in each list; like (x, y)) Be sure diagnostic is on (catalog, d, arrow down, enter) stat – calc- linreg (8) – two lists used for data Now, interpret the value of r relating to our data (remember, r gives us information about direction and strength of scatterplot) Let’s practice using our hand span and height data from our class.

32 Facts about Correlation
Correlation doesn’t care which variables is considered explanatory and which is considered response Can switch x & y L1, L2 or L2, L1 Still same correlation (r) value CAUTION! Switching x & y WILL change your scatterplot… just not ‘r’

33 Facts about Correlation
r is in standard units, so r doesn’t change if units are changed If we change from yards to feet, r is not effected + r, positive association - r, negative association

34 Facts about Correlation
Correlation is always between -1 & 1 Makes no sense for r = 13 or r = -5 r = 0 means weak linear relationship r = 1 or -1 means strong linear association Only makes sense with linear relationships

35

36 Facts about Correlation
Both variables must be quantitative, numerical. Doesn’t make any sense to discuss r for qualitative or categorical data Correlation is not resistant (like mean and SD). Be careful using r when outliers are present

37 Facts about Correlation
r isn’t enough! … mean, standard deviation, graphical representation Correlation does not imply causation; i.e., # students who own cell phones and # students passing AP exams

38 Absurd examples… correlation does not imply causation…
Did you know that eating chocolate makes winning a Nobel Prize more likely? The correlation between per capita chocolate consumption and the number of Nobel laureates per 10 million people for 23 selected countries is r = Did you know that AP Statistics is causing global warming? As the number of AP Statistics exams has grown over the years, so has the average global temperature!

39 Homework… Page 159, # 3, 5, 7, 9, 12, 15, 17, 23, 25. MC: Page 163, #27 – 32; then FRQ as review (page 164 #34) 3-1 Section Quiz: tomorrow

40 Least Squares Regression
Last section… scatterplots of two quantitative variables r measures strength and direction of linear relationship of scatterplot

41 About how much sodium would we expect in a hot dog that has 170 calories?
Hard to estimate from SP; better if we had a line on the SP…. LSRL

42 Least Squares Regression
BETTER to summarize overall pattern by drawing a line on scatterplot Not any line; we want a best-fit line over scatterplot Least Squares Regression Line (LSRL)

43 Least-Squares Regression Line

44 Least Squares Regression (predicts values)
LSRL Model: 𝑦 =𝑎+𝑏𝑥 𝑦 is predicted value of response variable a is y-intercept of LSRL b is slope of LSRL; slope is predicted (expected) rate of change x is explanatory variable

45 Least Squares Regression (predicts values)
Often will be asked to interpret slope of LSRL & y- intercept, in context Caution: Interpret slope of LSRL as the predicted or average change or expected change in the response variable given a unit change in the explanatory variable NOT change in y for a unit change in x; LSRL is a model; models are not perfect

46 LSRL: Our Data Go back to whole-class data on hand-span and height Create scatterplot; & do 2-var stats Now let’s determine LSRL for our data

47 LSRL: Our Data Stat – calc – linreg – L1, L2, vars – yvars – enter – enter LinReg y = a + bx (careful… 𝑦 =𝑎+𝑏𝑥… or variables in context) a = b = 𝑟 2 = r =

48 LSRL: Our Data AND, as a bonus, you get your LSRL graphed for you (due to the vars – yvars – enter – enter) on top of your scatterplot if you wish Go to y = and be sure nothing else is selected Go to stat plot and be sure only your scatterplot is selected

49 LSRL: Our Data Look at graph of our LSRL for our data Look at our LSRL equation for our data Our line fits scatterplot well (best fit) but not perfectly Make some predictions… what if our hand span was … Interpret our y-intercept; does it make sense? Interpretation of our slope?

50 Another example… value of a truck
Data points does not fit exactly on line; best fit; only prediction, expected values given a certain # of miles driven; it’s a model… models are not perfect; what if we had driven truck 100,000 miles… what do we expect truck would be valued at?; and vice versa

51 Truck example… Suppose we were given the LSRL equation for our truck data as 𝒑𝒓𝒊𝒄𝒆 =𝟑𝟖,𝟐𝟓𝟕−𝟎.𝟏𝟔𝟐𝟗(𝒎𝒊𝒍𝒆𝒔 𝒅𝒓𝒊𝒗𝒆𝒏) We want to find a more precise estimation of the value if we have driven 100,000 miles. Use the LSRL equation. Using graph, estimate price if we have driven 40,000 miles. Then use the above LSRL equation to calculate the predicted value of the truck. (100,000 , 21,967); (40,000 , 31,741); sometimes given predicted price, what is the approximate mileage according to our model? Sometimes use graph and sometimes use LSRL equation

52 Ages & Heights… Age (years) Height (inches) 18 1 28 4 40 5 42 8 49
18 1 28 4 40 5 42 8 49 Input into calculators

53 Let’s review for a moment, shall we …
Input into lists Create scatterplot and describe scatterplot (what do we include in a description?) Calculate r (btw, different from slope; why?), 𝒓 𝟐 , equation of LSRL; interpret equation of LSRL in context; does y-intercept make sense? Based on this data, make a prediction as to the height of a person at age 25.

54 LSRL: Our Data Extrapolation: Use of a regression line for prediction outside the range of values of the explanatory variable x used to obtain the line. Such predictions are often not accurate. Friends don’t let friends extrapolate! Maybe look at our data from hand-span and height. What if we had a handspan of 1 cm? Does that even make sense?

55 Calculating the equation of the LSRL: What if we don’t have the raw data?
We still can calculate the equation for the LSRL, but a little more time consuming Note: Every LSRL goes through the point ( 𝒙 , 𝒚 ) Formula for slope of LSRL: 𝑏=𝑟 𝑠 𝑦 𝑠 𝑥 LSRL: 𝑦 =𝑎+𝑏𝑥

56 Calculating the equation for the LSRL: What if we don’t have the raw data?
Equation of LSRL: 𝑦 =𝑎+𝑏𝑥 If you do not have raw data, but still need to calculate a LSRL, you will be given: 𝒙 , 𝒚 , 𝑟 (𝑜𝑟 𝑟 2 ), 𝑠 𝑦 , 𝑎𝑛𝑑 𝑠 𝑥 Remember, ( 𝑥 , 𝑦 ) is an ordered pair that is on the graph of the LSRL

57 Example: Creating Equation of LSRL (without raw data)
𝐵𝐴𝐿 = a + b (# of beers consumed) (equation of LSRL in context – better than x & y) Remember, slope formula of LSRL: 𝑏=𝑟 𝑠 𝑦 𝑠 𝑥 Givens: 𝒙 =4.8125, 𝑦 = 𝑆 𝑥 =2.1975, 𝑆 𝑦 =.0441, 𝑎𝑛𝑑 𝑟 2 = .80 Calculate slope for equation of LSRL

58 Example: Creating Equation of LSRL (without raw data)
𝐵𝐴𝐿 = a + b (# of beers consumed) Givens: 𝒙 =4.8125, 𝑦 = , 𝑆 𝑥 =2.1975, 𝑆 𝑦 =.0441, 𝑎𝑛𝑑 𝑟 2 = .80 So, slope = b = Remember, equations of all LSRL’s go through 𝑥 , 𝑦 … so what’s next?

59 Example: Creating Equation of LSRL (without raw data)
𝐵𝐴𝐿 = a + b (# of beers consumed) Givens: 𝒙 =4.8125, 𝑦 = , 𝑆 𝑥 =2.1975, 𝑆 𝑦 = .0441, 𝑎𝑛𝑑 𝑟 2 = .80 𝑦 =𝑎 𝑥 Substitute ( 𝑥 , 𝑦 ) into equation

60 Example: Creating Equation of LSRL (without raw data)
= a + (.0179) ( ) and solve for ‘a’ 𝐵𝐴𝐿 = a + b (# of beers consumed) 𝐵𝐴𝐿 = (# of beers consumed)

61 Statistical Software Output

62 Interpreting Software output…
Age vs. Gesell Score Gesell score; when first word was spoken and a later aptitude test

63

64 If we have raw data, how does calculator identify the line of best fit
If we have raw data, how does calculator identify the line of best fit? (aka LSRL)

65 If we have raw data, how does calculator identify the line of best fit
If we have raw data, how does calculator identify the line of best fit? (aka LSRL) LSRL is the line that creates the least “left-overs,” aka least residuals

66 If we have raw data, how does calculator identify the line of best fit
If we have raw data, how does calculator identify the line of best fit? (aka LSRL) correlation and regression applet Remember all of this is for linear data; the model (LSRL) is for linear data only Demo to class; purposely create a SP that is curved also

67 Detour… throw back Thursday…
What is r? What is r’s range? r tells us how linear (and direction) scatterplot is. ‘r’ ranges from -1 to 1. ‘r’ describes the scatterplot only (not LSRL)

68 Now… We need a measurement/graph that tells us how well the LSRL fits Graphical tool: Residuals Plot Numeric tool: Coefficient of Determination, or 𝑟 2

69 Graphical Tool: Residuals Plot
Do all the points on the scatterplot fall exactly on the LSRL? Sometimes too high and sometimes too low Is LSRL a good model to use for a particular data set? “Residuals” or left-overs

70 Graphical Tool: Residuals Plot
Difference between observed value of response variable and predicted value by regression line (LSRL) Residuals = observed responses – expected responses Resids = observed y – predicted y = 𝑦− 𝑦

71 Graphical Tool: Residuals Plot
Go back to our whole-class hand span and height data we collected stat – calc – LSRL a + bx – L1, L2, y1 You get a, b, 𝑟 2 , r, and …. … this process automatically creates a list of your residuals

72 Graphical Tool: Residuals Plot
We plot the residuals (left overs, points on scatterplot that are above or below LSRL) to determine if a line is the best model to describe our scatterplot of bivariate data Perhaps a line isn’t the best model…. Maybe a quadratic curve or a log curve or square root function is a better model for the data

73 Graphical Tool: Residuals Plot
Let’s graph our residuals and see if our distribution is best modeled by LSRL Need residuals list (obviously) before we can graph residuals; so must go through previous process to get residuals list 2nd – stat plot, only one plot on, select scatterplot (first graph option), explanatory list for xlist, resid for ylist, zoom - 9

74 Graphical Tool: Residuals Plot (truck example)
On left is scatter plot and LSRL; on right is residuals plot

75 Graphical Tool: Residuals Plot
To show that a linear model fits scatterplot data well, residuals plot should have no obvious pattern, random, unstructured In the below case, linear model is a good model for the data

76 Graphical Tool: Residuals Plot
If there is an obvious pattern, prediction model of LSRL (or a line) may not be the best model to use to describe scatterplot data.

77 Graphical Tool: Residuals Plot
Residuals ideally should be relatively ‘small’ How do we determine ‘small?’ Up for debate. ‘s’ in software output is standard deviation of residuals

78 Standard deviation of residuals…
You will never have to calculate by hand; will always be given to you.

79 Numerical Tool: Coefficient of Determination or 𝑟 2
Remember, we need a measurement/graph that tells us how well the LSRL fits the data (how well does linear model fit the data) Graphical tool: Residuals Plot Now, numeric tool: Coefficient of Determination, or 𝑟 2

80 Numerical Tool: Coefficient of Determination or 𝑟 2
“R-sq” software output Always 0≤ 𝑟 2 ≤1 Never calculate by hand; always use calculator No need to memorize formula

81 Numerical Tool: Coefficient of Determination or 𝑟 2
Remember “r” correlation, direction and strength of linear relationship of scatterplot −1≤𝑟≤1 𝑟 2 , coefficient of determination, fraction of the variation in the values of y that are explained by LSRL, describes to LSRL 0≤ 𝑟 2 ≤1

82 Numerical Tool: Coefficient of Determination or 𝑟 2
Interpretation of 𝒓 𝟐 : We say, “x% of variation in (y variable) is explained by LSRL relating (y variable) to (x variable).”

83 Residuals Plot vs. 𝑟 2 Always do both, always.

84 Facts to remember about LSRL
Distinction between explanatory and response variables. If switched, scatterplot changes and LSRL changes (but what doesn’t change?) LSRL minimizes distances from data points to line only vertically

85 Facts to remember about LSRL
𝑏=𝑟 𝑠 𝑦 𝑠 𝑥 Close relationship between correlation (r) and slope of LSRL; but r and b are (often) not the same; when would r and b have the same value? LSRL always passes through ( 𝑥 , 𝑦 ) Don’t have to have raw data to identify the equation of LSRL

86 Facts to remember about LSRL
Correlation (r) describes direction and strength of straight-line relationships in scatterplots Coefficient of determination ( 𝑟 2 ) is the fraction of variation in values of y explained by LSRL

87 Correlation & Regression Wisdom
Which of the following scatterplots has the highest correlation?

88 Correlation & Regression Wisdom
All r = 0.816; all have same exact LSRL equation Lesson: Always graph your data! … because correlation and regression describe only linear relationships

89 Correlation & Regression Wisdom
Correlation and regression describe only linear relationships

90 Correlation & Regression Wisdom
Correlation is not causation! Association does not imply causation… want a Nobel Prize? Eat some chocolate! How about Methodist ministers & rum imports? Year Number of Methodist Ministers in New England Cuban Rum Imported to Boston (in # of barrels) 1860 63 8,376 1865 48 6,506 1870 53 7,005 1875 64 8,486 1890 85 11,265 1900 80 10,547 1915 140 18,559 Sometimes there are ‘nonsense’ variables.

91 Beware of nonsense associations…
r = , but no economic relationship between these variables Strong association is due entirely to the fact that both imports & health spending grew rapidly in these years. Common year is other variable. Any two variables that both increase over time will show a strong association. Doesn’t mean one explains the other or influences the other

92 Correlation & Regression Wisdom
Correlation is not resistant; always plot data and look for unusual trends.

93 Correlation & Regression Wisdom
Extrapolation! Don’t do it… ever. Example: Growth data from children from age 1 month to age 12 years … LSRL 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 ℎ𝑒𝑖𝑔ℎ𝑡 =1.5𝑓𝑡+0.25(𝑎𝑔𝑒 𝑖𝑛 𝑦𝑒𝑎𝑟𝑠) What is the predicted height of a 40-year old?

94 Outliers & Influential Points
All influential points are outliers, but not all outliers are influential points.

95 Outliers & Influential Points
Outlier: observation lies outside overall pattern Points that are outliers in the ‘y’ direction of scatterplot have large residuals. Points that are outliers in the ‘x’ direction of scatterplot may not necessarily have large residuals.

96 Outliers & Influential Points
Influential points/observations: If removed would significantly change LSRL (slope and/or y-intercept)

97 Homework… Page 193, # 35, 37, 39, 41, 45, 47, 49, 51, 53, 55, 57, 59, 69 MC: Page 198 #71 – 78; FRQs FRAPPY Chapter 3 review exercises (pg 202) Chapter 3 AP Statistics Practice Test (pg 203; 1-10 MC; FRQ).


Download ppt "Describing relationships …"

Similar presentations


Ads by Google