Describing relationships …

Slides:



Advertisements
Similar presentations
CHAPTER 3 Describing Relationships
Advertisements

Describing relationships …
DESCRIBING RELATIONSHIPS …. RELATIONSHIPS BETWEEN... Talk to the person next to you. Think of two things that you believe may be related. For example,
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 4, REGRESSION ANALYSIS… EXPLORING ASSOCIATIONS BETWEEN VARIABLES.
Warm-up Get a sheet of computer paper/construction paper from the front of the room, and create your very own paper airplane. Try to create planes with.
Describing Relationships
Chapter 4.2 Notes LSRL.
Aim – How can we analyze the results of a linear regression?
Sections Review.
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Module 11 Math 075. Module 11 Math 075 Bivariate Data Proceed similarly as univariate distributions … What is univariate data? Which graphical models.
Module 5: examining relationships...
Chapter 3: Describing Relationships
Regression and Residual Plots
Module 12 Math 075.
Chapter 3: Describing Relationships
residual = observed y – predicted y residual = y - ŷ
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Review of Chapter 3 Examining Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
September 25, 2013 Chapter 3: Describing Relationships Section 3.1
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Summarizing Bivariate Data
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
AP Stats Agenda Text book swap 2nd edition to 3rd Frappy – YAY
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Honors Statistics Review Chapters 7 & 8
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Presentation transcript:

Describing relationships …

Relationships between ... Talk to the person next to you. Think of two things that you believe may be related. For example, height and weight are generally related... The taller the person, generally, the more they weigh. Share out.

Do you believe there Is there a relationship between... Time spent studying and GPA? # of cigarettes smoked daily & life expectancy Salary and education level? Age and height? Age of automobile and value of automobile value? Possibly discuss categorical vs. numerical data (spiral)

Relationships When we consider (possible) relationships between 2 (numeric) variables, the data is referred to as bi-variate data. There may or may not exist a relationship/an association between the 2 variables. Does one variable ‘cause’ the other? Caution! Does one variable influence the other? Or is the relationship influenced by another variable(s) that we are unaware of?

CSI Stats: The case of mrs. Flynn’s missing bear… Mrs. Flynn’s bear is missing.. The only people with access to Mrs. Flynn’s room are the other GV math teachers. She asks her colleagues if they know where the bear is; no one confesses to the crime. But the next day, Mrs. Flynn catches a break—she finds a clear handprint. The careless culprit has left behind crucial evidence! At this point, Mrs. Flynn calls in the CSI Stats team (your class) to help her identify the prime suspect in “The Case of the Missing Bear.” Could also or instead of do # hours you slept last night and # hours you spent on HW; does there seem to be a relationship?

Starting our investigation… 1. Ask a partner to measure your right hand span to the nearest centimeter (cm). Hand span is the maximum distance from the tip of the thumb to the tip of the pinkie finger on a person’s fully stretched- out hand. 2. Write your measured hand span as well as your height (both in cm) on the board with two columns, hand span (cm) and height (cm). Use a blue marker if you are male; use a black marker if you are female Note: to calculate your height in cm, multiply your height in inches by 2.54 3. Copy the data table onto your paper. Next, let’s plot our ordered pairs (hand span, height). Draw your axes and label the horizontal axis “Hand span (cm)” and the vertical axis “Height (cm).” Scaling… Give each person a piece of graph paper

Investigation continued… 4. Since neither hand span nor height can be close to 0 cm, we want to start our horizontal and vertical scales at larger numbers. Scale the horizontal axis in 0.5-cm increments starting with 15 cm. Scale the vertical axis in 5-cm increments starting with 135 cm. 5. What does the graph tell us about the relationship between hand span and height. Summarize your observations in a sentence or two (on your graph paper). 6. Now Mrs. Flynn will give you a copy of the handprint found at the scene and the math department roster. Which math teacher does your group believe is the “prime suspect”? Justify your answer with appropriate statistical evidence. Let’s see if that teacher really does have the bear.

Note… keep this data! We will be using it again soon.

Bivariate Data Proceed similarly as uni-variate distributions … Still graph (use model to describe data; scatter plot; LSRL) Still look at overall patterns and deviations from those patterns (DOFS) Still analyze numerical summary (2-var stats) DOFS – direction, outlier(s), form, strength

Bivariate Distributions Explanatory variable, x, ‘factor,’ may help predict or explain changes in response variable; usually on horizontal axis Response variable, y, measures an outcome of a study, usually on vertical axis Like hand span helped us predict the height… helping us find our bear thief

Bivariate Data Distributions For example ... Alcohol (explanatory) and body temperature (response). Generally, the more alcohol consumed, the higher the body temperature. Still use caution with ‘cause.’ Sometimes we don’t have variables that are clearly explanatory and response. Sometimes there could be two ‘explanatory’ variables. Examples: Discuss with a partner for 1 minute

Explanatory & Response or Two Explanatory Variables? ACT Score and SAT Score Activity level and physical fitness SAT Math and SAT Verbal Scores

Graphical models… Many graphing models display uni-variate data exclusively (review). Discuss for 30 seconds and share out. Main graphical representation used to display bivariate data (two quantitative variables) is scatterplot.

Scatterplots Scatterplots show relationship between two quantitative variables measured on the same individuals Each individual in data appears as a point (x, y) on the scatterplot. Plot explanatory variable (if there is one) on horizontal axis. If no distinction between explanatory and response, either can be plotted on horizontal axis. Label both axes. Scale both axes with uniform intervals (but scales don’t have to match) Like bear thief example

Label & Scale Scatterplot Variables: Clearly Explanatory and Response?? Can either be explanatory and response? Yes, in this case. Notice it doesn’t start at 0 on either axis. That’s ok for scatterplots… can be deceptive for bar graphs though.

Creating & Interpreting Scatterplots Let’s go back to our hand span and height data. Input hand span into L1 and height into L2 (careful when inputting) Calculator: dim error

Creating & Interpreting Scatterplots 2nd – stat plot make sure only one plot is on select first graph (picture of scatterplot) reference L1 and L2 choose the mark you wish to use zoom – 9 (forces your data to fit your screen) trace: provides (x,y) coordinate for each point on scatterplot

Interpreting Scatterplots Look for overall patterns (DOFS) including: direction: up or down, + or – association? outliers/deviations: individual value(s) falls outside overall pattern; no outlier rule for bi-variate data –unlike uni-variate data form: linear? curved? clusters? gaps? strength: how closely do the points follow a clear form? (“r” describes strength of linear relationships only) DOFS – direction, outlier(s), form, strength

Scatterplots: Note Might be asked to graph a scatterplot from data Might need to sketch what’s on calculator Use trace to do so Doesn’t have to be 100% exactly accurate; do your best Scaling, labeling: a must!

Adding Categorical Variables to Scatterplots Input male data into L3 & L4 and female data into L5 & L6 Turn on two stat plots (one with L3 & L4 and the other with L5 & L6) Be sure to choose different marks for each scatterplot (like + for one and . for the other) This embeds categorical data into scatterplot

Measuring Linear Association: Correlation or “r” Scatterplots (bi-variate data) show direction, outliers/ deviation(s), form, strength of relationship between two quantitative variables Linear relationships are important; common, simple pattern Linear relationship is strong if points are close to a straight line; weak if scattered about Other relationships (quadratic, logarithmic, etc.) We need more than ‘strong’ or ‘weak’… leads us into r or correlation

How strong are these relationships? Which one is stronger?

Measuring Linear Association: Correlation or “r” Eyes are not a good judge Need to specify just how strong or weak a linear relationship is Need a numeric measure Correlation or ‘r’

Measuring Linear Association: Correlation or “r” * Correlation (r) measures direction and strength of a linear relationship between two quantitative variables Correlation (r) is always between -1 and 1 Correlation (r) is not resistant (look at formula; based on mean) * Never calculate by formula; use calculator (dependent on having raw data)

Measuring Linear Association: Correlation or “r” r ≈0  not strong linear relationship r close to 1  strong positive linear relationship r close to -1  strong negative linear relationship

To estimate the correlation from a scatterplot, have student imagine drawing an oval around the points. The rounder the oval, the closer the correlation will be to 0. The longer and skinnier the oval, the closer the correlation will be to + or – 1. Rounder ovals look like 0’s and skinnier ovals look like tilted 1’s.

Guess the correlation www.rossmanchance.com/applets ‘March Madness’ bracket-style Guess the Correlation tournament Number off; randomly choose numbers to match up head-to- head competition/rounds Look at a scatterplot, each write down your guess on notecards and reveal at same time Student who is closest survives until the next round Page 151 in TE in textbook; make # cards for choosing

Correlation & regression applet www.whfreeman.com/tps5e then statistical applets, then Correlation & Regression applet Work with one partner Follow activity directions; press red ‘clear’ button after every scatter plot you create You and your partner are to verbally discuss all questions, but only need to turn in your summaries Should take less than 15 minutes total; book computers in core

Caution… interpreting correlation So, the moral of the story is… be careful when addressing form in scatterplots Strong positive linear relationship ► correlation ≈ 1 But Correlation ≈ 1 does not necessarily mean relationship is linear; always plot data! Y=x^2; correlation = 0.97; these are not bi-conditional (bi-conditionals go either way and are still true)

Calculating Correlation “r” n, x1, x2, etc., 𝒙 , y1, y2, etc., 𝒚 , sx, sy, … Not the way to go; explain formula

Calculating Correlation “r” Always use your calculator; never calculate by hand 2 lists --bivariate data (same # of data in each list; like (x, y)) Be sure diagnostic is on (catalog, d, arrow down, enter) stat – calc- linreg (8) – two lists used for data Now, interpret the value of r relating to our data (remember, r gives us information about direction and strength of scatterplot) Let’s practice using our hand span and height data from our class.

Facts about Correlation Correlation doesn’t care which variables is considered explanatory and which is considered response Can switch x & y L1, L2 or L2, L1 Still same correlation (r) value CAUTION! Switching x & y WILL change your scatterplot… just not ‘r’

Facts about Correlation r is in standard units, so r doesn’t change if units are changed If we change from yards to feet, r is not effected + r, positive association - r, negative association

Facts about Correlation Correlation is always between -1 & 1 Makes no sense for r = 13 or r = -5 r = 0 means weak linear relationship r = 1 or -1 means strong linear association Only makes sense with linear relationships

Facts about Correlation Both variables must be quantitative, numerical. Doesn’t make any sense to discuss r for qualitative or categorical data Correlation is not resistant (like mean and SD). Be careful using r when outliers are present

Facts about Correlation r isn’t enough! … mean, standard deviation, graphical representation Correlation does not imply causation; i.e., # students who own cell phones and # students passing AP exams

Absurd examples… correlation does not imply causation… Did you know that eating chocolate makes winning a Nobel Prize more likely? The correlation between per capita chocolate consumption and the number of Nobel laureates per 10 million people for 23 selected countries is r = 0.791 Did you know that AP Statistics is causing global warming? As the number of AP Statistics exams has grown over the years, so has the average global temperature!

Homework… Page 159, # 3, 5, 7, 9, 12, 15, 17, 23, 25. MC: Page 163, #27 – 32; then FRQ as review (page 164 #34) 3-1 Section Quiz: tomorrow

Least Squares Regression Last section… scatterplots of two quantitative variables r measures strength and direction of linear relationship of scatterplot

About how much sodium would we expect in a hot dog that has 170 calories? Hard to estimate from SP; better if we had a line on the SP…. LSRL

Least Squares Regression BETTER to summarize overall pattern by drawing a line on scatterplot Not any line; we want a best-fit line over scatterplot Least Squares Regression Line (LSRL)

Least-Squares Regression Line

Least Squares Regression (predicts values) LSRL Model: 𝑦 =𝑎+𝑏𝑥 𝑦 is predicted value of response variable a is y-intercept of LSRL b is slope of LSRL; slope is predicted (expected) rate of change x is explanatory variable

Least Squares Regression (predicts values) Often will be asked to interpret slope of LSRL & y- intercept, in context Caution: Interpret slope of LSRL as the predicted or average change or expected change in the response variable given a unit change in the explanatory variable NOT change in y for a unit change in x; LSRL is a model; models are not perfect

LSRL: Our Data Go back to whole-class data on hand-span and height Create scatterplot; & do 2-var stats Now let’s determine LSRL for our data

LSRL: Our Data Stat – calc – linreg – L1, L2, vars – yvars – enter – enter LinReg y = a + bx (careful… 𝑦 =𝑎+𝑏𝑥… or variables in context) a = b = 𝑟 2 = r =

LSRL: Our Data AND, as a bonus, you get your LSRL graphed for you (due to the vars – yvars – enter – enter) on top of your scatterplot if you wish Go to y = and be sure nothing else is selected Go to stat plot and be sure only your scatterplot is selected

LSRL: Our Data Look at graph of our LSRL for our data Look at our LSRL equation for our data Our line fits scatterplot well (best fit) but not perfectly Make some predictions… what if our hand span was … Interpret our y-intercept; does it make sense? Interpretation of our slope?

Another example… value of a truck Data points does not fit exactly on line; best fit; only prediction, expected values given a certain # of miles driven; it’s a model… models are not perfect; what if we had driven truck 100,000 miles… what do we expect truck would be valued at?; and vice versa

Truck example… Suppose we were given the LSRL equation for our truck data as 𝒑𝒓𝒊𝒄𝒆 =𝟑𝟖,𝟐𝟓𝟕−𝟎.𝟏𝟔𝟐𝟗(𝒎𝒊𝒍𝒆𝒔 𝒅𝒓𝒊𝒗𝒆𝒏) We want to find a more precise estimation of the value if we have driven 100,000 miles. Use the LSRL equation. Using graph, estimate price if we have driven 40,000 miles. Then use the above LSRL equation to calculate the predicted value of the truck. (100,000 , 21,967); (40,000 , 31,741); sometimes given predicted price, what is the approximate mileage according to our model? Sometimes use graph and sometimes use LSRL equation

Ages & Heights… Age (years) Height (inches) 18 1 28 4 40 5 42 8 49 18 1 28 4 40 5 42 8 49 Input into calculators

Let’s review for a moment, shall we … Input into lists Create scatterplot and describe scatterplot (what do we include in a description?) Calculate r (btw, different from slope; why?), 𝒓 𝟐 , equation of LSRL; interpret equation of LSRL in context; does y-intercept make sense? Based on this data, make a prediction as to the height of a person at age 25.

LSRL: Our Data Extrapolation: Use of a regression line for prediction outside the range of values of the explanatory variable x used to obtain the line. Such predictions are often not accurate. Friends don’t let friends extrapolate! Maybe look at our data from hand-span and height. What if we had a handspan of 1 cm? Does that even make sense?

Calculating the equation of the LSRL: What if we don’t have the raw data? We still can calculate the equation for the LSRL, but a little more time consuming Note: Every LSRL goes through the point ( 𝒙 , 𝒚 ) Formula for slope of LSRL: 𝑏=𝑟 𝑠 𝑦 𝑠 𝑥 LSRL: 𝑦 =𝑎+𝑏𝑥

Calculating the equation for the LSRL: What if we don’t have the raw data? Equation of LSRL: 𝑦 =𝑎+𝑏𝑥 If you do not have raw data, but still need to calculate a LSRL, you will be given: 𝒙 , 𝒚 , 𝑟 (𝑜𝑟 𝑟 2 ), 𝑠 𝑦 , 𝑎𝑛𝑑 𝑠 𝑥 Remember, ( 𝑥 , 𝑦 ) is an ordered pair that is on the graph of the LSRL

Example: Creating Equation of LSRL (without raw data) 𝐵𝐴𝐿 = a + b (# of beers consumed) (equation of LSRL in context – better than x & y) Remember, slope formula of LSRL: 𝑏=𝑟 𝑠 𝑦 𝑠 𝑥 Givens: 𝒙 =4.8125, 𝑦 = .07375 𝑆 𝑥 =2.1975, 𝑆 𝑦 =.0441, 𝑎𝑛𝑑 𝑟 2 = .80 Calculate slope for equation of LSRL

Example: Creating Equation of LSRL (without raw data) 𝐵𝐴𝐿 = a + b (# of beers consumed) Givens: 𝒙 =4.8125, 𝑦 = .07375, 𝑆 𝑥 =2.1975, 𝑆 𝑦 =.0441, 𝑎𝑛𝑑 𝑟 2 = .80 So, slope = b = .0179 Remember, equations of all LSRL’s go through 𝑥 , 𝑦 … so what’s next?

Example: Creating Equation of LSRL (without raw data) 𝐵𝐴𝐿 = a + b (# of beers consumed) Givens: 𝒙 =4.8125, 𝑦 = .07375, 𝑆 𝑥 =2.1975, 𝑆 𝑦 = .0441, 𝑎𝑛𝑑 𝑟 2 = .80 𝑦 =𝑎+ .0179 𝑥 Substitute ( 𝑥 , 𝑦 ) into equation

Example: Creating Equation of LSRL (without raw data) 0.07375 = a + (.0179) ( 4.8125) and solve for ‘a’ 𝐵𝐴𝐿 = a + b (# of beers consumed) 𝐵𝐴𝐿 = -0.0123 + 0.0179 (# of beers consumed)

Statistical Software Output

Interpreting Software output… Age vs. Gesell Score Gesell score; when first word was spoken and a later aptitude test

If we have raw data, how does calculator identify the line of best fit If we have raw data, how does calculator identify the line of best fit? (aka LSRL)

If we have raw data, how does calculator identify the line of best fit If we have raw data, how does calculator identify the line of best fit? (aka LSRL) LSRL is the line that creates the least “left-overs,” aka least residuals

If we have raw data, how does calculator identify the line of best fit If we have raw data, how does calculator identify the line of best fit? (aka LSRL) www.whfreeman.com/tps5e correlation and regression applet Remember all of this is for linear data; the model (LSRL) is for linear data only Demo to class; purposely create a SP that is curved also

Detour… throw back Thursday… What is r? What is r’s range? r tells us how linear (and direction) scatterplot is. ‘r’ ranges from -1 to 1. ‘r’ describes the scatterplot only (not LSRL)

Now… We need a measurement/graph that tells us how well the LSRL fits Graphical tool: Residuals Plot Numeric tool: Coefficient of Determination, or 𝑟 2

Graphical Tool: Residuals Plot Do all the points on the scatterplot fall exactly on the LSRL? Sometimes too high and sometimes too low Is LSRL a good model to use for a particular data set? “Residuals” or left-overs

Graphical Tool: Residuals Plot Difference between observed value of response variable and predicted value by regression line (LSRL) Residuals = observed responses – expected responses Resids = observed y – predicted y = 𝑦− 𝑦

Graphical Tool: Residuals Plot Go back to our whole-class hand span and height data we collected stat – calc – LSRL a + bx – L1, L2, y1 You get a, b, 𝑟 2 , r, and …. … this process automatically creates a list of your residuals

Graphical Tool: Residuals Plot We plot the residuals (left overs, points on scatterplot that are above or below LSRL) to determine if a line is the best model to describe our scatterplot of bivariate data Perhaps a line isn’t the best model…. Maybe a quadratic curve or a log curve or square root function is a better model for the data

Graphical Tool: Residuals Plot Let’s graph our residuals and see if our distribution is best modeled by LSRL Need residuals list (obviously) before we can graph residuals; so must go through previous process to get residuals list 2nd – stat plot, only one plot on, select scatterplot (first graph option), explanatory list for xlist, resid for ylist, zoom - 9

Graphical Tool: Residuals Plot (truck example) On left is scatter plot and LSRL; on right is residuals plot

Graphical Tool: Residuals Plot To show that a linear model fits scatterplot data well, residuals plot should have no obvious pattern, random, unstructured In the below case, linear model is a good model for the data

Graphical Tool: Residuals Plot If there is an obvious pattern, prediction model of LSRL (or a line) may not be the best model to use to describe scatterplot data.

Graphical Tool: Residuals Plot Residuals ideally should be relatively ‘small’ How do we determine ‘small?’ Up for debate. ‘s’ in software output is standard deviation of residuals

Standard deviation of residuals… You will never have to calculate by hand; will always be given to you.

Numerical Tool: Coefficient of Determination or 𝑟 2 Remember, we need a measurement/graph that tells us how well the LSRL fits the data (how well does linear model fit the data) Graphical tool: Residuals Plot Now, numeric tool: Coefficient of Determination, or 𝑟 2

Numerical Tool: Coefficient of Determination or 𝑟 2 “R-sq” software output Always 0≤ 𝑟 2 ≤1 Never calculate by hand; always use calculator No need to memorize formula

Numerical Tool: Coefficient of Determination or 𝑟 2 Remember “r” correlation, direction and strength of linear relationship of scatterplot −1≤𝑟≤1 𝑟 2 , coefficient of determination, fraction of the variation in the values of y that are explained by LSRL, describes to LSRL 0≤ 𝑟 2 ≤1

Numerical Tool: Coefficient of Determination or 𝑟 2 Interpretation of 𝒓 𝟐 : We say, “x% of variation in (y variable) is explained by LSRL relating (y variable) to (x variable).”

Residuals Plot vs. 𝑟 2 Always do both, always.

Facts to remember about LSRL Distinction between explanatory and response variables. If switched, scatterplot changes and LSRL changes (but what doesn’t change?) LSRL minimizes distances from data points to line only vertically

Facts to remember about LSRL 𝑏=𝑟 𝑠 𝑦 𝑠 𝑥 Close relationship between correlation (r) and slope of LSRL; but r and b are (often) not the same; when would r and b have the same value? LSRL always passes through ( 𝑥 , 𝑦 ) Don’t have to have raw data to identify the equation of LSRL

Facts to remember about LSRL Correlation (r) describes direction and strength of straight-line relationships in scatterplots Coefficient of determination ( 𝑟 2 ) is the fraction of variation in values of y explained by LSRL

Correlation & Regression Wisdom Which of the following scatterplots has the highest correlation?

Correlation & Regression Wisdom All r = 0.816; all have same exact LSRL equation Lesson: Always graph your data! … because correlation and regression describe only linear relationships

Correlation & Regression Wisdom Correlation and regression describe only linear relationships

Correlation & Regression Wisdom Correlation is not causation! Association does not imply causation… want a Nobel Prize? Eat some chocolate! How about Methodist ministers & rum imports? Year Number of Methodist Ministers in New England Cuban Rum Imported to Boston (in # of barrels) 1860 63 8,376 1865 48 6,506 1870 53 7,005 1875 64 8,486 1890 85 11,265 1900 80 10,547 1915 140 18,559 Sometimes there are ‘nonsense’ variables.

Beware of nonsense associations… r = 0.9749, but no economic relationship between these variables Strong association is due entirely to the fact that both imports & health spending grew rapidly in these years. Common year is other variable. Any two variables that both increase over time will show a strong association. Doesn’t mean one explains the other or influences the other

Correlation & Regression Wisdom Correlation is not resistant; always plot data and look for unusual trends.

Correlation & Regression Wisdom Extrapolation! Don’t do it… ever. Example: Growth data from children from age 1 month to age 12 years … LSRL 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 ℎ𝑒𝑖𝑔ℎ𝑡 =1.5𝑓𝑡+0.25(𝑎𝑔𝑒 𝑖𝑛 𝑦𝑒𝑎𝑟𝑠) What is the predicted height of a 40-year old?

Outliers & Influential Points All influential points are outliers, but not all outliers are influential points.

Outliers & Influential Points Outlier: observation lies outside overall pattern Points that are outliers in the ‘y’ direction of scatterplot have large residuals. Points that are outliers in the ‘x’ direction of scatterplot may not necessarily have large residuals.

Outliers & Influential Points Influential points/observations: If removed would significantly change LSRL (slope and/or y-intercept)

Homework… Page 193, # 35, 37, 39, 41, 45, 47, 49, 51, 53, 55, 57, 59, 69 MC: Page 198 #71 – 78; FRQs FRAPPY Chapter 3 review exercises (pg 202) Chapter 3 AP Statistics Practice Test (pg 203; 1-10 MC; 11-13 FRQ).