i.e. How to get an A on the big project I’m about to assign you…

Slides:



Advertisements
Similar presentations
Chapter 3 Examining Relationships Lindsey Van Cleave AP Statistics September 24, 2006.
Advertisements

Residuals.
Describing the Relation Between Two Variables
Chapter 10 Relationships between variables
AP STATS: 50 point quiz Sit with your partner. This is open notes/textbook. Work for 20 minutes with your partner on the quiz. Each person will have to.
Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We.
Looking at data: relationships - Caution about correlation and regression - The question of causation IPS chapters 2.4 and 2.5 © 2006 W. H. Freeman and.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Slide Slide 1 Warm Up Page 536; #16 and #18 For each number, answer the question in the book but also: 1)Prove whether or not there is a linear correlation.
Bivariate Data analysis. Bivariate Data In this PowerPoint we look at sets of data which contain two variables. Scatter plotsCorrelation OutliersCausation.
Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict.
What do you see in these scatter plots? Latitude (°S) Mean January Air Temperatures for 30 New Zealand Locations Temperature.
Bivariate Data AS (3 credits) Complete a statistical investigation involving bi-variate data.
Chapter 3-Examining Relationships Scatterplots and Correlation Least-squares Regression.
Lecture 5 Chapter 4. Relationships: Regression Student version.
Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances.
Bivariate Data AS Complete a statistical investigation involving bi-variate data.
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
Going Crackers! Do crackers with more fat content have greater energy content? Can knowing the percentage total fat content of a cracker help us to predict.
Correlation Correlation measures the strength of the linear association between two quantitative variables Get the correlation coefficient (r) from your.
Chapter 3 Unusual points and cautions in regression.
Warm-up Get a sheet of computer paper/construction paper from the front of the room, and create your very own paper airplane. Try to create planes with.
Scatter Plots and Correlation Coefficients
Statistics 200 Lecture #6 Thursday, September 8, 2016
Using the information in the plot, can you suggest what needs to be done in a country to increase the life expectancy? Explain. Perhaps.
Trail Mix Investigation
Scatterplots Association and Correlation
Describing Scatterplots
4. Relationships: Regression
Sections Review.
Chapter 13 Multiple Regression
Chindamanee School English Program
i.e. How to get an A on the big project I’m about to assign you…
Examining Relationships Least-Squares Regression & Cautions about Correlation and Regression PSBE Chapters 2.3 and 2.4 © 2011 W. H. Freeman and Company.
SCATTERPLOTS, ASSOCIATION AND RELATIONSHIPS
Georgetown Middle School Math
LSRL Least Squares Regression Line
Lesson 13: Things To Watch out for
Establishing Causation
The scatterplot shows the advertised prices (in thousands of dollars) plotted against ages (in years) for a random sample of Plymouth Voyagers on several.
4.5 – Analyzing Lines of Best Fit
Describe the association’s Form, Direction, and Strength
Suppose the maximum number of hours of study among students in your sample is 6. If you used the equation to predict the test score of a student who studied.
1) A residual: a) is the amount of variation explained by the LSRL of y on x b) is how much an observed y-value differs from a predicted y-value c) predicts.
Examining Relationships
What do you see in these scatter plots?
2-7 Curve Fitting with Linear Models Holt Algebra 2.
Chapter 7 Part 1 Scatterplots, Association, and Correlation
1. Describe the Form and Direction of the Scatterplot.
Simple Linear Regression
Simple Linear Regression
Ice Cream Sales vs Temperature
Chapter 8 Linear Regression
Scatterplots and Correlation
Examining Relationships
Bivariate Data analysis
Least Squares Regression
Scatterplots and Correlation
Review of Chapter 3 Examining Relationships
Objectives (IPS Chapter 2.3)
Looking at data: relationships - Caution about correlation and regression - The question of causation IPS chapters 2.4 and 2.5 © 2006 W. H. Freeman and.
Least-Squares Regression
Correlation/regression using averages
Algebra Review The equation of a straight line y = mx + b
Basic Practice of Statistics - 3rd Edition Lecture Powerpoint
Honors Statistics Review Chapters 7 & 8
Review of Chapter 3 Examining Relationships
Correlation/regression using averages
Presentation transcript:

i.e. How to get an A on the big project I’m about to assign you… Regression Notes i.e. How to get an A on the big project I’m about to assign you…

Bivariate (2-variable) Statistics PHASE I. Scatter-Plots and Bivariate (2-variable) Statistics

Two-Variable Descriptors: One Variable Descriptors: Describing Variables Two-Variable Descriptors: LINEARITY? DIRECTION ? SCATTER? anything unusual? One Variable Descriptors: SHAPE CENTER SPREAD anything unusual?

What do you see in these scatter plots? Mean January Air Temperatures for 30 U.S. Locations 20 19 18 Temperature (°C) 17 16 LINEAR TREND NEGATIVE ASSOCIATION 15 CONSTANT SCATTER 14 ANYTHING UNUSUAL? 35 40 45 Latitude (°N)   

What do you see in these scatter plots? 10 20 30 40 GDP per capita (thousands of dollars) 50 60 70 80 Internet Users (%) % of population who are Internet Users vs GDP per capita for 202 Countries NON-LINEAR TREND POSITIVE ASSOCIATION NON-CONSTANT SCATTER and an OUTLIER!!!

What do you see in these scatter plots? Year 1990 1980 1970 1960 1950 1940 1930 30 28 26 24 22 20 Age Average Age Americans are First Married 2 SEPARATE, NON-LINEAR TRENDS …gap in data in 1940s? NEGATIVE ASSOCIATION TIL ~1970, THEN POSITIVE NO SCATTER

What to look for in scatter plots Trend Linear or non-linear?

What to look for in scatter plots Trend Positive or negative association?

What to look for in scatter plots 2. Scatter Strong or weak relationship?

What to look for in scatter plots 2. Scatter Constant or non-constant scatter?

What to look for in scatter plots 3. Anything unusual Outlier

What to look for in scatter plots 3. Anything unusual Groupings

Rank relationships: weakest (1) to strongest (4) 4 2 1 3

Correlation Coefficient r

Correlation Coefficient little r – what is it? r measures the strength of a linear relationship Your calculator will find it for you. -1 ≤ r ≤ 1 r is a multiple of the slope

Only use r if the scatter plot is linear r – when can it be used? Only use r if the scatter plot is linear x y * r = 0.99

Don’t use r if the scatter plot is non-linear! r – when can it be used? Don’t use r if the scatter plot is non-linear! r = 0.00

r – when can it be used? Tick the plots where it’s OK to use a correlation coefficient to describe the strength of the relationship:

r – when can it be used? Tick the plots where it’s OK to use a correlation coefficient to describe the strength of the relationship:

How close the points in the scatter plot come to lying on the line r – what does it tell you? How close the points in the scatter plot come to lying on the line r = 0.99 x y * r = 0.57 Difficult Ones

Playing with Outliers (1)… Playing with Outliers (1)… What will happen to the correlation coefficient if we remove the tallest 12th grader? bigger or smaller Hint: …correlation measures how linear the data is ----------------- and an OUTLIER!!! LINEAR TREND POSITIVE ASSOCIATION MOSTLY CONSTANT SCATTER See for yourself HERE

Playing with Outliers (2)… Playing with Outliers (2)… What will happen to the correlation coefficient if we remove the elephant? bigger or smaller Hint: …make your brain zoom in on that main cluster of points ----------------- and an OUTLIER!!! LINEAR TREND POSITIVE ASSOCIATION CONSTANT SCATTER See for yourself HERE

PHASE II. Lurking Variables

Guess which are correlated with Test Scores? 4 are… 4 aren’t… Highly educated parents Mom’s age >30 at birth Mom stays home until Kindergarten Intact family (live with mom and dad) Attended Head Start program Parents have money Move to a better neighborhood Low birthweight (including premature)

Guess which are correlated with Test Scores? 3 are… 4 aren’t… Parents speak English Family goes to museums, zoos, concerts… Parents involved in PTA Child spanked regularly Watches TV a lot Parents own a lot of books Child is read to every day

Life Expectancy Example Life Expectancy and Availability of Doctors for a Sample of 40 Countries 80 Can you suggest how to increase life expectancy in a country? - Non-linear trend - Negative Association - Fairly Constant Scatter 70 Life Expectancy Get fewer people per doctor! Duh! 60 50 10000 20000 30000 40000 People per Doctor

Life Expectancy Example Life Expectancy and Availability of Televisions for a Sample of 40 Countries 80 Can you suggest how to increase life expectancy in a country? 70 Get fewer people per TV?!? Life Expectancy BEWARE LURKING VARIABLES!!! 60 50 100 200 300 400 500 600 People per Television

Kinds of Lurking Variables (1) CAUSATION “People who take showers have better organizational skills.” perceived correlation Organized y Shower x Maybe changes in x CAUSE changes in y

Kinds of Lurking Variables (2) CAUSATION again... but in reverse “People who take showers have better organizational skills.” perceived correlation Organized y Shower X Maybe changes in y CAUSE changes in x

Kinds of Lurking Variables (3) COMMON RESPONSE “People who take showers have better organizational skills.” perceived correlation Organized y Shower x Good Habits in General z Maybe something else z is causing changes in both X and Y at the same time!

Kinds of Lurking Variables (4) CONFOUNDING …we don’t know which variable (x or z) is causing the changes in y. They’re hopelessly mixed up with each other. “People who take showers have better organizational skills.” perceived correlation Organized y Shower x Good Habits in General z

LURKING VARIABLES Heh, Heh, Heh…

(Clearly children can hold their drink better than adults) How Regression gets you in Trouble… Famous examples of strong correlations: Instances of drunkenness in those below 18 years of age are significantly lower than for those above. (Clearly children can hold their drink better than adults)

(eating ice cream makes you tastier?) How Regression gets you in Trouble… Famous examples of strong correlations: Whenever ice cream sales rise, so do the number of shark attacks. (eating ice cream makes you tastier?)

(learning words make you hungry?) How Regression gets you in Trouble… Famous examples of strong correlations: As vocabulary in infancy rises, so does appetite. (learning words make you hungry?)

(firetrucks cause damage?) How Regression gets you in Trouble… Famous examples of strong correlations: The more fire trucks you send to a fire, the worse the damage is. (firetrucks cause damage?)

The more you pay teachers in a town, the more expensive alcohol is. How Regression gets you in Trouble… Famous examples of strong correlations: The more you pay teachers in a town, the more expensive alcohol is.

How Regression gets you in Trouble… How Regression gets you in Trouble… Famous examples of strong correlations: In Scandinavia, storks appear more often on the rooftops of families with more babies.

Left-handed peole die earlier than right-handed people. How Regression gets you in Trouble… Famous examples of strong correlations: Left-handed peole die earlier than right-handed people. (no. Older people grew up in an era where being left-handed was discouraged. Rightys are more common in older people; leftys are more common in the young. When you look at deaths, leftys die younger because leftys in general are younger!)

Deer and cattle, orient themselves along a north/south axis when grazing.

Correlation is not Causation How Regression gets you in Trouble… Famous examples of strong correlations: Correlation is not Causation

The story: The smoking ban in Wales "caused" a 13% fall in heart attacks from October to December 2007, compared with the same period in 2006. The flaw: The ban began in April. In April, we also observed a 13% fall in heart attacks. Presumably the ban "caused" me to spill my coffee, for that happened during April too.

!! TRADE UNIONS SECURE BETTER PAY !! See? Union membership can get you as much as 30% more pay!! 43

!! TRADE UNIONS SECURE BETTER PAY !! perceived correlation Union Membership x Better Pay! y Education level of employee z Experience Level of employee z Age of employee z 44

Gapminder 45

Least Square Regression Lines (LSRL) PHASE III. Residuals and Least Square Regression Lines (LSRL)

Residuals = Actual – Predicted Residuals = Actual – Predicted 4 = prediction line y = 5 + 2x The actual point is (8, 25) (8, 25) 25 The predicted point is (8, 21) 21 (8, 21) 17 4 6 8 10 12 Residuals = Actual – Predicted 4 = 25 – 21

“Actual” – “Predicted” 7 -3 17 1 -10 -4

Σ (Resids)2 = 439.2988 We’ll try to get the Least Squares Least Squares Regression: We’ll try to get the Least Squares Σ (Resids)2 = 439.2988

Least Squares Regression Line Facts: There is one and only one LSRL for every set of bivariate data. Σ Residuals = 0 (just like with st.dev) Your calculator will give you the one equation with the “least” amount of squares… “Least Squares Regression Line” (LSRL) The LSRL must go through the point You’ll only have to calculate the LSRL by hand once (…heh, heh)

PHASE III. Your Project

Example of analysis: “Going Crackers” An example of the type of work you’ll be doing for your REGRESSION ASSIGNMENT You start with some raw data… 1. Predict the energy content of a cracker with 25% fat content. 2. If you reduced the salt content by 100g, how would the fat change?

Example of analysis: “Going Crackers” 1. Predict the energy content of a cracker with 25% fat content. ENERGY FAT (ENERGY) = 380 + 4.98 (FAT) = 380 + 4.98 (25) = 504.5

Example of analysis: “Going Crackers” 2. If you reduced the salt content by 100g, how would the fat change? FAT SALT (FAT) = - 2.7 + 0.0237 (SALT) The fat content would drop by 0.0237 mg.

Problem 2 Analysis The data suggest a linear trend. The association is positive with constant scatter about the trend line. It is reasonable to do a linear regression.

Problem 2 Analysis The LSRL is y = -2.7 + 0.0237x. The slope of the fitted line is 0.0237 which tells us, on average, each 100mg decrease in salt content is associated with a decrease in total fat content by 2.4% The moderate relationship (r = 0.69) means that predicting such a drop will not necessarily be highly accurate.