June 3, 2008Stat 111 - Lecture 5 - Correlation1 Exploring Data Numerical Summaries of Relationships between Statistics 111 - Lecture 5.

Slides:



Advertisements
Similar presentations
June 9, 2008Stat Lecture 8 - Sampling Distributions 1 Introduction to Inference Sampling Distributions Statistics Lecture 8.
Advertisements

Chapter 8 Linear Regression © 2010 Pearson Education 1.
June 3, 2008Stat Lecture 6 - Probability1 Probability Introduction to Probability, Conditional Probability and Random Variables Statistics 111 -
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
July 1, 2008Lecture 17 - Regression Testing1 Testing Relationships between Variables Statistics Lecture 17.
Describing the Relation Between Two Variables
Regression Analysis. Unscheduled Maintenance Issue: l 36 flight squadrons l Each experiences unscheduled maintenance actions (UMAs) l UMAs costs $1000.
Basic Statistical Concepts Psych 231: Research Methods in Psychology.
The Simple Regression Model
Basic Statistical Concepts
Statistics Psych 231: Research Methods in Psychology.
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
1 Simple Linear Regression Linear regression model Prediction Limitation Correlation.
CHAPTER 3 Describing Relationships
Basic Statistical Concepts Part II Psych 231: Research Methods in Psychology.
BCOR 1020 Business Statistics Lecture 24 – April 17, 2008.
Correlation and Regression. Relationships between variables Example: Suppose that you notice that the more you study for an exam, the better your score.
Correlation and Regression Analysis
Introduction to Linear Regression.  You have seen how to find the equation of a line that connects two points.
June 2, 2008Stat Lecture 18 - Review1 Final review Statistics Lecture 18.
Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.
STA291 Statistical Methods Lecture 27. Inference for Regression.
Correlation and regression 1: Correlation Coefficient
STATISTICS: BASICS Aswath Damodaran 1. 2 The role of statistics Aswath Damodaran 2  When you are given lots of data, and especially when that data is.
Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Stat 13, Thur 5/24/ Scatterplot. 2. Correlation, r. 3. Residuals 4. Def. of least squares regression line. 5. Example. 6. Extrapolation. 7. Interpreting.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 2 – Slide 1 of 20 Chapter 4 Section 2 Least-Squares Regression.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Chapters 8 & 9 Linear Regression & Regression Wisdom.
Section 4.2: Least-Squares Regression Goal: Fit a straight line to a set of points as a way to describe the relationship between the X and Y variables.
June 11, 2008Stat Lecture 10 - Review1 Midterm review Chapters 1-5 Statistics Lecture 10.
Political Science 30: Political Inquiry. Linear Regression II: Making Sense of Regression Results Interpreting SPSS regression output Coefficients for.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
WARM-UP Do the work on the slip of paper (handout)
STA291 Statistical Methods Lecture LINEar Association o r measures “closeness” of data to the “best” line. What line is that? And best in what terms.
LECTURE 9 Tuesday, 24 FEBRUARY STA291 Fall Administrative 4.2 Measures of Variation (Empirical Rule) 4.4 Measures of Linear Relationship Suggested.
April 1 st, Bellringer-April 1 st, 2015 Video Link Worksheet Link
Creating a Residual Plot and Investigating the Correlation Coefficient.
Chapter 3-Examining Relationships Scatterplots and Correlation Least-squares Regression.
The correlation coefficient, r, tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition,
Chapter 2 Examining Relationships.  Response variable measures outcome of a study (dependent variable)  Explanatory variable explains or influences.
AP Statistics HW: p. 165 #42, 44, 45 Obj: to understand the meaning of r 2 and to use residual plots Do Now: On your calculator select: 2 ND ; 0; DIAGNOSTIC.
June 30, 2008Stat Lecture 16 - Regression1 Inference for relationships between variables Statistics Lecture 16.
Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many.
7.1 Draw Scatter Plots and Best Fitting Lines Pg. 255 Notetaking Guide Pg. 255 Notetaking Guide.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Simple Linear Regression SECTION 2.6 Least squares line Interpreting coefficients.
Section 1.6 Fitting Linear Functions to Data. Consider the set of points {(3,1), (4,3), (6,6), (8,12)} Plot these points on a graph –This is called a.
Unit 4 Lesson 3 (5.3) Summarizing Bivariate Data 5.3: LSRL.
The correlation coefficient, r, tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition,
Chapters 8 Linear Regression. Correlation and Regression Correlation = linear relationship between two variables. Summarize relationship with line. Called.
AP Statistics Review Day 1 Chapters 1-4. AP Exam Exploring Data accounts for 20%-30% of the material covered on the AP Exam. “Exploratory analysis of.
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
LSRL.
Least Squares Regression Line.
Political Science 30: Political Inquiry
Chapter 3.2 LSRL.
Unit 3 – Linear regression
Least-Squares Regression
Least-Squares Regression
y = mx + b Linear Regression line of best fit REMEMBER:
Warmup A study was done comparing the number of registered automatic weapons (in thousands) along with the murder rate (in murders per 100,000) for 8.
Section 3.2: Least Squares Regressions
Algebra Review The equation of a straight line y = mx + b
Warm-up: Pg 197 #79-80 Get ready for homework questions
A medical researcher wishes to determine how the dosage (in mg) of a drug affects the heart rate of the patient. Find the correlation coefficient & interpret.
7.1 Draw Scatter Plots and Best Fitting Lines
Presentation transcript:

June 3, 2008Stat Lecture 5 - Correlation1 Exploring Data Numerical Summaries of Relationships between Statistics Lecture 5

June 3, 2008Stat Lecture 5 - Correlation2 Administrative Notes Homework 2 due Monday, June 8 th Start the homework tonight –It’s long –There are things that will confuse you You will need JMP for the homework –Download it (link on my website) –Use a Wharton computer (link to the Wharton account information on website)

June 2, 2008Stat Lecture 5 - Correlation3 Outline Correlation Linear Regression Linear Prediction

June 2, 2008Stat Lecture 5 - Introduction4 Course Overview Collecting Data Exploring Data Probability Intro. Inference Comparing Variables Relationships between Variables MeansProportionsRegressionContingency Tables

June 2, 2008Stat Lecture 5 - Correlation5 Examples from Last Class 60 US Cities: Education vs. mortality Vietnam draft lottery: draft order vs. date New Orleans Baltimore San Jose York Lancaster Denver

June 2, 2008Stat Lecture 5 - Correlation6 How two variables vary together We have already looked at single-variable measures of spread: We now have pairs of two variables: We need a statistic that summarizes how they are spread out “together”

June 2, 2008Stat Lecture 5 - Correlation7 Correlation Correlation of two variables: We divide by standard deviation of both X and Y, so correlation has no units

June 2, 2008Stat Lecture 5 - Correlation8 Correlation Correlation is a measure of the strength of linear relationship between variables X and Y Correlation has a range between -1 and 1 r = 1 means the relationship between X and Y is exactly positive linear r = -1 means the relationship between X and Y is exactly negative linear r = 0 means that there is no linear relationship between X and Y r near 0 means there is a weak linear relationship between X and Y

June 2, 2008Stat Lecture 5 - Correlation9 Measure of Strength Correlation roughly measures how “tight” the linear relationship is between two variables

June 2, 2008Stat Lecture 5 - Correlation10 Examples Education and Mortality r = Draft Order and Birthday r = -0.22

June 2, 2008Stat Lecture 5 - Correlation11 Cautions about Correlation Correlation is only a good statistic to use if the relationship is roughly linear Correlation can not be used to measure non-linear relationships Always plot your data to make sure that the relationship is roughly linear!

June 2, 2008Stat Lecture 5 - Correlation12 Correlation with Non-linear Data A high correlation does not always mean a strong linear relationship A low correlation does not always mean no relationship (just no linear relationship)

June 2, 2008Stat Lecture 5 - Correlation13 Linear Regression If our X and Y variables do show a linear relationship, we can calculate a best fit line in addition to the correlation Y = a + b·X The values a and b together are called the regression coefficients a is called the intercept b is called the slope

June 2, 2008Stat Lecture 5 - Correlation14 Example: US Cities Mortality = a + b·Education How to determine our “best” line ? ie. best regression coefficients a and b ?

June 2, 2008Stat Lecture 5 - Correlation15 Finding the “Best” Line Want line that has smallest total “Y-residuals” Must square “Y-residuals” and add them up: Y-residuals

June 2, 2008Stat Lecture 5 - Correlation16 Best values for slope and intercept Line with smallest total residuals is called the “least-squares” line need calculus to find a and b that give “least-squares” line

June 2, 2008Stat Lecture 5 - Correlation17 Example: Education and Mortality Mortality = · Education Negative association means negative slope b

June 2, 2008Stat Lecture 5 - Correlation18 JMP Output Mortality = · Education Note that JMP gives correlation in terms of r 2 Need to take square root and give correct sign r = -0.51

June 2, 2008Stat Lecture 5 - Correlation19 Interpreting the regression line The slope coefficient b is the average change you get in the Y variable if you increased the X variable by one Eg. increasing education by one year means an average decrease of ≈ 38 deaths per 100,000 The intercept coefficient a is the average value of the Y variable when the X variable is equal to zero Eg. an uneducated city (Education = 0) would have an average mortality of 1353 deaths per 100,000

June 2, 2008Stat Lecture 5 - Correlation20 Example: Vietnam Draft Order Draft Order = · Birthday Slightly negative slope means later birthdays have a lower draft order

June 2, 2008Stat Lecture 5 - Correlation21 Example: Crack Cocaine Sentences Relationship seems to be non-linear

June 2, 2008Stat Lecture 5 - Correlation22 Sentence = · Quantity Extra gram of crack means extra 0.27 months (about 8 days) sentence on average Intercept is not always interpretable: having no crack gets you an 90 month sentence!

June 2, 2008Stat Lecture 5 - Correlation23 Prediction The regression equation can be used to predict our Y variable for a specific value of our X variable Eg: Sentence = · Quantity Person caught with 175 grams has a predicted sentence of · 175 =137.5 months Eg: Mortality = · Education City with only 6 median years of education has a predicted mortality of · 6 = 1127 deaths per 100,000

June 2, 2008Stat Lecture 5 - Correlation24 Problems with Prediction Quality of predictions depends on truth of assumption of a linear relationship Eg: crack sentences for large quantities

June 2, 2008Stat Lecture 5 - Correlation25 Problems with Prediction We shouldn’t extrapolate predictions beyond the range of data for our X variable Eg: Sentence = · Quantity Having no crack still has a predicted sentence of 90 months! Eg: Mortality = · Education A city with median education of 35 years is predicted to have no deaths at all!

Break! Take 5 June 3, 2008Stat Lecture 5 - Correlation26

How to Use JMP Scatterplots Correlation Regression –Hand calculating (using Distribution) –Have JMP calculate (using Fit Y by X) June 3, 2008Stat Lecture 5 - Correlation27

June 3, 2008Stat Lecture 5 - Correlation28 Next Class Probability –This is the hardest part of the course. Moore, McCabe and Craig: Section