Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scatterplot and trendline. Scatterplot Scatterplot explores the relationship between two quantitative variables. Example:

Similar presentations


Presentation on theme: "Scatterplot and trendline. Scatterplot Scatterplot explores the relationship between two quantitative variables. Example:"— Presentation transcript:

1 Scatterplot and trendline

2 Scatterplot Scatterplot explores the relationship between two quantitative variables. Example:

3 What can we tell from scatterplot Direction of relationship (positive, negative, no correlation) Strength of relationship ( strong >0.8, weak <0.5) Form of relationship (linear, quadratic, cubic, etc)

4 Some examples i r=0.5 Weak Points are scattered around Positive (upward trend) Hard to tell the form Roughly Linear?

5 Some examples ii r=0.8 Strong Points are compact Positive Clear linear pattern

6 Some examples iii r=0.2 Very weak, almost no pattern Points all over the plot Very hard to tell whether it is positive or negative

7 Some examples iii r=0 No pattern Points fall everywhere in the plot Can not tell whether there is upward or downward trend

8 Some examples iv r= - 0.8 Strong relationship Negative relationship (downward trend). Linear pattern

9 Some examples v r= - 0.2 Not very different from plot iii

10 What is r? r is called correlation coefficient There are many different ways of calculating r. The one that we use most frequently is called Pearson product moments correlation coefficient (or simply Pearson correlation coefficient)

11 How to calculate r? Formula to be introduced later.

12 Other facts about r Ranges from –1 to +1 Sign shows direction of the correlation Absolute value shows the strength of the correlation *** Only measures linear correlation

13 Example Y=x^2 r is almost 0 r= -0.016 *** But there is a clear quadratic correlation between x and y for sure!!!

14 How to use correlation Make predictions  Given a value of x and the correlation between x and y, we can predict the value of y.  This is an example of model fitting in statistics

15 Another classification of variables In terms of the role of the variables in the model, they are put into two classes:  Independent, explanatory, predictor, x-value  Dependent, response, y-value

16 What a statistical model does Gives us a measure of the relationship between two (or more) variables. Gives us a measure of how good the model performs, since we always have many model choices. Enables us to make prediction using the relationship identified in the model

17 Graphical Illustration of the model Trendline r=0.8 Positive Strong Linear

18 Regression Regression is one way of fitting a statistic model. For the above data, we have Y=b0+b1x+error b0 is called the intercept b1 is called the regression coefficient/slope Error is a “must have” part in any statistic model

19 Numeric Example Data X: 10 15 20 25 30 35 40 45 50 Y: 41 41 42 38 53 56 59 59 71 r=0.9194795

20 Results of a regression i Intercept = 28.5111 Slope = 0.7533 The line in the middle is called the trendline or regression line The distance between individual points and the line is called “residual”

21 Results of a regression ii X: 10 15 20 25 30 35 40 45 50 Y: 41 41 42 38 53 56 59 59 71 Y.hat: 36.04 39.81 43.58 47.34 51.11 54.88 58.64 62.41 66.18 Resid: 4.96 1.19 -1.58 -9.34 1.89 1.12 0.36 -3.41 4.82 Y.hat is the predicted value of Y given X and the regression model we got Residuals=Y-Y.hat and that is the error in our model

22 How do we get the regression model We find the set of intercept and slope that satisfies the following conditions  The sum of all residuals should be 0  The sum of the squared residuals is minimized

23 How to measure how good this model is? One measure is called r-square For this model, it is r^2=0.8454425 It means among all the variation observed in the variable Y, about 84.5% is explained by the predictor X. The rest is the error.

24 How is r-square related to our measure of correlation Hint, it is called… r-squared

25 Yes, it is the squared value of the correlation between X and Y. 0.9194795^2=0.8454425

26 Some things to know This relationship only works regression with one predictor. The trendline or the regression model only works for X values within the range of our data, or not too far from it. In this case, our X values range from 10 to 50. So we can predict Y using X=26 but not X=126. Correlation does not imply causality.  Example: Children’s shoe size vs reading ability


Download ppt "Scatterplot and trendline. Scatterplot Scatterplot explores the relationship between two quantitative variables. Example:"

Similar presentations


Ads by Google