Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

Similar presentations


Presentation on theme: "1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the."— Presentation transcript:

1 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the variability in the response.

2 2 Association  There is an association between two variables if values of one variable are more likely to occur with certain values of a second variable.

3 3 Picturing Association  Two Categorical (Qualitative). –Cross-tabs table, mosaic plot.  Two Numerical (Quantitative). –Scatter diagram.

4 4 Categorical Data  Who? –Students in a statistics class at Penn State University.  What? –“With whom is it easiest to make friends?” Opposite sex, same sex, no difference. –Gender. Male, female.

5 5 Cross-tabs Table Same Sex Opposite Sex No DiffTotal Female 165863137 Male 13154068 Total 2973103205 With whom is it easiest to make friends?

6 6 Bar Graph With whom is it easiest to make friends?

7 7 Percentages Count Row % Same Sex Opposite Sex No DiffTotal Female16 11.7% 58 42.3% 63 46.0% 137 100% Male13 19.1% 15 22.1% 40 58.8% 68 100% Total2973103205 With whom is it easiest to make friends?

8 8 Mosaic Plot

9 9 Interpretation  More than 50% of males say no difference while less than 50% of females say no difference.  Females are about twice as likely as males to say opposite.  Males are about twice as likely as females to say the same.

10 10 Scatter Plot  Statistics is about … variation.  Recognize, quantify and try to explain variation.  Variation in two quantitative variables is displayed in a scatter plot.

11 11 Scatter Plot  Numerical variable on the vertical axis, y, is the response variable.  Numerical variable on the horizontal axis, x, is the explanatory variable.

12 12 Scatter Plot  Example: Body mass (kg) and Bite force (N) for Canidae. –y, Response: Bite force (N) –x, Explanatory: Body mass (kg) –Cases: 28 species of Canidae.

13 13

14 14 Positive Association  Positive Association –Above average values of Bite force are associated with above average values of Body mass. –Below average values of Bite force are associated with below average values of Body mass.

15 15 Scatter Plot  Example: Outside temperature and amount of natural gas used. –Response: Natural gas used (1000 ft 3 ). –Explanatory: Outside temperature ( o C). –Cases: 26 days.

16 16

17 17 Negative Association –Above average values of gas are associated with below average temperatures. –Below average values of gas are associated with above average temperatures.

18 18 Association  Positive –As x goes up, y tends to go up.  Negative –As x goes up, y tends to go down.

19 19 Correlation  Linear Association –How closely do the points on the scatter plot represent a straight line? –The correlation coefficient gives the direction of and quantifies the strength of the linear association between two quantitative variables.

20 20 Correlation  Standardize y  Standardize x

21 21

22 22 Correlation Coefficient

23 23 Correlation Coefficient  Body mass and Bite force  r = 0.9807

24 24 Correlation Coefficient  There is a very strong positive correlation, linear association, between the body mass and bite force for the various species of Canidae.

25 25 JMP  Analyze – Multivariate methods – Multivariate  Y, Columns – Body mass – BF ca (Bite force at the canine)

26 26

27 27 Correlation Properties  The sign of r indicates the direction of the association.  The value of r is always between –1 and +1.  Correlation has no units.  Correlation is not affected by changes of center or scale.

28 28 Algebra Review  The equation of a straight line  y = mx + b – m is the slope – the change in y over the change in x – or rise over run. – b is the y-intercept – the value where the line cuts the y axis.

29 29

30 30 Review  y = 3x + 2 –x = 0 y = 2 (y-intercept) –x = 3 y = 11 –Change in y (+9) divided by the change in x (+3) gives the slope, 3.

31 31 Linear Regression  Example: Body mass (kg) and Bite force (N) for Canidae. –y, Response: Bite force (N) –x, Explanatory: Body mass (kg) –Cases: 28 species of Canidae.

32 32 Correlation Coefficient  Body mass and Bite force  r = 0.9807

33 33 Correlation Coefficient  There is a strong correlation, linear association, between the body mass and bite force for the various species of Canidae.

34 34 Linear Model  The linear model is the equation of a straight line through the data.  A point on the straight line through the data gives a predicted value of y, denoted.

35 35 Residual  The difference between the observed value of y and the predicted value of y,, is called the residual.  Residual =

36 36

37 37 Line of “Best Fit”  There are lots of straight lines that go through the data.  The line of “best fit” is the line for which the sum of squared residuals is the smallest – the least squares line.

38 38 Line of “Best Fit”  Some positive and some negative residuals but they sum to zero.  Passes through the point.

39 39 Line of “Best Fit” Least squares slope: intercept:

40 40 Body mass, xBite Force, y Least Squares Estimates

41 41 Least Squares Estimates

42 42 Interpretation  Slope – for a 1 kg increase in body mass, the bite force increases, on average, 13.428 N.  Intercept – there is not a reasonable interpretation of the intercept in this context because one wouldn’t see a Canidae with a body mass of 0 kg.

43 43

44 44 Prediction  Least squares line

45 45 Residual  Body mass, x = 25 kg  Bite force, y = 351.5 N  Predicted, = 366.1 N  Residual, = 351.5 – 366.1 = – 14.6 N

46 46 Residuals  Residuals help us see if the linear model makes sense.  Plot residuals versus the explanatory variable. –If the plot is a random scatter of points, then the linear model is the best we can do.

47 47

48 48 Interpretation of the Plot  The residuals are scattered randomly. This indicates that the linear model is an appropriate model for the relationship between body mass and bite force for Canidae.

49 49 (r) 2 or R 2  The square of the correlation coefficient gives the amount of variation in y, that is accounted for or explained by the linear relationship with x.

50 50 Body mass and Bite force  r = 0.9807  (r) 2 = (0.9807) 2 = 0.962 or 96.2%  96.2% of the variation in bite force can be explained by the linear relationship with body mass.

51 51 Regression Conditions  Quantitative variables – both variables should be quantitative.  Linear model – does the scatter diagram show a reasonably straight line?  Outliers – watch out for outliers as they can be very influential.

52 52 Regression Cautions  Beware of extraordinary points.  Don’t extrapolate beyond the data.  Don’t infer x causes y just because there is a good linear model relating the two variables.

53 53 Extraordinary Points

54 54 Don’t Extrapolate  Explanatory (x) – Average outdoor temperature ( o C).  Response (y) – Amount of natural gas used (1000 cu ft).

55 55 Don’t Extrapolate

56 56 Don’t Extrapolate  Explanatory (x = 20) – Average outdoor temperature ( o C).  Response (y) – Amount of natural gas used (1000 cu ft).

57 57 Correlation Causation  Don’t confuse correlation with causation. –There is a strong positive correlation between the number of crimes committed in communities and the number of 2 nd graders in those communities.  Beware of lurking variables.


Download ppt "1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the."

Similar presentations


Ads by Google