Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 4 Chapter 3. Bivariate Associations. Objectives (PSLS Chapter 3) Relationships: Scatterplots and correlation  Bivariate data  Scatterplots (2.

Similar presentations


Presentation on theme: "Lecture 4 Chapter 3. Bivariate Associations. Objectives (PSLS Chapter 3) Relationships: Scatterplots and correlation  Bivariate data  Scatterplots (2."— Presentation transcript:

1 Lecture 4 Chapter 3. Bivariate Associations

2 Objectives (PSLS Chapter 3) Relationships: Scatterplots and correlation  Bivariate data  Scatterplots (2 way graphs) {D Level Award}  Interpreting scatterplots (2 way graphs) {D Level Award}  Adding categorical variables to scatterplots (3 way graphs){B Level }  The correlation coefficient r (correlation award) {D Level Award}  Facts about correlation (correlation award) {D Level Award}

3 Student ID Number of Beers Blood Alcohol Content 150.1 220.03 390.19 670.095 730.07 930.02 1140.07 1350.085 480.12 530.04 850.06 1050.05 1260.1 1470.09 1510.01 1640.05 Here we have two quantitative variables recorded for each of 16 students: 1. how many beers they drank 2. their resulting blood alcohol content (BAC) Bivariate data  For each individual studied, we record data on two variables.  We then examine whether there is a relationship between these two variables: Do changes in one variable tend to be associated with specific changes in the other variables?

4 StudentBeersBAC 150.1 220.03 390.19 670.095 730.07 930.02 1140.07 1350.085 480.12 530.04 850.06 1050.05 1260.1 1470.09 1510.01 1640.05 Scatterplots A scatterplot is used to display quantitative bivariate data. Each variable makes up one axis. Each individual is a point on the graph.

5 Explanatory number of beers Response BAC x y Explanatory and response variables A response (dependent) variable measures an outcome of a study. An explanatory (independent) variable may explain or influence changes in a response variable. When there is an obvious explanatory variable, it is plotted on the x (horizontal) axis of the scatterplot.

6 How to scale a scatterplot Same data in all four plots Baldi’s recommendation: Both variables should be given a similar amount of space:  Plot is roughly square  Points should occupy all of the domain and range.

7 Interpreting scatterplots  After plotting two variables on a scatterplot, we describe the overall pattern of the relationship. Specifically, we look for …  Form: linear, curved, clusters, no pattern  Direction: positive, negative, no direction  Strength: how closely the points fit the “form”? What is the slope?  … and clear deviations from that pattern  Outliers of the relationship

8 Form Linear Curvilinear/Nonlinear No relationship

9 Positive association: High values of one variable tend to occur together with high values of the other variable. Negative association: High values of one variable tend to occur together with low values of the other variable. Direction

10 Strength The strength of the relationship between the two variables can be seen by how much variation, or scatter, there is around the main form, and the slope of the distribution.

11 Outliers An outlier in this context is a data value that has a very low probability of occurrence (i.e., it is atypical or unexpected). In a scatterplot, outliers are points that fall outside of the overall pattern of the relationship.

12 Describes these scatterplots.

13 Adding categorical variables to scatterplots Two or more relationships can be compared on a single scatterplot when we use different symbols for groups of points on the graph. The graph compares the association between thorax length and longevity of male fruit flies that are allowed to reproduce (green) or not (purple). The pattern is similar in both groups (linear, positive association), but male fruit flies not allowed to reproduce tend to live longer than reproducing male fruit flies of the same size.

14 The correlation coefficient is a measure of the direction and strength of a relationship. It is calculated using the mean and the standard deviation of both the x and y variables. The correlation coefficient: r Time to swim: x = 35, s x = 0.7 Pulse rate: y = 140 s y = 9.5

15 r doesn’t distinguish explanatory and response variables r treats x and y symmetrically “Time to swim” is the explanatory variable here and belongs on the x axis. However, in either plot r is the same (r = −0.75). r = -0.75

16 r has no unit r = -0.75 standardized value of x (unitless) standardized value of y (unitless)

17 r ranges from − 1 to +1 Strength is indicated by the absolute value of r Direction is indicated by the sign of r (+ or –)

18 Correlations are calculated using means and standard deviations, and thus are NOT resistant to outliers. r is not resistant to outliers Just moving one point away from the linear pattern here weakens the correlation from −0.91 to −0.75 (closer to zero).

19 Objectives (PSLS Chapter 3) Relationships: Scatterplots and correlation  Bivariate data  Scatterplots (2 way graphs) {D Level Award}  Interpreting scatterplots (2 way graphs) {D Level Award}  Adding categorical variables to scatterplots (3 way graphs){B Level }  The correlation coefficient r (correlation award) {D Level Award}  Facts about correlation (correlation award) {D Level Award}


Download ppt "Lecture 4 Chapter 3. Bivariate Associations. Objectives (PSLS Chapter 3) Relationships: Scatterplots and correlation  Bivariate data  Scatterplots (2."

Similar presentations


Ads by Google