Presentation is loading. Please wait.

Presentation is loading. Please wait.

BIVARIATE/MULTIVARIATE DESCRIPTIVE STATISTICS Displaying and analyzing the relationship between continuous variables.

Similar presentations


Presentation on theme: "BIVARIATE/MULTIVARIATE DESCRIPTIVE STATISTICS Displaying and analyzing the relationship between continuous variables."— Presentation transcript:

1 BIVARIATE/MULTIVARIATE DESCRIPTIVE STATISTICS Displaying and analyzing the relationship between continuous variables

2 Correlation and Regression Correlation: measure of the strength of an association (relationship) between continuous variables Regression: predicting the value of a continuous dependent variable (y) based on the value of a continuous independent variable (x)

3 Correlation statistic - r Values of r Range from –1 to +1 -1 is a perfect negative association (correlation), meaning that as the scores of one variable increase, the scores of the other variable decrease at exactly the same rate +1 is a perfect positive association, meaning that both variables go up or down together, in lock-step Intermediate values of r (close to zero) indicate weak or no relationship Zero r (never in real life) means no relationship – that the variables do not change or “vary” together except by chance.

4 1 2 3 4 5 1 2 3 4 5 6 X Y 1 2 3 4 5 1 2 3 4 5 6 X Y r = +1r = - 1 Can changes in one variable be predicted by changes in the other? Two “scattergrams” – each with a “cloud” of dots

5 1 2 3 4 5 1 2 3 4 5 6 X Y r = 0 Can changes in one variable be predicted by changes in the other?

6 “Line of best fit” To arrive at a value of “r” a straight line is placed through the cloud of dots (the actual “observed” data) Linear relationship between the variables is assumed This line is placed so that the overall distance between itself and the dots is minimized 1 2 3 4 5 1 2 3 4 5 6 X Y 2

7 “Line of best fit” To place this line in the cloud of dots it is necessary to compute a and b for each observed (known) value of x. –a = where the line crosses the y axis –b = “slope”, or no. of units that the value of y changes when x changes one unit When x is the “independent variable”:  (x -  x)(y -  y) b = ------------------  (x -  x) 2 a =  y - b  x

8 y = a + bx a = where the line crosses the y axis b = “slope”, or no. of units that y changes when x changes one unit 1 2 3 4 5 1 2 3 4 5 6 X Y a

9 How closely will a straight line fit the “observed” (actual) data? 1 2 3 4 5 1 2 3 4 5 6 X Y A perfect fit yields an r of +1 or -1 1 2 3 4 5 1 2 3 4 5 6 X Y +1.0 - 1.0 4

10 An intermediate fit yields an intermediate value of r r = +.65 1 2 3 4 5 1 2 3 4 5 6 X Y 2

11 1 2 3 4 5 1 2 3 4 5 6 X Y r = -.19 A poor fit yields a low value of r

12 The line of best fit predicts a value for one variable given the value of the other variable There will be a difference between these estimated values and the actual, known (“observed”) values. This difference is called a “residual” or an “error of the estimate.” As the error between the known and predicted values decrease – as the dots cluster more tightly around the line – the absolute value of r (ignoring the + or – sign) increases 1 2 3 4 5 1 2 3 4 5 6 X Y if x =.5, y=2.3 if y =5, x=3.4

13 1 2 3 4 5 6 X Y Measurement scales can be changed from continuous to categorical 2

14 1 2 3 4 5 6 X Y To evaluate a relationship between categorical variables, count the cell frequencies, then compare changes in the distribution of the dependent variable as the value of the independent variable changes 2 0 0 2 4

15 Coefficient of determination (r 2 ) Proportion of the change in the dependent variable that is accounted for by changes in the independent variable Multiple correlation (R 2 ) is proportion of the change of the dependent variable that is accounted for by the combined effects of multiple independent variables –Relative contribution of each independent variable can be estimated

16 Other correlation/regression techniques “Partial correlation” –Using a control or “test” variable to assess its potential influence on a bivariate (two-variable) relationship –All variables must be continuous “Spearman’s r” is used to assess the correlation between two ordinal variables “Logit” and “logistic” regression are used when one desires to use regression techniques and the independent and/or dependent variables are categorical (can only have two possible values) –Create “dummy” variables that range from 0 – 1

17 Correlation matrix Load Height weight gender age.sav or.xls Choose Analyze|Correlate|Bivariate Load all variables

18 Scattergram Usually display only two variables at a time Graphs|Scatter/dot|Simple Convention is to place independent variable on “X” axis Optional: add fit line

19 Controlling for third variable Analyze|Correlate|Partial Place “Age” in “Controlling for” Did the original, “zero-order” relationship between height and weight change? –Any reduction suggests that the independent variables are “intercorrelated” –When we measure height or weight, the effect of the other variable winds up being included

20 Recoding from continuous to categorical (Height weight gender age REC.sav and.xls) Recode Height –61 to 68 inches: Short –69 inches +: Tall Recode Weight –100-145 pounds: Light –146 pounds +: Heavy Run crosstabs –Does there appear to be a strong association when these variables are categorical?


Download ppt "BIVARIATE/MULTIVARIATE DESCRIPTIVE STATISTICS Displaying and analyzing the relationship between continuous variables."

Similar presentations


Ads by Google