Presentation is loading. Please wait.

Presentation is loading. Please wait.

Relationship between two variables Two quantitative variables: correlation and regression methods Two qualitative variables: contingency table methods.

Similar presentations


Presentation on theme: "Relationship between two variables Two quantitative variables: correlation and regression methods Two qualitative variables: contingency table methods."— Presentation transcript:

1 Relationship between two variables Two quantitative variables: correlation and regression methods Two qualitative variables: contingency table methods One quantitative, one qualitative : two-sample methods already considered… We’ll begin with two quantitative variables, continuous measurement variables usually, X and Y. There are usually two situations giving rise to X and Y: 1.bivariate sampling: assume we select pairs at random from a bivariate population 2.fixed-X sampling: an experiment is performed where the X’s are fixed in advance and we observe the values of Y that correspond to those X’s. In either of these cases we may use the correlation coefficient and regression to look for the association between X and Y. Think of Y as the response and X the explanatory variable (though in the first case above, we may not have an explanatory/response situation…)

2 we define the population correlation coefficient as the value of  ranges from -1 to +1 and measures the strength of the linear relationship between X and Y. -1 corresponds to perfect negative correlation, +1 to perfect positive correlation. If rho = 0 then there is no linear association between X and Y and we say they are uncorrelated variables.. The usual parametric statistic to test the hypothesis that rho=0 is the Pearson product- moment correlation coefficient, r, given by the formula below: The t-statistic (n-2 df) below can be used to test the null hypothesis that rho=0:

3 Of course, the parametric assumptions are that the (X,Y)-pairs are from a bivariate normal distribution… We may also consider the slope of the so-called regression line relating Y to X: We estimate the unknown slope and intercept via least-squares (the important formulas are given on pages 146-147) and we use the statistic given in the middle of p.147 to test the null hypothesis that    As with our statistic to test 0 correlation, this statistic has n-2 df and assumes the errors are normally distributed. Let’s use SAS and R to show how they implement the correlation and regression tests… Go over Example 5.1.1 on page 149… we’ll do the permutation test next, but first, try SAS and R: –in SAS, the procedures of interest here are PROC PLOT (always look at your data), PROC CORR (to get correlations and to test rho=0), and PROC REG (to get estimates for the slope and intercept and to test hypotheses about them). PROC REG also will give plots… –in R use the lm function (fit linear models) and look at the various components

4 Let’s try to reproduce the empirical distribution of the correlation coefficient, r, as given in Figure 5.1.3 on page 153 – use R (see R#9, page 2). Note the correspondence between the transformed r (Z=r(sqrt(n-1))) percentiles and the standard normal percentiles (see Table 5.1.3 on page 152). HW: Read Chapter 5, through page 153. Do problems #1, 3 and 4 on page 189 Then for next time, we’ll begin our discussion of the Spearman’s rank correlation coefficient…


Download ppt "Relationship between two variables Two quantitative variables: correlation and regression methods Two qualitative variables: contingency table methods."

Similar presentations


Ads by Google