Razlaga korelacije vzročna povezanost X in Y ??? eksperiment X in Y imata skupne vzroke Kaj se zgodi s korelacijo, če kontroliramo (ohranjamo konstantno) vrednost tretje spremenljivke? korelacijski načrt; nadzorovanje vzrokov, ki jih ne moremo eksperimentalno spreminjati
PARCIALNA KORELACIJA
Uporaba ko povezanost med dvema spremenljivkama ni “čista”, ampak nanjo vpliva t.i. tretja spremenljivka ko pri iskanju odnosa med X in Y združimo več skupin, ki nimajo enakih aritmetičnih sredin na X (Z vpliva na X)
Primer Korelacija med telesno višino (V) in dolžino las (L) je negativna. Razlaga: ženske so v povprečju nižje od moških; ženske imajo v povprečju daljše lase; kakšna je “čista” povezava V - L, bomo ugotovili, če iz povezave med V in L izključimo vpliv povezanosti spol - V in spol - L = parcialna korelacija
Parcialna korelacija pri zveznih spremenljivkah
... varianca Y, ki jo samostojno pojasnjuje X z x ... varianca Y, ki jo samostojno pojasnjuje X ... varianca Y, ki jo samostojno pojasnjuje Z ... varianca Y, ki jo skupno pojasnjujeta X in Z ... varianca Y, ki je X in Z ne pojasnjujeta (nepojasnjena varianca Y) r2xy.z = / ( + ) + ... delež variance, ki po upoštevanju Z ostane nepojasnjen
MULTIPLA KORELACIJA
Koeficient multiple korelacije y R2y.xz = ( + + ) / var Y z x
Multipla korelacija Statistična pomembnost R: N ... število točk k ... število prediktorjev df1 = k, df2 = N - k- 1
MULTIPLA REGRESIJA
Multipla regresija več prediktorjev (“NV”), npr. X1 in X2, en kriterij (“OV”), npr. Y Y’ ... linearna kombinacija X-ov poiskati obtežitve b za prediktorje, tako da bomo lahko kar najbolje pojasnili varianco Y enačba MR za 2 prediktorja 3D prikaz - ravnina, ki najbolje prilega točkam Y
Multipla regresija y enostavna regresija: varpoj = r2XY dodajanje novih prediktorjev: upoštevati moramo tudi korelacije med prediktorji, da pri pojasnjevanju ne podvajamo z R2y.xz = ( + + ) / var Y x / var Y = r2zy(x) semi-parcialna korelacija (upor. pri multipli regresiji) / ( + ) = r2zy.x parcialna korelacija R2y.xz = r2xy + r2zy(x)
Multipla regresija surove vrednosti — obtežitve b standardizirane vrednosti — obtežitve koeficenti (standardizirani b) k ... število prediktorjev df1 = k, df2 = N - k- 1
Multipla regresija Pomembnost prirasta v R2 po dodajanju novega prediktorja: Rb...multipla korelacija po vključitvi b prediktorjev Ra...multipla korelacija po vključitvi a prediktorjev b > a k ... število prediktorjev df1 = kb-ka, df2 = N - kb- 1
Primer multiple regresije trije kazalci uspešnosti pri delu WORK_1, WORK_2, WORK_3 napovedovanje WORK_1 na podlagi poznavanja WORK_2 in WORK_3 Deskriptivna statistika:
1. Pregled podatkov - deskriptivna statistika:
3. Izračun regresijskih parametrov koeficient multiple korelacije The tolerance of a variable is defined as 1 minus the squared multiple correlation of this variable with all other independent variables in the regression equation. Therefore, the smaller the tolerance of a variable, the more redundant is its contribution to the regression (i.e., it is redundant with the contribution of other independent variables). If the tolerance of any of the variables in the regression equation is equal to zero (or very close to zero) then the regression equation cannot be evaluated (the matrix is said to be ill-conditioned, and it cannot be inverted. If the tolerance of a variable entered into the regression equation is less than the default tolerance value (.01) it means that this variable is 99 percent redundant with (identical to) the variables already in the equation. Forcing very redundant variables into the regression equation is not only questionable in terms of relevance of results, but the resultant estimates (regression coefficients) will become increasingly unreliable. Adjusted R-Square is an adjustment for the fact that when one has a large number of independents, it is possible that R2 will become artificially high simply because some independents' chance variations "explain" small parts of the variance of the dependent. At the extreme, when there are as many independents as cases in the sample, R2 will always be 1.0. The adjustment to the formula lowers R2 as p, the number of independents, increases. Put another way, adjusted R2 is a downward adjustment of R2 to adjust for one model having more degrees of freedom than another model. When used for the case of a few independents, R2 and adjusted R2 will be close. When there are many independents, adjusted R2 may be noticeably lower (and in rare cases may even be negative). The greater the number of independents, the more the researcher is expected to report the adjusted coefficient. Always use adjusted R2 when comparing models with different numbers of independents. However, Gujarati (2006: 229) also recommends, "Even when we are not comparing two regression models, it is a good practice to find the adjusted R2 value because it explicitly takes into account the number of variables included in the model." Adjusted R2 = 1 - ( (1-R2)(N-1 / N - k - 1) ), where n is sample size and k is the number of terms in the model not counting the constant (i.e., the number of independents). In SPSS, Analyze, Regression, Linear; click the Statistics button; make sure Estimates is checked to get adjusted R-Square. napoved WORK_1: WORK_1’ = 0.232 + 0.504 ·WORK_2 + 0.479 ·WORK_3 samostojno pojasni 6.2 % var samostojno pojasni 6.9 % var delita si pojasnjevanje 35.7 % var skupno pojasnita 53.6 % var
Pregled rezidualov odnos med napovedanimi vrednostmi in reziduali odnos med dejanskimi vrednostmi in reziduali