1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.

1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE

2 STATISTICAL DEPENDANCE CORRELATION – relationship between QUANTIVATIVE (measured) data CONTINGENCE – relationship between QUALITATIVE (descriptive) data

3 CORRELATION simple – for two variables, multiple – for more then two variables, parcial – describes relationship of two variables in multivariable data set (we exclude influence of all other variables)

4 CORRELATION positive negative

5 Correlation TOTAL VARIABILITY RESIDUAL VARIABLITY MODEL VARIABILITY

6 CORRELATION COEFF. OF DETERMINATION COEFF. OF CORRELATION

7 COEFF. OF DERETMINATION quantifies which part of total variability of the response is explained by model r 2 = 0.9 r 2 = 1 r 2 = 0.05

8 COEFF. OF CORRELATION simple correlation Pearson Spearman Spearman (rank correlation)

9 PEARSON COEFF. OF CORRELATION = standardised covariance BIVARIATE normal distribution

10 COVARIANCE measure of linear relationship always is non – negative product of standard deviations is its upper limit its magnitude is depend on units of arguments  standardisation is necessary COVARIANCE:

11 PEARSON COEFF. OF CORRELATION Basic properties: It is dimensionless measure of correlation; 0 – 1 for positive correlation, 0 – (-1) for negative correlation; 0 means that there is no linear relationship between variables (can be nonlinear!) or this relationship is not statistically significant on the basis of available data; 1 or (-1) indicates a functional (perfect) relationship; Value of correlaion coefficient is the same for dependence x 1 on x 2 and for reverse dependence x 2 on x 1.

12 SPEARMAN CORRELATION COEFFICIENT nonparametric correlation coeff. based on ranks difference between ranks of X and Y in one row

13 SPEARMAN CORRELATION COEFFICIENT influential points (extremes) Pearson R = -0,412 (influential points are fully counted) Spearman R = +0,541 (influential points are stronly limited)

14 CONFIDENCE INTERVAL R (CI) CI (  ) includes interval of possible values of population correlation coefficient  (with probability 1 -  ) Because distribution of corr. coeff. is not normal, we must use Fisher transformation with appox. normal distribution with mean E(Z) = Z(  ) and variance D(Z) = 1/(n-3).

15 CONFIDENCE INTERVAL R (CI) R Fisher transformation Z(R) lower and upper boundary of CI in Fisher tranformation retransformation Z(R) to correlation coeff. lower and upper boundary of CI of correlation coeff. half of CI of transformed value lower and upper boundary of CI in Fisher tranformation

16 CONFIDENCE INTERVAL R (CI) R = 0.95305 fisherz(0.95305) = 1.864 Fisher value CI Fisher value: 1.21 1.864 2.517 CI correlation coeff: =fisherz2r(1.2107) = 0.83689 =fisherz2r (2.5174) = 0.98707 0.837 0.953 0.987

17 REGRESSION ANALYSIS MEASURED VALUES MODEL VALUES independent (explanatory) variable dependent, explained, response var.

18 REGRESSION MODEL response explanatory variable(s) regression random variable parameters error y = X  + 

19 REGRESSION MODEL regression parameter b intercept a independent (explanatory) variable response

20 CONFIDENCE INTERVAL OF MODEL upper boundary of CI lower boundary of CI VALUE OF REGRESSION MODEL ( these values are only point estimates ) Area where all possible models computed from any sample (coming from the same population) are appear with probability 1 -  CI of one model value

21 CI OF Y VALUES – PREDICTION INTERVAL is an estimate of an interval in which future observations will fall, with a certain probability 1 - 

CONFIDENCE INTERVAL OF MODEL (CI), PREDICTION INTERVAL OF RESPONSE (PI) 22

23 COMPARISON OF REGRESSION MODELS Akaike information criterion (AIC) RSC rezidual sum of squares mnumber of parameters The AIC is smaller, the model is better (from the statistical point of view!!).

REGRESSION DIAGNOSTICS 24 Diagnostics of residuals: normality homoscedasticity (constant variance) independence

REGRESSION DIAGNOSTICS 25 Breusch–Pagan test (and many others…) Weighted OLS method

REGRESSION DIAGNOSTICS 26

REGRESSION DIAGNOSTICS 27 Influential points

REGRESSION DIAGNOSTICS 28 HAT VALUES (leverages) the hat matrix, H, relates the fitted values to the observed values. It describes the influence each observed value has on each fitted value. The diagonal elements of the hat matrix are the leverages, which describe the influence each observed value has on the fitted value for that same observation.

REGRESSION DIAGNOSTICS 29 Cook distance measures the effect of deleting a given observation. Data points with large residuals (outliers) and/or high leverage may distort the outcome and accuracy of a regression.

REGRESSION DIAGNOSTICS 30 DFFITS statistic is a scaled measure of the change in the predicted value for the ith observation and is calculated by deleting the ith observation. A large value indicates that the observation is very influential in its neighborhood of the X space. A general cutoff to consider is 2; a size-adjusted cutoff recommended is

REGRESSION DIAGNOSTICS 31 DFBETAS are the scaled measures of the change in each parameter estimate and are calculated by deleting the ith observation General cut off value is 2, size adjusted

1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.

Similar presentations

Presentation on theme: "1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.

Similar presentations

Presentation on theme: "1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE."— Presentation transcript:

Similar presentations

About project

Feedback