Download presentation
Presentation is loading. Please wait.
Published byRoss Merritt Modified over 9 years ago
1
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE
2
2 STATISTICAL DEPENDANCE CORRELATION – relationship between QUANTIVATIVE (measured) data CONTINGENCE – relationship between QUALITATIVE (descriptive) data
3
3 CORRELATION simple – for two variables, multiple – for more then two variables, parcial – describes relationship of two variables in multivariable data set (we exclude influence of all other variables)
4
4 CORRELATION positive negative
5
5 Correlation TOTAL VARIABILITY RESIDUAL VARIABLITY MODEL VARIABILITY
6
6 CORRELATION COEFF. OF DETERMINATION COEFF. OF CORRELATION
7
7 COEFF. OF DERETMINATION quantifies which part of total variability of the response is explained by model r 2 = 0.9 r 2 = 1 r 2 = 0.05
8
8 COEFF. OF CORRELATION simple correlation Pearson Spearman Spearman (rank correlation)
9
9 PEARSON COEFF. OF CORRELATION = standardised covariance BIVARIATE normal distribution
10
10 COVARIANCE measure of linear relationship always is non – negative product of standard deviations is its upper limit its magnitude is depend on units of arguments standardisation is necessary COVARIANCE:
11
11 PEARSON COEFF. OF CORRELATION Basic properties: It is dimensionless measure of correlation; 0 – 1 for positive correlation, 0 – (-1) for negative correlation; 0 means that there is no linear relationship between variables (can be nonlinear!) or this relationship is not statistically significant on the basis of available data; 1 or (-1) indicates a functional (perfect) relationship; Value of correlaion coefficient is the same for dependence x 1 on x 2 and for reverse dependence x 2 on x 1.
12
12 SPEARMAN CORRELATION COEFFICIENT nonparametric correlation coeff. based on ranks difference between ranks of X and Y in one row
13
13 SPEARMAN CORRELATION COEFFICIENT influential points (extremes) Pearson R = -0,412 (influential points are fully counted) Spearman R = +0,541 (influential points are stronly limited)
14
14 CONFIDENCE INTERVAL R (CI) CI ( ) includes interval of possible values of population correlation coefficient (with probability 1 - ) Because distribution of corr. coeff. is not normal, we must use Fisher transformation with appox. normal distribution with mean E(Z) = Z( ) and variance D(Z) = 1/(n-3).
15
15 CONFIDENCE INTERVAL R (CI) R Fisher transformation Z(R) lower and upper boundary of CI in Fisher tranformation retransformation Z(R) to correlation coeff. lower and upper boundary of CI of correlation coeff. half of CI of transformed value lower and upper boundary of CI in Fisher tranformation
16
16 CONFIDENCE INTERVAL R (CI) R = 0.95305 fisherz(0.95305) = 1.864 Fisher value CI Fisher value: 1.21 1.864 2.517 CI correlation coeff: =fisherz2r(1.2107) = 0.83689 =fisherz2r (2.5174) = 0.98707 0.837 0.953 0.987
17
17 REGRESSION ANALYSIS MEASURED VALUES MODEL VALUES independent (explanatory) variable dependent, explained, response var.
18
18 REGRESSION MODEL response explanatory variable(s) regression random variable parameters error y = X +
19
19 REGRESSION MODEL regression parameter b intercept a independent (explanatory) variable response
20
20 CONFIDENCE INTERVAL OF MODEL upper boundary of CI lower boundary of CI VALUE OF REGRESSION MODEL ( these values are only point estimates ) Area where all possible models computed from any sample (coming from the same population) are appear with probability 1 - CI of one model value
21
21 CI OF Y VALUES – PREDICTION INTERVAL is an estimate of an interval in which future observations will fall, with a certain probability 1 -
22
CONFIDENCE INTERVAL OF MODEL (CI), PREDICTION INTERVAL OF RESPONSE (PI) 22
23
23 COMPARISON OF REGRESSION MODELS Akaike information criterion (AIC) RSC rezidual sum of squares mnumber of parameters The AIC is smaller, the model is better (from the statistical point of view!!).
24
REGRESSION DIAGNOSTICS 24 Diagnostics of residuals: normality homoscedasticity (constant variance) independence
25
REGRESSION DIAGNOSTICS 25 Breusch–Pagan test (and many others…) Weighted OLS method
26
REGRESSION DIAGNOSTICS 26
27
REGRESSION DIAGNOSTICS 27 Influential points
28
REGRESSION DIAGNOSTICS 28 HAT VALUES (leverages) the hat matrix, H, relates the fitted values to the observed values. It describes the influence each observed value has on each fitted value. The diagonal elements of the hat matrix are the leverages, which describe the influence each observed value has on the fitted value for that same observation.
29
REGRESSION DIAGNOSTICS 29 Cook distance measures the effect of deleting a given observation. Data points with large residuals (outliers) and/or high leverage may distort the outcome and accuracy of a regression.
30
REGRESSION DIAGNOSTICS 30 DFFITS statistic is a scaled measure of the change in the predicted value for the ith observation and is calculated by deleting the ith observation. A large value indicates that the observation is very influential in its neighborhood of the X space. A general cutoff to consider is 2; a size-adjusted cutoff recommended is
31
REGRESSION DIAGNOSTICS 31 DFBETAS are the scaled measures of the change in each parameter estimate and are calculated by deleting the ith observation General cut off value is 2, size adjusted
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.