Statistics in Data Mining on Finance by Jian Chen Linear Regression Two-variables Y=f(x)+e f(x)=polynomial Observations: (x1,y1) … (xn,yn) f(x)=a+b x The line of “best” fit: minimize the sum of the squared residuals. (It is convenient in that it permits statistical testing) Minimize the sum of the absolute residuals (least absolute value) 11/11/2018 Statistics in Data Mining on Finance by Jian Chen o:\tem\regression.ppt
Linear (Least Squres) Regression See it work: Another How to estimate the “goodness of fit”? Error: Total variation of squares Standard deviation of the residuals Se= (95 %) The sample correlation coefficient r2: R-squared of the regression equations r: 11/11/2018 Statistics in Data Mining on Finance by Jian Chen
Statistics in Data Mining on Finance by Jian Chen Linear Regression Ei: Error Y=a+bX Predicted value 11/11/2018 Statistics in Data Mining on Finance by Jian Chen o:\tem\regression.ppt
Statistics in Data Mining on Finance by Jian Chen Linear Regression Test of Regression Coefficients: b=0? (p-value) Multiple Regression: Model selection? Backward, forward Multiple correlation R2 (The Corrected R2) Se (95 %) 11/11/2018 Statistics in Data Mining on Finance by Jian Chen
Statistics in Data Mining on Finance by Jian Chen Linear Regression (2) Multicollinearity: Y, xi, axi+b Effects: Identification Correctness Other issues in linear model Use of Dummy variables Transformations: (logYi, logXi), Nonlinear, Probit, Logit etc. 11/11/2018 Statistics in Data Mining on Finance by Jian Chen
Statistics in Data Mining on Finance by Jian Chen Section 2 Heteroscedasticity pp145 Test ? Serial Correlation? => Time Series Regression Diagnostics: Outliers Forecasting At x=xp, what is the Y? What is the average of Y predicted? Point estimates, confidence intervals. 11/11/2018 Statistics in Data Mining on Finance by Jian Chen
Statistics in Data Mining on Finance by Jian Chen Linear Regression Regression tools SAS SPSS SPLUS 11/11/2018 Statistics in Data Mining on Finance by Jian Chen