Download presentation
Presentation is loading. Please wait.
Published byPearl Briggs Modified over 9 years ago
1
Linear Discriminant Analysis (LDA)
2
Goal To classify observations into 2 or more groups based on k discriminant functions (Dependent variable Y is categorical with k classes.) Assumptions Multivariate Normal Distribution variables are distributed normally within the classes/groups. Similar Group Covariances Correlations between and the variances within each group should be similar.
3
Dependent Variable Must be categorical with 2 or more classes (groups). If there are only 2 classes, the discriminant analysis procedure will give the same result as the multiple regression procedure.
4
Independent Variables Continuous or categorical independent variables If categorical, they are converted into binary (dummy) variables as in multiple linear regression
5
Output Example: Assume 3 classes (y=1,2,3) of the dependent. Yx11x12x13x14f1f2f3Pred. Y 1202510128578581 1181614128068651 ….. 215 16177584702 2141617187088672 ….. 38991195861053 31088 96841003 …..
6
Binary Dependent - Regression If only 2 classes of dependent, can do multiple regression Sample data shown below: StatusAge (18-30)Age (50+)Income YX1X2X3 01030 01032 ….. 00050 00028 00075 ….. 101100 10190 10195
7
Regression Output SUMMARY OUTPUT Regression Statistics Multiple R0.833615561 R Square 0.694914903 Adjusted R Square0.649152139 Standard Error 0.301479577 Observations24 ANOVA dfSSMSFSignificance F Regression34.1405346321.38017821115.18516005 2.19698E-05 Residual201.8177987020.090889935 Total235.958333333 Coefficients Standard Errort StatP-valueLower 95%Upper 95% Intercept-0.337942024 0.22002876-1.5358993270.14023269-0.7969139730.121029925 X1-0.160950017 0.155728156-1.0335319010.313691534-0.4857932570.163893223 X20.426373823 0.1531400522.7842084210.0114497030.1069292730.745818373 Income0.013571735 0.0030783794.4087278590.000270650.0071503490.019993121
8
Classification StatusAge (18-30)Age (50+)Income YX1X2X3Predicted YClass 01030-0.09170 01032-0.06460 010400.04400 010380.01680 010550.24760 010560.26110 000450.27280 000400.20490 000650.54421 000500.34060 000280.04210 100750.67991 100500.34060 110800.58681 1001001.01921 100900.88351 100950.95141 101751.10631 101500.76701 101851.24201 101400.63131 101881.28271 100780.72071 101650.97061 Classification Rule in this case: If Pred. Y > 0.5 then Class = 1; else Class = 0. This model yielded 2 misclassifications out of 24. How good is R-square?
9
Crosstab of Pred. Y and Y For large datasets, one can format the Predicted Y variable and create a crosstab with Y to see how accurately the model classifies the data (fictitious results shown here). The Good and Bad columns represent the number of actual Y values. Predicted Y *1000GoodBad 900to100041050 850to90039070 800to85037090 750to800350110 700to750330130 650to700310150 600to650290170 550to600270190 500to550250210 450to500230 400to450210250 350to400190270 300to350170290 250to300150310 200to250130330 150to200110350 100to15090370 50to10070390 0to50 410 4370
10
Kolmogorov-Smirnov Test Use the crosstabs shown in last slide to conduct the KS Test to determine 1. Cutoff score, 2. Classification accuracy, and 3. Forecasts of model performance.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.