Presentation is loading. Please wait.

Presentation is loading. Please wait.

DISCRIMINANT ANALYSIS. Discriminant Analysis  Discriminant analysis builds a predictive model for group membership. The model is composed of a discriminant.

Similar presentations


Presentation on theme: "DISCRIMINANT ANALYSIS. Discriminant Analysis  Discriminant analysis builds a predictive model for group membership. The model is composed of a discriminant."— Presentation transcript:

1 DISCRIMINANT ANALYSIS

2 Discriminant Analysis  Discriminant analysis builds a predictive model for group membership. The model is composed of a discriminant function (or, for more than two groups, a set of discriminant functions) based on linear combinations of the predictor variables that provide the best discrimination between the groups. The functions are generated from a sample of cases for which group membership is known; the functions can then be applied to new cases that have measurements for the predictor variables but have unknown group membership. Rahul Chandra

3 Discriminant Analysis  The grouping variable (dependent) can have more than two values. The codes for the grouping variable must be integers, however, and you need to specify their minimum and maximum values. Cases with values outside of these bounds are excluded from the analysis. Rahul Chandra

4 DA vs Linear Regression  linear regression is limited to cases where the dependent variable (Y) is an interval variable so that the combination of predictors (X) will, through the regression equation, produce estimated Y values for given values of weighted combinations of X values.  But many interesting variables are categorical, making a profit or not, holding a particular credit card, owning, renting or paying a mortgage for a house, employed/unemployed, satisfied versus dissatisfied employees, which customers are likely to buy a product or not buy, etc. Here DA is used. Rahul Chandra

5 DA is used when  The dependent variable is categorical with the predictor IV’s at interval level such as age, income, attitudes, perceptions, and years of education, although dummy variables can be used as predictors as in multiple regression. Logistic regression IV’s can be of any level of measurement.  There are more than two DV categories, unlike logistic regression, which is limited to a dichotomous dependent variable. Rahul Chandra

6 Discriminant analysis linear equation Rahul Chandra

7 Assumptions of discriminant analysis  The observations are a random sample.  Each predictor variable is normally distributed.  Each of the allocations for the dependent categories in the initial classification are correctly classified.  Within-group variance-covariance matrices should be equal across groups.  The groups or categories should be defined before collecting the data. Rahul Chandra

8 Assumptions of discriminant analysis  There must be at least two groups or categories, with each case belonging to only one group so that the groups are mutually exclusive and collectively exhaustive. For instance, three groups taking three available levels of amounts of housing loan etc.  Group sizes of the dependent should not be grossly different and should be at least five times the number of independent variables. Rahul Chandra

9 Statistical analysis in DA  The aim is to combine (weight) the variable scores in some way so that a single new composite variable, the discriminant score, is produced.  It is hoped that each group will have a normal distribution of discriminant scores.  The degree of overlap between the discriminant score distributions can then be used as a measure of the success of the technique. Rahul Chandra

10 Discriminant Distribution Rahul Chandra

11 Discriminant Distribution Rahul Chandra

12 Discriminant Distribution Rahul Chandra

13 Discriminant Analysis on SPSS  Given is an example of discriminant analysis for classifying employees of a company in two groups of smokers and Non-smokers based on certain independent variables.  Dependent variable is ‘smoke’ with two categories of smokers and Non-smokers.  The other variables to be used are age, days absent sick from work last year, self-concept score, anxiety score and attitudes to anti-smoking at work score.  The aim of the analysis is to determine whether these variables will discriminate between those who smoke and those who do not. Rahul Chandra

14 SPSS OUTPUTS Rahul Chandra

15 Group Statistics Rahul Chandra

16 Test of Equality of Group Means Strong statistical evidence of significant differences between means of smoke and no smoke groups for all IV’s, with self-concept and anxiety producing very high value F’s indicating they are strongest discriminator. Rahul Chandra

17 Within Group Inter correlation Matrices Good discriminator should have low inter correlations. The Pooled Within-Group Matrices also supports use of these IV’s as inter- correlations are low. Rahul Chandra

18 Log determinants and Box’s M tables  In DA the basic assumption is that the variance - Co- variance matrices are equivalent. Box’s M tests the null hypothesis that the covariance matrices do not differ between groups formed by the dependent. This test should not to be significant so that the null hypothesis that the groups do not differ can be retained. Rahul Chandra

19 Log determinants and Box’s M tables  For this assumption to hold, the log determinants should be equal.  When tested by Box’s M, we are looking for a non- significant M to show similarity and lack of significant differences. Rahul Chandra

20 Log Determinants In this case the log determinants appear similar. Where three or more groups exist, and M is significant, groups with very small log determinants should be deleted from the analysis. Rahul Chandra

21 BOX’s M Test Box’s M is 176.474 with F = 11.615 which is significant at p <.000. However, since the sample size is large, a significant result is not regarded as too important. Rahul Chandra

22 Eigen Values Function - This indicates the first or second canonical linear discriminant function. The number of functions is equal to the number of discriminating variables, if there are more groups than variables; or it is 1 less than the number of levels in the group variable. In this example dependent variable has only two levels and number of predictors are five, therefore only 1 (2-1) function is created. Each function acts as projections of the data onto a dimension that best separates or discriminates between the groups. Rahul Chandra

23 Eigenvalue  Eigenvalue are related to the canonical correlations and describe how much discriminating ability a function possesses. The magnitudes of the eigenvalues are indicative of the functions' discriminating abilities.  Canonical correlations is the correlation of our predictor variables and the discriminant functions. Rahul Chandra

24 Wilk’s Lamda Wilks’ lambda indicates the significance of the discriminant function. This table indicates a highly significant function (p <.000) and provides the proportion of total variability not explained. So we have 35.6% unexplained Rahul Chandra

25 More on Wilk’s Lamda  Wilk’s Lamda checks the Null hypothesis that the function, and all functions that follow, have no discriminating ability. This hypothesis is tested using this Chi-square statistic.  This has to be significant otherwise discriminant function is questionable as a discriminator. Rahul Chandra

26 Discriminant Function Coefficients These un-standardized coefficients (b) are used to create the discriminant function (equation). It operates just like a regression equation. Rahul Chandra

27 Discriminant Function Rahul Chandra

28 Group Centroids A further way of interpreting discriminant analysis results is to describe each group in terms of its profile, using the group means of the predictor variables. These group means are called centroids. Rahul Chandra

29 Classification Table Rahul Chandra


Download ppt "DISCRIMINANT ANALYSIS. Discriminant Analysis  Discriminant analysis builds a predictive model for group membership. The model is composed of a discriminant."

Similar presentations


Ads by Google