Download presentation
Presentation is loading. Please wait.
Published byVivien Cooper Modified over 9 years ago
1
CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon http://www.pcord.comhttp://www.pcord.com Tables, Figures, and Equations
2
Purposes: 1. Summarizing the differences between groups (often used as a follow-up to clustering, to help describe the groups); "descriptive discriminant analysis." With community data, you could use indicator species analysis as a nonparametric alternative.
3
Purposes: 1. Summarizing the differences between groups (often used as a follow-up to clustering, to help describe the groups); "descriptive discriminant analysis." With community data, you could use indicator species analysis as a nonparametric alternative. 2. Multivariate testing of whether or not two or more groups differ significantly from each other. For ecological community data this is better done with MRPP, thus avoiding the assumptions listed below.
4
Purposes: 1. Summarizing the differences between groups (often used as a follow-up to clustering, to help describe the groups); "descriptive discriminant analysis." With community data, you could use indicator species analysis as a nonparametric alternative. 2. Multivariate testing of whether or not two or more groups differ significantly from each other. For ecological community data this is better done with MRPP, thus avoiding the assumptions listed below. 3. Determining the dimensionality of group differences.
5
Purposes: 1. Summarizing the differences between groups (often used as a follow-up to clustering, to help describe the groups); "descriptive discriminant analysis." With community data, you could use indicator species analysis as a nonparametric alternative. 2. Multivariate testing of whether or not two or more groups differ significantly from each other. For ecological community data this is better done with MRPP, thus avoiding the assumptions listed below. 3. Determining the dimensionality of group differences. 4. Checking for misclassified items.
6
Purposes (cont.): 5. Predicting group membership or classifying new cases ("predictive discriminant analysis").
7
Purposes (cont.): 5. Predicting group membership or classifying new cases ("predictive discriminant analysis"). 6. Comparing occupied vs. unoccupied habitat to determine the habitat characteristics that allow or prevent a species' existence. DA has been widely used for this purpose in wildlife studies and rare plant studies.
8
Assumptions 1. Homogeneous within-group variances 2. Multivariate normality within groups. 3. Linearity among all pairs of variables. 4. Prior probabilities.
9
How it works The "direct" procedure is described below. 1. Calculate variance/covariance matrix for each group.
10
How it works The "direct" procedure is described below. 1. Calculate variance/covariance matrix for each group. 2. Calculate pooled variance/covariance matrix (S p ) from the above matrices.
11
How it works The "direct" procedure is described below. 1. Calculate variance/covariance matrix for each group. 2. Calculate pooled variance/covariance matrix (S p ) from the above matrices. 3. Calculate between group variance (S g ) for each variable.
12
4. Maximize the F-ratio: where the y is an the eigenvector associated with a particular discriminant function. We seek y to maximize F.
13
Maximize this ratio by finding the partial derivatives with a characteristic equation: The number of roots is g-1, where g is number of groups. In other words, the number of functions (axes) derived is one less than the number of groups. The eigenvalues thus express the percent of variance among groups explained by those axes.
14
6. Solve for each eigenvector y (also known as the "canonical variates" or "discriminant functions").
15
7. Locate points (sample units) on each axis. X = scores (coordinates) for n rows (sample units) on m dimensions, where m = g-1. A = original data matrix of n rows by p columns Y = matrix of m eigenvectors with loadings for p variables. Each eigenvector is known as a discriminant function.
16
These unstandardized discriminant functions Y can be used as (linear) prediction equations, assigning scores to unclassified items. Standardized discriminant function coefficients standardize to unit variance. The absolute value of these coefficients indicate the relative importance of the individual variables in contributing to the discriminant function.
17
8. Classification phase. a.Derive a classification equation for each group, one term in the equation for each variable, plus a constant. b.Insert data values for a given SU to calculate a classification score for each group for that SU. c.The SU is assigned to the group in which it had the highest score. The coefficients in the equation are derived from: p p within-group variance-covariance matrix (S p ) and p 1 vector of the means for each variable in group k, M k. First, calculate W by dividing each term of S p by the within-group degrees of freedom. Then:
18
8. Classification phase, cont. The coefficients in the equation are derived from: p p within-group variance-covariance matrix (S p ) p 1 vector of the means for each variable in group k, M k. First, calculate W by dividing each term of S p by the within- group degrees of freedom. Then: The constant is derived as: The constant and the coefficients in C k define a linear equation of the usual form, one equation for each group k.
19
Summary statistics Wilk's lambda ( ). Wilk's is the error sum of squares divided by the sum of the effect sum of squares and the error sum of squares. Thus, it is the variance among the objects not explained by the discriminant functions. It ranges from zero (perfect separation of groups) to one (no separation of groups). Statistical significance of lambda is tested with a chi- square approximation. Chi-square (derived from Wilk’s lambda). Variance explained.
20
Figure 26.1. Comparison of DA and PCA. Groups are tighter in DA than in PCA because DA maximizes group separation while PCA maximizes the representation of variance among individual points. Groups were superimposed on an ordination of pine species in ecological trait space (after McCune 1988). Pinus resinosa was not assigned to a group, so it does not appear in the DA ordination.
21
Table 26.1. Predictions of goshawk nesting sites from DA compared to actual results, in one case using equal prior probabilities, in the other case using prior probabilities based on the occupancy rate of landscape cells. The first value of 0.83 means that 83% of the sites that were predicted by DA to be nesting sites actually were nesting sites.
22
priors 0.5 priors 0.50.93 0.07
23
EQUAL priors: No. non-nests predicted nests = p(predicted nest but not nest) number of non-nests = 0.17 93 = 15.8 No. nests predicted non-nests = p(predicted not nest but nest) number of nests = 0.17 7 = 1.2 Total number of errors = 15.8 + 1.2 = 17 False positives False negatives
24
UNEQUAL priors: No. non-nests predicted nests = p(predicted nest but not nest) number of non-nests = 0.02 93 = 1.9 No. nests predicted non-nests = p(predicted not nest but nest) number of nests = 0.52 7 = 3.6 Total number of errors = 1.9 + 3.6 = 5.5 False positives False negatives
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.