Presentation is loading. Please wait.

Presentation is loading. Please wait.

Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.

Similar presentations


Presentation on theme: "Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar."— Presentation transcript:

1 Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification
M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar

2 Pattern Classification
Given a sample x Find the label corresponding to it A classifier is an algorithm, which takes x and returns the label between 1 to N Binary Classification N = 2 Multiclass classification N > 2 Evaluation is usually done as probability of correct classification

3 Multiclass Classification
Many standard approaches Neural Networks, Decision Trees Direct extensions Combinations of component classifiers

4 Decision Directed Acyclic Graph
1,5 2,5 1,4 3,5 2,4 1,3 4,5 3,4 2,3 1,2 x 5 4 1 2 3 Sample x from class 3 Neural network vs decision dag

5 Decision Directed Acyclic Graph
1,5 2,5 1,4 3,5 2,4 1,3 4,5 3,4 2,3 1,2 x 5 4 1 2 3 Sample x from class 5 Neural network vs decision dag

6 Decision Directed Acyclic Graph
x 1,5 Sample x from class 4 2,5 1,4 3,5 2,4 1,3 Neural network vs decision dag 4,5 2,3 1,2 3,4 4 3 2 1 5

7 Decision Directed Acyclic Graph
1,5 2,5 1,4 3,5 2,4 1,3 4,5 3,4 2,3 1,2 x 5 4 1 2 3 There are multiple paths Neural network vs decision dag

8 Decision Directed Acyclic Graph
x 1,5 A DDAG can be improved by improving individual nodes 2,5 1,4 3,5 2,4 1,3 Neural network vs decision dag 4,5 2,3 1,2 3,4 5 4 3 2 1

9 Decision Directed Acyclic Graph
x A DDAG can be improved by improving individual nodes 1,5 2,5 1,4 Architecture is fixed for a given sequence of classes 3,5 2,4 1,3 Neural network vs decision dag 4,5 2,3 1,2 3,4 5 4 3 2 1

10 Decision Directed Acyclic Graph
x A DDAG can be improved by improving individual nodes 3,5 2,5 3,4 A DDAG can be improved by changing class order 1,5 2,4 3,1 Neural network vs decision dag 4,5 2,1 3,2 1,4 5 4 1 2 3 Class Order Changed

11 Features at Each Node Image as Features
Large number of features in Computer vision problems Principal Component Analysis (PCA) Project the data onto an axis which preserves maximum variance PCA is good for representation but not for discrimination

12 LDA performs better, but is computationally expensive
Features at Each Node Pairwise Linear Discriminant Analysis (LDA) is more effective Fischer Linear Discriminant, Optimal Discriminant Vectors Large number of feature extractions Large number of matrices to be stored LDA performs better, but is computationally expensive

13 Solution 1,4 4 1 3 2,4 1,3 2 3,4 2,3 1,2 4 3 2 1

14 Solution M14 1,4 1 4 2,4 1,3 3 2 3,4 2,3 1,2 4 3 2 1

15 Solution M14 1,4 1 4 2,4 1,3 3 2 3,4 2,3 1,2 M23 4 3 2 1

16 Solution M14 1,4 M34 1 4 2,4 1,3 3 2 3,4 2,3 1,2 4 3 2 1 M23 M12

17 Solution M14 M24 1,4 M34 1 4 2,4 1,3 3 2 3,4 2,3 1,2 M13 4 3 2 1 M23 M12 4 Classes 6 classifiers 6 Dimensionality Reductions Total number of features extracted : (N-1) * reduced_dimension

18 Solution M14 M24 1,4 M34 1 4 2,4 1,3 3 2 3,4 2,3 1,2 M13 4 3 2 1 M23 M12 Example : 400 classes and 400 features reduced to 50 Results in Projections overall, and 19950 for a single evaluation DDAG

19 LDA is effective, but highly complex
Solution M14 M13 1,4 M34 1 4 2,4 1,3 3 2 3,4 2,3 1,2 M24 4 3 2 1 M23 M12 LDA is effective, but highly complex in space and time

20 Solution M14 M13 M34 1 4 3 2 M24 M23 M12

21 Solution M14 M13 M12 M34 M23 4 1 M34 3 M = 2 M13 M24 M14 M23 M24 M12
Stack all the transformations

22 This matrix is Rank Deficient
Solution M14 M12 M13 M23 M34 1 M34 4 M = 3 M13 2 M14 M24 M24 M23 This matrix is Rank Deficient M12

23 Solution M14 M12 M24 M34 M23 1 4 3 M34 M = 2 M13 M13 M14 M23 M24 M12
This matrix is Rank Deficient Use a reduced representation

24 Solution M14 M24 M12 M34 M23 4 1 M34 3 M = 2 M13 M13 M14 M23 M24 M12
This matrix is Rank Deficient Has many similar rows Clustering, SVD etc., may be used

25 Remarks Only one time feature extraction
Results in a reduced LDA matrix, retaining the discriminant capacity Direct PCA LDA Compressed LDA Dataset Compressed LDA Boosted Pendigits 95.99 97.24 Optdigits 98.34 98.92 Dataset Feat Acc Pendigits 16 95.77 13 95.46 225 95.99 11 Optdigits 64 97.9 41 98.63 36 98.34

26 Motivating Example 1,4 Priors : {0.3, 0.1, 0.2, 0.4} All Classifiers are 90% Correct 2,4 1,3 1,4 0.3*(0.9) *(0.5)*(0.9) *(0.5)*(0.9) 2 + 0.4*(0.9)3 3,4 2,3 1,2 Reordering 2,4 1,3 2 1 4 3 3,4 2,3 1,2 Accuracy : % Accuracy : % 43.8% reduction in error !! 1 4 3 2

27 Formulation 1,4 Number of classes = N Priors = Pi 2,4 1,3 Errors = q (at each nodes) Relevant Path length = max (N – i, i – 1) 3,4 2,3 1,2 Number of relevant paths of length l to node r = Nrl 2 1 4 3 Prefer central positions in the list for high prior classes Optimal Maximize

28 Disadvantage of a DDAG DDAG can provide only a class label
New DDAG classification protocol proposed Previous formulation is insufficient

29 Maximizing DDAG Accuracy
1,4 2,4 1,3 3,4 2,3 1,2 i j ……..

30 DDAG design is NP-Hard Optimal Decision Tree is NP-Hard
DAG Design is reducible to Optimal Decision Tree Approximate algorithms are the only resort

31 Proposed Algorithms Three greedy algorithms
Prefer high prior classifiers to be at center of the DDAG Prefer high performance classifiers to be the root nodes of the DDAG Prefer high error classes to be at the center of the DDAG Empirical results show that approximation error is close to half that of optimal graph

32 Complexities of Classification
Classifier Space T - Best T - Worst T-Avg 1 vs Rest O(N) 1 vs 1 O(N2) DDAG O(N2) O(N) BHC O(N) O(1) O(log(N))

33 Binary Hierarchical Classifiers
1,4,5 vs 2,3 3 5 2 4 vs 1,5 2 vs 3 1 4 4 1vs5 3 2 1 5

34 Graph Partitioning We prefer Linear Cuts with large Margin
3 3 5 2 5 2 1 Root Node 1 4 4 1,4 vs 2,3,5 1,2,4,5 vs 3 Data Similarity Graph We prefer Linear Cuts with large Margin We prefer Linear Cuts None of the partitioning schemes are universally good for all problems (No Free Lunch Theorem) Objective : Compact Clusters Objective : Maximize the cut

35 Graph Partitioning Simple Workaround : Use locally best partitions 3 3
5 2 5 2 1 1 4 4 Graph Data Simple Workaround : Use locally best partitions

36 Margin Improvement Let some classes be there on both sides
3 3 5 Remove class 2 5 2 1 1 4 4 Improved Margin Margin Let some classes be there on both sides Don’t insist on mutually exclusive partitions

37 Trees with Overlapping Partitions
1,2 – 3 – 4,5,6 1,2 – 3 3,4 – 5 – 6 3 1,2 3,4 - 5 5,6 3,4 5 2 1 3 4

38 Comments The complexity remains O(log(N))
Different criterion for removing bad classes

39 Configurable Hybrid Classifiers
DDAG : High Accuracy, Large Size BHC : Moderate Accuracy, Small Size Take advantages of both If “classification” is easy, use BHC, otherwise use a DDAG

40

41 Results on OCR datasets

42 Classifiability Use expected error to select appropriate classifiers
How easy or difficult is it to classify a set of classes Computable from cooccurence matrices We proposed a pair wise classifiability measure Lpairwise = 2/N(N-1)∑ Lij

43 Generalization Capacity of Proposed Algorithms
The probability of error that a classifier makes on unseen samples is called generalization Large Margin Better features in a DDAG Better partitions in a BHC Use classifier of required complexity at each step (Occam’s Razor) Efficient feature representations require less complex classifiers Simpler partitions in BHC require less complex classifiers Architecture level generalization Hybrid classifiers use architectures of required complexity at each node, thereby improving the generalization Empirically we have demonstrated the generalization of algorithms

44 Conclusions Formulation, Analysis and Algorithms are presented
to design DDAGs using robust feature representations to design DDAGs using node-reordering to design Hierarchical classifiers with better generalization to design Hybrid hierarchical classifiers

45 Future Work Design based on simple algorithms may improve the current “high-performance” classifiers Promising directions Feature based partitioning vs Class based partitioning Trees with overlapping partitions Efficient DDAG design algorithms Configurability in classifier design

46 Thank You


Download ppt "Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar."

Similar presentations


Ads by Google