Classifying Categorical Data Risi Thonangi M.S. Thesis Presentation Advisor: Dr. Vikram Pudi.

Classifying Categorical Data Risi Thonangi M.S. Thesis Presentation Advisor: Dr. Vikram Pudi

2 Presentation Outline I. Introduction II. Related Work III. Contributions IV. ACME Classifier V. Handling Non-Closed Itemsets VI. Improving the Execution time VII. ACME and Naïve Bayes VIII. Experimental Results IX. Conclusions and Future Work

3 Presentation Outline I. Introduction I.The classification problem II.Preliminaries II. Related Work III. Contributions IV. ACME Classifier V. Handling Non-Closed Itemsets VI. Improving the Execution time VII. ACME and Naïve Bayes VIII. Experimental Results IX. Conclusions and Future Work

4 The Classification Problem record class i 1, i 2, i 3 c 1 i 2, i 3 c 1 i 1, i 4 c 2 i 1, i 3 c 1 I = { i 1, i 2, i 3, i 4 } C = {c 1,c 2 } attributes classes

5 The Classification Problem record class i 1, i 2, i 3 c 1 i 2, i 3 c 1 i 1, i 4 c 2 i 1, i 3 c 1 I = { i 1, i 2, i 3, i 4 } C = {c 1,c 2 } attributes classes i 1, i 2, i 4 ?

6 The Classification Problem record class i 1, i 2, i 3 c 1 i 2, i 3 c 1 i 1, i 4 c 2 i 1, i 3 c 1 I = { i 1, i 2, i 3, i 4 } C = {c 1,c 2 } attributes classes i 1, i 2, i 4 ? query

7 Formal Problem Statement Given a Dataset D correctLearn from this dataset to classify a potentially unseen record `q’ [query] to its correct class. Each record r i is explained using boolean attributes I = { i 1, i 2, …, i |I| } and is labeled to one of the classes C = { c 1, c 2, …, c |C| } I = { i 1, i 2, …, i |I| } can also be looked at as a set of items.

8 Preliminaries itemset A set of items – { i 1, i 2, i 3 } P(.) Probability Distribution frq-itemset An itemset whose frequency is above a given threshold σ σ Support Threshold τ Confidence Threshold { i 1, i 2 } → { i 3 } An Association Rule ( AR ) { i 1, i 2 } → c 1 A Classification Association Rule ( CAR )

9 Presentation Outline I. Introduction II. Related Work I.Classification based on Associations (CBA) II.Classification based on Multiple Association Rules (CMAR) III. Large Bayes (LB) Classifier III. Contributions IV. ACME Classifier V. Handling Non-Closed Itemsets VI. ACME and Naïve Bayes VII. Experimental Results VIII. Conclusions and Future Work

10 Classification based on Associations (CBA) [Bing Liu – KDD98] First Classifier that used the paradigm of Association Rules Steps in CBA: –Mine for CARs satisfying support and confidence thresholds –Sort all CARs based on confidence –Classify using the rule that satisfies the query and has the highest confidence

11 Classification based on Associations (CBA) [Bing Liu – KDD98] First Classifier that used the paradigm of Association Rules Steps in CBA: –Mine for CARs satisfying support and confidence thresholds –Sort all CARs based on confidence –Classify using the rule that satisfies the query and has the highest confidence Disadvantages: Not Robust –Single rule based classification – Not Robust –Cannot handle Fully Confident Associations

13 Disadvantages with CBA: Single Rule based classification Let the classifier have 3 rules : – i 1 → c 1 support: 0.3, confidence: 0.8 – i 2, i 3 → c 2 support: 0.7, confidence: 0.7 – i 2, i 4 → c 2 support: 0.8, confidence: 0.7 Query { i 1, i 2, i 3, i 4 } will be classified to the class c 1 by CBA which might be incorrect. CBA, being a single-rule classifier, cannot consider the effects of multiple-parameters.

15 Fully Confident Associations An Association { i 1, i 2 } → { i 3 } is fully confident if its confidence is 100%. This means P( ~i 3, i 1, i 2 ) = 0. If CBA includes the CAR { i 1, i 2, i 3 } → c 1 it will also include { i 1, i 2 } → c 1 If the query { i 1, i 2, ~i 3 } arrives for classification, it is classified to c 1 using { i 1, i 2 } → c 1 But P (~i 3, i 1, i 2 ) = 0 CBA does not check for all statistical relationships.

16 Classification based on Multiple ARs (CMAR) [WenminLi-ICDM01] Uses multiple CARs in the classification step Steps in CMAR: –Mine for CARs satisfying support and confidence thresholds –Sort all CARs based on confidence –Find all CARs which satisfy the given query –Group them based on their class label –Classify the query to the class whose group of CARs has the maximum weight

17 Classification based on Multiple ARs (CMAR) [WenminLi-ICDM01] Uses multiple CARs in the classification step Steps in CMAR: –Mine for CARs satisfying support and confidence thresholds –Sort all CARs based on confidence –Find all CARs which satisfy the given query –Group them based on their class label –Classify the query to the class whose group of CARs has the maximum weight

18 CMAR contd. Rules labeled c 1 i 1 → c 1 Rules labeled c 2 i 2, i 3 → c 2 i 2, i 4 → c 2 Rules satisfying query `q’ R Output the class with the highest weight

19 CMAR Disadvantages No proper statistical explanation given for the mathematical formulae that were employed Cannot handle Fully Confident Associations

20 Large Bayes (LB) Classifier [Meretakis-KDD99] Build P( q|c i ) using frequent itemsets in c i which are subsets of `q’ Steps in LB: –Mine for frequent itemsets –Prune the frequent itemsets –Calculate P( q ) using a product approximation –Classify to the class with the highest probability

22 LB: Pruning FRQ-Itemsets An immediate itemset of `s’ without the item `i ’ is denoted as `s\i ’ –Ex: s = { i 1, i 2 } then s\i 1 denotes the set { i 2 } Symbol ` I ‘ stands for interestingness P j,k ( s ) denotes the estimate of `s ‘ calculated from the frequencies of `s\j ‘ and `s\k ‘

23 LB: Pruning FRQ-Itemsets An immediate itemset of `s’ without the item `i ’ is denoted as `s\i ’ –Ex: s = { i 1, i 2 } then s\i 1 denotes the set { i 2 } Symbol ` I ‘ stands for interestingness P j,k ( s ) denotes the estimate of `s ‘ calculated from the frequencies of `s\j ‘ and `s\k ‘

24 LB Pruner: Disadvantages Assumes an itemset’s interestingness does not depend on items not occurring in it. –Ex: For s = { i 1, i 2 }, I( s ) is only dependent on { i 1 } and { i 2 } but not on { i 3 } Assumes an itemset’s interestingness can be calculated from pairs of immediate-subsets Uses a global information measure for all classes. –itemsets can be informative in one class but not in another.

26 LB: Calculating P(q) Approximately calculates P( q ) using frequencies of frequent itemsets. –Ex: P( i 1, i 2, i 3, i 4, i 5 ) = P( i 2, i 5 ) · P( i 3 | i 5 ) · P( i 1, i 4 | i 2 ) –Following should be available: { i 2, i 5 }, { i 3, i 5 }, { i 5 }, { i 1, i 4, i 2 }, { i 2 }

27 LB: Calculating P(q) Approximately calculates P( q ) using frequencies of frequent itemsets. –Ex: P( i 1, i 2, i 3, i 4, i 5 ) = P( i 2, i 5 ) · P( i 3 | i 5 ) · P( i 1, i 4 | i 2 ) –Following should be available: { i 2, i 5 }, { i 3, i 5 }, { i 5 }, { i 1, i 4, i 2 }, { i 2 } Iteratively select itemsets until all the items in ` q ‘ are covered. There could be many product approximations.

28 LB: Calculating P(q) Approximately calculates P( q ) using frequencies of frequent itemsets. –Ex: P( i 1, i 2, i 3, i 4, i 5 ) = P( i 2, i 5 ) · P( i 3 | i 5 ) · P( i 1, i 4 | i 2 ) –Following should be available: { i 2, i 5 }, { i 3, i 5 }, { i 5 }, { i 1, i 4, i 2 }, { i 2 } Iteratively select itemsets until all the items in ` q ‘ are covered. There could be many product approximations. Heuristic : Select itemset ` s ‘ iteratively s.t. –new items in `s’ are the least. –If there are contenders, pick `s’ with the highest I ( s )

29 Estimating P(q): Disadvantages LB calculates probability of q as if its an itemset. Uses an approximation of P(q) and hence assumes independences between items There could be a better product approximation. Cannot handle Fully Confident Associations –if there exists a rule { i 1, i 2 } → i 3, i.e. P( i 1, i 2, i 3 ) = P ( i 1, i 2 ) –q = { i 1, i 2, i 4 } ‘s product approximation is built as P ( i 1, i 2 )· P ( i 4 | i 1, i 2 )

31 Contributions Frequent Itemsets + Maximum Entropy = very accurate, robust and theoretically appealing classifier Fixed the existing Maximum Entropy model to work with frequent itemsets Made the approach scalable to large databases Proved that Naïve Bayes is a specialization of ACME

32 Presentation Outline I. Introduction II. Related Work III. Contributions IV. ACME Classifier I.Philosophy II.Steps involved V. Handling Non-Closed Itemsets VI. Improving the Execution time VII. ACME and Naïve Bayes VIII. Experimental Results IX. Conclusions and Future Work

33 ACME Classifier Main philosophy of the ACME: Use Data Mining principles to mine informative patterns from a dataset and build a Statistical Model using these patterns.

34 ACME Classifier Main philosophy of the ACME: ACME does not explicitly assume any independence whatsoever if it is not represented in the dataset Use Data Mining principles to mine informative patterns from a dataset and build a Statistical Model using these patterns.

35 ACME Classifier Main philosophy of the ACME: ACME does not explicitly assume any independence whatsoever if it is not represented in the dataset ACME has three steps: –Mining step –Learning step –Classifying step Use Data Mining principles to mine informative patterns from a dataset and build a Statistical Model using these patterns.

37 Mining Informative Patterns record class i 1, i 2, i 3 c 1 i 2, i 3 c 1 i 1, i 4 c 2 i 1, i 3 c 1 D D1D1 D2D2 Records labeled as c 2 Records labeled as c 1

38 Mining Informative Patterns DiDi Constraints of class c i Non-Redundant Constraints of class c i Apriori Confidence based Pruner

39 Mining constraints Let S i denote the set of itemsets which are frequent in c i, i.e.: Let S denote the set of itemsets which are frequent in atleast one class, i.e.: Constraints of c i denoted by C i are:

40 Pruning constraints Constraints are pruned based on how well they differentiate between classes. –Ex: s={ i 1, i 2 } an itemset in S, is pruned if –a case when it is not pruned

43 Learning from constraints Constraints of Class c i (C i ) Statistical Distribution of Class c i (P i ) Available from the mining step Will be used in the classification step

44 Learning from constraints Constraints of Class c i (C i ) Statistical Distribution of Class c i (P i ) Available from the mining step How? Will be used in the classification step

45 Learning from constraints Constraints of Class c i (C i ) Statistical Distribution of Class c i (P i ) Available from the mining step How? Will be used in the classification step Characteristic of P i : Should satisfy every constraint in C i

46 An Example i 1 i 2 P 0 00.25 0 1 1 0 1 1 Output of Mining Step for c 1 C 1 = { ( i 1, 0.5 ), ( i 2, 0.5 ) }

47 An Example i 1 i 2 P 0 00.25 0 1 1 0 1 1 Output of Mining Step for c 1 C 1 = { ( i 1, 0.5 ), ( i 2, 0.5 ) } i 1 i 2 P 0 0 0.4 0 1 0.1 1 0 1 1 0.4

48 An Example i 1 i 2 P 0 00.25 0 1 1 0 1 1 Output of Mining Step for c 1 C 1 = { ( i 1, 0.5 ), ( i 2, 0.5 ) } i 1 i 2 P 0 0 0.4 0 1 0.1 1 0 1 1 0.4 choose the distribution with the highest entropy.

49 Learning step The final outcome solution distribution P for class c i of the Learning Step should: –Satisfy constraints of the class –Have the highest entropy possible

50 Learning step The final outcome solution distribution P for class c i of the Learning Step should: –Satisfy constraints of the class –Have the highest entropy possible [Good-AMS63] P can be modeled as the following log-linear model.

51 Learning step Advantage with the log-linear model approach: –Find `μ ’ values s.t. the distribution P satisfies the constraints. (entropy is automatically maximized)

52 Learning step Advantage with the log-linear model approach: –Find `μ ’ values s.t. the distribution P satisfies the constraints. (entropy is automatically maximized) How to find the `μ ’ values?

53 Learning step Advantage with the log-linear model approach: –Find `μ ’ values s.t. the distribution P satisfies the constraints. (entropy is automatically maximized) How to find the `μ ’ values? Generalized Iterative Scaling (GIS) Algorithm [Darroch-AMS72] Initialize every μ to 1, and iteratively scale all μ’s until distribution P satisfies all constraints.

56 Classification step

59 Presentation Outline I. Introduction II. Related Work III. Contributions IV. ACME Classifier V. Handling Non-Closed Itemsets I.Failure of the Log-Linear Model II.Fix to the Log-Linear Model VI. ACME and Naïve Bayes VII. Experimental Results VIII. Conclusions and Future Work

60 Handling Non-Closed constraints Non-Closed Constraint Example:. ( i 1 i 2, 0.5 ) is non-closed if ( i 1, 0.5 ) is also in the system of constraints. A constraint ( s i, f i ) is non-closed iff there exists another constraint ( s j, f j ) such that:

61 Log-Linear’s Failure to Accommodate Non-Closed constraints Theorem: The log-linear model does not have a solution in the presence of non-closed constraints.

62 Log-Linear’s Failure to Accommodate Non-Closed constraints Theorem: Proof: pp 24—25 [Thesis] The log-linear model does not have a solution in the presence of non-closed constraints. Let ( i 1 i 2, 0.5 ) and ( i 1, 0.5 ) are the only two constraints in C i. Let q = { i 1, ~i 2 } is the query to be classified.

66 Log-Linear’s Failure to Accommodate Non-Closed constraints Theorem: Proof: pp 24—25 [Thesis] The log-linear model does not have a solution in the presence of non-closed constraints. Let ( i 1 i 2, 0.5 ) and ( i 1, 0.5 ) are the only two constraints in C i. Let q = { i 1, ~i 2 } is the query to be classified. This is a contradiction

67 Log-Linear’s Failure to Accommodate Non-Closed constraints Theorem: Proof: pp 24—25 [Thesis] The log-linear model does not have a solution in the presence of non-closed constraints. Let ( i 1 i 2, 0.5 ) and ( i 1, 0.5 ) are the only two constraints in C i. Let q = { i 1, ~i 2 } is the query to be classified. There is no possible [μ]-vector which can fit the distribution P to the given set of constraints.

68 Fix to the Log-Linear an important note: –The product form of the Log-Linear maximizes entropy. problem with the existing log-linear: –Cannot learn zero probabilities without setting μ values to zeroes.

69 Fix to the Log-Linear an important note: –The product form of the Log-Linear maximizes entropy. problem with the existing log-linear: –Cannot learn zero probabilities without setting μ values to zeroes. Solution: Keep the product form of the classical Log-Linear and explicitly set zero probabilities to the required queries and remove them from the learning step.

70 The Modified Log-Linear Model

71 The Modified Log-Linear Model Example: ( i 1 i 2, 0.5 ) and ( i 1, 0.5 ) are two constraints in C i i 1 i 2 P 0 0 0 1 1 0 1 1

72 The Modified Log-Linear Model Example: ( i 1 i 2, 0.5 ) and ( i 1, 0.5 ) are two constraints in C i i 1 i 2 P 0 0 0 1 1 0 0 1 1 ( i 1 i 2, 0.5 ) is a non- closed constraint.

73 The Modified Log-Linear Model Example: ( i 1 i 2, 0.5 ) and ( i 1, 0.5 ) are two constraints in C i i 1 i 2 P 0 0 0 1 1 0 0 1 1 ( i 1 i 2, 0.5 ) is a non- closed constraint. Define P using Log- Linear Model

74 The Modified Log-Linear Model Example: ( i 1 i 2, 0.5 ) and ( i 1, 0.5 ) are two constraints in C i i 1 i 2 P 0 0 0 1 1 0 0 1 1 ( i 1 i 2, 0.5 ) is a non- closed constraint. Define P using Log- Linear Model Learnt using the GIS algorithm

75 Learning the Modified Log-Linear Model Learning the modified Log-Linear Model has 3 steps: –Find non-closed constraints –Find queries which satisfy them –Run GIS on the remaining queries

77 Learning the Modified Log-Linear Model Learning the modified Log-Linear Model has 3 steps: –Find non-closed constraints –Find queries which satisfy them –Run GIS on the remaining queries Non-Closed Constraint A constraint ( s i, f i ) is non-closed iff there exists another constraint ( s j, f j ) such that:

78 An Efficient Method to find Non- Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i1i1 i2i2 i3i3 i3i3 i3i3 i2i2 i3i3 { }

79 An Efficient Method to find Non- Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i1i1 i2i2 i3i3 i3i3 i3i3 i2i2 i3i3 { } { i 1,i 2 }

80 An Efficient Method to find Non- Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } { i 1,i 2 }

81 An Efficient Method to find Non- Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { }

82 An Efficient Method to find Non- Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } To determine if a node is non-closed we should know frequencies of its immediate subsets

83 An Efficient Method to find Non- Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } To determine if a node is non-closed we should know frequencies of its immediate subsets { i 1,i 2,i 3 }

84 An Efficient Method to find Non- Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } To determine if a node is non-closed we should know frequencies of its immediate subsets { i 1,i 2 } { i 1,i 3 } { i 2, i 3 } { i 1,i 2,i 3 }

85 An Efficient Method to find Non- Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } To determine if a node is non-closed we should know frequencies of its immediate subsets { i 1,i 2 } { i 1,i 3 } { i 2, i 3 } { i 1,i 2,i 3 } Use parent’s set of subsets to build child’s set of subsets

86 An Efficient Method to find Non- Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } To determine if a node is non-closed we should know frequencies of its immediate subsets { i 1,i 2 } { i 1,i 2,i 3 } Use parent’s set of subsets to build child’s set of subsets

87 An Efficient Method to find Non- Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } To determine if a node is non-closed we should know frequencies of its immediate subsets { i 1,i 2 } { i 1 } { i 2 } { i 1,i 2,i 3 }

88 An Efficient Method to find Non- Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } To determine if a node is non-closed we should know frequencies of its immediate subsets { i 1,i 2 } { i 1 } { i 2 } { i 1,i 2,i 3 } i3i3

89 An Efficient Method to find Non- Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } To determine if a node is non-closed we should know frequencies of its immediate subsets { i 1,i 2 } { i 1 } { i 2 } { i 1,i 2,i 3 } i3i3 i3i3 i3i3

90 An Efficient Method to find Non- Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } To determine if a node is non-closed we should know frequencies of its immediate subsets { i 1,i 2 } { i 1 } { i 2 } { i 1,i 2,i 3 } i3i3 i3i3 i3i3 { i 1,i 3 } { i 2, i 3 }

91 An Efficient Method to find Non- Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } To determine if a node is non-closed we should know frequencies of its immediate subsets { i 1,i 2 } { i 1 } { i 2 } { i 1,i 2,i 3 } i3i3 i3i3 i3i3 { i 1,i 3 } { i 2, i 3 }

92 An Efficient Method to find Non- Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } To determine if a node is non-closed we should know frequencies of its immediate subsets { i 1,i 2 } { i 1,i 3 } { i 2, i 3 } { i 1,i 2,i 3 }

93 An Efficient Method to find Non-Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } To determine if a node is non-closed we should know frequencies of its immediate subsets { i 1,i 2,i 3 }

94 An Efficient Method to find Non-Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } To determine if a node is non-closed we should know frequencies of its immediate subsets { i 1,i 2,i 3 } O( |I|· |C| )

96 Learning the Modified Log-Linear Model Learning the modified Log-Linear Model has 3 steps: –Find non-closed constraints –Find queries which satisfy them –Run GIS on the remaining queries i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } { i 1,i 2,i 3 } { i 2,i 3 }

101 Learning the Modified Log-Linear Model Learning the modified Log-Linear Model has 3 steps: –Find non-closed constraints –Find queries which satisfy them –Run GIS on the remaining queries i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } { i 1,i 2,i 3 } { i 2,i 3 } Prob is zero

102 Learning the Modified Log-Linear Model Learning the modified Log-Linear Model has 3 steps: –Find non-closed constraints –Find queries which satisfy them –Run GIS on the remaining queries i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } { i 1,i 2,i 3 } { i 2,i 3 } Prob is zero

103 Advantages of the Modified Log- Linear Model It can handle non-closed constraints In the learning procedure, it does not consider all possible queries. Hence is efficient. Dataset# ConsPruned q’s Austra(354) 26310.1% Waveform(99) 241.3% Cleve(246) 2049.96% Diabetes(85) 617.42% German(54) 449.66% Heart(115) 9555.1% Breast(189)1808.89% Lymph(29) 1412.1% Pima(87) 558.6%

105 Improving the Execution time The learning step is computationally inefficient for domains with large number of items I –includes running GIS with the constraint set C and the query space 2 I (power set of I).

106 Improving the Execution time The learning step is computationally inefficient for domains with large number of items I –includes running GIS with the constraint set C and the query space 2 I (power set of I). Divide I into subparts I 1, I 2, …, I m s.t. there is no constraint ( s, f ) which overlaps two subparts. Run GIS for each I j and build corresponding distribution P j. Combine P 1, P 2, …, P m using naïve bayes.

107 Improving the Execution time The learning step is computationally inefficient for domains with large number of items I –includes running GIS with the constraint set C and the query space 2 I (power set of I). Divide I into subparts I 1, I 2, …, I m s.t. there is no constraint ( s, f ) which overlaps two subparts. Run GIS for each I j and build corresponding distribution P j. Combine P 1, P 2, …, P m using naïve bayes. Ex: I = {a,b,c,d}, and constraints are {a}, {a,b} and {c,d}. Split I into I 1 ={a,b} and I 2 ={c,d}. Learn Log-Linear models P 1 (.) for I 1 ={a,b} and P 2 (.) for I 2 ={c,d} P(b,c) = P 1 (b) * P 2 (c)

110 Presentation Outline I. Introduction II. Related Work III. Contributions IV. ACME Classifier V. Handling Non-Closed Itemsets VI. Improving the Execution time VII. ACME and Naïve Bayes I.Equivalence under certain conditions VIII. Experimental Results IX. Conclusions and Future Work

111 ACME and Naïve Bayes Classifiers ACMENaïve Bayes Constraints: ( i 1,f 1 ), ( i 2,f 2 ), ( i 1 i 2,f 3 ), … Constraints: ( i 1,f 1 ), ( i 2,f 2 ), …

112 ACME and Naïve Bayes Classifiers ACMENaïve Bayes Constraints: ( i 1,f 1 ), ( i 2,f 2 ), ( i 1 i 2,f 3 ), … Constraints: ( i 1,f 1 ), ( i 2,f 2 ), … Build P using log-linear model

113 ACME and Naïve Bayes Classifiers ACMENaïve Bayes Constraints: ( i 1,f 1 ), ( i 2,f 2 ), ( i 1 i 2,f 3 ), … Constraints: ( i 1,f 1 ), ( i 2,f 2 ), … Build P using log-linear model Build P using Naïve-Bayes model

114 ACME and Naïve Bayes Classifiers Is there any relationship between ACME and NB ?

115 ACME and Naïve Bayes Classifiers Is there any relationship between ACME and NB ? Yes. Naïve Bayes is a specialization of ACME.

116 ACME and Naïve Bayes Classifiers Is there any relationship between ACME and NB ? Yes. Naïve Bayes is a specialization of ACME. Theorem: Under no conditional dependencies between the items in the domain, ACME performs the same as that of NB. Proof: pp 34—37 [Thesis] Steps in the proof: (1) When there are no conditional dependencies between items, all constraints other than unigrams are redundant. (2) Log-Linear Model built with unigram frequencies will yield the same probability distribution as that of Naïve Bayes.

118 Experiments Datasets chosen from UCI Machine Learning Repository. Compared with Naïve Bayes (NB), C4.5 Decision Tree Classifier, CBA Association Rule based Classifier, TAN Bayesian Network based Classifiers. ACME performed the best on an average.

119 Performance DatasetNBC4.5CBATANACME Austra84.3485.584.9 85.5 Breast97.2895.1396.396.5696.49 Cleve83.8276.2382.882.583.82 Diabetes75.7872.3974.576.3077.86 German70.070.973.47371.3 Heart83.78081.8781.8582.96 Lymph83.177.077.884.4578.4 Pima76.1774.3472.976.377.89 Waveform80.8276.4480.081.5283.08 Performance of ACME as against to other classifiers.

121 Conclusion Frequent Itemsets + Maximum Entropy An accurate theoretically appealing classifier

122 Future Work An Approximate Algorithm for GIS Identifying redundant constraints Removing user input –support and confidence thresholds are input by the user.

123 Questions?

Classifying Categorical Data Risi Thonangi M.S. Thesis Presentation Advisor: Dr. Vikram Pudi.

Similar presentations

Presentation on theme: "Classifying Categorical Data Risi Thonangi M.S. Thesis Presentation Advisor: Dr. Vikram Pudi."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Classifying Categorical Data Risi Thonangi M.S. Thesis Presentation Advisor: Dr. Vikram Pudi.

Similar presentations

Presentation on theme: "Classifying Categorical Data Risi Thonangi M.S. Thesis Presentation Advisor: Dr. Vikram Pudi."— Presentation transcript:

Similar presentations

About project

Feedback