Classifying Categorical Data Risi Thonangi M.S. Thesis Presentation Advisor: Dr. Vikram Pudi.

Slides:



Advertisements
Similar presentations
Recap: Mining association rules from large datasets
Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Data Mining Classification: Alternative Techniques
FP-Growth algorithm Vasiljevic Vladica,
CSC 380 Algorithm Project Presentation Spam Detection Algorithms Kyle McCombs Bridget Kelly.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Decision Tree Rong Jin. Determine Milage Per Gallon.
Probably Approximately Correct Model (PAC)
Basic Data Mining Techniques Chapter Decision Trees.
Evaluation.
Ensemble Learning: An Introduction
Basic Data Mining Techniques
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
Lecture 5 (Classification with Decision Trees)
Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Mining Association Rules
Chapter 5 Data mining : A Closer Look.
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning RASTOGI, Rajeev and SHIM, Kyuseok Data Mining and Knowledge Discovery, 2000, 4.4.
Basic Data Mining Techniques
USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.
by B. Zadrozny and C. Elkan
Mining Optimal Decision Trees from Itemset Lattices Dr, Siegfried Nijssen Dr. Elisa Fromont KDD 2007.
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
Approximate Frequency Counts over Data Streams Gurmeet Singh Manku, Rajeev Motwani Standford University VLDB2002.
1 Verifying and Mining Frequent Patterns from Large Windows ICDE2008 Barzan Mozafari, Hetal Thakkar, Carlo Zaniolo Date: 2008/9/25 Speaker: Li, HueiJyun.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Bug Localization with Machine Learning Techniques Wujie Zheng
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
On information theory and association rule interestingness Loo Kin Kong 5 th July, 2002.
1 Bayesian Methods. 2 Naïve Bayes New data point to classify: X=(x 1,x 2,…x m ) Strategy: – Calculate P(C i /X) for each class C i. – Select C i for which.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.
Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1 Estimating Accuracy Holdout method – Randomly partition data: training set + test set – accuracy = |correctly classified points| / |test data points|
Classification and Prediction
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Lecture Notes for Chapter 4 Introduction to Data Mining
Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
1 Discriminative Frequent Pattern Analysis for Effective Classification Presenter: Han Liang COURSE PRESENTATION:
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Linear Models & Clustering Presented by Kwak, Nam-ju 1.
Searching for Pattern Rules Guichong Li and Howard J. Hamilton Int'l Conf on Data Mining (ICDM),2006 IEEE Advisor : Jia-Ling Koh Speaker : Tsui-Feng Yen.
Machine Learning: Ensemble Methods
Chapter 7. Classification and Prediction
Chapter 6 Classification and Prediction
Frequent Pattern Mining
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Classification and Prediction
Association Rule Mining
Discriminative Frequent Pattern Analysis for Effective Classification
Market Basket Analysis and Association Rules
The Naïve Bayes (NB) Classifier
Machine Learning: Lecture 6
©Jiawei Han and Micheline Kamber
Machine Learning: UNIT-3 CHAPTER-1
A task of induction to find patterns
A task of induction to find patterns
Presentation transcript:

Classifying Categorical Data Risi Thonangi M.S. Thesis Presentation Advisor: Dr. Vikram Pudi

2 Presentation Outline I. Introduction II. Related Work III. Contributions IV. ACME Classifier V. Handling Non-Closed Itemsets VI. Improving the Execution time VII. ACME and Naïve Bayes VIII. Experimental Results IX. Conclusions and Future Work

3 Presentation Outline I. Introduction I.The classification problem II.Preliminaries II. Related Work III. Contributions IV. ACME Classifier V. Handling Non-Closed Itemsets VI. Improving the Execution time VII. ACME and Naïve Bayes VIII. Experimental Results IX. Conclusions and Future Work

4 The Classification Problem record class i 1, i 2, i 3 c 1 i 2, i 3 c 1 i 1, i 4 c 2 i 1, i 3 c 1 I = { i 1, i 2, i 3, i 4 } C = {c 1,c 2 } attributes classes

5 The Classification Problem record class i 1, i 2, i 3 c 1 i 2, i 3 c 1 i 1, i 4 c 2 i 1, i 3 c 1 I = { i 1, i 2, i 3, i 4 } C = {c 1,c 2 } attributes classes i 1, i 2, i 4 ?

6 The Classification Problem record class i 1, i 2, i 3 c 1 i 2, i 3 c 1 i 1, i 4 c 2 i 1, i 3 c 1 I = { i 1, i 2, i 3, i 4 } C = {c 1,c 2 } attributes classes i 1, i 2, i 4 ? query

7 Formal Problem Statement Given a Dataset D correctLearn from this dataset to classify a potentially unseen record `q’ [query] to its correct class. Each record r i is explained using boolean attributes I = { i 1, i 2, …, i |I| } and is labeled to one of the classes C = { c 1, c 2, …, c |C| } I = { i 1, i 2, …, i |I| } can also be looked at as a set of items.

8 Preliminaries itemset A set of items – { i 1, i 2, i 3 } P(.) Probability Distribution frq-itemset An itemset whose frequency is above a given threshold σ σ Support Threshold τ Confidence Threshold { i 1, i 2 } → { i 3 } An Association Rule ( AR ) { i 1, i 2 } → c 1 A Classification Association Rule ( CAR )

9 Presentation Outline I. Introduction II. Related Work I.Classification based on Associations (CBA) II.Classification based on Multiple Association Rules (CMAR) III. Large Bayes (LB) Classifier III. Contributions IV. ACME Classifier V. Handling Non-Closed Itemsets VI. ACME and Naïve Bayes VII. Experimental Results VIII. Conclusions and Future Work

10 Classification based on Associations (CBA) [Bing Liu – KDD98] First Classifier that used the paradigm of Association Rules Steps in CBA: –Mine for CARs satisfying support and confidence thresholds –Sort all CARs based on confidence –Classify using the rule that satisfies the query and has the highest confidence

11 Classification based on Associations (CBA) [Bing Liu – KDD98] First Classifier that used the paradigm of Association Rules Steps in CBA: –Mine for CARs satisfying support and confidence thresholds –Sort all CARs based on confidence –Classify using the rule that satisfies the query and has the highest confidence Disadvantages: Not Robust –Single rule based classification – Not Robust –Cannot handle Fully Confident Associations

12 Classification based on Associations (CBA) [Bing Liu – KDD98] First Classifier that used the paradigm of Association Rules Steps in CBA: –Mine for CARs satisfying support and confidence thresholds –Sort all CARs based on confidence –Classify using the rule that satisfies the query and has the highest confidence Disadvantages: Not Robust –Single rule based classification – Not Robust –Cannot handle Fully Confident Associations

13 Disadvantages with CBA: Single Rule based classification Let the classifier have 3 rules : – i 1 → c 1 support: 0.3, confidence: 0.8 – i 2, i 3 → c 2 support: 0.7, confidence: 0.7 – i 2, i 4 → c 2 support: 0.8, confidence: 0.7 Query { i 1, i 2, i 3, i 4 } will be classified to the class c 1 by CBA which might be incorrect. CBA, being a single-rule classifier, cannot consider the effects of multiple-parameters.

14 Classification based on Associations (CBA) [Bing Liu – KDD98] First Classifier that used the paradigm of Association Rules Steps in CBA: –Mine for CARs satisfying support and confidence thresholds –Sort all CARs based on confidence –Classify using the rule that satisfies the query and has the highest confidence Disadvantages: Not Robust –Single rule based classification – Not Robust –Cannot handle Fully Confident Associations

15 Fully Confident Associations An Association { i 1, i 2 } → { i 3 } is fully confident if its confidence is 100%. This means P( ~i 3, i 1, i 2 ) = 0. If CBA includes the CAR { i 1, i 2, i 3 } → c 1 it will also include { i 1, i 2 } → c 1 If the query { i 1, i 2, ~i 3 } arrives for classification, it is classified to c 1 using { i 1, i 2 } → c 1 But P (~i 3, i 1, i 2 ) = 0 CBA does not check for all statistical relationships.

16 Classification based on Multiple ARs (CMAR) [WenminLi-ICDM01] Uses multiple CARs in the classification step Steps in CMAR: –Mine for CARs satisfying support and confidence thresholds –Sort all CARs based on confidence –Find all CARs which satisfy the given query –Group them based on their class label –Classify the query to the class whose group of CARs has the maximum weight

17 Classification based on Multiple ARs (CMAR) [WenminLi-ICDM01] Uses multiple CARs in the classification step Steps in CMAR: –Mine for CARs satisfying support and confidence thresholds –Sort all CARs based on confidence –Find all CARs which satisfy the given query –Group them based on their class label –Classify the query to the class whose group of CARs has the maximum weight

18 CMAR contd. Rules labeled c 1 i 1 → c 1 Rules labeled c 2 i 2, i 3 → c 2 i 2, i 4 → c 2 Rules satisfying query `q’ R Output the class with the highest weight

19 CMAR Disadvantages No proper statistical explanation given for the mathematical formulae that were employed Cannot handle Fully Confident Associations

20 Large Bayes (LB) Classifier [Meretakis-KDD99] Build P( q|c i ) using frequent itemsets in c i which are subsets of `q’ Steps in LB: –Mine for frequent itemsets –Prune the frequent itemsets –Calculate P( q ) using a product approximation –Classify to the class with the highest probability

21 Large Bayes (LB) Classifier [Meretakis-KDD99] Build P( q|c i ) using frequent itemsets in c i which are subsets of `q’ Steps in LB: –Mine for frequent itemsets –Prune the frequent itemsets –Calculate P( q ) using a product approximation –Classify to the class with the highest probability

22 LB: Pruning FRQ-Itemsets An immediate itemset of `s’ without the item `i ’ is denoted as `s\i ’ –Ex: s = { i 1, i 2 } then s\i 1 denotes the set { i 2 } Symbol ` I ‘ stands for interestingness P j,k ( s ) denotes the estimate of `s ‘ calculated from the frequencies of `s\j ‘ and `s\k ‘

23 LB: Pruning FRQ-Itemsets An immediate itemset of `s’ without the item `i ’ is denoted as `s\i ’ –Ex: s = { i 1, i 2 } then s\i 1 denotes the set { i 2 } Symbol ` I ‘ stands for interestingness P j,k ( s ) denotes the estimate of `s ‘ calculated from the frequencies of `s\j ‘ and `s\k ‘

24 LB Pruner: Disadvantages Assumes an itemset’s interestingness does not depend on items not occurring in it. –Ex: For s = { i 1, i 2 }, I( s ) is only dependent on { i 1 } and { i 2 } but not on { i 3 } Assumes an itemset’s interestingness can be calculated from pairs of immediate-subsets Uses a global information measure for all classes. –itemsets can be informative in one class but not in another.

25 Large Bayes (LB) Classifier [Meretakis-KDD99] Build P( q|c i ) using frequent itemsets in c i which are subsets of `q’ Steps in LB: –Mine for frequent itemsets –Prune the frequent itemsets –Calculate P( q ) using a product approximation –Classify to the class with the highest probability

26 LB: Calculating P(q) Approximately calculates P( q ) using frequencies of frequent itemsets. –Ex: P( i 1, i 2, i 3, i 4, i 5 ) = P( i 2, i 5 ) · P( i 3 | i 5 ) · P( i 1, i 4 | i 2 ) –Following should be available: { i 2, i 5 }, { i 3, i 5 }, { i 5 }, { i 1, i 4, i 2 }, { i 2 }

27 LB: Calculating P(q) Approximately calculates P( q ) using frequencies of frequent itemsets. –Ex: P( i 1, i 2, i 3, i 4, i 5 ) = P( i 2, i 5 ) · P( i 3 | i 5 ) · P( i 1, i 4 | i 2 ) –Following should be available: { i 2, i 5 }, { i 3, i 5 }, { i 5 }, { i 1, i 4, i 2 }, { i 2 } Iteratively select itemsets until all the items in ` q ‘ are covered. There could be many product approximations.

28 LB: Calculating P(q) Approximately calculates P( q ) using frequencies of frequent itemsets. –Ex: P( i 1, i 2, i 3, i 4, i 5 ) = P( i 2, i 5 ) · P( i 3 | i 5 ) · P( i 1, i 4 | i 2 ) –Following should be available: { i 2, i 5 }, { i 3, i 5 }, { i 5 }, { i 1, i 4, i 2 }, { i 2 } Iteratively select itemsets until all the items in ` q ‘ are covered. There could be many product approximations. Heuristic : Select itemset ` s ‘ iteratively s.t. –new items in `s’ are the least. –If there are contenders, pick `s’ with the highest I ( s )

29 Estimating P(q): Disadvantages LB calculates probability of q as if its an itemset. Uses an approximation of P(q) and hence assumes independences between items There could be a better product approximation. Cannot handle Fully Confident Associations –if there exists a rule { i 1, i 2 } → i 3, i.e. P( i 1, i 2, i 3 ) = P ( i 1, i 2 ) –q = { i 1, i 2, i 4 } ‘s product approximation is built as P ( i 1, i 2 )· P ( i 4 | i 1, i 2 )

30 Presentation Outline I. Introduction II. Related Work III. Contributions IV. ACME Classifier V. Handling Non-Closed Itemsets VI. Improving the Execution time VII. ACME and Naïve Bayes VIII. Experimental Results IX. Conclusions and Future Work

31 Contributions Frequent Itemsets + Maximum Entropy = very accurate, robust and theoretically appealing classifier Fixed the existing Maximum Entropy model to work with frequent itemsets Made the approach scalable to large databases Proved that Naïve Bayes is a specialization of ACME

32 Presentation Outline I. Introduction II. Related Work III. Contributions IV. ACME Classifier I.Philosophy II.Steps involved V. Handling Non-Closed Itemsets VI. Improving the Execution time VII. ACME and Naïve Bayes VIII. Experimental Results IX. Conclusions and Future Work

33 ACME Classifier Main philosophy of the ACME: Use Data Mining principles to mine informative patterns from a dataset and build a Statistical Model using these patterns.

34 ACME Classifier Main philosophy of the ACME: ACME does not explicitly assume any independence whatsoever if it is not represented in the dataset Use Data Mining principles to mine informative patterns from a dataset and build a Statistical Model using these patterns.

35 ACME Classifier Main philosophy of the ACME: ACME does not explicitly assume any independence whatsoever if it is not represented in the dataset ACME has three steps: –Mining step –Learning step –Classifying step Use Data Mining principles to mine informative patterns from a dataset and build a Statistical Model using these patterns.

36 ACME Classifier Main philosophy of the ACME: ACME does not explicitly assume any independence whatsoever if it is not represented in the dataset ACME has three steps: –Mining step –Learning step –Classifying step Use Data Mining principles to mine informative patterns from a dataset and build a Statistical Model using these patterns.

37 Mining Informative Patterns record class i 1, i 2, i 3 c 1 i 2, i 3 c 1 i 1, i 4 c 2 i 1, i 3 c 1 D D1D1 D2D2 Records labeled as c 2 Records labeled as c 1

38 Mining Informative Patterns DiDi Constraints of class c i Non-Redundant Constraints of class c i Apriori Confidence based Pruner

39 Mining constraints Let S i denote the set of itemsets which are frequent in c i, i.e.: Let S denote the set of itemsets which are frequent in atleast one class, i.e.: Constraints of c i denoted by C i are:

40 Pruning constraints Constraints are pruned based on how well they differentiate between classes. –Ex: s={ i 1, i 2 } an itemset in S, is pruned if –a case when it is not pruned

41 ACME Classifier Main philosophy of the ACME: ACME does not explicitly assume any independence whatsoever if it is not represented in the dataset ACME has three steps: –Mining step –Learning step –Classifying step Use Data Mining principles to mine informative patterns from a dataset and build a Statistical Model using these patterns.

42 ACME Classifier Main philosophy of the ACME: ACME does not explicitly assume any independence whatsoever if it is not represented in the dataset ACME has three steps: –Mining step –Learning step –Classifying step Use Data Mining principles to mine informative patterns from a dataset and build a Statistical Model using these patterns.

43 Learning from constraints Constraints of Class c i (C i ) Statistical Distribution of Class c i (P i ) Available from the mining step Will be used in the classification step

44 Learning from constraints Constraints of Class c i (C i ) Statistical Distribution of Class c i (P i ) Available from the mining step How? Will be used in the classification step

45 Learning from constraints Constraints of Class c i (C i ) Statistical Distribution of Class c i (P i ) Available from the mining step How? Will be used in the classification step Characteristic of P i : Should satisfy every constraint in C i

46 An Example i 1 i 2 P Output of Mining Step for c 1 C 1 = { ( i 1, 0.5 ), ( i 2, 0.5 ) }

47 An Example i 1 i 2 P Output of Mining Step for c 1 C 1 = { ( i 1, 0.5 ), ( i 2, 0.5 ) } i 1 i 2 P

48 An Example i 1 i 2 P Output of Mining Step for c 1 C 1 = { ( i 1, 0.5 ), ( i 2, 0.5 ) } i 1 i 2 P choose the distribution with the highest entropy.

49 Learning step The final outcome solution distribution P for class c i of the Learning Step should: –Satisfy constraints of the class –Have the highest entropy possible

50 Learning step The final outcome solution distribution P for class c i of the Learning Step should: –Satisfy constraints of the class –Have the highest entropy possible [Good-AMS63] P can be modeled as the following log-linear model.

51 Learning step Advantage with the log-linear model approach: –Find `μ ’ values s.t. the distribution P satisfies the constraints. (entropy is automatically maximized)

52 Learning step Advantage with the log-linear model approach: –Find `μ ’ values s.t. the distribution P satisfies the constraints. (entropy is automatically maximized) How to find the `μ ’ values?

53 Learning step Advantage with the log-linear model approach: –Find `μ ’ values s.t. the distribution P satisfies the constraints. (entropy is automatically maximized) How to find the `μ ’ values? Generalized Iterative Scaling (GIS) Algorithm [Darroch-AMS72] Initialize every μ to 1, and iteratively scale all μ’s until distribution P satisfies all constraints.

54 ACME Classifier Main philosophy of the ACME: ACME does not explicitly assume any independence whatsoever if it is not represented in the dataset ACME has three steps: –Mining step –Learning step –Classifying step Use Data Mining principles to mine informative patterns from a dataset and build a Statistical Model using these patterns.

55 ACME Classifier Main philosophy of the ACME: ACME does not explicitly assume any independence whatsoever if it is not represented in the dataset ACME has three steps: –Mining step –Learning step –Classifying step Use Data Mining principles to mine informative patterns from a dataset and build a Statistical Model using these patterns.

56 Classification step

57 Classification step

58 Classification step

59 Presentation Outline I. Introduction II. Related Work III. Contributions IV. ACME Classifier V. Handling Non-Closed Itemsets I.Failure of the Log-Linear Model II.Fix to the Log-Linear Model VI. ACME and Naïve Bayes VII. Experimental Results VIII. Conclusions and Future Work

60 Handling Non-Closed constraints Non-Closed Constraint Example:. ( i 1 i 2, 0.5 ) is non-closed if ( i 1, 0.5 ) is also in the system of constraints. A constraint ( s i, f i ) is non-closed iff there exists another constraint ( s j, f j ) such that:

61 Log-Linear’s Failure to Accommodate Non-Closed constraints Theorem: The log-linear model does not have a solution in the presence of non-closed constraints.

62 Log-Linear’s Failure to Accommodate Non-Closed constraints Theorem: Proof: pp 24—25 [Thesis] The log-linear model does not have a solution in the presence of non-closed constraints. Let ( i 1 i 2, 0.5 ) and ( i 1, 0.5 ) are the only two constraints in C i. Let q = { i 1, ~i 2 } is the query to be classified.

63 Log-Linear’s Failure to Accommodate Non-Closed constraints Theorem: Proof: pp 24—25 [Thesis] The log-linear model does not have a solution in the presence of non-closed constraints. Let ( i 1 i 2, 0.5 ) and ( i 1, 0.5 ) are the only two constraints in C i. Let q = { i 1, ~i 2 } is the query to be classified.

64 Log-Linear’s Failure to Accommodate Non-Closed constraints Theorem: Proof: pp 24—25 [Thesis] The log-linear model does not have a solution in the presence of non-closed constraints. Let ( i 1 i 2, 0.5 ) and ( i 1, 0.5 ) are the only two constraints in C i. Let q = { i 1, ~i 2 } is the query to be classified.

65 Log-Linear’s Failure to Accommodate Non-Closed constraints Theorem: Proof: pp 24—25 [Thesis] The log-linear model does not have a solution in the presence of non-closed constraints. Let ( i 1 i 2, 0.5 ) and ( i 1, 0.5 ) are the only two constraints in C i. Let q = { i 1, ~i 2 } is the query to be classified.

66 Log-Linear’s Failure to Accommodate Non-Closed constraints Theorem: Proof: pp 24—25 [Thesis] The log-linear model does not have a solution in the presence of non-closed constraints. Let ( i 1 i 2, 0.5 ) and ( i 1, 0.5 ) are the only two constraints in C i. Let q = { i 1, ~i 2 } is the query to be classified. This is a contradiction

67 Log-Linear’s Failure to Accommodate Non-Closed constraints Theorem: Proof: pp 24—25 [Thesis] The log-linear model does not have a solution in the presence of non-closed constraints. Let ( i 1 i 2, 0.5 ) and ( i 1, 0.5 ) are the only two constraints in C i. Let q = { i 1, ~i 2 } is the query to be classified. There is no possible [μ]-vector which can fit the distribution P to the given set of constraints.

68 Fix to the Log-Linear an important note: –The product form of the Log-Linear maximizes entropy. problem with the existing log-linear: –Cannot learn zero probabilities without setting μ values to zeroes.

69 Fix to the Log-Linear an important note: –The product form of the Log-Linear maximizes entropy. problem with the existing log-linear: –Cannot learn zero probabilities without setting μ values to zeroes. Solution: Keep the product form of the classical Log-Linear and explicitly set zero probabilities to the required queries and remove them from the learning step.

70 The Modified Log-Linear Model

71 The Modified Log-Linear Model Example: ( i 1 i 2, 0.5 ) and ( i 1, 0.5 ) are two constraints in C i i 1 i 2 P

72 The Modified Log-Linear Model Example: ( i 1 i 2, 0.5 ) and ( i 1, 0.5 ) are two constraints in C i i 1 i 2 P ( i 1 i 2, 0.5 ) is a non- closed constraint.

73 The Modified Log-Linear Model Example: ( i 1 i 2, 0.5 ) and ( i 1, 0.5 ) are two constraints in C i i 1 i 2 P ( i 1 i 2, 0.5 ) is a non- closed constraint. Define P using Log- Linear Model

74 The Modified Log-Linear Model Example: ( i 1 i 2, 0.5 ) and ( i 1, 0.5 ) are two constraints in C i i 1 i 2 P ( i 1 i 2, 0.5 ) is a non- closed constraint. Define P using Log- Linear Model Learnt using the GIS algorithm

75 Learning the Modified Log-Linear Model Learning the modified Log-Linear Model has 3 steps: –Find non-closed constraints –Find queries which satisfy them –Run GIS on the remaining queries

76 Learning the Modified Log-Linear Model Learning the modified Log-Linear Model has 3 steps: –Find non-closed constraints –Find queries which satisfy them –Run GIS on the remaining queries

77 Learning the Modified Log-Linear Model Learning the modified Log-Linear Model has 3 steps: –Find non-closed constraints –Find queries which satisfy them –Run GIS on the remaining queries Non-Closed Constraint A constraint ( s i, f i ) is non-closed iff there exists another constraint ( s j, f j ) such that:

78 An Efficient Method to find Non- Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i1i1 i2i2 i3i3 i3i3 i3i3 i2i2 i3i3 { }

79 An Efficient Method to find Non- Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i1i1 i2i2 i3i3 i3i3 i3i3 i2i2 i3i3 { } { i 1,i 2 }

80 An Efficient Method to find Non- Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } { i 1,i 2 }

81 An Efficient Method to find Non- Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { }

82 An Efficient Method to find Non- Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } To determine if a node is non-closed we should know frequencies of its immediate subsets

83 An Efficient Method to find Non- Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } To determine if a node is non-closed we should know frequencies of its immediate subsets { i 1,i 2,i 3 }

84 An Efficient Method to find Non- Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } To determine if a node is non-closed we should know frequencies of its immediate subsets { i 1,i 2 } { i 1,i 3 } { i 2, i 3 } { i 1,i 2,i 3 }

85 An Efficient Method to find Non- Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } To determine if a node is non-closed we should know frequencies of its immediate subsets { i 1,i 2 } { i 1,i 3 } { i 2, i 3 } { i 1,i 2,i 3 } Use parent’s set of subsets to build child’s set of subsets

86 An Efficient Method to find Non- Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } To determine if a node is non-closed we should know frequencies of its immediate subsets { i 1,i 2 } { i 1,i 2,i 3 } Use parent’s set of subsets to build child’s set of subsets

87 An Efficient Method to find Non- Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } To determine if a node is non-closed we should know frequencies of its immediate subsets { i 1,i 2 } { i 1 } { i 2 } { i 1,i 2,i 3 }

88 An Efficient Method to find Non- Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } To determine if a node is non-closed we should know frequencies of its immediate subsets { i 1,i 2 } { i 1 } { i 2 } { i 1,i 2,i 3 } i3i3

89 An Efficient Method to find Non- Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } To determine if a node is non-closed we should know frequencies of its immediate subsets { i 1,i 2 } { i 1 } { i 2 } { i 1,i 2,i 3 } i3i3 i3i3 i3i3

90 An Efficient Method to find Non- Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } To determine if a node is non-closed we should know frequencies of its immediate subsets { i 1,i 2 } { i 1 } { i 2 } { i 1,i 2,i 3 } i3i3 i3i3 i3i3 { i 1,i 3 } { i 2, i 3 }

91 An Efficient Method to find Non- Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } To determine if a node is non-closed we should know frequencies of its immediate subsets { i 1,i 2 } { i 1 } { i 2 } { i 1,i 2,i 3 } i3i3 i3i3 i3i3 { i 1,i 3 } { i 2, i 3 }

92 An Efficient Method to find Non- Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } To determine if a node is non-closed we should know frequencies of its immediate subsets { i 1,i 2 } { i 1,i 3 } { i 2, i 3 } { i 1,i 2,i 3 }

93 An Efficient Method to find Non-Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } To determine if a node is non-closed we should know frequencies of its immediate subsets { i 1,i 2,i 3 }

94 An Efficient Method to find Non-Closed Constraints Steps involved: –Store all constraints in a prefix-tree –Traverse the prefix tree and flag non-closed constraints ( i 1, f 1 ), ( i 2, f 2 ), ( i 3, f 3 ), ( i 1 i 2, f 4 ), ( i 1 i 3, f 5 ), ( i 2 i 3, f 6 ), ( i 1 i 2 i 3, f 7 ) i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } To determine if a node is non-closed we should know frequencies of its immediate subsets { i 1,i 2,i 3 } O( |I|· |C| )

95 Learning the Modified Log-Linear Model Learning the modified Log-Linear Model has 3 steps: –Find non-closed constraints –Find queries which satisfy them –Run GIS on the remaining queries

96 Learning the Modified Log-Linear Model Learning the modified Log-Linear Model has 3 steps: –Find non-closed constraints –Find queries which satisfy them –Run GIS on the remaining queries i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } { i 1,i 2,i 3 } { i 2,i 3 }

97 Learning the Modified Log-Linear Model Learning the modified Log-Linear Model has 3 steps: –Find non-closed constraints –Find queries which satisfy them –Run GIS on the remaining queries i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } { i 1,i 2,i 3 } { i 2,i 3 }

98 Learning the Modified Log-Linear Model Learning the modified Log-Linear Model has 3 steps: –Find non-closed constraints –Find queries which satisfy them –Run GIS on the remaining queries i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } { i 1,i 2,i 3 } { i 2,i 3 }

99 Learning the Modified Log-Linear Model Learning the modified Log-Linear Model has 3 steps: –Find non-closed constraints –Find queries which satisfy them –Run GIS on the remaining queries i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } { i 1,i 2,i 3 } { i 2,i 3 }

100 Learning the Modified Log-Linear Model Learning the modified Log-Linear Model has 3 steps: –Find non-closed constraints –Find queries which satisfy them –Run GIS on the remaining queries i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } { i 1,i 2,i 3 } { i 2,i 3 }

101 Learning the Modified Log-Linear Model Learning the modified Log-Linear Model has 3 steps: –Find non-closed constraints –Find queries which satisfy them –Run GIS on the remaining queries i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } { i 1,i 2,i 3 } { i 2,i 3 } Prob is zero

102 Learning the Modified Log-Linear Model Learning the modified Log-Linear Model has 3 steps: –Find non-closed constraints –Find queries which satisfy them –Run GIS on the remaining queries i 1,f 1 i 2,f 2 i 3,f 3 i 3,f 5 i 3,f 6 i 2,f 4 i 3,f 7 { } { i 1,i 2,i 3 } { i 2,i 3 } Prob is zero

103 Advantages of the Modified Log- Linear Model It can handle non-closed constraints In the learning procedure, it does not consider all possible queries. Hence is efficient. Dataset# ConsPruned q’s Austra(354) % Waveform(99) 241.3% Cleve(246) % Diabetes(85) % German(54) % Heart(115) % Breast(189) % Lymph(29) % Pima(87) 558.6%

104 Presentation Outline I. Introduction II. Related Work III. Contributions IV. ACME Classifier V. Handling Non-Closed Itemsets VI. Improving the Execution time VII. ACME and Naïve Bayes VIII. Experimental Results IX. Conclusions and Future Work

105 Improving the Execution time The learning step is computationally inefficient for domains with large number of items I –includes running GIS with the constraint set C and the query space 2 I (power set of I).

106 Improving the Execution time The learning step is computationally inefficient for domains with large number of items I –includes running GIS with the constraint set C and the query space 2 I (power set of I). Divide I into subparts I 1, I 2, …, I m s.t. there is no constraint ( s, f ) which overlaps two subparts. Run GIS for each I j and build corresponding distribution P j. Combine P 1, P 2, …, P m using naïve bayes.

107 Improving the Execution time The learning step is computationally inefficient for domains with large number of items I –includes running GIS with the constraint set C and the query space 2 I (power set of I). Divide I into subparts I 1, I 2, …, I m s.t. there is no constraint ( s, f ) which overlaps two subparts. Run GIS for each I j and build corresponding distribution P j. Combine P 1, P 2, …, P m using naïve bayes. Ex: I = {a,b,c,d}, and constraints are {a}, {a,b} and {c,d}. Split I into I 1 ={a,b} and I 2 ={c,d}. Learn Log-Linear models P 1 (.) for I 1 ={a,b} and P 2 (.) for I 2 ={c,d} P(b,c) = P 1 (b) * P 2 (c)

108 Improving the Execution time The learning step is computationally inefficient for domains with large number of items I –includes running GIS with the constraint set C and the query space 2 I (power set of I). Divide I into subparts I 1, I 2, …, I m s.t. there is no constraint ( s, f ) which overlaps two subparts. Run GIS for each I j and build corresponding distribution P j. Combine P 1, P 2, …, P m using naïve bayes. Ex: I = {a,b,c,d}, and constraints are {a}, {a,b} and {c,d}. Split I into I 1 ={a,b} and I 2 ={c,d}. Learn Log-Linear models P 1 (.) for I 1 ={a,b} and P 2 (.) for I 2 ={c,d} P(b,c) = P 1 (b) * P 2 (c)

109 Improving the Execution time The learning step is computationally inefficient for domains with large number of items I –includes running GIS with the constraint set C and the query space 2 I (power set of I). Divide I into subparts I 1, I 2, …, I m s.t. there is no constraint ( s, f ) which overlaps two subparts. Run GIS for each I j and build corresponding distribution P j. Combine P 1, P 2, …, P m using naïve bayes. Ex: I = {a,b,c,d}, and constraints are {a}, {a,b} and {c,d}. Split I into I 1 ={a,b} and I 2 ={c,d}. Learn Log-Linear models P 1 (.) for I 1 ={a,b} and P 2 (.) for I 2 ={c,d} P(b,c) = P 1 (b) * P 2 (c)

110 Presentation Outline I. Introduction II. Related Work III. Contributions IV. ACME Classifier V. Handling Non-Closed Itemsets VI. Improving the Execution time VII. ACME and Naïve Bayes I.Equivalence under certain conditions VIII. Experimental Results IX. Conclusions and Future Work

111 ACME and Naïve Bayes Classifiers ACMENaïve Bayes Constraints: ( i 1,f 1 ), ( i 2,f 2 ), ( i 1 i 2,f 3 ), … Constraints: ( i 1,f 1 ), ( i 2,f 2 ), …

112 ACME and Naïve Bayes Classifiers ACMENaïve Bayes Constraints: ( i 1,f 1 ), ( i 2,f 2 ), ( i 1 i 2,f 3 ), … Constraints: ( i 1,f 1 ), ( i 2,f 2 ), … Build P using log-linear model

113 ACME and Naïve Bayes Classifiers ACMENaïve Bayes Constraints: ( i 1,f 1 ), ( i 2,f 2 ), ( i 1 i 2,f 3 ), … Constraints: ( i 1,f 1 ), ( i 2,f 2 ), … Build P using log-linear model Build P using Naïve-Bayes model

114 ACME and Naïve Bayes Classifiers Is there any relationship between ACME and NB ?

115 ACME and Naïve Bayes Classifiers Is there any relationship between ACME and NB ? Yes. Naïve Bayes is a specialization of ACME.

116 ACME and Naïve Bayes Classifiers Is there any relationship between ACME and NB ? Yes. Naïve Bayes is a specialization of ACME. Theorem: Under no conditional dependencies between the items in the domain, ACME performs the same as that of NB. Proof: pp 34—37 [Thesis] Steps in the proof: (1) When there are no conditional dependencies between items, all constraints other than unigrams are redundant. (2) Log-Linear Model built with unigram frequencies will yield the same probability distribution as that of Naïve Bayes.

117 Presentation Outline I. Introduction II. Related Work III. Contributions IV. ACME Classifier V. Handling Non-Closed Itemsets VI. Improving the Execution time VII. ACME and Naïve Bayes VIII. Experimental Results IX. Conclusions and Future Work

118 Experiments Datasets chosen from UCI Machine Learning Repository. Compared with Naïve Bayes (NB), C4.5 Decision Tree Classifier, CBA Association Rule based Classifier, TAN Bayesian Network based Classifiers. ACME performed the best on an average.

119 Performance DatasetNBC4.5CBATANACME Austra Breast Cleve Diabetes German Heart Lymph Pima Waveform Performance of ACME as against to other classifiers.

120 Presentation Outline I. Introduction II. Related Work III. Contributions IV. ACME Classifier V. Handling Non-Closed Itemsets VI. Improving the Execution time VII. ACME and Naïve Bayes VIII. Experimental Results IX. Conclusions and Future Work

121 Conclusion Frequent Itemsets + Maximum Entropy An accurate theoretically appealing classifier

122 Future Work An Approximate Algorithm for GIS Identifying redundant constraints Removing user input –support and confidence thresholds are input by the user.

123 Questions?