Mining Optimal Decision Trees from Itemset Lattices Dr, Siegfried Nijssen Dr. Elisa Fromont KDD 2007.

Slides:

Advertisements

Similar presentations

Is Random Model Better? -On its accuracy and efficiency-

Advertisements

COMP3740 CR32: Knowledge Management and Adaptive Systems

Recap: Mining association rules from large datasets

Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,

DECISION TREES. Decision trees  One possible representation for hypotheses.

Rule-Based Classifiers. Rule-Based Classifier Classify records by using a collection of “if…then…” rules Rule: (Condition)  y –where Condition is a conjunctions.

gSpan: Graph-based substructure pattern mining

Rule Generation from Decision Tree Decision tree classifiers are popular method of classification due to it is easy understanding However, decision tree.

Mining High-Speed Data Streams

Data Mining Techniques: Classification. Classification What is Classification? –Classifying tuples in a database –In training set E each tuple consists.

Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.

Data Mining Association Analysis: Basic Concepts and Algorithms

Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.

Mining Association Rules. Association rules Association rules… –… can predict any attribute and combinations of attributes … are not intended to be used.

Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,

Hardness Results for Problems P: Class of “easy to solve” problems Absolute hardness results Relative hardness results –Reduction technique.

1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.

The Theory of NP-Completeness

Ensemble Learning: An Introduction

© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.

Covering Algorithms. Trees vs. rules From trees to rules. Easy: converting a tree into a set of rules –One rule for each leaf: –Antecedent contains a.

Classification II.

Classification.

Hardness Results for Problems P: Class of “easy to solve” problems Absolute hardness results Relative hardness results –Reduction technique.

Hardness Results for Problems

Mining Optimal Decision Trees from Itemset Lattices KDD’07 Presented by Xiaoxi Du.

Approximate Frequency Counts over Data Streams Gurmeet Singh Manku, Rajeev Motwani Standford University VLDB2002.

Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.

Chapter 9 – Classification and Regression Trees

Querying Structured Text in an XML Database By Xuemei Luo.

Inferring Decision Trees Using the Minimum Description Length Principle J. R. Quinlan and R. L. Rivest Information and Computation 80, , 1989.

K Nearest Neighbors Classifier & Decision Trees

The Lower Bounds of Problems

Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.

Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.

Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.

CS690L Data Mining: Classification

For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.

Association Rule Mining

MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.

Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.

1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.

Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.

Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.

Lecture Notes for Chapter 4 Introduction to Data Mining

Top-K Generation of Integrated Schemas Based on Directed and Weighted Correspondences by Ahmed Radwan, Lucian Popa, Ioana R. Stanoi, Akmal Younis Presented.

HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.

1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.

1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.

Basic Data Mining Techniques Chapter 3-A. 3.1 Decision Trees.

Suffix Tree 6 Mar MinKoo Seo. Contents  Basic Text Searching  Introduction to Suffix Tree  Suffix Trees and Exact Matching  Longest Common Substring.

1 Discriminative Frequent Pattern Analysis for Effective Classification Presenter: Han Liang COURSE PRESENTATION:

The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.

Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.

Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),

By N.Gopinath AP/CSE.  A decision tree is a flowchart-like tree structure, where each internal node (nonleaf node) denotes a test on an attribute, each.

Gspan: Graph-based Substructure Pattern Mining

DECISION TREES An internal node represents a test on an attribute.

C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.

C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.

Rule Induction for Classification Using

Ch9: Decision Trees 9.1 Introduction A decision tree:

Chapter 6 Classification and Prediction

Issues in Decision-Tree Learning Avoiding overfitting through pruning

Classification and Prediction

Discriminative Frequent Pattern Analysis for Effective Classification

Sungho Kang Yonsei University

Classification and Prediction

Stratified Sampling for Data Mining on the Deep Web

©Jiawei Han and Micheline Kamber

Presentation transcript:

Mining Optimal Decision Trees from Itemset Lattices Dr, Siegfried Nijssen Dr. Elisa Fromont KDD 2007

Introduction Decision Trees – Popular prediction mechanism – Efficient, easy to understand algorithms – Easily interpreted models Surprisingly, mining decision trees under constraints has not received much attention.

Introduction Finding the most accurate tree on training data in which each leaf covers at least n examples. Finding the k most accurate trees on training data in which the majority class in each leaf covers at least n examples more than any of the minority classes. Finding the smallest decision tree in which each leaf contains at least n examples and the expected accuracy is maximized for unseen examples. Finding the smallest or shallowest decision tree which has accuracy higher than minacc.

Motivation Algorithms do exist, so what’s the problem? – Heuristics are used to decide when to split the tree, in line, from top down. – Sometimes the heuristic is off! – A tree can be produced, but it might be sub- optimal. – Maybe a different heuristic will be better? – How do we know?

Motivation What is needed is an exact method for recognizing these optimal decision trees while functioning under various constraints. – Prove of a heuristic’s goodness. – Prove trends and theories in small, simple data sets hold true in larger, more complex data sets.

Motivation Authors suggest that problem complexity has been a deterrent. – Hardness is NP-Complete – Small problems could still be computable – Frequent itemset mining

Model Frequent itemset terminology – Items : I = {i 1, i 2, …, i m } – Transactions : D = {T 1, T 2, …, T n } – TID-Set : t(I) = {1, 2, …, n} – Frequency : freq(I) = |t(I)| – Support: support(I) = freq(I) / |D| – “frequent itemset” : support(I) ≥ minsup

Model Interested in finding the frequent item sets from databases containing examples labeled with classes. Formation of class association rules I → c(I) where c is the class with highest frequency from set of classes C

Model Decision Tree Classification – Examples are sorted down the tree – Each node tests an attribute of an example – Each edge represents a value of the attribute – Assumed binary attributes – Input to a decision tree learner is a matrix B where B ij contains the value of attribute i in example j

Model Observation: Transform a binary matrix B into transactional form D s.t. T j = { i | B ij = 1 } U { ⌐i | B ij = 0 } then examples sorted by B are sorted by items corresponding to itemsets occuring in D

Model Paths in the tree correspond to itemsets. Leaves identify the classes. If an example contains the itemset given by a path, then the example belongs to that class.

Model Decision tree learning typically specifies coverage requirements. Corresponds to setting a minimum threshold on support for association rules.

Model Accuracy of a tree is derived from the number of misclassified examples. accuracy(T) = |D| - e(T) / |D|, where e(T) = Sum(e(I)) for I in leaves(T) e(I) = freq(I) – freq c(I) (I)

Model Itemsets form a lattice containing many decision trees.

Method Finding decision trees under contraints is similar to querying a database. Query has three parts – Constraints on individual nodes – Constraints on the overall tree – Preference for a specific tree instance

Method Individual node constraints – Q 1 : { T | T belongs to DecisionTrees, for all I belonging to paths(T), p(I) } – Locally constrained decision tree – Predicate p(I) represents the constraint. – Simple case: p(I) := (freq(I) ≥ minfreq) – Two types of local constraints Coverage: frequency Pattern: itemset size

Method Constraints on the overall tree Q 2 : { T | T belongs to Q 1, q(T) } Globally constrained decision trees q(T) is a conjunction of the following four constraints: e(T): error of a tree on training data ex(T): expected error on unseen examples size(T): number of nodes in the tree depth(T): longest path permitted from root to leaf Optional

Method Preference for a specific tree instance Q 3 : output minarg T in T 2 [ r 1 (T), r 2 (T), …, r n (T) ] where r i = { e, ex, size, depth } Tuples of r are compared lexicographically, and define a ranking. Since the function is minimization, ordering of r is not relevant.

Algorithm

Algorithm (Part 2)

Contributions Dynamic programming solution When an optimal tree (may or may not eventually become a subtree) is computed, that tree is stored. Requests for identical trees result in fetches to the stored set of trees. Accessing data can be implemented in one of four ways.

Contributions Data access is required to compute frequency counts needed at three key points in the algorithm. Four approaches: – Simple – FIM – Constrained FIM – Closure based single step

Contributions Simple Method – Itemset frequencies are computed while the algorithm is executing. – Calling DL8-Recursive for an itemset I results in a scan of the data for I, during which frequency for I can be calculated.

Contributions FIM – Frequent Itemset Miners – Every itemset must satisfy p. – If p is a minimum frequency constraint, then preprocess the data using a FIM to determine the itemsets that qualify. – Use only these itemsets in the algorithm.

Contributions Constrained FIM – Involves the identification of an itemset’s relevancy while using a frequent itemset miner. – Some itemsets, if assumed to be frequently, have infrequent counterparts, yet some tree will still contain these frequent itemsets. – This method removes these itemset.

Contributions Closure based single step

Experiments

Related Work