Rule Induction with Extension Matrices Yuen F. Helbig Dr. Xindong Wu.

Slides:

Advertisements

Similar presentations

Multi‑Criteria Decision Making

Advertisements

_ Rough Sets. Basic Concepts of Rough Sets _ Information/Decision Systems (Tables) _ Indiscernibility _ Set Approximation _ Reducts and Core _ Rough Membership.

Operations Research Assistant Professor Dr. Sana’a Wafa Al-Sayegh 2 nd Semester ITGD4207 University of Palestine.

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.

Decision Tree Approach in Data Mining

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,

1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.

Classification Techniques: Decision Tree Learning

Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.

Algorithms: The basic methods. Inferring rudimentary rules Simplicity first Simple algorithms often work surprisingly well Many different kinds of simple.

The Transportation and Assignment Problems

CSE (c) S. Tanimoto, 2008 Propositional Logic

Rule Induction with Extension Matrices Dr. Xindong Wu Journal of the American Society for Information Science VOL. 49, NO. 5, 1998 Presented by Peter Duval.

Branch and Bound Searching Strategies

Implicit Hitting Set Problems Richard M. Karp Harvard University August 29, 2011.

NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.

1 Energy Efficient Multi-match Packet Classification with TCAM Fang Yu

Induction of Decision Trees

August 2005RSFDGrC 2005, Regina, Canada 1 Feature Selection Based on Relative Attribute Dependency: An Experimental Study Jianchao Han 1, Ricardo Sanchez.

Fuzzy Interpretation of Discretized Intervals Dr. Xindong Wu Andrea Porter April 11, 2002.

Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong.

1 Branch and Bound Searching Strategies 2 Branch-and-bound strategy 2 mechanisms: A mechanism to generate branches A mechanism to generate a bound so.

Chapter 11: Limitations of Algorithmic Power

Backtracking Reading Material: Chapter 13, Sections 1, 2, 4, and 5.

On Generating All Shortest Paths and Minimal Cut-sets By: Beth Hayden, Daniel MacDonald July 20, 2005 Advisor: Dr. Endre Boros, RUTCOR.

CSE 550 Computer Network Design Dr. Mohammed H. Sqalli COE, KFUPM Spring 2007 (Term 062)

Rule Induction with Extension Matrices Leslie Damon, based on slides by Yuen F. Helbig Dr. Xindong Wu, 1998.

Daniel Kroening and Ofer Strichman Decision Procedures An Algorithmic Point of View Deciding ILPs with Branch & Bound ILP References: ‘Integer Programming’

Decision Tree Learning

Linear Algebra/Eigenvalues and eigenvectors. One mathematical tool, which has applications not only for Linear Algebra but for differential equations,

The Theory of NP-Completeness 1. What is NP-completeness? Consider the circuit satisfiability problem Difficult to answer the decision problem in polynomial.

Chapter 11 Limitations of Algorithm Power. Lower Bounds Lower bound: an estimate on a minimum amount of work needed to solve a given problem Examples:

Mohammad Ali Keyvanrad

1 CSI 5388:Topics in Machine Learning Inductive Learning: A Review.

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “ Introduction to the Design & Analysis of Algorithms, ” 2 nd ed., Ch. 1 Chapter.

INTRODUCTION TO ARTIFICIAL INTELLIGENCE COS302 MICHAEL L. LITTMAN FALL 2001 Satisfiability.

Design and Analysis of Algorithms - Chapter 111 How to tackle those difficult problems... There are two principal approaches to tackling NP-hard problems.

Learning from Observations Chapter 18 Through

Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.

Boolean Minimizer FC-Min: Coverage Finding Process Petr Fišer, Hana Kubátová Czech Technical University Department of Computer Science and Engineering.

CS690L Data Mining: Classification

Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.

Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.

Decision Tree Learning

Chapter 13 Backtracking Introduction The 3-coloring problem

1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.

Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.

Custom Computing Machines for the Set Covering Problem Paper Written By: Christian Plessl and Marco Platzner Swiss Federal Institute of Technology, 2002.

Chap. 10 Learning Sets of Rules 박성배 서울대학교 컴퓨터공학과.

Branch and Bound Searching Strategies

Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.

DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.

Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.

Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Chapter 7. Classification and Prediction

Artificial Intelligence

Chapter 6 Classification and Prediction

Data Science Algorithms: The Basic Methods

Decision Tree Saed Sayad 9/21/2018.

Classification and Prediction

Machine Learning: Lecture 3

CSE 550 Computer Network Design

©Jiawei Han and Micheline Kamber

A task of induction to find patterns

Data Mining CSCI 307, Spring 2019 Lecture 21

A task of induction to find patterns

Presentation transcript:

Rule Induction with Extension Matrices Yuen F. Helbig Dr. Xindong Wu

Outline  Extension matrix approach for rule induction  The MFL and MCV optimization problems  The AE1 solution  The HCV solution  Noise handling and discretization in HCV  Comparison of HCV with ID3-like algorithms including C4.5 and C4.5 rules

a  Number of attributes X a  a th attribute e   Vector of positive examples e –  Vector of negative examples  Value of a th attribute in the k th positive example n  Number of negative examples p  Number of positive examples (r ij ) axb  ij th element of axb matrix A(i,j)  ij th element of matrix A Extension Matrix Terminology

 A positive example is such an example that belongs to a known class, say ‘Play’  All the other examples can be called negative examples Extension Matrix Definitions (overcast, mild, high, windy) => Play (rainy, hot, high, windy) => Don’t Play

 Negative example matrix is defined as Negative Example Matrix

 when, v + j k  NEM ij NEM ij when, v + j k  NEM ij The extension matrix (EM) of a positive example against NEM, is defined as dead-element Extension Matrix

Example Extension Matrix Negative Extension Matrix (NEM) Positive Example

Example Extension Matrix Extension Matrix (EM) Positive Example

e.g., {X 1  1, X 2  0, X 1  1} and {X 1  1, X 3  1, X 2  0} are paths in the extension matrix above A set of ‘n’ non-dead elements that come from ‘i’ different rows is called a path in an extension matrix Attributes Extension matrix Paths in Extension Matrices

Conjunctive Formulas A path in the EM k of the positive example k against NEM corresponds to a conjunctive formula or cover

A path in the EMD of against NE corresponds to a conjunctive formula or cover which covers Against NE and vice-versa Disjunction Matrix  when, otherwise all of Extension Matrix Disjunction

EMD Example Negative Extension Matrix (NEM)

EMD Example Extension Matrix Disjunction (EMD) Positive Example

EMD Example Positive Example Extension Matrix Disjunction (EMD)

EMD Example Positive Example Extension Matrix Disjunction (EMD)

MFL and MCV (1)  The minimum formula problem (MFL)  Generating a conjunctive formula that covers a positive example or an intersecting group of positive examples against NEM and has the minimum number of different conjunctive selectors  The minimum cover problem (MCV)  Seeking a cover that covers all positive examples in PE against NEM and has the minimum number of conjunctive formulae with each conjunctive formula being as short as possible

MFL and MCV (2)  NP-hard  Two complete algorithms are designed to solve them when each attribute domain D i  {i  1,…,a} satisfies |D i  2|  O(na2 a )for MFL  O(n2 a 4 a  pa 2 4 a )for MCV  When |D i  2|, the domain can be decomposed into several, each having base 2

AE1 Heuristic  Starting search from columns with the most non- dead elements  Simplifying redundancy by deductive inference rules in mathematical logic

 Can easily loose optimum solution Here, AE1 will select [X 2  0], [X 1  1], and [X 3  1], instead of [X 1  1] and [X 3  1]  Simplifying redundancy for MFL and MCV itself is NP-hard Problems with AE1

 HCV is a extension matrix based rule induction algorithm which is  Heuristic  Attributebased  Noisetolerant  Divides the positive examples into intersecting groups.  Uses HFL heuristics to find a conjunctive formula which covers each intersecting group.  Loworder polynomial time complexity at induction time What is HCV ?

HCV Issues  The HCV algorithm  The HFL heuristics  Speed and efficiency  Noise handling capabilities  Dealing with numeric and nominal data  Accuracy and description compactness

HCV Algorithm (1) Procedure HCV(EM 1,..., EM p ; Hcv) integer n, a, p matrix EM 1 (n,a),..., EM p (n,a), D(p) set Hcv S1: D   D(j) = 1 (j = 1,..., p) indicates that EM j has been put into an intersecting group. Hcv   initialization S2: for i = 1 to p, do if D(i) = 0 then { EM  EM i

HCV Algorithm (2) for j = i+1 to p, do if D(j) = 0 then {EM2  EM EM j If there exists at least one path in EM2 then { EM  EM2, D(j)  1 } } next j call HFL(EM; Hfl) Hcv  Hcv  Hfl } next i Return (Hcv)

HFL - Fast Strategy Selector [X 5  {normal, dry-peep}] can be a possible selector, which will cover all 5 rows

HFL - Precedence Selector [X 1  1] and [X 3  1] are two inevitable selectors in the above extension matrix

HFL - Elimination Attribute X 2 can be eliminated by X 3

HFL - Least Frequency Attribute X 1 can be eliminated and there still exists a path

HFL Algorithm (1) Procedure HFL(EM; Hfl) S0: Hfl  {} S1: /* the fast strategy */ Try the fast strategy on all these rows which haven't been covered; If successful, add a corresponding selector to Hfl and return(Hfl) S2: /* the precedence strategy */ Apply the precedence strategy to the uncovered rows; If some inevitable selectors are found, add them to Hfl, label all the rows they cover, and go to S1

HFL Algorithm (2) S3: /* the elimination strategy */ Apply the elimination strategy to those attributes that have neither been selected nor eliminated; If an eliminable selector is found, reset all the elements in the corresponding column with *, and go to S2. S4: /* the leastfrequency strategy */ Apply the leastfrequency strategy to those attributes which have neither been selected nor eliminated, and find a leastfrequency selector; Reset all the elements in the corresponding column with *, and go to S2. Return(Hfl)

Complexity of HFL  S1 - O(na)  S2 - O(na)  S3 - O(na 2 )  S4 - O(na)  Overall - O( a(na  na  na 2  na) )  O(na 3 )

Complexity of HCV  Worst case time complexity  Space requirement  2na

HCV Example

NEM

HCV Example EM 1 Positive Example 1

HCV Example EM 2 Positive Example 2

HCV Example EM 3 Positive Example 3

HCV Example EM 4 Positive Example 4

HCV Example EM 5 Positive Example 5

HCV Example EM 1 EM 2

HCV Example EM 1 EM 2 EM 3

HCV Example EM 1 EM 2 EM 3 EM 4

HCV Example EM 1 EM 2 EM 3 EM 4 EM 5

HCV Example HFL Step 1: Fast Strategy HFL Rules = {}

HCV Example HFL Step 2: Precedence HFL Rules = {}

HCV Example HFL Step 3: Elimination HFL Rules = {}

HCV Example HFL Rules = {} HFL Step 4: Least-Frequency

HCV Example HFL Step 4: Least-Frequency HFL Rules = {}

HCV Example HFL Step 2: Precedence HFL Rules = {ESR fast }

HCV Example HFL Step 2: Precedence HFL Rules = {ESR fast }

HCV Example HFL Step 1: Fast Strategy HFL Rules = {ESR fast, Auscultation normal }

HCV Example HFL Step 1: Fast Strategy HFL Rules = {ESR fast, Auscultation normal }

HCV Example HCV generated rule C4.5rules generated rule

Example (8)

HCV versus AE1  The use of disjunctive matrix  Reasonable solution to MFL and MCV  Noise handling  Discretization of attributes

HCV Noise Handling  Don’t care values are dead elements  Approximate partitioning  Stopping criteria

Discretization of Attributes  Information Gain Heuristic  Stop splitting criteria  Stop if the information gain on all cut points is the same.  Stop if the number of example to split is less than a certain number.  Limit the total number of intervals.

Comparison (1) Table 1:Number of rules and conditions using Monk 1, 2 and 3 dataset as training set 1, 2 and 3 respectively

Comparison (2) Table 2: Accuracy

Comparison (3)

Conclusions  Rules generated in HCV take the form of variable-valued logic rules, rather than decision trees  HCV generates very compact rules in low-order polynomial time  Noise handling and discretization  Predictive accuracy comparable to the ID3 family of algorithms viz., C4.5, C4.5rules