Rule-Based Classifiers. Rule-Based Classifier Classify records by using a collection of “if…then…” rules Rule: (Condition)  y –where Condition is a conjunctions.

Slides:



Advertisements
Similar presentations
COMP3740 CR32: Knowledge Management and Adaptive Systems
Advertisements

Data Mining Classification: Alternative Techniques
From Decision Trees To Rules
Rule Generation from Decision Tree Decision tree classifiers are popular method of classification due to it is easy understanding However, decision tree.
Paper By - Manish Mehta, Rakesh Agarwal and Jorma Rissanen
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
RIPPER Fast Effective Rule Induction
Decision Tree Approach in Data Mining
Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Classification Algorithms – Continued. 2 Outline  Rules  Linear Models (Regression)  Instance-based (Nearest-neighbor)
Classification Algorithms – Continued. 2 Outline  Rules  Linear Models (Regression)  Instance-based (Nearest-neighbor)
1 Input and Output Thanks: I. Witten and E. Frank.
Naïve Bayes: discussion
Primer Parcial -> Tema 3 Minería de Datos Universidad del Cauca.
Data Mining Classification: Alternative Techniques
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
Decision Tree Rong Jin. Determine Milage Per Gallon.
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
Instance-based representation
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Decision Tree Algorithm
Induction of Decision Trees
Covering Algorithms. Trees vs. rules From trees to rules. Easy: converting a tree into a set of rules –One rule for each leaf: –Antecedent contains a.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Basic Data Mining Techniques
Next Generation Techniques: Trees, Network and Rules
Machine Learning Chapter 3. Decision Tree Learning
Decision Trees.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Mining Optimal Decision Trees from Itemset Lattices Dr, Siegfried Nijssen Dr. Elisa Fromont KDD 2007.
Mohammad Ali Keyvanrad
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.4: Covering Algorithms Rodney Nielsen Many.
Bab 5 Classification: Alternative Techniques Part 1 Rule-Based Classifer.
Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar.
Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar.
CS690L Data Mining: Classification
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
RULE-BASED CLASSIFIERS
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Classification Algorithms Covering, Nearest-Neighbour.
Fall 2004, CIS, Temple University CIS527: Data Warehousing, Filtering, and Mining Lecture 8 Alternative Classification Algorithms Lecture slides taken/modified.
Basic Data Mining Techniques Chapter 3-A. 3.1 Decision Trees.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 By Gun Ho Lee Intelligent Information Systems Lab Soongsil.
Data Mining Chapter 4 Algorithms: The Basic Methods - Constructing decision trees Reporter: Yuen-Kuei Hsueh Date: 2008/7/24.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Machine Learning in Practice Lecture 18
DECISION TREES An internal node represents a test on an attribute.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Artificial Intelligence
Data Mining Classification: Alternative Techniques
Rule-Based Classification
Data Science Algorithms: The Basic Methods
Data Science Algorithms: The Basic Methods
Classification Decision Trees
Data Mining Classification: Alternative Techniques
Classification Algorithms
Data Mining Rule Classifiers
Statistical Learning Dong Liu Dept. EEIS, USTC.
Data Mining Rule Classifiers
Machine Learning in Practice Lecture 17
MIS2502: Data Analytics Classification Using Decision Trees
Data Mining CSCI 307, Spring 2019 Lecture 9
Presentation transcript:

Rule-Based Classifiers

Rule-Based Classifier Classify records by using a collection of “if…then…” rules Rule: (Condition)  y –where Condition is a conjunctions of attributes y is the class label –LHS: rule antecedent or condition –RHS: rule consequent –Examples of classification rules: (Blood Type=Warm)  (Lay Eggs=Yes)  Birds (Taxable Income < 50K)  (Refund=Yes)  Evade=No

Rule-based Classifier (Example) R1: (Give Birth = no)  (Can Fly = yes)  Birds R2: (Give Birth = no)  (Live in Water = yes)  Fishes R3: (Give Birth = yes)  (Blood Type = warm)  Mammals R4: (Give Birth = no)  (Can Fly = no)  Reptiles R5: (Live in Water = sometimes)  Amphibians

Application of Rule-Based Classifier A rule r covers an instance x if the attributes of the instance satisfy the condition of the rule R1: (Give Birth = no)  (Can Fly = yes)  Birds R2: (Give Birth = no)  (Live in Water = yes)  Fishes R3: (Give Birth = yes)  (Blood Type = warm)  Mammals R4: (Give Birth = no)  (Can Fly = no)  Reptiles R5: (Live in Water = sometimes)  Amphibians The rule R1 covers a hawk => Bird The rule R3 covers the grizzly bear => Mammal

Rule Coverage and Accuracy Coverage of a rule: –Fraction of records that satisfy the antecedent of a rule Accuracy of a rule: –Fraction of records that satisfy both the antecedent and consequent of a rule (over those that satisfy the antecedent) (Status=Single)  No Coverage = 40%, Accuracy = 50%

Decision Trees vs. rules From trees to rules. Easy: converting a tree into a set of rules –One rule for each leaf: –Antecedent contains a condition for every node on the path from the root to the leaf –Consequent is the class assigned by the leaf –Straightforward, but rule set might be overly complex

Decision Trees vs. rules From rules to trees More difficult: transforming a rule set into a tree –Tree cannot easily express disjunction between rules Example: If a and b then x If c and d then x –Corresponding tree contains identical subtrees (  “replicated subtree problem”)

A tree for a simple disjunction

How does Rule-based Classifier Work? R1: (Give Birth = no)  (Can Fly = yes)  Birds R2: (Give Birth = no)  (Live in Water = yes)  Fishes R3: (Give Birth = yes)  (Blood Type = warm)  Mammals R4: (Give Birth = no)  (Can Fly = no)  Reptiles R5: (Live in Water = sometimes)  Amphibians A lemur triggers rule R3, so it is classified as a mammal A turtle triggers both R4 and R5 A dogfish shark triggers none of the rules

Desiderata for Rule-Based Classifier Mutually exclusive rules –No two rules are triggered by the same record. –This ensures that every record is covered by at most one rule. Exhaustive rules –There exists a rule for each combination of attribute values. –This ensures that every record is covered by at least one rule. Together these properties ensure that every record is covered by exactly one rule.

Rules Non mutually exclusive rules –A record may trigger more than one rule –Solution? Ordered rule set Non exhaustive rules –A record may not trigger any rules –Solution? Use a default class

Ordered Rule Set Rules are ranked ordered according to their priority (e.g. based on their quality) –An ordered rule set is known as a decision list When a test record is presented to the classifier –It is assigned to the class label of the highest ranked rule it has triggered –If none of the rules fired, it is assigned to the default class R1: (Give Birth = no)  (Can Fly = yes)  Birds R2: (Give Birth = no)  (Live in Water = yes)  Fishes R3: (Give Birth = yes)  (Blood Type = warm)  Mammals R4: (Give Birth = no)  (Can Fly = no)  Reptiles R5: (Live in Water = sometimes)  Amphibians

Building Classification Rules: Sequential Covering 1.Start from an empty rule 2.Grow a rule using some Learn-One-Rule function 3.Remove training records covered by the rule 4.Repeat Step (2) and (3) until stopping criterion is met

This approach is called a covering approach because at each stage a rule is identified that covers some of the instances

Example: generating a rule Possible rule set for class “b”: More rules could be added for “perfect” rule set If x  1.2 then class = b If x > 1.2 and y  2.6 then class = b

A simple covering algorithm Generates a rule by adding tests that maximize rule’s accuracy Similar to situation in decision trees: problem of selecting an attribute to split on. –But: decision tree inducer maximizes overall purity Here, each new test (growing the rule) reduces rule’s coverage.

Selecting a test Goal: maximizing accuracy –t: total number of instances covered by rule –p: positive examples of the class covered by rule –t-p: number of errors made by rule  Select test that maximizes the ratio p/t We are finished when p/t = 1 or the set of instances can’t be split any further

Example: contact lenses data AgeSpectacle prescription AstigmatismTear production rateRecommended Lenses youngmyopenoreducednone youngmyopenonormalsoft youngmyopeyesreducednone youngmyopeyesnormalhard younghypermetropenoreducednone younghypermetropenonormalsoft younghypermetropeyesreducednone younghypermetropeyesnormalhard pre-presbyopicmyopenoreducednone pre-presbyopicmyopenonormalsoft pre-presbyopicmyopeyesreducednone pre-presbyopicmyopeyesnormalhard pre-presbyopichypermetropenoreducednone pre-presbyopichypermetropenonormalsoft pre-presbyopichypermetropeyesreducednone pre-presbyopichypermetropeyesnormalnone presbyopicmyopenoreducednone presbyopicmyopenonormalnone presbyopicmyopeyesreducednone presbyopicmyopeyesnormalhard presbyopichypermetropenoreducednone presbyopichypermetropenonormalsoft presbyopichypermetropeyesreducednone presbyopichypermetropeyesnormalnone

Example: contact lenses data The numbers on the right show the fraction of “correct” instances in the set singled out by that choice. In this case, correct means that their recommendation is “hard.”

Modified rule and resulting data The rule isn’t very accurate, getting only 4 out of 12 that it covers. So, it needs further refinement.

Further refinement

Modified rule and resulting data Should we stop here? Perhaps. But let’s say we are going for exact rules, no matter how complex they become. So, let’s refine further.

Further refinement

The result

Pseudo-code for PRISM For each class C Initialize E to the instance set While E contains instances in class C Create a rule R with an empty left-hand side that predicts class C Until R is perfect (or there are no more attributes to use) do For each attribute A not mentioned in R, and each value v, Consider adding the condition A = v to the left-hand side of R Select A and v to maximize the accuracy p/t (break ties by choosing the condition with the largest p) Add A = v to R Remove the instances covered by R from E RIPPER Algorithm is similar. It uses instead of p/t the info gain. Heuristic: order C in ascending order of occurrence.

Separate and conquer Methods like PRISM (for dealing with one class) are separate-and-conquer algorithms: –First, a rule is identified –Then, all instances covered by the rule are separated out –Finally, the remaining instances are “conquered” Difference to divide-and-conquer methods: –Subset covered by rule doesn’t need to be explored any further