Learning Rules from Data

Slides:



Advertisements
Similar presentations
COMP3740 CR32: Knowledge Management and Adaptive Systems
Advertisements

Data Mining Classification: Alternative Techniques
Rule-Based Classifiers. Rule-Based Classifier Classify records by using a collection of “if…then…” rules Rule: (Condition)  y –where Condition is a conjunctions.
From Decision Trees To Rules
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
3-1 Decision Tree Learning Kelby Lee. 3-2 Overview ¨ What is a Decision Tree ¨ ID3 ¨ REP ¨ IREP ¨ RIPPER ¨ Application.
RIPPER Fast Effective Rule Induction
IT 433 Data Warehousing and Data Mining
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
1 Test-Cost Sensitive Naïve Bayes Classification X. Chai, L. Deng, Q. Yang Dept. of Computer Science The Hong Kong University of Science and Technology.
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Chapter 7 – Classification and Regression Trees
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Decision Tree Algorithm
Decision Tree Learning Learning Decision Trees (Mitchell 1997, Russell & Norvig 2003) –Decision tree induction is a simple but powerful learning paradigm.
Classification Continued
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Covering Algorithms. Trees vs. rules From trees to rules. Easy: converting a tree into a set of rules –One rule for each leaf: –Antecedent contains a.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
A Comparison of Discriminant Functions and Decision Tree Induction Techniques for Evaluation of Antenatal Fetal Risk Assessment Nilgün Güler, Olcay Taner.
ID3 Algorithm Allan Neymark CS157B – Spring 2007.
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Mohammad Ali Keyvanrad
Chapter 9 – Classification and Regression Trees
Chapter 4 Classification. 2 Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of.
Computational Intelligence: Methods and Applications Lecture 19 Pruning of decision trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
GATree: Genetically Evolved Decision Trees 전자전기컴퓨터공학과 데이터베이스 연구실 G 김태종.
Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
Bab 5 Classification: Alternative Techniques Part 1 Rule-Based Classifer.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
CS690L Data Mining: Classification
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
Rule Induction Overview Generic separate-and-conquer strategy CN2 rule induction algorithm Improvements to rule induction.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
Class1 Class2 The methods discussed so far are Linear Discriminants.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
Decision Tree Learning
Presentation on Decision trees Presented to: Sir Marooof Pasha.
Neural Trees Olcay Taner Yıldız, Ethem Alpaydın Boğaziçi University Computer Engineering Department
RULE-BASED CLASSIFIERS
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
On the Relation Between Simulation-based and SAT-based Diagnosis CMPE 58Q Giray Kömürcü Boğaziçi University.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Fast Effective Rule Induction
Presented by: Dr Beatriz de la Iglesia
Data Science Algorithms: The Basic Methods
Data Mining Classification: Alternative Techniques
Decision Tree Saed Sayad 9/21/2018.
Rule Learning for Go Introduction Data Extraction Bad Move Problems
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Data Mining Rule Classifiers
Machine Learning Chapter 3. Decision Tree Learning
Statistical Learning Dong Liu Dept. EEIS, USTC.
Data Mining Rule Classifiers
INTRODUCTION TO Machine Learning
INTRODUCTION TO Machine Learning 2nd Edition
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Presentation transcript:

Learning Rules from Data Olcay Taner Yıldız, Ethem Alpaydın yildizol@cmpe.boun.edu.tr Department of Computer Engineering Boğaziçi University

Rule Induction? Derive meaningful rules from data Mainly used in classification problems Attribute types (Continuous, Discrete) Name Income Owns a house? Marital status Default Ali 25,000 $ Yes Married No Veli 18,000 $ No Married Yes

Rules Disjunction: conjunctions are binded via OR's Conjunction: propositions are binded via AND's Proposition: relation on an attribute Attribute is Continuous (defines a subinterval) Attribute is Discrete (= one of the values of the attribute)

How to generate rules? Rule Induction Techniques Via Trees C4.5Rules Directly from Data Ripper

C4.5Rules (Quinlan, 93) Create decision tree using C4.5 Convert the decision tree to a ruleset by writing each path from root to the leaves as a rule

C4.5Rules q1 x1 > q1 x2 > q2 y = 0 y = 1 yes no q2 x2 : savings x1 : yearly-income q1 x1 > q1 x2 > q2 y = 0 y = 1 yes no OK DEFAULT q2 Rules: IF (x1 > q1) AND (x2 > q2) THEN Y = OK IF (x1 < q1) OR ((x1 > q1) AND (x2 < q2)) THEN Y = DEFAULT

RIPPER (Cohen, 95) Learn rule for each class Two Steps Objective class is positive, other classes are negative Two Steps Initialization Learn rules one by one Immediately prune learned rule Optimization Since search is greeedy, pass k times over the rules to optimize

RIPPER (Initialization) Split (pos, neg) into growset and pruneset Rule := grow conjunction with growset Add propositions one by one IF (x1 > q1) AND (x2 > q2) AND (x2 < q3) THEN Y = OK Rule := prune conjunction with pruneset Remove propositions IF (x1 > q1) AND (x2 < q3) THEN Y = OK If MDL < Best MDL + 64 Add conjunction Else Break

Ripper (Optimization) Repeat K times For each rule IF (x1 > q1) AND (x2 < q3) THEN Y = OK Generate revisionrule by adding propositions IF (x1 > q1) AND (x2 < q3) AND (x1 > q3) THEN Y = OK Generate replacementrule by regrowing IF (x1 > q4) THEN Y = OK Compare current rule with revisionrule and replacementrule Take the best according to MDL

Minimum Description Length Description Length of a Ruleset Description Length of Rules S = ||k|| + k log2 (n / k) + (n – k) log2 (k / (n – k)) Description Length of Exceptions S = log2 (|D| + 1) + fp (-log2 (e / 2C)) + (C – fp) (-log2 (1 - e / 2C)) + fn (-log2 (fn / 2U)) + (U – fn) (-log2 (1 - fn / 2U))

Ripper* Finding best condition is done by trying all possible split points (time-consuming) Shortcut: Linear Discriminant Analysis Split point is calculated analytically To be more robust Instances further than 3  are removed If number of examples < 20, shortcut not used

Experiments 29 datasets from UCI repository are used 10 fold cross-validation Comparison done using one-sided t test Comparison of three algorithms C4.5Rules, Ripper, Ripper* Comparison based on Error Rate Complexity of the rulesets Learning Time

Error Rate (I) Ripper and its variant have better performance than C4.5Rules Ripper* has similar performance compared to Ripper C4.5Rules has advantage when the number of rules are small (Exhaustive Search)

Error Rate (II) C4.5Rules Ripper Ripper* Total - 4 7 12 2 10 1 5 9

Ruleset Complexity (I) Ripper and Ripper* produce significantly small number of rules compared to C4.5Rules C4.5Rules starts with an unpruned tree, which is a large amount of rule to start

Ruleset Complexity (II) C4.5Rules Ripper Ripper* Total - 1 26 10 27 2 3 11

Learning Time (I) Ripper* better than Ripper, which is better than C4.5Rules C4.5Rules O(N3) Ripper O(Nlog2N) Ripper* O(NlogN)

Learning Time (II) C4.5Rules Ripper Ripper* Total - 2 23 25 13 27

Conclusion Comparison of two rule induction algorithms C4.5Rules and Ripper Proposed a shortcut in learning conditions using LDA (Ripper*) Ripper is better than C4.5Rules Ripper* improves learning time of Ripper without decreasing its performance