Data Mining CSCI 307, Spring 2019 Lecture 9

Slides:



Advertisements
Similar presentations
COMP3740 CR32: Knowledge Management and Adaptive Systems
Advertisements

Rule-Based Classifiers. Rule-Based Classifier Classify records by using a collection of “if…then…” rules Rule: (Condition)  y –where Condition is a conjunctions.
Classification Techniques: Decision Tree Learning
Mining Association Rules. Association rules Association rules… –… can predict any attribute and combinations of attributes … are not intended to be used.
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
Knowledge Representation. 2 Outline: Output - Knowledge representation  Decision tables  Decision trees  Decision rules  Rules involving relations.
Induction of Decision Trees
Decision Trees Chapter 18 From Data to Knowledge.
Knowledge Representation. 2 Outline: Output - Knowledge representation  Decision tables  Decision trees  Decision rules  Rules involving relations.
Covering Algorithms. Trees vs. rules From trees to rules. Easy: converting a tree into a set of rules –One rule for each leaf: –Antecedent contains a.
Data Mining – Algorithms: OneR Chapter 4, Section 4.1.
Data Mining – Output: Knowledge Representation
Slides for “Data Mining” by I. H. Witten and E. Frank.
Mohammad Ali Keyvanrad
1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.
Data Mining – Algorithms: Prism – Learning Rules via Separating and Covering Chapter 4, Section 4.4.
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
CpSc 810: Machine Learning Decision Tree Learning.
Data Mining Practical Machine Learning Tools and Techniques Chapter 3: Output: Knowledge Representation Rodney Nielsen Many of these slides were adapted.
1 1 Slide Using Weka. 2 2 Slide Data Mining Using Weka n What’s Data Mining? We are overwhelmed with data We are overwhelmed with data Data mining is.
Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.
CS690L Data Mining: Classification
1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge Version 0.1 ?
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.2 Statistical Modeling Rodney Nielsen Many.
 Classification 1. 2  Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.  Supervised learning: classes.
Data Mining – Algorithms: Naïve Bayes Chapter 4, Section 4.2.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
Machine Learning in Practice Lecture 5 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
An Exercise in Machine Learning
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Output: Knowledge Representation WFH: Data Mining,
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten and E. Frank.
Data Mining Practical Machine Learning Tools and Techniques
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
Chapter 18 From Data to Knowledge
Decision Tree Learning
Machine Learning Lecture 2: Decision Tree Learning.
Data Science Algorithms: The Basic Methods
Classification Algorithms
Decision Tree Learning
Prepared by: Mahmoud Rafeek Al-Farra
Data Science Algorithms: The Basic Methods
Decision Trees: Another Example
Artificial Intelligence
Data Science Algorithms: The Basic Methods
Data Science Algorithms: The Basic Methods
CSE 711: DATA MINING Sargur N. Srihari Phone: , ext. 113.
Decision Tree Saed Sayad 9/21/2018.
Figure 1.1 Rules for the contact lens data.
Machine Learning Techniques for Data Mining
Weka Package Weka package is open source data mining software written in Java. Weka can be applied to your dataset from the GUI, the command line or called.
Clustering.
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Machine Learning Chapter 3. Decision Tree Learning
CSCI N317 Computation for Scientific Applications Unit Weka
Machine Learning in Practice Lecture 17
Boolean Expressions to Make Comparisons
Artificial Intelligence 6. Decision Tree Learning
Data Mining CSCI 307 Spring, 2019
Data Mining CSCI 307, Spring 2019 Lecture 15
Data Mining CSCI 307 Spring, 2019
Data Mining CSCI 307, Spring 2019 Lecture 18
Data Mining CSCI 307, Spring 2019 Lecture 6
Presentation transcript:

Data Mining CSCI 307, Spring 2019 Lecture 9 Output: Rules

First, install the package we need…. Manually Build a Decision Tree We want to use the "User Classify" facility in WEKA. In WEKA 3.8.1, this must be installed. From the window that pops up, scroll down and choose "userClassifier" to install. Get a message that it was installed successfully.

Manually Build a Decision Tree Use the "User Classify" facility in WEKA. Click on Classify tab Click Choose/trees/UserClassifier Visualize the data. Two-Way Split: Find a pair of attributes that discriminates the class well. Draw a polygon around them. Next slide: use petal-length and petal-width to "isolate" the Iris versicolor class. Switch to view the tree.

Can manually construct for a bit and then select a ML algorithm to finish. This group only contains one "mistake" -- a lone virginica Need to refine this further, but only 3 versicolors contaminate this side.

Interactive decision tree construction (You can try this on your own) Load segment-challenge.arff; look at dataset Select UserClassifier (tree classifier) Use the test set segment-test.arff Examine data visualizer and tree visualizer Plot region-centroid-row vs intensity-mean Rectangle, Polygon and Polyline selection tools … several selections … Rightclick in Tree visualizer and Accept the tree Given enough time, we could produce a "perfect" tree for the dataset, but would it perform well on the testdata?

Classification Rules Popular alternative to decision trees If attributeONE and attributeTWO then CLASS is x Popular alternative to decision trees Antecedent (pre-condition): a series of tests (just like the tests at the nodes of a decision tree) Tests are usually logically ANDed together (but may also be general logical expressions) Consequent (conclusion): classes, set of classes, or probability distribution assigned by rule Individual rules are often logically ORed together Conflicts arise if different conclusions apply

From Trees to Rules Easy: converting a tree into a set of rules One rule for each leaf: Antecedent contains a condition for every node on the path from the root to the leaf Consequent is class assigned by the leaf Produces rules that are unambiguous Doesn’t matter in which order they are executed But: resulting rules are unnecessarily complex Pruning to remove redundant tests/rules

Example

From Rules to Trees #1 More difficult: transforming a rule set into a tree Tree cannot easily express disjunction between rules Example: rules that test different attributes Symmetry needs to be broken Corresponding tree contains identical subtrees (==> “replicated subtree problem”) if a and b then x if c and d then x Need to choose a single test for the root node.

A Decision Tree for a Simple Disjunction if a and b then x if c and d then x

From Rules to Trees #2 More difficult: but sometimes it's not Example: Exclusive Or Problem Here, want the class to be a only when have opposite attribute values.

The Exclusive-Or Problem What would the rules look like for this problem? What would the tree look like? if x = 1 and y = 0 then class = a if x = 0 and y = 1 if x = 0 and y = 0 then class = b if x = 1 and y = 1

From Rules to Trees #3 More difficult: transforming a rule set into a tree Tree cannot easily handle Default clauses Example: Four attributes, each can be 1, 2, or 3. if x = 1 and y = 1 then class = a if z = 1 and w = 1 otherwise class = b Replicated subtree problem again.

Decision Tree with a Replicated Subtree if x = 1 and y = 1 then class = a if z = 1 and w = 1 then class = a otherwise class = b

So maybe the "ease" of adding a new rule is an illusion and not fact. One reason "rules" are popular: "Nuggets" of Knowledge Are rules independent pieces of knowledge? (It seems easy to add a rule to an existing rule base.) Here’s the Problem: ignores how rules are executed Two ways of executing a rule set: Ordered set of rules (“decision list”) Order is important for interpretation Unordered set of rules Rules may overlap and lead to different conclusions for the same instance Adding to a tree may cause total reshaping. So maybe the "ease" of adding a new rule is an illusion and not fact.

Interpreting Rules What if two or more rules conflict? What if no rule applies to a test instance? e.g. Different rules lead to different conclusions for the same instance. This cannot happen with decision trees, or with rules that are read from decision trees, but it can happen.

Straightforward: A form of closed-world assumption When class is "boolean" and only one outcome is expressed. Assumption: if instance does not belong to class “yes”, it belongs to class “no” Trick: only learn rules for class “yes” and use default rule for “no” Order of rules is not important. No conflicts! Rule can be written in disjunctive normal form if x = 1 and y = 1 then class = a if z = 1 and w = 1 then class = a otherwise class = b i.e. OR a bunch of AND conditions.

Association Rules.... Problem: immense number of possible associations … can predict any attribute and combinations of attributes … are not intended to be used together as a set (versus classification rules that are intended to be used together) Problem: immense number of possible associations Output needs to be restricted to show only the most predictive associations ==> only those with high support and high confidence Coverage (AKA support): Number of instances the rule predicts correctly. Accuracy (AKA confidence): Number of instances it predicts correctly as a proportion of all instances it applies to.

Outlook Temperature Humidity Windy Play Sunny Hot High False No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No Example: 4 cool days with normal humidity ==> Support = 4, confidence = 100% if temperature = cool then humidity = normal

Example: ==> Support = ?, confidence = ? Might not meet the minimum support and confidence threshold, so this particular rule would not be generated, but it is an example of support and confidence. Outlook Temperature Humidity Windy Play Sunny Hot High False No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No Example: ==> Support = ?, confidence = ? if play = yes then windy = false

Support and Confidence of a Rule Support: number of instances predicted correctly Confidence: number of correct predictions, as proportion of all instances that rule applies to Example: 4 cool days with normal humidity ==> Support = 4, confidence = 100% Normally: minimum support and confidence pre-specified (e.g. 58 rules with support >= 2 and confidence >= 95% for weather data) if temperature = cool then humidity = normal

Interpreting Association Rules Outlook Temperature Humidity Windy Play Sunny Hot High False No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No Interpretation is not obvious is not a shorthand for But 1st rule means that the following holds: if windy = false and play = no then outlook = sunny and humidity = high if windy = false and play = no then outlook = sunny if windy = false and play = no then humidity = high Recall from Discrete Structures, bind more means stronger! A formula is STRONGER if it restricts the state more. A formula is WEAKER when the fewest restrictions are in place. The state true is the weakest (true in all states). The state false is the strongest (true in no states). though in the rule if windy = false and play = no then outlook = sunny and humidity = high the consequent is stronger than this rule's consequent if humidity = high and windy = false and play = no then outlook = sunny if humidity = high and windy = false and play = no then outlook = sunny