Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.5: Mining Association Rules Rodney Nielsen.

Slides:



Advertisements
Similar presentations
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
Advertisements

Classification Techniques: Decision Tree Learning
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Mining Association Rules. Association rules Association rules… –… can predict any attribute and combinations of attributes … are not intended to be used.
Algorithms: The basic methods. Inferring rudimentary rules Simplicity first Simple algorithms often work surprisingly well Many different kinds of simple.
Data Mining. Jim Jim ’ s cows Which cows should I breed??
Lazy Associative Classification By Adriano Veloso,Wagner Meira Jr., Mohammad J. Zaki Presented by: Fariba Mahdavifard Department of Computing Science University.
Instance-based representation
WEKA Evaluation of WEKA Waikato Environment for Knowledge Analysis Presented By: Manoj Wartikar & Sameer Sagade.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Data Mining. Jim Jim ’ s cows Which cows should I buy??
Data Mining. Jim Which cow should I buy?? Jim ’ s cows RatingAGE Milk Avg. (MA) Name Good56Mona Bad64Lisa Good38Mary Bad56Quirri Good62Paula Bad710Abdul.
Associations and Frequent Item Analysis. 2 Outline  Transactions  Frequent itemsets  Subset Property  Association rules  Applications.
Probabilistic techniques. Machine learning problem: want to decide the classification of an instance given various attributes. Data contains attributes.
Bayesian Decision Theory Making Decisions Under uncertainty 1.
Chapter 4: Algorithms CS 795.
Basic Data Mining Techniques
ISOM Data Mining and Warehousing Arijit Sengupta.
Data Mining – Algorithms: OneR Chapter 4, Section 4.1.
Data Mining – Output: Knowledge Representation
Appendix: The WEKA Data Mining Software
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining – Algorithms: Prism – Learning Rules via Separating and Covering Chapter 4, Section 4.4.
Data Mining Practical Machine Learning Tools and Techniques Chapter 3: Output: Knowledge Representation Rodney Nielsen Many of these slides were adapted.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Chapter 4: Algorithms CS 795. Inferring Rudimentary Rules 1R – Single rule – one level decision tree –Pick each attribute and form a single level tree.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.4: Covering Algorithms Rodney Nielsen Many.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Association Rule Mining March 5, 2009.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge Version 0.1 ?
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.2 Statistical Modeling Rodney Nielsen Many.
 Classification 1. 2  Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.  Supervised learning: classes.
Data Mining – Algorithms: Naïve Bayes Chapter 4, Section 4.2.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
Associations and Frequent Item Analysis. 2 Outline  Transactions  Frequent itemsets  Subset Property  Association rules  Applications.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Slide 1 DSCI 4520/5240: Data Mining Fall 2013 – Dr. Nick Evangelopoulos Lecture 5: Decision Tree Algorithms Material based on: Witten & Frank 2000, Olson.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
Machine Learning in Practice Lecture 5 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Data Warehouse [ Example ] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001, ISBN Data Mining: Concepts and.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.1: Decision Trees Rodney Nielsen Many /
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
Chapter 4: Algorithms CS 795. Inferring Rudimentary Rules 1R – Single rule – one level decision tree –Pick each attribute and form a single level tree.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Data Mining Practical Machine Learning Tools and Techniques Chapter 6.5: Instance-based Learning Rodney Nielsen Many / most of these slides were adapted.
Data Mining Chapter 4 Algorithms: The Basic Methods Reporter: Yuen-Kuei Hsueh.
Data Mining Practical Machine Learning Tools and Techniques Chapter 6.3: Association Rules Rodney Nielsen Many / most of these slides were adapted from:
Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Mining Practical Machine Learning Tools and Techniques.
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 4 of Data Mining by I. H. Witten and E. Frank.
Data Science Algorithms: The Basic Methods
Data Science Algorithms: The Basic Methods
Prepared by: Mahmoud Rafeek Al-Farra
Data Science Algorithms: The Basic Methods
Data Science Algorithms: The Basic Methods
Data Science Algorithms: The Basic Methods
Decision Tree Saed Sayad 9/21/2018.
Association Rule Mining
Data Mining CSCI 307, Spring 2019 Lecture 18
Data Mining CSCI 307, Spring 2019 Lecture 6
Presentation transcript:

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.5: Mining Association Rules Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall

Rodney Nielsen, Human Intelligence & Language Technologies Lab Algorithms: The Basic Methods ● Inferring rudimentary rules ● Naïve Bayes, probabilistic model ● Constructing decision trees ● Constructing rules ● Association rule learning ● Linear models ● Instance-based learning ● Clustering

Rodney Nielsen, Human Intelligence & Language Technologies Lab Mining Association Rules ● Baseline method for finding association rules:  Use separate-and-conquer method  Treat every possible combination of attribute values as a separate class ● Two problems:  Computational complexity  Resulting number of rules (which would have to be pruned on the basis of support and confidence) ● But: we can look for high support rules directly!

Rodney Nielsen, Human Intelligence & Language Technologies Lab Item Sets ● Support: number of instances correctly covered by association rule  The same as the number of instances covered by all tests in the rule (LHS and RHS!) ● Item: one test/attribute-value pair ● Item set : all items occurring in a rule ● Goal: only rules that exceed pre-defined support  Do it by finding all item sets with the given minimum support and generating rules from them!

Rodney Nielsen, Human Intelligence & Language Technologies Lab Weather Data NoTrueHighMildRainy YesFalseNormalHotOvercast YesTrueHighMildOvercast YesTrueNormalMildSunny YesFalseNormalMildRainy YesFalseNormalCoolSunny NoFalseHighMildSunny YesTrueNormalCoolOvercast NoTrueNormalCoolRainy YesFalseNormalCoolRainy YesFalseHighMildRainy YesFalseHighHotOvercast NoTrueHighHotSunny NoFalseHighHotSunny PlayWindyHumidityTempOutlook

Rodney Nielsen, Human Intelligence & Language Technologies Lab Item Sets for Weather Data ● In total: 12 one-item sets, 47 two-item sets, 39 three-item sets, 6 four-item sets and 0 five- item sets (with minimum support of two) ………… Outlook = Rainy Temperature = Mild Windy = False Play = Yes (2) Outlook = Sunny Humidity = High Windy = False (2) Outlook = Sunny Humidity = High (3) Temperature = Cool (4) Outlook = Sunny Temperature = Hot Humidity = High Play = No (2) Outlook = Sunny Temperature = Hot Humidity = High (2) Outlook = Sunny Temperature = Hot (2) Outlook = Sunny (5) Four-item setsThree-item setsTwo-item setsOne-item sets

Rodney Nielsen, Human Intelligence & Language Technologies Lab Generating Rules From an Item Set ● Once all item sets with minimum support have been generated, we can turn them into rules ● Example: ● (2 N -1) = 7 potential rules: Humidity = Normal, Windy = False, Play = Yes (4) 4/4 4/6 4/7 4/8 4/9 4/12 If Humidity = Normal and Windy = False then Play = Yes If Humidity = Normal and Play = Yes then Windy = False If Windy = False and Play = Yes then Humidity = Normal If Humidity = Normal then Windy = False and Play = Yes If Windy = False then Humidity = Normal and Play = Yes If Play = Yes then Humidity = Normal and Windy = False If True then Humidity = Normal and Windy = False and Play = Yes

Rodney Nielsen, Human Intelligence & Language Technologies Lab Rules for Weather Data ● Rules with support > 1 and confidence = 100%: ● In total: 3 rules with support four 5 with support three 50 with support two 100%2  Humidity=High Outlook=Sunny Temperature=Hot %3  Humidity=Normal Temperature=Cold Play=Yes4 100%4  Play=Yes Outlook=Overcast3 100%4  Humidity=Normal Temperature=Cool2 100%4  Play=Yes Humidity=Normal Windy=False1 Association ruleConf.Sup.

Rodney Nielsen, Human Intelligence & Language Technologies Lab Example Rules From the Same Set ● Item set: ● Resulting rules with 100% confidence: Temperature = Cool, Humidity = Normal, Windy = False, Play = Yes (2) Temperature = Cool, Windy = False  Humidity = Normal, Play = Yes Temperature = Cool, Windy = False, Humidity = Normal  Play = Yes Temperature = Cool, Windy = False, Play = Yes  Humidity = Normal

Rodney Nielsen, Human Intelligence & Language Technologies Lab Generating Item Sets Efficiently ● How can we efficiently find all frequent item sets? ● Finding one-item sets easy ● Idea: use one-item sets to generate two-item sets, two-item sets to generate three-item sets, …  If (A B) is frequent item set, then (A) and (B) have to be frequent item sets as well!  In general: if X is frequent k-item set, then all (k-1)-item subsets of X are also frequent  Compute k-item set by merging (k-1)-item sets

Rodney Nielsen, Human Intelligence & Language Technologies Lab Example ● Given: five three-item sets (A B C), (A B D), (A C D), (A C E), (B C D) ● Lexicographically ordered! ● Candidate four-item sets: (A B C D)OK: (A B C),(A B D),(A C D),(B C D) (A C D E)Not OK because of (C D E) ● Final check by counting instances in dataset! ● (k –1)-item sets are stored in hash table

Rodney Nielsen, Human Intelligence & Language Technologies Lab Generating Rules Efficiently ● We are looking for all high-confidence rules  Support of antecedent obtained from hash table  But: brute-force method is (2 N -1) ● Better way: building (c + 1)-consequent rules from c-consequent ones  Observation: (c + 1)-consequent rule can only hold if all corresponding c-consequent rules also hold ● Resulting algorithm similar to procedure for large item sets

Rodney Nielsen, Human Intelligence & Language Technologies Lab Example ● 1-consequent rules: Corresponding 2-consequent rule: ● Final check of antecedent against hash table! If Windy = False and Play = No then Outlook = Sunny and Humidity = High (2/2) If Outlook = Sunny and Windy = False and Play = No then Humidity = High (2/2) If Humidity = High and Windy = False and Play = No then Outlook = Sunny (2/2)

Rodney Nielsen, Human Intelligence & Language Technologies Lab Association Rules: Discussion ● Above method makes one pass through the data for each different size item set  Other possibility: generate (k+2)-item sets just after (k+1)-item sets have been generated  Result: more (k+2)-item sets than necessary will be considered but less passes through the data  Makes sense if data too large for main memory ● Practical issue: generating a certain number of rules (e.g. by incrementally reducing min. support)

Rodney Nielsen, Human Intelligence & Language Technologies Lab Other Issues ● Standard ARFF format very inefficient for typical market basket data  Attributes represent items in a basket and most items are usually missing  Data should be represented in sparse format ● Instances are also called transactions ● Confidence is not necessarily the best measure  Example: milk occurs in almost every supermarket transaction  Other measures have been devised (e.g. lift)