©Jiawei Han and Micheline Kamber

Slides:



Advertisements
Similar presentations
Association rule mining
Advertisements

Association Rule and Sequential Pattern Mining for Episode Extraction Jonathan Yip.
Association Rules Mining
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
CSE 634 Data Mining Techniques
Data Mining Techniques Association Rule
3/3/20081 Data Warehousing and Data Mining. 3/3/20082 Why Data Mining? — Potential Applications Database analysis and decision support –Market analysis.
FUNGSI MAYOR Assosiation. What Is Association Mining? Association rule mining: –Finding frequent patterns, associations, correlations, or causal structures.
Pertemuan XIV FUNGSI MAYOR Assosiation. What Is Association Mining? Association rule mining: –Finding frequent patterns, associations, correlations, or.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rules Mining Part III. Multiple-Level Association Rules Items often form hierarchy. Items at the lower level are expected to have lower support.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Warehousing/Mining 1 Data Warehousing/Mining Comp 150 DW Chapter 6: Mining Association Rules in Large Databases Instructor: Dan Hebert.
Mining Association Rules in Large Databases
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
Mining Association Rules
Mining Frequent Patterns I: Association Rule Discovery Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Mining Association Rules
Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
Information Systems Data Analysis – Association Mining Prof. Les Sztandera.
Mining various kinds of Association Rules
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining III COMP Seminar GNET 713 BCB Module Spring 2007.
Fast Algorithms For Mining Association Rules By Rakesh Agrawal and R. Srikant Presented By: Chirayu Modi.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Data Mining Find information from data data ? information.
Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
UNIT-5 Mining Association Rules in Large Databases LectureTopic ********************************************** Lecture-27Association rule mining Lecture-28Mining.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Data Mining  Association Rule  Classification  Clustering.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
2016年6月14日星期二 2016年6月14日星期二 2016年6月14日星期二 Data Mining: Concepts and Techniques1 Mining Frequent Patterns, Associations, and Correlations (Chapter 5)
Mining Association Rules in Large Database This work is created by Dr. Anamika Bhargava, Ms. Pooja Kaul, Ms. Priti Bali and Ms. Rajnipriya Dhawan and licensed.
Data Mining Find information from data data ? information.
Association Rule Mining
UNIT-5 Mining Association Rules in Large Databases
Association Rule Mining
Association rule mining
Association Rules Repoussis Panagiotis.
Mining Association Rules
Knowledge discovery & data mining Association rules and market basket analysis--introduction UCLA CS240A Course Notes*
Association Rules.
Data Mining Association Rule Mining
I. Association Market Basket Analysis.
©Jiawei Han and Micheline Kamber
Association Rule Mining
Data Mining II: Association Rule mining & Classification
Mining Association Rules in Large Databases
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Association Rule Mining
Transactional data Algorithm Applications
Association Rule Mining
Analysis of Customer Behavior and Service Modeling
Find Patterns Having P From P-conditional Database
Unit 3 MINING FREQUENT PATTERNS ASSOCIATION AND CORRELATIONS
Frequent patterns and Association Rules
I. Association Market Basket Analysis.
Department of Computer Science National Tsing Hua University
A Data Mining Tutorial David Madigan
CIS671-Knowledge Discovery and Data Mining
Presentation transcript:

Data Mining: Concepts and Techniques — Slides for Textbook — — Chapter 6 — ©Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser University, Canada http://www.cs.sfu.ca February 17, 2019 Data Mining: Concepts and Techniques

What Is Association Mining? Association rule mining: Finding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories. Applications: Basket data analysis, cross-marketing, catalog design, loss-leader analysis, clustering, classification, etc. Examples. Rule form: “Body ® Head [support, confidence]”. buys(x, “diapers”) ® buys(x, “beers”) [0.5%, 60%] major(x, “CS”) ^ takes(x, “DB”) ® grade(x, “A”) [1%, 75%] February 17, 2019 Data Mining: Concepts and Techniques

Association Rule: Basic Concepts Given: (1) database of transactions, (2) each transaction is a list of items (purchased by a customer in a visit) Find: all rules that correlate the presence of one set of items with that of another set of items E.g., 98% of people who purchase tires and auto accessories also get automotive services done Applications *  Maintenance Agreement (What the store should do to boost Maintenance Agreement sales) Home Electronics  * (What other products should the store stocks up?) Attached mailing in direct marketing Detecting “ping-pong”ing of patients, faulty “collisions” February 17, 2019 Data Mining: Concepts and Techniques

Rule Measures: Support and Confidence Customer buys both Customer buys diaper Find all the rules X & Y  Z with minimum confidence and support support, s, probability that a transaction contains {X & Y & Z} confidence, c, conditional probability that a transaction having {X & Y} also contains Z Customer buys beer Let minimum support 50%, and minimum confidence 50%, we have A  C (50%, 66.6%) C  A (50%, 100%) February 17, 2019 Data Mining: Concepts and Techniques

Association Rule Mining: A Road Map Boolean vs. quantitative associations (Based on the types of values handled) buys(x, “SQLServer”) ^ buys(x, “DMBook”) ® buys(x, “DBMiner”) [0.2%, 60%] age(x, “30..39”) ^ income(x, “42..48K”) ® buys(x, “PC”) [1%, 75%] Single dimension vs. multiple dimensional associations (see ex. Above) Single level vs. multiple-level analysis What brands of beers are associated with what brands of diapers? Various extensions Correlation, causality analysis Association does not necessarily imply correlation or causality February 17, 2019 Data Mining: Concepts and Techniques

Mining Association Rules—An Example Min. support 50% Min. confidence 50% For rule A  C: support = support({A & C}) = 50% confidence = support({A & C})/support({A}) = 66.6% The Apriori principle: Any subset of a frequent itemset must be frequent February 17, 2019 Data Mining: Concepts and Techniques

Mining Frequent Itemsets: the Key Step Find the frequent itemsets: the sets of items that have minimum support A subset of a frequent itemset must also be a frequent itemset i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset Iteratively find frequent itemsets with cardinality from 1 to k (k-itemset) Use the frequent itemsets to generate association rules. February 17, 2019 Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques The Apriori Algorithm Join Step: Ck is generated by joining Lk-1with itself Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemset Pseudo-code: Ck: Candidate itemset of size k Lk : frequent itemset of size k L1 = {frequent items}; for (k = 1; Lk !=; k++) do begin Ck+1 = candidates generated from Lk; for each transaction t in database do increment the count of all candidates in Ck+1 that are contained in t Lk+1 = candidates in Ck+1 with min_support end return k Lk; February 17, 2019 Data Mining: Concepts and Techniques

The Apriori Algorithm — Example Database D L1 C1 Scan D C2 C2 L2 Scan D C3 L3 Scan D February 17, 2019 Data Mining: Concepts and Techniques

Multi-level Association: Redundancy Filtering Some rules may be redundant due to “ancestor” relationships between items. Example milk  wheat bread [support = 8%, confidence = 70%] 2% milk  wheat bread [support = 2%, confidence = 72%] We say the first rule is an ancestor of the second rule. A rule is redundant if its support is close to the “expected” value, based on the rule’s ancestor. February 17, 2019 Data Mining: Concepts and Techniques

Quantitative Association Rules Numeric attributes are dynamically discretized Such that the confidence or compactness of the rules mined is maximized. 2-D quantitative association rules: Aquan1  Aquan2  Acat Cluster “adjacent” association rules to form general rules using a 2-D grid. Example: age(X,”30-34”)  income(X,”24K - 48K”)  buys(X,”high resolution TV”) February 17, 2019 Data Mining: Concepts and Techniques

Mining Distance-based Association Rules Binning methods do not capture the semantics of interval data Distance-based partitioning, more meaningful discretization considering: density/number of points in an interval “closeness” of points in an interval February 17, 2019 Data Mining: Concepts and Techniques

Interestingness Measurements Objective measures Two popular measurements: support; and confidence Subjective measures (Silberschatz & Tuzhilin, KDD95) A rule (pattern) is interesting if it is unexpected (surprising to the user); and/or actionable (the user can do something with it) February 17, 2019 Data Mining: Concepts and Techniques

Criticism to Support and Confidence Example 1: (Aggarwal & Yu, PODS98) Among 5000 students 3000 play basketball 3750 eat cereal 2000 both play basket ball and eat cereal play basketball  eat cereal [40%, 66.7%] is misleading because the overall percentage of students eating cereal is 75% which is higher than 66.7%. play basketball  not eat cereal [20%, 33.3%] is far more accurate, although with lower support and confidence February 17, 2019 Data Mining: Concepts and Techniques