Generating Non-Redundant Association Rules Mohammed J. Zaki.

Slides:



Advertisements
Similar presentations
Association Rule Mining
Advertisements

Recap: Mining association rules from large datasets
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
Zeev Dvir – GenMax From: “ Efficiently Mining Frequent Itemsets ” By : Karam Gouda & Mohammed J. Zaki.
Outline. Theorem For the two processor network, Bit C(Leader) = Bit C(MaxF) = 2[log 2 ((M + 2)/3.5)] and Bit C t (Leader) = Bit C t (MaxF) = 2[log 2 ((M.
FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Mining Data Mining Spring Transactional Database Transaction – A row in the database i.e.: {Eggs, Cheese, Milk} Transactional Database.
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.
Association Analysis: Basic Concepts and Algorithms.
1 Mining Quantitative Association Rules in Large Relational Database Presented by Jin Jin April 1, 2004.
Data Mining Association Analysis: Basic Concepts and Algorithms
Fast Algorithms for Mining Association Rules * CS401 Final Presentation Presented by Lin Yang University of Missouri-Rolla * Rakesh Agrawal, Ramakrishnam.
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Mining Negative Rules in Large Databases using GRD Dhananjay R Thiruvady Supervisor: Professor Geoffrey Webb.
Fast Algorithms for Association Rule Mining
Research Project Mining Negative Rules in Large Databases using GRD.
Fast Vertical Mining Using Diffsets Mohammed J. Zaki Karam Gouda
1 Synthesizing High-Frequency Rules from Different Data Sources Xindong Wu and Shichao Zhang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.
Mining Frequent Itemsets with Constraints Takeaki Uno Takeaki Uno National Institute of Informatics, JAPAN Nov/2005 FJWCP.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
1 A Theoretical Framework for Association Mining based on the Boolean Retrieval Model on the Boolean Retrieval Model Peter Bollmann-Sdorra.
MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.
On information theory and association rule interestingness Loo Kin Kong 5 th July, 2002.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
 期中测验时间:  10 月 31 日上午 9 : 40—11 : 30  第一到第四章  即,集合,关系,函数,组合数学.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
1 FINDING FUZZY SETS FOR QUANTITATIVE ATTRIBUTES FOR MINING OF FUZZY ASSOCIATE RULES By H.N.A. Pham, T.W. Liao, and E. Triantaphyllou Department of Industrial.
Semantic Aspects in Spatial Data Mining Vania Bogorny.
MINING COLOSSAL FREQUENT PATTERNS BY CORE PATTERN FUSION FEIDA ZHU, XIFENG YAN, JIAWEI HAN, PHILIP S. YU, HONG CHENG ICDE07 Advisor: Koh JiaLing Speaker:
Mining Quantitative Association Rules in Large Relational Tables ACM SIGMOD Conference 1996 Authors: R. Srikant, and R. Agrawal Presented by: Sasi Sekhar.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Elsayed Hemayed Data Mining Course
HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Optimization of Association Rules Extraction Through Exploitation of Context Dependent Constraints Arianna Gallo, Roberto Esposito, Rosa Meo, Marco Botta.
Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Prepared By Meri Dedania (AITS) Discrete Mathematics by Meri Dedania Assistant Professor MCA department Atmiya Institute of Technology & Science Yogidham.
Searching for Pattern Rules Guichong Li and Howard J. Hamilton Int'l Conf on Data Mining (ICDM),2006 IEEE Advisor : Jia-Ling Koh Speaker : Tsui-Feng Yen.
Frequent Pattern Mining
Chang-Hung Lee, Jian Chih Ou, and Ming Syan Chen, Proc
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
New Apporoach to Data Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Amer Zaheer PC Mohammad Ali Jinnah University, Islamabad
Association Analysis: Basic Concepts and Algorithms
Scalable Algorithms for Association Mining
Frequent-Pattern Tree
Association Analysis: Basic Concepts
Presentation transcript:

Generating Non-Redundant Association Rules Mohammed J. Zaki

Yaeer Master©2 Outline  Introduction  Association Rules – reminder  Closed Frequent Itemsets  Generating Rules  Complexity Analysis  Experimental Evaluation

Yaeer Master©3 Introduction  Association Rule Discovery – The set of association rules can grow to be unwieldy especially as we lower the frequency requirement (support).  Many rules are redundant.  Number of redundant rules can be exponential in the length of the longest frequent itemset.  For dense datasets it is not feasible to mine all frequent itemsets.

Yaeer Master©4 Introduction Solution:  Using Closed Frequent Itemsets:  The set is smaller in orders of magnitude.  No loss of information.  Creating a “Generating Set”.  Algorithm for mining closed itemsets: CHARM

Yaeer Master©5 Association Rules

Yaeer Master©6 Mining Association Rules

Yaeer Master©7 Mining Association Rules  Find all frequent itemsets:  2 m : NP-Complete.  Assuming a bound on transaction length O (r · n · 2 L ).  Generating confident rules:  For each itemset of size k, 2 k potential rules.  Complexity: O (f · 2 L ). Num of max frequent itemsets Num of transactions Longest frequent itemset Num of frequent itemsets Longest frequent itemset

Yaeer Master©8 Closed Frequent Itemsets – Defining a Galois connection  The Mappings :  Let: Define a Galois Connection between the partially ordered sets P(I), P(T).  Galois connection: For all a in A and b in B: F (a) ≤ b ↔ G (b) ≤ a

Yaeer Master©9 Galois Connection Cont. Properties: )()( 2121 XtXtXX)()( 2121 YiYiYY  ))(( ))((YitYandXtiX 

Yaeer Master©10 Galois Connection

Yaeer Master©11 Example t (ACW) = t (A) ∩ t (C) ∩ t (W) = 1345 ∩ ∩ = 1345 = 1345 i (245) = CDW ACW ACDW  ACW  ACDW  t (ACW) = = t (ACDW) t (ACW) = 1345  135 = t (ACDW)

Yaeer Master©12 Closure Operator  c: P(s)  P(s) if satisfies the following:  Closure Composition:  c it (x) = i t (x) = i(t(x))  c ti (x) )(:XcXExtension)()(:YcXcYXtyMonotonici  )())((:XcXccyIdempotenc 

Yaeer Master©13 Closure Operator – Round Trip

Yaeer Master©14 Closed Itemset - Definition A Closed Itemset X is an Itemset that is same as its closure. Example : c it (AC) = i(t(AC) = i(1345) = ACW conclusion: AC is not closed. ACW is closed. ACW is closed.

Yaeer Master©15 Closed Vs Frequent itemsets

Yaeer Master©16 Concept - Definition  For any Closed Itemset X, there exists a Closed Tidset Y, with the property: Y = t(X).  The Pair X × Y is called a Concept.

Yaeer Master©17 Galois Lattice  A concept x 1 × y 1 is a sub concept of x 2 × y 2, If x 1  x 2 (if y 2  y 1 ).  Let B(δ) be the set of all concepts.  The ordered set (B(δ),≤) is a complete lattice, called the Galois lattice.

Yaeer Master©18 Galois Lattice Of Concepts

Yaeer Master©19 Frequent Closed ItemSets Vs. Frequent Itemsets  Lattice operations  Join:  Meet:  Frequent Concept: With support greater than minsup, We define the support is the cardinality of the closed tidset.

Yaeer Master©20 Join Meet Example Join: (ACDW × 45) (CDT × 56) = (ACDW × 45)  (CDT × 56) = c it )ACDW CDT) × (45 56) = c it )ACDW  CDT) × (45  56) = ACDTW × 5 Meet: (ACDW × 45) (CDT 56) = (ACDW × 45)  (CDT 56) = (ACDW CDT) × c ti (4556) = (ACDW  CDT) × c ti (45   56) = CD × 2456

Yaeer Master©21 Frequent Concepts

Yaeer Master©22 Frequent Concepts  Lemma 1: An itemset’s (X) support is equal to the support of its closure, i.e. σ(X) = σ(c it (X)). Therefore all frequent itemsets are uniquely determined by the Closed itemsets and can be determined by the join operation on the frequent concepts. frequent concepts frequent concepts

Yaeer Master©23 Redundant Rules  Definition: A rule R 1 : is more general than a rule R 2 denoted R 1 ‹ R 2, provided that R 2 can be generated by adding additional items to the antecedent or consequent of R 1. is more general than a rule R 2 denoted R 1 ‹ R 2, provided that R 2 can be generated by adding additional items to the antecedent or consequent of R 1. The Non-Redundant rules are those that are most general (with equal confidence). i p i XX i 21 

Yaeer Master©24 Rule Generation  Lemma 2: Transitivity: Let X 1, X 2, X 3 be frequent closed itemsets, with If, then Observation: it is sufficient to consider rules among adjacent concepts. 321 XXX  32 XX q  21 XX p  31 XX pq 

Yaeer Master©25 Rule Generation – 100% conf.  Lemma 3: An association rule has confidence p = 1.0 If and only if.  100% confidence rules are those directed from a super-concept to a sub-concept, i.e. Down Arcs XX  )()( 21 XtXt

Yaeer Master©26 Rule Generation – 100% conf.

Yaeer Master©27 Rule Generation – 100% conf  Theorem 1. Let R = {R 1,…, R n } be a set of rules with 100% confidence (p i for all i), such that for all rules R i. for all rules R i. Let R I denote the 100% confidence rule Then all rules R i ≠ R I are more specific than, and thus are redundant., and thus are redundant. )( and )( i it ii it XcIXXcI  II 

Yaeer Master©28 Rule Generation – 100% conf  Example: TW  A, TW  AC, CTW  A c it (TW A) = c it (ATW) = ACTW c it (TW  A) = c it (ATW) = ACTW c it (TW AC) = ACTW c it (TW  AC) = ACTW c it (CTW A) = ACTW c it (CTW  A) = ACTW The most general

Yaeer Master©29 Rule Generation – Confidence <100%  Rules from sub-concepts to super- concepts i.e. correspond to up-arcs.  Rules between non adjacent concepts can be derived by transitivity. For example: C  W (with p= 0.83) and W  A (q=0.8) C  A (pq = 0.67)

Yaeer Master©30 Rule Generation – Confidence <100%

Yaeer Master©31 Rule Generation – Confidence <100%  Theorem 2. Let R = {R 1,…, R n } be a set of rules with confidence p< 1.0 (p i for all i), such that for all rules R i. for all rules R i. Let R I denote the rule Then all rules R i ≠ R I are more specific than R I, and thus are redundant. )( and )( ii it i it XXcIXcI  21 II p 

Yaeer Master©32 Generating Set  Combining the two sets gives us a generating set for rules with minconf = 50% and minsup = 80%: }TW→A, A→W, W→C, T→C, D→C, W→A (0.8), C →W (0.83) } All association rules can Be derived from this set

Yaeer Master©33 Complexity of Rule Generation  Traditional:  New Framework:  Best case: one closed itemset, no rules.  Worst case:  All frequent itemsets are closed.  Number of rules:  Reduction factor: )2( llll i l i lll i l i ill i l i O      )lOlil l l i l i l i l i 2 ( )( 00    ) 2 ( l O l l 2

Yaeer Master©34 Experimental Evaluation

Yaeer Master©35 Experimental Evaluation

Yaeer Master©36 Experimental Evaluation

Yaeer Master©37 Number of Rules Traditional Vs Closed itemset

Yaeer Master©38 Number of Rules Traditional Vs Closed itemset

Yaeer Master©39 Conclusion  The new framework based on closed itemsets can drastically reduce the rule set, and can be presented to the user in a succinct manner.  Future work:  Interactive visualization and exploration of mined associations, generating rules on demand based on user’s interest.  Finding a minimal generating set.