1 Mining the Smallest Association Rule Set for Predictions Jiuyong Li, Hong Shen, and Rodney Topor Proceedings of the 2001 IEEE International Conference.

Slides:



Advertisements
Similar presentations
Association Rule Mining
Advertisements

Recap: Mining association rules from large datasets
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
6.830 Lecture 10 Query Optimization 10/6/2014. Selinger Optimizer Algorithm algorithm: compute optimal way to generate every sub-join: size 1, size 2,...
6.830 Lecture 11 Query Optimization & Automatic Database Design 10/8/2014.
1 Department of Information & Computer Education, NTNU SmartMiner: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets.
Association rules The goal of mining association rules is to generate all possible rules that exceed some minimum user-specified support and confidence.
FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, D. W. Cheung, B. Kao Department of Computer Science.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Association Rule Mining - MaxMiner. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and.
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
1 Synthesizing High-Frequency Rules from Different Data Sources Xindong Wu and Shichao Zhang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.
Rule Generation [Chapter ]
Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:
Sequential PAttern Mining using A Bitmap Representation
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.
Data & Text Mining1 Introduction to Association Analysis Zhangxi Lin ISQS 3358 Texas Tech University.
Mining Frequent Patterns without Candidate Generation : A Frequent-Pattern Tree Approach 指導教授:廖述賢博士 報 告 人:朱 佩 慧 班 級:管科所博一.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.
1 On Mining General Temporal Association Rules in a Publication Database Chang-Hung Lee, Cheng-Ru Lin and Ming-Syan Chen, Proceedings of the 2001 IEEE.
1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and.
CloSpan: Mining Closed Sequential Patterns in Large Datasets Xifeng Yan, Jiawei Han and Ramin Afshar Proceedings of 2003 SIAM International Conference.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hong.
Counting Techniques Tree Diagram Multiplication Rule Permutations Combinations.
1 The Strategies for Mining Fault-Tolerant Patterns Jia-Ling Koh Department of Information and Computer Education National Taiwan Normal University.
Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
Introduction to Machine Learning Lecture 13 Introduction to Association Rules Albert Orriols i Puig Artificial.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
1 Summarizing Sequential Data with Closed Partial Orders Gemma Casas-Garriga Proceedings of the SIAM International Conference on Data Mining (SDM'05) Advisor.
Searching for Pattern Rules Guichong Li and Howard J. Hamilton Int'l Conf on Data Mining (ICDM),2006 IEEE Advisor : Jia-Ling Koh Speaker : Tsui-Feng Yen.
Rapid Association Rule Mining Amitabha Das, Wee-Keong Ng, Yew-Kwong Woon, Proc. of the 10th ACM International Conference on Information and Knowledge Management(CIKM’01),2001.
Mining General Temporal Association Rules for Items with Different Exhibition Cheng-Yue Chang, Ming-Syan Chen, Chang-Hung Lee, Proc. of the 2002 IEEE international.
Discovering Frequent Arrangements of Temporal Intervals Papapetrou, P. ; Kollios, G. ; Sclaroff, S. ; Gunopulos, D. ICDM 2005.
Reducing Number of Candidates
Data Mining: Concepts and Techniques
Frequent Pattern Mining
Byung Joon Park, Sung Hee Kim
Chang-Hung Lee, Jian Chih Ou, and Ming Syan Chen, Proc
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
A Parameterised Algorithm for Mining Association Rules
Mining Association Rules from Stars
Data Mining Association Analysis: Basic Concepts and Algorithms
Pascal's Triangle This is the start of Pascal's triangle. It follows a pattern, do you know.
Association Rule Mining
DENSE ITEMSETS JOUNI K. SEPPANEN, HEIKKI MANNILA SIGKDD2004
Association Analysis: Basic Concepts
Presentation transcript:

1 Mining the Smallest Association Rule Set for Predictions Jiuyong Li, Hong Shen, and Rodney Topor Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM’01), 29 Nov.-2 Dec. 2001, pp. 361–368. J. Li, H. Shen and R. Topor, “Mining the informative rule set for prediction,” Journal of Intelligent Information Systems, 22:2, pp , 2004, Kluwer Academic. Advisor : Jia-Ling Koh Speaker : Chen-Yi Lin Department of Information & Computer Education, NTNU

2 Introductions The informative rule set Algorithm Experimental Results Conclusions Department of Information & Computer Education, NTNU Outlines

3 Introductions (1/2) Department of Information & Computer Education, NTNU The key problems with association rule mining are –The high cost of generating association rules –and the large number of rules that are normally generated.

4 Introductions (2/2) Mining the smallest association rules efficiently for subsequent prediction Informative rule set: necessary rule set for prediction Direct method without generating all Department of Information & Computer Education, NTNU

5 The informative rule set (1/8) The predictions for an itemset P from rule set R is a sequence of items Q: Ex: Department of Information & Computer Education, NTNU Itemset: a, b, c, d R1: a, b=>c R2: a, b=>d Q: {c} Q: {c, d} P

6 The informative rule set (2/8) Let be an association rule set and the set of single-target rules in. A set is informative over if – –for all, there does not exist a such that and –for all, there exist such that and Department of Information & Computer Education, NTNU

7 The informative rule set (3/8) Ex1: –r = ac=>b (0.5, 0.75) –r ’ = a=>b (0.67, 0.8) X Ex2: –r ’’ = ac=>b (0.5, 0.75) O –r = a=>b (0.67, 0.8) Department of Information & Computer Education, NTNU

8 The informative rule set (4/8) TidItems 100a, b, c 200a, b, c 300a, b, c 400a, b, d 500a, c, d 600b, c, d min_sup=0.5 min_conf=0.5 Transaction database There are 12 association rules: a=>b (0.67, 0.8) a=>c (0.67, 0.8) b=>c (0.67, 0.8) b=>a (0.67, 0.8) c=>a (0.67, 0.8) c=>b (0.67, 0.8) ab=>c (0.5, 0.75) ac=>b (0.5, 0.75) bc=>a (0.5, 0.75) a=>bc (0.5, 0.6) b=>ac (0.5, 0.6) c=>ab (0.5, 0.6) Department of Information & Computer Education, NTNU

9 The informative rule set (5/8) Ex: (1) Every transaction identified by the rule ab=>c is also identified by rule a=>c or b=>c.  ab=>c can be omitted from the informative rule set without losing predictive capability. (2) Rule a=>b and a=>c provide predictions b and c than rule a=>bc.  a=>bc can be omitted from the informative rule set without losing predictive capability. Hence, we left only 6 rules in informative rule set: { a=>b (0.67, 0.8), a=>c (0.67, 0.8), b=>c (0.67, 0.8), b=>a (0.67, 0.8), c=>a (0.67, 0.8), c=>b (0.67, 0.8) } Department of Information & Computer Education, NTNU

10 The informative rule set (6/8), then rule XY=>Z does not belong to the informative rule set (because Z is identified by X=>Z). EX: –X=a, Y=b, and Z=c (TID = {400})  ab=>c (X) Department of Information & Computer Education, NTNU

11 The informative rule set (7/8) Upward closed properties for generating informative rule sets: –If, then rule XY=>Z and all more specific rules do not occur in the informative rule set. –If, then for any Z, rule XY=>Z and all more specific rules do not occur in the informative rule set. Department of Information & Computer Education, NTNU

12 The informative rule set (8/8) Department of Information & Computer Education, NTNU Ex1:  a=>d or b=>d (O) ab=>d and ab…=>d (X)

13 Algorithm (1/2) An fully expanded candidate tree over the set of items {1, 2, 3, 4}. Identity set Label Department of Information & Computer Education, NTNU

14 Algorithm (2/2) Department of Information & Computer Education, NTNU

15 Experimental Results (1/4) Department of Information & Computer Education, NTNU Sizes of different rule sets

16 Department of Information & Computer Education, NTNU Experimental Results (2/4) Generating time for different rule sets

17 Department of Information & Computer Education, NTNU Experimental Results (3/4) The number of times for scanning the database

18 Experimental Results (4/4) Department of Information & Computer Education, NTNU The number of candidate nodes

19 Conclusions Department of Information & Computer Education, NTNU Reduce the rule set for prediction sequences. A direct algorithm to efficiently mine the informative rules set without generating all frequent itemsets first. Fewer candidates and database accesses.