1 On Mining General Temporal Association Rules in a Publication Database Chang-Hung Lee, Cheng-Ru Lin and Ming-Syan Chen, Proceedings of the 2001 IEEE.

Slides:



Advertisements
Similar presentations
Association Rule Mining
Advertisements

Recap: Mining association rules from large datasets
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Zeev Dvir – GenMax From: “ Efficiently Mining Frequent Itemsets ” By : Karam Gouda & Mohammed J. Zaki.
1 Department of Information & Computer Education, NTNU SmartMiner: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets.
Association rules The goal of mining association rules is to generate all possible rules that exceed some minimum user-specified support and confidence.
FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
Mining Generalized Association Rules Ramkrishnan Strikant Rakesh Agrawal Data Mining Seminar, spring semester, 2003 Prof. Amos Fiat Student: Idit Haran.
Data Mining Association Analysis: Basic Concepts and Algorithms
Chapter 5: Mining Frequent Patterns, Association and Correlations
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
Pattern Lattice Traversal by Selective Jumps Osmar R. Zaïane and Mohammad El-Hajj Department of Computing Science, University of Alberta Edmonton, AB,
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, D. W. Cheung, B. Kao Department of Computer Science.
Mining Association Rules in Large Databases
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee David W. Cheung Ben Kao The University of Hong Kong.
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Performance and Scalability: Apriori Implementation.
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Data & Text Mining1 Introduction to Association Analysis Zhangxi Lin ISQS 3358 Texas Tech University.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and.
CloSpan: Mining Closed Sequential Patterns in Large Datasets Xifeng Yan, Jiawei Han and Ramin Afshar Proceedings of 2003 SIAM International Conference.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hong.
A Scalable Association Rules Mining Algorithm Based on Sorting, Indexing and Trimming Chuang-Kai Chiou, Judy C. R Tseng Proceedings of the Sixth International.
1 The Strategies for Mining Fault-Tolerant Patterns Jia-Ling Koh Department of Information and Computer Education National Taiwan Normal University.
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
1 Mining the Smallest Association Rule Set for Predictions Jiuyong Li, Hong Shen, and Rodney Topor Proceedings of the 2001 IEEE International Conference.
CS685: Special Topics in Data Mining The UNIVERSITY of KENTUCKY Frequent Itemset Mining II Tree-based Algorithm Max Itemsets Closed Itemsets.
Mining General Temporal Association Rules for Items with Different Exhibition Cheng-Yue Chang, Ming-Syan Chen, Chang-Hung Lee, Proc. of the 2002 IEEE international.
Discovering Frequent Arrangements of Temporal Intervals Papapetrou, P. ; Kollios, G. ; Sclaroff, S. ; Gunopulos, D. ICDM 2005.
Reducing Number of Candidates
Frequent Pattern Mining
Combinations COURSE 3 LESSON 11-3
Chang-Hung Lee, Jian Chih Ou, and Ming Syan Chen, Proc
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
An Efficient Algorithm for Incremental Mining of Association Rules
A Parameterised Algorithm for Mining Association Rules
Mining Association Rules from Stars
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Amer Zaheer PC Mohammad Ali Jinnah University, Islamabad
Farzaneh Mirzazadeh Fall 2007
Data Warehousing Mining & BI
Frequent-Pattern Tree
AB AC AD AE AF 5 ways If you used AB, then, there would be 4 remaining ODD vertices (C, D, E and F) CD CE CF 3 ways If you used CD, then, there.
Maintaining Frequent Itemsets over High-Speed Data Streams
DENSE ITEMSETS JOUNI K. SEPPANEN, HEIKKI MANNILA SIGKDD2004
Association Analysis: Basic Concepts
Presentation transcript:

1 On Mining General Temporal Association Rules in a Publication Database Chang-Hung Lee, Cheng-Ru Lin and Ming-Syan Chen, Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM’01), 29 Nov.-2 Dec. 2001, pp. 337–344. Advisor : Jia-Ling Koh Speaker : Chen-Yi Lin Department of Information & Computer Education, NTNU

2 Introductions Problem Description General Temporal Association Rules (Progressive Partition Miner, PPM) Experimental Results Conclusions Department of Information & Computer Education, NTNU Outlines

3 Introductions (1/3) A publication database is a set of transactions where each transaction T is a set of items of which each item contains an individual exhibition period. Department of Information & Computer Education, NTNU

4 Introductions (2/3) Department of Information & Computer Education, NTNU Bookstore Transaction Database DataTIDItemset Jan-01 T1BD T2BCD T3BC T4AD Feb-01 T5BCE T6DE T7ABC T8CDE Mar-01 T9BCEF T10BF T11AD T12BDF Item Information ItemPublication Data AJan-95 BApr-96 CJuly-97 DAug-00 EFeb-01 FMar-01 min_sup=30% min_conf=75% Frequent itemsets: {B, C, D, E, BC} Frequent association rule: C => B

5 Introductions (3/3) The current model of association rule mining is not able to handle the publication database: –Lack of consideration of the exhibition period of each individual item –Lack of an equitable support counting basis for each item Department of Information & Computer Education, NTNU

6 Problem Description (1/4) Department of Information & Computer Education, NTNU Bookstore Transaction Database DataTIDItemset D P1 Jan-01 T1BD db 1,3 T2BCD T3BC T4AD P2 Feb-01 T5BCE db 2,3 T6DE T7ABC T8CDE P3 Mar-01 T9BCEF db 3,3 T10BF T11AD T12BDF Time granularity: Month P1+P2+P3

7 Problem Description (2/4) Department of Information & Computer Education, NTNU Maximal Common exhibition period(MCP) of items –MCP(x): the MCP value of item x –For example: MCP(C)=(1, 3) and MCP(E)=(2, 3) => MCP(CE)=(2, 3)

8 Problem Description (3/4) Department of Information & Computer Education, NTNU An association rule (X => Y) MCP(XY) is called a general temporal association rule. (X => Y) MCP(XY) is frequent if and only if –supp((X => Y) MCP(XY) ) >= min_supp –and conf((X => Y) MCP(XY) ) >= min_conf –For example: (C => E) 2,3 is general temporal association rule min_supp=30% and min_conf=75% => (C => E) 2,3 is frequent.

9 Problem Description (4/4) Department of Information & Computer Education, NTNU When a maximal temporal k-itemset is frequent in data set, each of its corresponding sub-itemset is also frequent in. –For example: is frequent. => and are also frequent.

10 General Temporal Association Rules (1/6) Department of Information & Computer Education, NTNU Bookstore Transaction Database DataTIDItemset D P1 Jan-01 T1BD db 1,3 T2BCD T3BC T4AD P2 Feb-01 T5BCE db 2,3 T6DE T7ABC T8CDE P3 Mar-01 T9BCEF db 3,3 T10BF T11AD T12BDF min_sup=30% min_conf=75%

11 General Temporal Association Rules (2/6) Department of Information & Computer Education, NTNU Scan DB (1) P1 C2startCount BD12 BC12 CD11 AD11   P1+P2 C2startCount BD12 BC14 BE21 CE22 DE22 AB21 AC21 CD21    P1+P2+P3 C2startCount BC15 CE23 DE22 BE31 BF33 CF31 EF31 AD31 BD31 DF31   

12 General Temporal Association Rules (3/6) Department of Information & Computer Education, NTNU After 1 st scan database D, we have candidate itemsets as follows: no candidate k-itemset is generated (k>=3)

13 General Temporal Association Rules (4/6) Department of Information & Computer Education, NTNU Scan DB (2) Candidate ItemsetscountSRSR C1 {B 1,3 }84 {B 3,3 }32 {C 1,3 }64 {C 2,3 }43 {E 2,3 }43 {F 3,3 }32 C2 {BC 1,3 }54 {BF 3,3 }32 {CE 2,3 }33 Pruning Frequent Itemsetscount L1 {B 1,3 }8 {B 3,3 }3 {C 1,3 }6 {C 2,3 }4 {E 2,3 }4 {F 3,3 }3 L2 {BC 1,3 }5 {BF 3,3 }3 {CE 2,3 }3

14 General Temporal Association Rules (5/6) Department of Information & Computer Education, NTNU After 2nd scan database D, we have frequent itemsets as follows: RulesSupp.Conf. (B=>C) 1, %62.50% (C=>B) 1, %83.33% (B=>F) 3, %100.00% (F=>B) 3, %100.00% (C=>E) 2, %75.00% (C=>E) 2, %75.00% Pruning RulesSupp.Conf. (C=>B) 1, %83.33% (B=>F) 3, %100.00% (F=>B) 3, %100.00% (C=>E) 2, %75.00% (C=>E) 2, %75.00%

15 General Temporal Association Rules (6/6) Department of Information & Computer Education, NTNU Partition database based on exhibition periods Produce candidate 2-TIs Use candidate 2-Tis to produce candidate k-TIs and k-SIs Generate frequent k-TIs and k- SIs Rule generation 1st scan database 2nd scan database The flowchart of PPM

16 Experimental Results (1/3) Department of Information & Computer Education, NTNU |D|Number of transactions in the database |T|Average size of the transactions |I|Average size of the maximal frequent itemsets |L|Number of maximal potentially frequent itemsets (default 2000) NNumber of items (default 10000) |Pi|Number of transactions in the partition database Pi Meaning of various parameters:

17 Department of Information & Computer Education, NTNU Experimental Results (2/3) Relative performance

18 Department of Information & Computer Education, NTNU Experimental Results (3/3) Scaleup performance

19 Conclusions Department of Information & Computer Education, NTNU Algorithm PPM is particularly powerful for efficient mining for transaction databases, video rental store records, library rental records, book rental records, and transactions in electronic commerce.