1 An Efficient Algorithm for Mining Frequent Sequences by a New Strategy without Support Counting Ding-Ying Chiu Yi-Hung Wu Arbee L.P. Chen ICDE2004 peaker:

Slides:



Advertisements
Similar presentations
Association Rules Evgueni Smirnov.
Advertisements

Association Rule and Sequential Pattern Mining for Episode Extraction Jonathan Yip.
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Data Mining Techniques Association Rule
Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
Chase Repp.  knowledge discovery  searching, analyzing, and sifting through large data sets to find new patterns, trends, and relationships contained.
Data Mining Association Analysis: Basic Concepts and Algorithms
Rakesh Agrawal Ramakrishnan Srikant
1 IncSpan :Incremental Mining of Sequential Patterns in Large Database Hong Cheng, Xifeng Yan, Jiawei Han Proc Int. Conf. on Knowledge Discovery.
Efficiently Mining Long Patterns from Databases Roberto J. Bayardo Jr. IBM Almaden Research Center.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Generalized Sequential Pattern (GSP) Step 1: – Make the first pass over the sequence database D to yield all the 1-element frequent sequences Step 2: Repeat.
Sequential Pattern Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Rules Yao Meng Hongli Li Database II Fall 2002.
Sequence Databases & Sequential Patterns
Fast Algorithms for Mining Association Rules * CS401 Final Presentation Presented by Lin Yang University of Missouri-Rolla * Rakesh Agrawal, Ramakrishnam.
ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis)
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
Mining Association Rules
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
Pattern-growth Methods for Sequential Pattern Mining: Principles and Extensions Jiawei Han (UIUC) Jian Pei (Simon Fraser Univ.)
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
Mining Association Rules
Performance and Scalability: Apriori Implementation.
Mining Sequential Patterns: Generalizations and Performance Improvements R. Srikant R. Agrawal IBM Almaden Research Center Advisor: Dr. Hsu Presented by:
What Is Sequential Pattern Mining?
USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.
Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:
1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.
October 2, 2015 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 8 — 8.3 Mining sequence patterns in transactional.
Sequential PAttern Mining using A Bitmap Representation
Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int ’ l Conference on Data Engineering (ICDE) March 1995 Presenter: Sam Brown.
Modul 8: Sequential Pattern Mining. Terminology  Item  Itemset  Sequence (Customer-sequence)  Subsequence  Support for a sequence  Large/frequent.
Modul 8: Sequential Pattern Mining
Pattern-Growth Methods for Sequential Pattern Mining Iris Zhang
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Sequential Pattern Mining COMP Seminar BCB 713 Module Spring 2011.
Sequential Pattern Mining
Jian Pei Jiawei Han Behzad Mortazavi-Asl Helen Pinto ICDE’01
Generalized Sequential Pattern Mining with Item Intervals Yu Hirate Hayato Yamana PAKDD2006.
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Data Mining Association Rules: Advanced Concepts and Algorithms
S EQUENTIAL P ATTERNS & THE GSP A LGORITHM BY : J OE C ASABONA.
Course on Data Mining: Seminar Meetings Page 1/30 Course on Data Mining ( ): Seminar Meetings Ass. Rules EpisodesEpisodes Text Mining
2004/12/31 報告人 : 邱紹禎 1 Mining Frequent Query Patterns from XML Queries L.H. Yang, M.L. Lee, W. Hsu, and S. Acharya. Proc. of 8th Int. Conf. on Database.
Mining Sequential Patterns © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Slides are adapted from Introduction to Data Mining by Tan, Steinbach,
1 The Strategies for Mining Fault-Tolerant Patterns Jia-Ling Koh Department of Information and Computer Education National Taiwan Normal University.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth Jiawei Han, Jian Pei, Helen Pinto, Behzad Mortazavi-Asl, Qiming Chen,
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Rapid Association Rule Mining Amitabha Das, Wee-Keong Ng, Yew-Kwong Woon, Proc. of the 10th ACM International Conference on Information and Knowledge Management(CIKM’01),2001.
Data Mining: Principles and Algorithms Mining Sequence Patterns
Sequential Pattern Mining
Reducing Number of Candidates
Data Mining: Concepts and Techniques
Association Rules Repoussis Panagiotis.
Advanced Pattern Mining 02
DIRECT HASHING AND PRUNING (DHP) ALGORITHM
Association Rule Mining
An Efficient Algorithm for Incremental Mining of Association Rules
A Parameterised Algorithm for Mining Association Rules
Association Rule Mining
Unit 3 MINING FREQUENT PATTERNS ASSOCIATION AND CORRELATIONS
Data Warehousing Mining & BI
Mining Sequential Patterns
Presentation transcript:

1 An Efficient Algorithm for Mining Frequent Sequences by a New Strategy without Support Counting Ding-Ying Chiu Yi-Hung Wu Arbee L.P. Chen ICDE2004 peaker: Ming Jing Tsai

2 Strategies  Candidate Pruning  Database partitioning  Customer reducing  DISC : Direct Sequence Comparison Reducing the costs for support counting Reducing decomposition of customer sequences

3 Order of sequences  Identify the leftmost items located in different transactions in two sequences having common prefixes  Exam the leftmost distinct items in alphabetic order > <

4 DISC frequent k sequences (a)(b)(b) (b)(d)(e) (b,f,g) (a)(b)(b) CIDCustomer Sequences3-minimum Subsequences 1(a,e,g)(b)(h)(f)(c)(b,f) 2(b)(d,f)(e) 3(b,f,g) 4(f)(a,g)(b,f,h)(b,f)

5 3-sorted database CIDCustomer Sequences3-minimum Subsequences 1(a,e,g)(b)(h)(f)(c)(b,f)(a)(b)(b) 4(f)(a,g)(b,f,h)(b,f)(a)(b)(b) 2(b)(d,f)(e)(b)(d)(e) 3(b,f,g)

6 Compare α 1,α δ  k-minimum subsequence in k-sorted database at first position α 1 at δ-th positionα δ : conditional k-minimum sequence  α1=α δ, α 1 is frequent next potential frequent k-sequence > α δ  α 1 ≠α δ, α 1 is not frequent Next potential frequent k-sequence ≧ α δ

7 Re-sorting 3-sorted database CIDCustomer Sequences3-minimum Subsequence s 2(b)(d,f)(e)(b)(d)(e) 4(f)(a,g)(b,f,h)(b,f)(b,f)(b) 3(b,f,g) 1(a,e,g)(b)(h)(f)(c)(b,f)(b)(f)(b)

8 Advantage  No candidate sequence is generated  Cost of decomposing customer sequences are reduced  Frequent k-sequences can be directly discovered.

9 DISC_ALL

10 Running example δ=3 CIDCustomer Sequences 1(a,d)(d)(a,g,h)(c) 2(b)(a)(f)(a,c,e,g)(c) 3(a,g) 4(a,f,g)(a,e,g,h)(c,g,h) 5(b,f)(b,e)(e,f,h) 6(d,f)(d,f,g,h) 7(b,f,g)(c,e,h) a4 b3 c4 d2 e4 f5 g6 h5 (a) (b) (a) (b) (d) First-level partition

11 First-level Partition1 λ=a,δ=3 CIDCustomer Sequences 1(a,d)(d)(a,g,h)( c) 2(b)(a)(f)(a,c,e,g )(c) 3(a,g) 4(a,f,g)(a,e,g,h)( c,g,h) (a)(b)(c)(d)(e)(f)(g)(h) Sup Last_ CID (_a)(_b)(_c)(_d)(_e)(_f)(_g)(_h) Sup Last _CID Frequent 2-sequences :(a)(a) , (a)(c) , (a)(g) , (ag)

12 Whether an item to the right of the min point can be removed or not  Condition1:The transaction having x contains λ  Condition2:The min point is to the left of the transaction having x  X can be removed Condition1 does not hold, and is not frequent. Condition1 holds, condition2 does not hold, and is not frequent Condition1 and2 both hold, and and are not frequent.

13 DISC λ=(a), δ=3 CID3-minimum subsequences Customer Sequences Apriori pointer 1(a)(a,g)(c) 2(b)(a)(a,c,g)(c) 4(a,g)(a,g)(c,g) The 2-sorted List NoFrequent 2- sequences 1 (a)(a) 2 (a)(c) 3 (a)(g) 4 (a,g) (a)(a)(c) (a)(a,c) (a)(a)(c) CID3-order DB 2(a)(a,c) 1(a)(a)(c) 4 (a)(a,g) CID3-order DB 1(a)(a)(c) 4 2(a)(a,g) Frequent 3-sequences : (a)(a,g) removed (a)(c,g) 2 2 2

14 Bi-level (a)(b)(c)(d)(e)(f)(g)(h) Sup Last_ CID (_a)(_b)(_c)(_d)(_e)(_f)(_g)(_h) Sup Last _CID CIDCustomer Sequences 1(a)(a,g)(c) 2(b)(a)(a,c,g)(c) 4(a,g)(a,g)(c,g) Frequent 4-sequence (a)(a,g)(c)

15 First-level partition 2 CIDCustomer Sequences First-level partitioning 1(a,d)(d)(a,g,h)(c) 2(b)(a)(f)(a,c,e,g)(c) 3(a,g) 4(a,f,g)(a,e,g,h)(c,g,h) 5(b,f)(b,e)(e,f,h) 6(d,f)(d,f,g,h) 7(b,f,g)(c,e,h) (c) (b) removed (c) (b) (d) (b)

16 Experiment  Intel P4 2.8GHz with 512 MB main memory Windows XP  IBM data generator  Compared with PrefixSpan Pseudo-projection named Pseudo

17 Parameter

18 Different database size δ=

19 Different minimum sup DB=10k Slen=8 Tlen=8 Seq.patlen=8

20 Multi-level partitioning DB=10k NRR Q =1/N Q ∑ Size P /Size Q P is a child partition of Q

21 Dynamic DISC-all Customer =50k Items = 1000 θ:transactions# customer

22 Compare on different θ