1 Multi-dimensional Sequential Pattern Mining Helen Pinto, Jiawei Han, Jian Pei, Ke Wang, Qiming Chen, Umeshwar Dayal ~From: 10th ACM Intednational Conference.

Slides:



Advertisements
Similar presentations
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Advertisements

Mining Frequent Patterns Using FP-Growth Method Ivan Tanasić Department of Computer Engineering and Computer Science, School of Electrical.
PREFIXSPAN ALGORITHM Mining Sequential Patterns Efficiently by Prefix- Projected Pattern Growth
LOGO Association Rule Lecturer: Dr. Bo Yuan
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)
Rakesh Agrawal Ramakrishnan Srikant
IncSpan: Incremental Mining of Sequential Patterns in Large Databases Hong Cheng,Xifeng Yan,Jiawei Han University of Illinois at Urbana-Champaign.
Cube Explorer: Online Exploration of Data Cubes Jiawei Han, Jianyong Wang, Guozhu Dong, Jian Pei, Ke Wang.
Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and.
Multi-dimensional Sequential Pattern Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Sequential Pattern Mining
Data Mining Association Rules Yao Meng Hongli Li Database II Fall 2002.
Sequence Databases & Sequential Patterns
Mining Sequential Patterns Dimitrios Gunopulos, UCR.
Business Systems Intelligence: 4. Mining Association Rules Dr. Brian Mac Namee (
Temporal Pattern Matching of Moving Objects for Location-Based Service GDM Ronald Treur14 October 2003.
1 Mining Association Rules in Large Databases Association rule mining Algorithms for scalable mining of (single-dimensional Boolean) association rules.
Association Analysis: Basic Concepts and Algorithms.
Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 5: Association Rules, Sequential Associations.
Data Mining: Concepts and Techniques 1 Mining Sequence Patterns in Transactional Databases CS240B --UCLA Notes by Carlo Zaniolo Based on those by J. Han.
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
Pattern-growth Methods for Sequential Pattern Mining: Principles and Extensions Jiawei Han (UIUC) Jian Pei (Simon Fraser Univ.)
Performance and Scalability: Apriori Implementation.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar GNET 713 BCB Module Spring 2007.
A Short Introduction to Sequential Data Mining
What Is Sequential Pattern Mining?
實驗室研究暨成果說明會 Content and Knowledge Management Laboratory (B) Data Mining Part Director: Anthony J. T. Lee Presenter: Wan-chuen Lin.
Ch5 Mining Frequent Patterns, Associations, and Correlations
1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.
October 2, 2015 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 8 — 8.3 Mining sequence patterns in transactional.
AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.
1 Data Mining and Warehousing: Session 6 Association Analysis Jia-wei Han
Mining Multidimensional Sequential Patterns over Data Streams Chedy Raїssi and Marc Plantevit DaWak_2008.
1 Knowledge discovery & data mining Association rules and market basket analysis --introduction A EDBT2000 Fosca Giannotti and Dino Pedreschi.
Pattern-Growth Methods for Sequential Pattern Mining Iris Zhang
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Sequential Pattern Mining COMP Seminar BCB 713 Module Spring 2011.
Lecture 11 Sequential Pattern Mining MW 4:00PM-5:15PM Dr. Jianjun Hu CSCE822 Data Mining and Warehousing University.
Sequential Pattern Mining
Jian Pei Jiawei Han Behzad Mortazavi-Asl Helen Pinto ICDE’01
Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.
CloSpan: Mining Closed Sequential Patterns in Large Datasets Xifeng Yan, Jiawei Han and Ramin Afshar Proceedings of 2003 SIAM International Conference.
Temporal Database Paper Reading R 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang.
1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.
Mining Sequential Patterns © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Slides are adapted from Introduction to Data Mining by Tan, Steinbach,
PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth Jiawei Han, Jian Pei, Helen Pinto, Behzad Mortazavi-Asl, Qiming Chen,
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
1 Top Down FP-Growth for Association Rule Mining By Ke Wang.
Data Mining: Principles and Algorithms Mining Sequence Patterns
CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets
Sequential Pattern Mining
Jian Pei and Runying Mao (Simon Fraser University)
Sequential Pattern Mining Using A Bitmap Representation
Information Management course
Association rule mining
Data Mining: Concepts and Techniques
CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets
A Linear Method for Deviation Detection in Large databases
I don’t need a title slide for a lecture
Association Rule Mining
Data Mining: Concepts and Techniques — Chapter 8 — 8
Data Warehousing Mining & BI
Association Rule Mining
Online Analytical Processing Stream Data: Is It Feasible?
CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets
Presentation transcript:

1 Multi-dimensional Sequential Pattern Mining Helen Pinto, Jiawei Han, Jian Pei, Ke Wang, Qiming Chen, Umeshwar Dayal ~From: 10th ACM Intednational Conference on Information and Knowledge Management (CIKM 2001), Atlanta. 碩專二 阮士峰

2 Outline Why multidimensional sequential pattern mining? Problem definition UniSeq Algorithms Dim-Seq and Seq-Dim Experimental results Conclusions

3 Why Sequential Pattern Mining? Sequential pattern mining: Finding time-related frequent patterns (frequent subsequences) Many data and applications are time-related Customer shopping patterns, telephone calling patterns Natural disasters (e.g., earthquake, hurricane) Disease and treatment Stock market fluctuation Weblog click stream analysis DNA sequence analysis

4 Sequential Pattern: Basics SequenceSeq. ID A sequence database A sequence : Elements is a subsequence of Given support threshold min_sup =2, is a sequential pattern

5 Multi-Dimenesion Sequence Database cidCust_grpCityAge_grpsequence 10BusinessBostonMiddle 20ProfessionalChicagoYoung 30BusinessChicagoMiddle 40EducationNew YorkRetired If support =2, P is a MD sequential pattern P=(*,Chicago,*, ) matches tuple 20 and 30

6 Problem definition Sequential patterns are useful “try a 100 hour free internet access package”  “subscribe to 15 hours/mouth package”  “ upgrade to 30 hours/mouth package”  “upgrade to unlimited package” Marketing, product design & development Problems: lack of focus Various groups of customers may have different patterns MD-sequential pattern mining: integrate multi- dimensional analysis and sequential pattern mining

7 UniSeq Embed MD information into sequences cidCust_grpCityAge_grpsequence 10BusinessBostonMiddle 20ProfessionalChicagoYoung 30BusinessChicagoMiddle 40EducationNew YorkRetired cidMD-extension of sequences Mine the extended sequence database using sequential pattern mining methods Table1 SDB Table2 SDB MD

8 UniSeq(cont.) Sequence database SDB MD can be mined using PrefixSpan. First scan the database, PrefixSpan finds all the single-item frequent sequence. these are :2, :2, :2, :2, :4, :3, :2 and :2. The complete set of sequential patterns can then be partitioned into 8 subsets. cidMD-extension of sequences

9 UniSeq(cont.) Ex: the -projected database contains two postfix sequences: and. cidMD-extension of sequences Then print out the sequential pattern, and find this projected database. They are : and, which form the sequential paterns “ :2” and “ :2” respectively. However, -projected database contains postfix sequences for: and with one frequent item between them find “” :2”  (*,Chicago,*, )

10 Mine Sequential Patterns by Prefix Projections Step 1: find length-1 sequential patterns,,,,, Step 2: divide search space. The complete set of seq. pat. can be partitioned into 6 subsets: The ones having prefix ; … The ones having prefix SIDsequence

11 Find Seq. Patterns with Prefix Only need to consider projections -projected database:,,, Find all the length-2 seq. pat. Having prefix :,,,,, Further partition into 6 subsets Having prefix ; … Having prefix SIDsequence

12 Completeness of PrefixSpan SIDsequence SDB Length-1 sequential patterns,,,,, -projected database Length-2 sequential patterns,,,,, Having prefix -proj. db … Having prefix -projected database … Having prefix Having prefix, …, …

13 Efficiency of PrefixSpan No candidate sequence needs to be generated Projected databases keep shrinking Major cost of PrefixSpan: constructing projected databases

14 Dim-Seq First find MD-patterns E.g. (*,Chicago,*) Form projected sequence database and for (*,Chicago,*) Find seq. pat in projected database E.g. (*,Chicago,*, ) cidCust_grpCityAge_grpsequence 10BusinessBostonMiddle 20ProfessionalChicagoYoung 30BusinessChicagoMiddle 40EducationNew YorkRetired

15 Seq-Dim Find sequential patterns E.g. Form projected MD-database E.g. (Professional,Chicago,Young) and (Business,Chicago,Middle) for Mine MD-patterns E.g. (*,Chicago,*, ) cidCust_grpCityAge_grpsequence 10BusinessBostonMiddle 20ProfessionalChicagoYoung 30BusinessChicagoMiddle 40EducationNew YorkRetired

16 Dim-Seq and Seq-Dim The problem of multi-dimensional sequential pattern mining problem can reduced to two sub-problem: sequential pattern mining and MD-pattern mining As introduced before, sequential pattern mining can be done efficiently by PrefixSpan. For MD-pattern mining, we adopt a BUC-like algorithm.

17 BUC algorithm Kevin Beyer, Raghu Ramakrishnan, Bottom-up computation of sparse and Iceberg CUBE, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p , May 31-June 03, 1999, Philadelphia, Pennsylvania, United States

18 Mining MD-Patterns(BUC-like) All (cust-grp,*,*) (*,city,*)(*,*,age-grp) (cust-grp,city)Cust-grp,*,age-grp) (cust-grp,city,age-grp) cidCust_grpCityAge_grpsequence 10BusinessBostonMiddle 20ProfessionalChicagoYoung 30BusinessChicagoMiddle 40EducationNew YorkRetired BUC processing

19 Experimental results Run on Pentium III pc with 1G main memory. Using Microsoft Visual C In this dataset, the number of items is set to 10,000, while the number of sequence is 10,000. The average number of items within each element is 2.5. The average number of elements in one sequence is 8.

20 Scalability Over Dimensionality

21 Scalability Over Cardinality

22 Scalability Over Support Threshold

23 Scalability Over Database Size

24 Pros & Cons of Algorithms Seq-Dim is efficient and scalable Fastest in most cases UniSeq is also efficient and scalable Fastest with low dimensionality Dim-Seq has poor scalability

25 Conclusions MD seq. pat. mining are interesting and useful Mining MD seq. pat. efficiently Uniseq, Dim-Seq, and Seq-Dim Future work Applications of sequential pattern mining

報告結束

27 References (1) R. Agrawal and R. Srikant. Fast algorithms for mining association rules. VLDB'94, pages R. Agrawal and R. Srikant. Mining sequential patterns. ICDE'95, pages Kevin Beyer, Raghu Ramakrishnan, Bottom-up computation of sparse and Iceberg CUBE, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p , May 31- June 03, 1999, Philadelphia, Pennsylvania, United States C. Bettini, X. S. Wang, and S. Jajodia. Mining temporal relationships with multiple granularities in time sequences. Data Engineering Bulletin, 21:32-38, M. Garofalakis, R. Rastogi, and K. Shim. Spirit: Sequential pattern mining with regular expression constraints. VLDB'99, pages J. Han, G. Dong, and Y. Yin. Efficient mining of partial periodic patterns in time series database. ICDE'99, pages J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M.-C. Hsu. FreeSpan: Frequent pattern-projected sequential pattern mining. KDD'00, pages

28 References (2) J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. SIGMOD'00, pages H. Lu, J. Han, and L. Feng. Stock movement and n-dimensional intertransaction association rules. DMKD'98, pages 12:1-12:7. H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1: , B. "Ozden, S. Ramaswamy, and A. Silberschatz. Cyclic association rules. ICDE'98, pages J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. PrefixSpan: Mining sequential patterns efficiently by prefix- projected pattern growth. ICDE'01, pages R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. EDBT'96, pages 3-17.