Sequential Pattern Mining Using A Bitmap Representation

Slides:



Advertisements
Similar presentations
Sequential PAttern Mining using A Bitmap Representation
Advertisements

Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
gSpan: Graph-based substructure pattern mining
PREFIXSPAN ALGORITHM Mining Sequential Patterns Efficiently by Prefix- Projected Pattern Growth
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Mining Multiple-level Association Rules in Large Databases
Zhou Zhao, Da Yan and Wilfred Ng
Frequent Closed Pattern Search By Row and Feature Enumeration
LOGO Association Rule Lecturer: Dr. Bo Yuan
Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang
IncSpan: Incremental Mining of Sequential Patterns in Large Databases Hong Cheng,Xifeng Yan,Jiawei Han University of Illinois at Urbana-Champaign.
1 IncSpan :Incremental Mining of Sequential Patterns in Large Database Hong Cheng, Xifeng Yan, Jiawei Han Proc Int. Conf. on Knowledge Discovery.
Multi-dimensional Sequential Pattern Mining
Sequential Pattern Mining
Weekly Report Start learning GPU Ph.D. Student: Leo Lee date: Sep. 18, 2009.
Indexing Positions of Moving Objects Using B + -trees 4-th WIM meeting, Aalborg 2002 Laurynas Speičys
Pattern Lattice Traversal by Selective Jumps Osmar R. Zaïane and Mohammad El-Hajj Department of Computing Science, University of Alberta Edmonton, AB,
DATA MINING -ASSOCIATION RULES-
ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery.
Panagiotis Papapetrou Department of Computer Science Boston University Constraint-based Mining of Frequent Arrangements of Temporal Intervals Master Thesis.
Mining Sequential Patterns: Generalizations and Performance Improvements R. Srikant R. Agrawal IBM Almaden Research Center Advisor: Dr. Hsu Presented by:
What Is Sequential Pattern Mining?
實驗室研究暨成果說明會 Content and Knowledge Management Laboratory (B) Data Mining Part Director: Anthony J. T. Lee Presenter: Wan-chuen Lin.
USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.
Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:
1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.
October 2, 2015 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 8 — 8.3 Mining sequence patterns in transactional.
Sequential PAttern Mining using A Bitmap Representation
你的一小步,我的一大步 Jen-Wei Huang 黃仁暐 National Taiwan University.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Mining Multidimensional Sequential Patterns over Data Streams Chedy Raїssi and Marc Plantevit DaWak_2008.
Pattern-Growth Methods for Sequential Pattern Mining Iris Zhang
Lecture 11 Sequential Pattern Mining MW 4:00PM-5:15PM Dr. Jianjun Hu CSCE822 Data Mining and Warehousing University.
Sequential Pattern Mining
CEMiner – An Efficient Algorithm for Mining Closed Patterns from Time Interval-based Data Yi-Cheng Chen, Wen-Chih Peng and Suh-Yin Lee ICDM 2011.
Implementation of “A New Two-Phase Sampling Based Algorithm for Discovering Association Rules” Tokunbo Makanju Adan Cosgaya Faculty of Computer Science.
Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001.
Jian Pei Jiawei Han Behzad Mortazavi-Asl Helen Pinto ICDE’01
Multi-Relational Data Mining: An Introduction Joe Paulowskey.
Generalized Sequential Pattern Mining with Item Intervals Yu Hirate Hayato Yamana PAKDD2006.
1 Efficient Algorithms for Incremental Update of Frequent Sequences Minghua ZHANG Dec. 7, 2001.
Algorithms For Time Series Knowledge Mining Fabian Moerchen 沈奕聰.
Panagiotis Papapetrou, George Kollios, Stan Sclaroff, Dimitrios Gunopulos Department of Computer Science Boston University University of California, Riverside.
CloSpan: Mining Closed Sequential Patterns in Large Datasets Xifeng Yan, Jiawei Han and Ramin Afshar Proceedings of 2003 SIAM International Conference.
1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {
PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth Jiawei Han, Jian Pei, Helen Pinto, Behzad Mortazavi-Asl, Qiming Chen,
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Mining Complex Data COMP Seminar Spring 2011.
Differential Analysis on Deep Web Data Sources Tantan Liu, Fan Wang, Jiedan Zhu, Gagan Agrawal December.
Gspan: Graph-based Substructure Pattern Mining
Rapid Association Rule Mining Amitabha Das, Wee-Keong Ng, Yew-Kwong Woon, Proc. of the 10th ACM International Conference on Information and Knowledge Management(CIKM’01),2001.
Mining Sequential Patterns With Item Constraints
Sequential Pattern Mining
Byung Joon Park, Sung Hee Kim
The Concept of Maximal Frequent Itemsets
Jiawei Han Department of Computer Science
CARPENTER Find Closed Patterns in Long Biological Datasets
Spatio-temporal Rule Mining: Issues and Techniques
A Parameterised Algorithm for Mining Association Rules
Mining Complex Data COMP Seminar Spring 2011.
Association Rule Mining
Data Warehousing Mining & BI
Walking in the Crowd: Anonymizing Trajectory Data for Pattern Analysis
Maintaining Frequent Itemsets over High-Speed Data Streams
Finding Frequent Itemsets by Transaction Mapping
Presentation transcript:

Sequential Pattern Mining Using A Bitmap Representation Authors: Jay Ayres, Johannes Gehrke, Tomi Yiu and Jason Flannick Source: The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, 2002.

Outline Introduction SPAM (Sequential PAttern mining) algorithm Lexicographic tree for sequences Depth first tree traversal Pruning S-step I-step Data representation - Bitmap

S= ({a}, {b, c}) is a sequence The support of S is SupD(S) Frequent sequential pattern: SupD(S) >= Min Support SupD(S) = SupD ({a}, {b, c} ) = 2

SPAM (Sequential pattern mining) S = ({a, b, c}, {a, b}) Sequence length: Length (S) = 5 Sequence size: Size (S) = 2 Sequence-extended sequence Itemset-extended sequence S’ = ({a, b, c}, {a, b}, {a}) S’ = ({a, b, c}, {a, b, d})

SPAM (Sequential pattern mining) Max Size = 3 Items = {a, b} Level 1 Level 2 Level 3 Level 4 Level 5 Sequence-extended Item-extended Level 6

SPAM (Sequential pattern mining) Max Size = 3 Items = {a, b} Level 1 Level 2 Level 3 Level 4 Level 5 Level 6

SPAM (Sequential pattern mining) Pruning Items = {a, b, c, d}

Data Representation – BitMap 2K+1 < 3 < 2K+1

S-type S = {a} S’={a},{b} S’={a},{c} …

I-type S = {a} S’={a, b} S’={a, c} …

Expirations and results D3 C2.5 T3 SPAM SPADE PrefixSpan

Small database Small database middle database middle database SPADE SPAM PrefixSpan prefix middle database middle database

large database

Conclusions SPAM DFS traversal search S-type I-type Efficient in large database but inefficient in small database Space-inefficient in comparison to SPADE