Download presentation
Presentation is loading. Please wait.
Published byBethany Adams Modified over 6 years ago
1
Sequential Pattern Mining Using A Bitmap Representation
Authors: Jay Ayres, Johannes Gehrke, Tomi Yiu and Jason Flannick Source: The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, 2002.
2
Outline Introduction SPAM (Sequential PAttern mining) algorithm
Lexicographic tree for sequences Depth first tree traversal Pruning S-step I-step Data representation - Bitmap
4
S= ({a}, {b, c}) is a sequence
The support of S is SupD(S) Frequent sequential pattern: SupD(S) >= Min Support SupD(S) = SupD ({a}, {b, c} ) = 2
5
SPAM (Sequential pattern mining)
S = ({a, b, c}, {a, b}) Sequence length: Length (S) = 5 Sequence size: Size (S) = 2 Sequence-extended sequence Itemset-extended sequence S’ = ({a, b, c}, {a, b}, {a}) S’ = ({a, b, c}, {a, b, d})
6
SPAM (Sequential pattern mining)
Max Size = 3 Items = {a, b} Level 1 Level 2 Level 3 Level 4 Level 5 Sequence-extended Item-extended Level 6
7
SPAM (Sequential pattern mining)
Max Size = 3 Items = {a, b} Level 1 Level 2 Level 3 Level 4 Level 5 Level 6
8
SPAM (Sequential pattern mining)
Pruning Items = {a, b, c, d}
9
Data Representation – BitMap
2K+1 < 3 < 2K+1
10
S-type S = {a} S’={a},{b} S’={a},{c} …
11
I-type S = {a} S’={a, b} S’={a, c} …
12
Expirations and results
D3 C2.5 T3 SPAM SPADE PrefixSpan
13
Small database Small database middle database middle database
SPADE SPAM PrefixSpan prefix middle database middle database
14
large database
16
Conclusions SPAM DFS traversal search S-type I-type
Efficient in large database but inefficient in small database Space-inefficient in comparison to SPADE
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.