Download presentation
Presentation is loading. Please wait.
Published byStephen Aldous Heath Modified over 9 years ago
1
Incremental Discovery of Sequential Patterns (ACM-SIGMOD's 96 Data Mining Workshop)
2
Abstract produces the update patterns by scanning only the affected part of the database and data structure handles the dynamism of the minimum support and confidence without recomputat- ion
3
Suffix Trees sequence S : –a list of records ordered by position number starting with 1 –delimited by the special symbol $ that occurs only at the right end of the list –ex. S = 123523423$ suffix tree T for S –whose paths are the suffix of S –whose terminal nodes correspond uniquely to positions within S
4
Suffix Trees properties of suffix tree T for S –T1: An arc of T may represent any nonempty subsequence of S –T2: Each nonterminal node of T, except the root, must have at least two offspring arcs –T3: The subsequences represented by sibling arcs of T must begin with different records
5
Example 2.1 S = 123523423$
6
Suffix Trees Straightforward way O(n 2 ) Linear Time Construction –is to start the search at the lowest possible level, by maintaining suffix links between nonterminal nodes of the tree –a suffix link from u to v is created if the paths from the root to u and v have the form x and , respectively u, v : nonterminal node x : a single record, : a subsequence
7
Sequential Patterns support of a subsequence w.r.t S –the number of positions in S at which occurs locus( ) –the first node in T encounted after is spelled out v.support –the number of terminal nodes in the subtree rooted at a nonterminal node v
8
Sequential Patterns is a pattern if –locus( ).support min_sup and –locus( ).support/locus( ).support min_conf pattern tree(w.r.t min_sup) –the portion of T above all nonterminal nodes v such that v.support min_sup
9
A Single Sequence How to update T when S = Update Strategy – * : the longest suffix of which occurs in at least two different places in – -splitters : those sequences of the form , where is a nonempty suffix of * –delete all -splitters from T, and insert into T all -splitters of the form , where is a nonempty suffix of *
10
Update Strategy Example S = 123523423$, = 52 = 678 – : 123, = 3423$, * : 23 –delete all -splitters from T * : 2352 -splitters : 2 , 52 , 352 , 2352 –insert all -splitters into T * : 23678 -splitters : 8 , 78 , 678 , 3678 , 23678
12
Support by Occurrences The database consists of multiple sequences support of a sequence –the sum of the support of in all sequences D = {S 1, …, S k } replace D with S = {S 1 $ 1, …,S k $ k } to locate all paths of a sequences, a B+tree on pairs (id, head) is maintained with id being the search key
13
Figure. Locating all suffixes of a sequence
14
Support by Sequences Support of a subsequence –the number of sequences in which occurs
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.