Presentation is loading. Please wait.

Presentation is loading. Please wait.

Incremental Discovery of Sequential Patterns (ACM-SIGMOD's 96 Data Mining Workshop)

Similar presentations


Presentation on theme: "Incremental Discovery of Sequential Patterns (ACM-SIGMOD's 96 Data Mining Workshop)"— Presentation transcript:

1 Incremental Discovery of Sequential Patterns (ACM-SIGMOD's 96 Data Mining Workshop)

2 Abstract produces the update patterns by scanning only the affected part of the database and data structure handles the dynamism of the minimum support and confidence without recomputat- ion

3 Suffix Trees sequence S : –a list of records ordered by position number starting with 1 –delimited by the special symbol $ that occurs only at the right end of the list –ex. S = 123523423$ suffix tree T for S –whose paths are the suffix of S –whose terminal nodes correspond uniquely to positions within S

4 Suffix Trees properties of suffix tree T for S –T1: An arc of T may represent any nonempty subsequence of S –T2: Each nonterminal node of T, except the root, must have at least two offspring arcs –T3: The subsequences represented by sibling arcs of T must begin with different records

5 Example 2.1 S = 123523423$

6 Suffix Trees Straightforward way  O(n 2 ) Linear Time Construction –is to start the search at the lowest possible level, by maintaining suffix links between nonterminal nodes of the tree –a suffix link from u to v is created if the paths from the root to u and v have the form x  and , respectively u, v : nonterminal node x : a single record,  : a subsequence

7 Sequential Patterns support of a subsequence  w.r.t S –the number of positions in S at which  occurs locus(  ) –the first node in T encounted after  is spelled out v.support –the number of terminal nodes in the subtree rooted at a nonterminal node v

8 Sequential Patterns  is a pattern if –locus(  ).support  min_sup and –locus(  ).support/locus(  ).support  min_conf pattern tree(w.r.t min_sup) –the portion of T above all nonterminal nodes v such that v.support  min_sup

9 A Single Sequence How to update T when S =    Update Strategy –  * : the longest suffix of  which occurs in at least two different places in  –  -splitters : those sequences of the form , where  is a nonempty suffix of  *  –delete all  -splitters from T, and insert into T all  -splitters of the form , where  is a nonempty suffix of  * 

10 Update Strategy Example S = 123523423$,  = 52   = 678 –  : 123,  = 3423$,  * : 23 –delete all  -splitters from T  *  : 2352  -splitters : 2 , 52 , 352 , 2352  –insert all  -splitters into T  *  : 23678  -splitters : 8 , 78 , 678 , 3678 , 23678 

11

12 Support by Occurrences The database consists of multiple sequences support of a sequence  –the sum of the support of  in all sequences D = {S 1, …, S k } replace D with S = {S 1 $ 1, …,S k $ k } to locate all paths of a sequences, a B+tree on pairs (id, head) is maintained with id being the search key

13 Figure. Locating all suffixes of a sequence

14 Support by Sequences Support of a subsequence  –the number of sequences in which  occurs


Download ppt "Incremental Discovery of Sequential Patterns (ACM-SIGMOD's 96 Data Mining Workshop)"

Similar presentations


Ads by Google