Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Online Mining (Recently) Maximal Frequent Itemsets over Data Streams Hua-Fu Li, Suh-Yin Lee, Man Kwan Shan RIDE-SDMA ’ 05 speaker :董原賓 Advisor :柯佳伶.

Similar presentations


Presentation on theme: "1 Online Mining (Recently) Maximal Frequent Itemsets over Data Streams Hua-Fu Li, Suh-Yin Lee, Man Kwan Shan RIDE-SDMA ’ 05 speaker :董原賓 Advisor :柯佳伶."— Presentation transcript:

1 1 Online Mining (Recently) Maximal Frequent Itemsets over Data Streams Hua-Fu Li, Suh-Yin Lee, Man Kwan Shan RIDE-SDMA ’ 05 speaker :董原賓 Advisor :柯佳伶

2 2 Introduction Difficulties of Data Stream Mining Huge High speed Continuous Solution : one-pass algorithm Summary data structure Mines the maximal frequent itemsets

3 3 Definition Ψ= {i 1, i 2, …, i n } : a set of items W i : basic window i Data stream= [W 1, W 2, …, W N ) : an infinite sequence of basic windows N : the window identifier of the latest basic window Current length of data stream (CL) = |W 1 | + |W 2 | + … + |W N | CL = 3xN W 1 abc bcd acd W 2 cd abd bc W N a b cd ··· time

4 4 Definition X.tsup : true support of itemset X X.esup : estimated support of itemset X, 1 ≤ X.esup ≤ X.tsup X.CL = |W j |+|W j+1 |+ … +|W N | W j : the first window containing X in the summary data structure S : minimum support ε : maximum support error threshold

5 5 Data Stream Mining for maximal Frequent Itemsets (DSM-MFI) Step1, reads a window of transactions Step2, constructs and maintains the summary data structure Step3, prunes the infrequent information Step4, searches the maximal frequent itemsets

6 6 Summary Frequent Itemsets forest (SFI-forest) Composed of a FI-list and a set of SFI-trees SFI-trees item-id, the item identifier esup, the number of transactions reaching the node with the item-id window-id, assigned to a new node of the current basic window identifier node-link, links to the next node with the same item-id in the same SFI-tree

7 7 Summary Frequent Itemsets forest (SFI-forest) FI-list item-id, the item identifier esup, the number of transactions containing the item window-id, assigned to a new entry of the current basic window identifier head link, links to the root node of the item-id.SFI-tree

8 8 Summary Frequent Itemsets forest (SFI-forest) Each SFI-tree has a specific opposite frequent item list (OFI-list) OFI-list (item-id, esup, window-id, head link) head link links to the first node carrying the item-id in the SFI-tree

9 9 Example W 1 abc bcd acd (item-id, esup, window-id, node link) FI-list T = abc (1,1,1) (2,1,1) (3,1,1) X = aX = bX = c Transaction Projection (T)  abc bc c a.OFI-listX = bX = c (2,1,1) (3,1,1) SFI-tree-maintenance (abc)SFI-tree-maintenance (bc)SFI-tree-maintenance (c) a.SFI-tree 1:1:12:1:13:1:1 b.OFI-list (3,1,1) 2:1:13:1:1 b.SFI-tree c.OFI-list c.SFI-tree 3:1:1

10 10 Example W 1 abc bcd acd (item-id, esup, window-id, node link) FI-list T = bcd (1,1,1) (2,1,1) (3,1,1) X = dX = bX = c Transaction Projection (T)  bcd cd d SFI-tree-maintenance (d)SFI-tree-maintenance (bcd)SFI-tree-maintenance (cd) a.SFI-tree b.OFI-list (3,1,1) 2:1:13:1:1 b.SFI-tree c.OFI-list c.SFI-tree 3:1:1 (2,1,2) (3,1,2) (4,1,1)(4,1,1) X = cX = d (4,1,1)(4,1,1) (4,1,1)(4,1,1) (3,1,2) 3:1:24:1:1 3:1:22:1:2 d.SFI-tree 4:1:1 d.OFI-list

11 11 Example W 1 abc bcd acd (item-id, esup, window-id, node link) FI-list T = acd (1,1,1) X = dX = aX = c Transaction Projection (T)  acd cd d SFI-tree-maintenance (acd) a.SFI-tree b.SFI-treec.SFI-tree (2,1,2) (3,1,2) (4,1,1) X = dX = c d.SFI-tree 1:1:12:1:13:1:1 (2,1,1) (3,1,1) a.OFI-list (1,1,2) (3,1,3) (4,1,2) (3,1,2) (4,1,1)(4,1,1) 1:1:2 3:1:1 4:1:1

12 12 Pruning infrequent items from SFI-forest X : 1-itemset in the FI-list if X.esup < X.CL*ε then X and its supersets are deleted from SFI-forest Step 1 deletes item-id.OFI-list item-id.SFI-tree the entry with item-id from the FI-list 2 removes the infrequent item from other OFI-lists by traversing the FI-list

13 13 Pruning infrequent items from SFI-forest 3 deletes the infrequent item from other SFI-trees 4 reconstructs SFI-trees by reinserting these modified item-suffix transactions or join the remainder subtrees into SFI-tree

14 14 Example (1,1,3)(2,1,2)(3,1,3)(4,1,3) s = 0.3, ε= 0.2 FI-list a.SFI-tree c.SFI-treeb.SFI-treed.SFI-tree 2:1:1 1:1:3 3:1:1 4:1:1 3:1:2 2:1:2 3:1:2 4:1:1 4:1:3 4:1:2 (3,1,2) (2,1,1) (4,1,1) (3,1,2) (4,1,1) (4,1,2) a.OFI-list b.OFI-list c.OFI-list d.OFI-list a.CL = b.CL = c.CL = d.CL = 12 12 x 0.2 = 2.4 3:1:1 3:1:3

15 15 Determining maximal frequent itemsets There are k frequent 1-itemsets, e 1, e 2, …, e k, in the FI-list o 1, o 2, …, o j, the items in the e i.OFI-list Generates a candidate maximal frequent (j+1)-itemset, E = (e i, o 1, o 2, …, o j ) starts from a frequent item with the smallest estimated support traverses the path via node link to count E ’ s estimated support

16 16 Determining maximal frequent itemsets if E.esup ≥ s . e i.CL then E is MFI else enumerate E into itemsets with size |E|−1 until finds the set of all maximal frequent itemsets with respect to entry e

17 17 Example (1,1,3)(2,1,2)(3,1,3)(4,1,3) s = 0.3, ε= 0.2 FI-list a.SFI-tree c.SFI-treeb.SFI-treed.SFI-tree 2:1:1 1:1:3 3:1:1 4:1:1 3:1:2 2:1:2 4:1:1 4:1:3 4:1:2 (3,1,2) (2,1,1) (4,1,1) (3,1,2) (4,1,1) (4,1,2) a.OFI-list b.OFI-list c.OFI-list d.OFI-list a.CL = b.CL = c.CL = d.CL = 5 3:1:3 5 x 0.3 = 1.5 Caculate support (bcd)Caculate support (bc) = 1

18 18 Sliding Window Mining over Data Streams Modifications : uses DSM-MFI algorithm to construct a SFI-forest i for each basic window W i find local maximal frequent itemsets (local MFI i ), all local MFI are stored in a queue global MFI-list store all local MFI from W 1 to W N

19 19 Sliding Window Mining over Data Streams When basic window N+1 arrives removes the local MFI 1 from the queue subtracts the support of the local MFI 1 from the global MFI uses DSMMFI algorithm to mine all local maximal frequent itemsets of W N+1 Increases the support of global MFI or insert local MFI N+1 into it

20 20 Experiment 1GHz IBMx24, 384MB, Visual C++ 6.0 s = 0.1%, ε= 0.01%. IBM synthetic datasets T10.I5.D1000K T30.I20.D1000K the data is broken into 20 basic windows for simulating the streaming data

21 21 Experiment


Download ppt "1 Online Mining (Recently) Maximal Frequent Itemsets over Data Streams Hua-Fu Li, Suh-Yin Lee, Man Kwan Shan RIDE-SDMA ’ 05 speaker :董原賓 Advisor :柯佳伶."

Similar presentations


Ads by Google