Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding window Yun Chi, Haixun Wang, Philip S. Yu, Richard R. Muntz, ICDM 2004. Adviser: Jia-Ling Koh Speaker: Shu-Ning Shin Date: 2005.5.6
Introduction Algorithm Moment: Mime closed frequent itemsets in the most N transactions in data stream. Data structure, closed enumeration tree (CET), maintain: Closed frequent itemsets, Boundary between closed frequent itemsets and the rest.
Problem Lexicographic order: Closed frequent itemset: none of its supersets has the same support. Items Σ={A, B, C, D}, window size N=4, minimum support s = ½.
CET (1) Four types of itemsets node: Infrequent: Infrequent gateway node, dashed circle — D. Frequent but not closed: Unpromising gateway node, dashed rectangle — AC. Intermediate node — A. Closed: Closed node, solid rectangle — ABC.
CET (2) Property 1: if nI is an infrequent gateway node, then any node nJ where represents an infrequent itemset. Property 2: if nI is an unpromising gateway node, then nI is not closed, and none of nI’s descendents is closed. Property 3: if nI is an intermediate node, then nI is not closed and nI has closed descendents.
Moment: Build CET (1) Node nI has information : Hash table: itemset I, node type, support, tid_sum Hash table: store all closed frequent itemsets check if nI is an unpromising gateway node, if exit a nJ where hash on the (support, tid_sum) of nI
Moment: Build CET (2)
Moment: Build CET (3) Items Σ={A, B, C, D}, Explore(n{i}), for each i in Σ. ψ A B C D
Moment: Add CET (1)
Moment: Add CET (2) ψ Adding a transaction tid 5: Call Addition(nψ, t5, D, minsup) ψ 4 A 4 C 2 D AD CD F={D} AD 3 AC 1 2 CD 5 A, C, D
Moment: Delete CET (1)
Moment: Delete CET (2) Deleting a transaction tid 1: F={D} 3 C 1 D
Moment: Update CET (3) Deleting a transaction tid 2: 3 A 2 B 2 AB
Experiment (1) Dataset: T20I4D100K Window Size N = 100000
Experiment (2)
Experiment (3) Real Datase: BMS-WebView-1 Items: 497, transactions: 59602 Window Size N = 50000