CFI-Stream: Mining Closed Frequent Itemsets in Data Streams Nan Jiang,Le Gruenwald SIGKDD’06 報告者:林靜怡 2006/10/04
Introduction mining Closed frequent itemsets computes and maintains closed itemsets online and incrementally perform the closure checking output the current closed frequent itemsets in real time based on users’ specified thresholds
Definition D:data stream I = { , , …, } :a set of n elements, called items T: subsets of all the transactions X: subsets of all the items appearing in a data stream
Definition C(X):the smallest closed set containing X Definition 1 An itemset X is said to be closed if and only if C(X)= f(g(X)) = f•g(X) = X
Algorithm CFI-Stream algorithm DIrect Update (DIU) tree perform the closure checking online over a data stream sliding window Conditions need to check for closed itemsets check when performing addition and deletion operations on the DIU tree
DIU tree maintain the current closed itemsets k levels in the DIU tree, each level i stores the closed i-itemsets
DIU tree Each node in the DIU tree stores a closed itemset its current support information links to its parent and children nodes
Add a Transaction to the DIU Tree T1:original transaction set t:new arrived transaction Conditions to Check for Closed Itemsets (1) t is in the T1, if the largest itemset X it contains is not currently in the DIU tree ->check for all X’s subsets Y, which are in T1
(2) when t is not in T1, for each its subset Y, if Y is in T1, we need to check
Closure Checking for Addition
C,D 2 A,B 3 A,B,C CD C CD 2 1 3 4 A,B,C 2 1 3 1 AB ABC 1 2
Delete a Transaction in DIU Tree Conditions to Check for Closed Itemsets When the number of the transactions with same itemset of X is equal to zero, if Y is a subset of X, and Y is a closed itemset in the original transaction set
Closure Checking for Deletion
C,D 2 A,B 3 A,B,C 4 A,B,C 2 3 C 2 3 1 AB CD 2 ABC
Experiment Synthetic datasets T10.I6.D100K and T5.I4.D100K
Experiment
Experiment