Download presentation
Presentation is loading. Please wait.
Published byRosemary Clark Modified over 6 years ago
1
Efficient Closed Pattern Mining in Strongly Accessible Set Systems
Mario Boley, Tamás Horváth, Axel Poigné, Stefan Wrobel Fraunhofer IAIS, Sankt Augustin & University of Bonn Germany
2
Closed Frequent Patterns
data mining definition: frequent patterns that cannot be further enlarged without changing their support example: closed frequent itemsets compact representation of frequent itemsets number of frequent itemsets can be exponentially larger than that of closed frequent itemsets A B C D E 1
3
The Closed Set Mining (CSM) Problem
in many cases, the closed frequent pattern mining problem is an instance of the Closed Set Mining problem: Given a finite ground set E, a membership oracle MF: 2E {0,1} defining a family F 2E with F, and a closure operator : F F, list the family (F) of closed sets, i.e., (F) = { (X) : XF } . example: closed frequent itemsets E: set of items F: family of frequent itemsets oracle MF decides whether an itemset is frequent for every XF, (X) is the intersection of the transactions containing X
4
Results on Mining Closed Sets
several positive complexity results in different communities, e.g., Formal Concept Analysis (Wille, ’82, Ganter & Wille, ’99) e.g., polynomial delay algorithm (Ganter & Reuter, ‘91) assumption: F is the power set of E Closed Frequent Itemset Mining (Pasquier, Bastide, Taouil, & Lakhal, ’99) e.g., incremental polynomial time algorithm (Boros, Gurvich, Khachiyan, & Makino, ’03) assumption: F is an independence system (closed under taking subsets) question for this talk: What about closed frequent pattern mining problems with set systems not even closed under intersection?
5
Example: Track Mining given a database of GPS-based recordings of spatio-temporal movements (tracks), list the set of closed frequent connected subgraphs of movements of people or cars in a street network closed frequent connected subgraphs: ‘homogeneous’ connected subnetworks model: street network: undirected graph G = (V,E) tracks: subsets of E embedding operator: subset relation easy to decide underlying set system: F = { X E : X is frequent and connected } F is not closed under intersection
6
Example frequency threshold = 1 F = { {a,b,c,d,e,f,g,h,k}, {a,i,j,k} }
intersection is not connected a i b j c g h k d f e
7
Generators and Inductive Generators
C: -closed element, i.e., C (F) generator of C: X F such that C = (X) inductive generator of C: C’ {e} F such that - C’ is -closed, - C = (C’ {e}) for some e E \ C’ example: () = , (a) = (ac) = ac, (ab) = (abd) = (abcd) = abcd abcd has a generator (e.g., ab), but no inductive generator ac has an inductive generator (i.e., a)
8
The Closed Set Mining (CSM) Problem
Lemma: The CSM problem can be solved with polynomial delay if the membership oracle and the closure operator can be computed in polynomial time and for every -closed set except (), there exists an inductive generator. proof sketch: traverse the digraph of -closed sets in depth-first manner (C’,C) is an edge iff there is an e E \ C’ such that C’ {e} is an inductive generator of C -closed sets are stored in prefix trees
9
Main Result for Strongly Accessible Set Systems
set system (E,F) is strongly accessible if F and for every X,Y F satisfying X Y, there exists an eY \ X such that X {e} F . there is a sequence X=X0, X1,…,Xk=Y s.t. |Xi \ Xi-1| = 1 for i = 1,…,k Thm: For any finite strongly accessible set system (E,F) (i) given by a polynomial membership oracle and (ii) for any polynomially computable closure operator : F F, (F) can be listed with polynomial delay. proof sketch: show that every -closed set has an inductive generator apply the previous lemma
10
Appl. 1: Closed Frequent Itemset Mining
Thm: The closed frequent itemset mining problem can be solved with polynomial delay. proof sketch: family of frequent itemsets is an independence system strongly accessible frequency can be decided in polynomial time set system is given by a polynomial membership oracle closure operator: (X) = intersection of the transactions containing X can be computed in polynomial time
11
Appl. 2: Closed Frequent Connected Subgraph Mining
Given an undirected graph G = (V,E), a transaction database D of subgraphs of G, and an integer frequency threshold t > 0, list the family of closed frequent connected subgraphs of D. Thm: The above problem can be solved with polynomial delay. proof sketch: F: set of frequent connected subgraphs of G not closed under intersection F is strongly accessible membership is decidable in polynomial time closure of a frequent connected subgraph X: largest connected supergraph of X in the intersection of the transactions containing X. it is indeed a closure operator and can be computed in polynomial time
12
Closed Frequent Connected Subgraph Mining
Example: frequency threshold = 2 …
13
Appl. 3: Closed Frequent Subpath Mining
data mining definition: a path P is closed frequent if it is frequent and has strictly larger support than any path P’ containing P there is no closure operator corresponding to this definition example: D = { abc } frequency threshold = 1 F = { ,a, b, c, ab, ac, bc } closed 1-frequent paths: C = { ab, ac, bc } suppose there is a closure operator s.t. (F) = C because of extensivity: (a) must be ab or ac, say ab (a) = ab is not a subset of (ac) = ac contradicting monotonicity b a c
14
Appl. 3: Closed Frequent Subpath Mining
alternative definition: let P be a path in G compute the intersection GP of the transactions containing P return the intersection of the maximal paths in GP that contain P example: D = { abc } frequency threshold = 1 F = { ,a, b, c, ab, ac, bc } closed 1-frequent paths: C’ = { a, b, c, ab, ac, bc } Thm: The set of closed frequent path w.r.t. the alternative definition can be listed with polynomial delay. b a c
15
An Open Problem accessible set systems:
for all X F \ {} there is an e X such that X \ {e} F there is a sequence =X0, X1,…, Xk=X s.t. |Xi \ Xi-1| = 1 for i = 1,…,k abcd has no inductive generator Question: Can the positive result on strongly accessible set systems be generalized to accessible set systems?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.