Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficient Closed Pattern Mining in Strongly Accessible Set Systems

Similar presentations


Presentation on theme: "Efficient Closed Pattern Mining in Strongly Accessible Set Systems"— Presentation transcript:

1 Efficient Closed Pattern Mining in Strongly Accessible Set Systems
Mario Boley, Tamás Horváth, Axel Poigné, Stefan Wrobel Fraunhofer IAIS, Sankt Augustin & University of Bonn Germany

2 Closed Frequent Patterns
data mining definition: frequent patterns that cannot be further enlarged without changing their support example: closed frequent itemsets compact representation of frequent itemsets number of frequent itemsets can be exponentially larger than that of closed frequent itemsets A B C D E 1

3 The Closed Set Mining (CSM) Problem
in many cases, the closed frequent pattern mining problem is an instance of the Closed Set Mining problem: Given a finite ground set E, a membership oracle MF: 2E  {0,1} defining a family F  2E with   F, and a closure operator : F  F, list the family (F) of closed sets, i.e., (F) = { (X) : XF } . example: closed frequent itemsets E: set of items F: family of frequent itemsets oracle MF decides whether an itemset is frequent for every XF, (X) is the intersection of the transactions containing X

4 Results on Mining Closed Sets
several positive complexity results in different communities, e.g., Formal Concept Analysis (Wille, ’82, Ganter & Wille, ’99) e.g., polynomial delay algorithm (Ganter & Reuter, ‘91) assumption: F is the power set of E Closed Frequent Itemset Mining (Pasquier, Bastide, Taouil, & Lakhal, ’99) e.g., incremental polynomial time algorithm (Boros, Gurvich, Khachiyan, & Makino, ’03) assumption: F is an independence system (closed under taking subsets) question for this talk: What about closed frequent pattern mining problems with set systems not even closed under intersection?

5 Example: Track Mining given a database of GPS-based recordings of spatio-temporal movements (tracks), list the set of closed frequent connected subgraphs of movements of people or cars in a street network closed frequent connected subgraphs: ‘homogeneous’ connected subnetworks model: street network: undirected graph G = (V,E) tracks: subsets of E embedding operator: subset relation easy to decide underlying set system: F = { X  E : X is frequent and connected } F is not closed under intersection

6 Example frequency threshold = 1 F = { {a,b,c,d,e,f,g,h,k}, {a,i,j,k} }
intersection is not connected a i b j c g h k d f e

7 Generators and Inductive Generators
C: -closed element, i.e., C  (F) generator of C: X  F such that C = (X) inductive generator of C: C’ {e}  F such that - C’ is -closed, - C = (C’  {e}) for some e  E \ C’ example: () = , (a) = (ac) = ac, (ab) = (abd) = (abcd) = abcd abcd has a generator (e.g., ab), but no inductive generator ac has an inductive generator (i.e., a)

8 The Closed Set Mining (CSM) Problem
Lemma: The CSM problem can be solved with polynomial delay if the membership oracle and the closure operator can be computed in polynomial time and for every -closed set except (), there exists an inductive generator. proof sketch: traverse the digraph of -closed sets in depth-first manner (C’,C) is an edge iff there is an e  E \ C’ such that C’ {e} is an inductive generator of C -closed sets are stored in prefix trees

9 Main Result for Strongly Accessible Set Systems
set system (E,F) is strongly accessible if F and for every X,Y  F satisfying X  Y, there exists an eY \ X such that X  {e}  F . there is a sequence X=X0, X1,…,Xk=Y s.t. |Xi \ Xi-1| = 1 for i = 1,…,k Thm: For any finite strongly accessible set system (E,F) (i) given by a polynomial membership oracle and (ii) for any polynomially computable closure operator  : F  F, (F) can be listed with polynomial delay. proof sketch: show that every -closed set has an inductive generator apply the previous lemma

10 Appl. 1: Closed Frequent Itemset Mining
Thm: The closed frequent itemset mining problem can be solved with polynomial delay. proof sketch: family of frequent itemsets is an independence system strongly accessible frequency can be decided in polynomial time set system is given by a polynomial membership oracle closure operator: (X) = intersection of the transactions containing X can be computed in polynomial time

11 Appl. 2: Closed Frequent Connected Subgraph Mining
Given an undirected graph G = (V,E), a transaction database D of subgraphs of G, and an integer frequency threshold t > 0, list the family of closed frequent connected subgraphs of D. Thm: The above problem can be solved with polynomial delay. proof sketch: F: set of frequent connected subgraphs of G not closed under intersection F is strongly accessible membership is decidable in polynomial time closure of a frequent connected subgraph X: largest connected supergraph of X in the intersection of the transactions containing X. it is indeed a closure operator and can be computed in polynomial time

12 Closed Frequent Connected Subgraph Mining
Example: frequency threshold = 2

13 Appl. 3: Closed Frequent Subpath Mining
data mining definition: a path P is closed frequent if it is frequent and has strictly larger support than any path P’ containing P there is no closure operator corresponding to this definition example: D = { abc } frequency threshold = 1 F = { ,a, b, c, ab, ac, bc } closed 1-frequent paths: C = { ab, ac, bc } suppose there is a closure operator  s.t. (F) = C because of extensivity: (a) must be ab or ac, say ab (a) = ab is not a subset of (ac) = ac contradicting monotonicity b a c

14 Appl. 3: Closed Frequent Subpath Mining
alternative definition: let P be a path in G compute the intersection GP of the transactions containing P return the intersection of the maximal paths in GP that contain P example: D = { abc } frequency threshold = 1 F = { ,a, b, c, ab, ac, bc } closed 1-frequent paths: C’ = { a, b, c, ab, ac, bc } Thm: The set of closed frequent path w.r.t. the alternative definition can be listed with polynomial delay. b a c

15 An Open Problem accessible set systems:
for all X  F \ {} there is an e  X such that X \ {e}  F there is a sequence  =X0, X1,…, Xk=X s.t. |Xi \ Xi-1| = 1 for i = 1,…,k abcd has no inductive generator Question: Can the positive result on strongly accessible set systems be generalized to accessible set systems?


Download ppt "Efficient Closed Pattern Mining in Strongly Accessible Set Systems"

Similar presentations


Ads by Google