Download presentation
Presentation is loading. Please wait.
Published byTodd Stokes Modified over 9 years ago
1
1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai
2
2 Introduction mining tree pattern T in a single graph Incremental in the number of nodes Unordered, rooted For each tree T, all conjunctive queries are generated SQL
3
3 Tree query pattern example Selected node(constant):0,8 Existential node: ∃ Distinguished node: x
4
4 matching A query Q matchs in a graph G Homomorphism h (i,j) ∈ Q, (h(i), h(j)) ∈ G Verify value on x to distinguish them Don ’ t care existential nodes on different values
5
5 ∃ 08 Q G Frequency = 3(4,5,8)
6
6 Generate all trees Increasing number of nodes Canonically ordered Level sequence i th number is the depth of the i th node in preorder Lexicagraph:Maximal one Level sequence 012212 > 012122
7
7 queries Levelwise Fix a tree T, and find all queries based on T whose frequency in G is at lease k Q{∏, ∑, λ} ∏: existential nodes ∑: selected nodes λ: label of selected nodes
8
8
9
9 To generate candidate in an efficient manner,using of candidacy tables and frequency tables
10
10 CanTab ∏, ∑ parents Each candidacy table can be computed by taking the natural join of its parent ’ s(∏’, ∑’) frequency tables CanTabφ,{x} as the table with a single column x,holding all nodes of the graph G being mined
11
11 ∏=x2,formulate expression->SQL ∑={x 1,x 3 } Candidacy table Frequency table
12
12 Equivalent queries To avoid query Q2 equivalent to an earlier query Q1 Containment mapping Q1 to Q2 is a homomorphism the distinguished variables of Q1 is mapping one-to-one to those of Q2 So as selected nodes Case1:Q1 has fewer nodes than Q2 Case2:Q1 and Q2 have the same number of nodes
13
13 Case1 redundancy checking Q2 contains redundant subtrees such that removing them yields an equivalent query Redundancy a subtree C in the form of a linear chain of existential nodes such that parent of C has another subtree that is at least as deep as C Q1Q2
14
14 Case 2 canonical forms Q1 and Q2 are tree isomorphism Canonical forms Existential nodes-> ∃ Selceted nodes ->c Distinguished nodes->X C, ∃ ∃,C ∃,X C,X X,C X,X C, ∃ ∃,C ∃,X C,X X,C X,X
15
15 experiment Pentium4 2.8GHz 1GB main memory Linux 2.6 C++ embedded SQL Relational database:DB2 UDB v8.2
16
16 Real dataset A food web, a protein intersactions graph, and a citation graph k: frequency threshold Size: maximal size of trees in the run It all takes several hours
17
17 Food web 154 species dependent on Scotch Broom Label 20 occurs in many frequent patterns->Orthotylus adenocarpi( 什麼都 吃的植物害蟲 ) Frequency 176
18
18 Protein interaction graph 1870 種 Saccharomyces cerevisiae 發酵酵 母菌 ( 幫助麵包發酵 ) A small number of highly connected nodes occur
19
19 Citation graph Kdd cup 2003 2500 papers high-energy physics 350,000 cross-references Frequency 1655
20
20 Synthetic data,web graphs Tree size 5 Minsup 4,10,25
21
21 Uniform random graphs Dense, uniform minsup: 10,25 edges:47,264,997
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.