Download presentation
Presentation is loading. Please wait.
Published byAgus Setiabudi Modified over 6 years ago
1
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
L L N L Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad 8/13/2007 KDD 2007, San Jose
2
Input Output Query Graph Matching Subgraph Attributed Data Graph
3
Terminology: ``Conform’’
First, We say the subgraph H_t conforms the query graph H_q, if we have all desired job titles and connection between them. Matching Subgraph conforms Query Graph
4
Terminology: ``Interception’’
Intermediate node matching node matching node matching node matching node We allow the in-directed connection by introducing some extra nodes. For example, the connection between 12 and 4 is indirected. We refer this phenomena as interception, and the extra nodes, e.g. node 13 as intermediate node. And all remaining nodes as matching nodes, e.g. node 11 12,4 and 7. Matching Subgraph Query Graph Path is an Interception
5
Terminology: ``Instantiate’’
Matching Subgraph Ht Query Graph Hq Whenever we have a matching subgraph H_t, we say H_t instantiates the query graph H_q. and the matching nodes in H_t instantiates the nodes in the query graph. for example, we say node 11 in H_t instantiates the SEC node in the query graph, and so on. Node 11 instantiates SEC node Ht instantiates Hq
6
Roadmap Introduction How to: Graph X-Ray Experimental Results
Problem Definition Motivations How to: Graph X-Ray Experimental Results Conclusion
7
Motivation: Why Not SQL?
Case 1: Exact match does not exist Q: How to find approximate answer? Case 2: Too many exact matches Q: How to rank them?
8
Motivation: Why Not SQL? (Cont.)
Case 3: Exact match might be not the best answer ``Find CEO who has heavy contact with Accountant’’ Q: how to find right? Exact match 1 direct connection Inexact match Many indirect connections
9
Motivation: Efficiency
Why Not Subgraph Isomorphism? Polynomial for fixed # of pattern query Q1: How to scale up linearly? Q2: … and with a small slope?
10
Wish List G-Ray meets all! Effectiveness Efficiency
Both exact match & inexact Match Ranking among multiple results ``Best’’ answer (proximity-based) Efficiency Scale linearly Scale with small scope G-Ray meets all!
11
Roadmap Introduction How to: Graph X-Ray Experimental Results
Problem Definition Motivations How to: Graph X-Ray Experimental Results Conclusion
12
Preliminary: Center-Piece Subgraph [Tong+]
Q Original Graph Black: query nodes CePS is meta opt. in G-Ray!
13
Preliminary: Augmented Graph
Data nodes 1,…13 Attribute nodes a Footnote Aug. Graph is crucial for computation!
14
G-Ray: quick overview (for loop )
Step 1: SF Step 2: NE Step 3: BR Step 4: NE Step 5: BR Step 6: NE Step 7: BR Step 8: BR SF: Seed-Finder NE: Neighborhood -Expander BR: Bridge
15
Seed-Finder ( ) Q: How to instantiate SEC node? A: Footnote
`11’ is close to some un-known data nodes for `CEO’ `Account.’ and `Manager’
16
Neighborhood-Expander ( )
Q: How to instantiate CEO node? Step 1 Step 2? A: Footnote: Step 3 Step 4? Step 5 Step 6?
17
Bridge ( ) ? Q: A: Prim-like Alg. Footnote To maximize
Step 6: NE Step 7: BR ? Q: A: Prim-like Alg. To maximize Should block node 11 and 7 Footnote Connection subgraph, or one single path?
18
Roadmap Introduction How to: Graph X-Ray Experimental Results
Problem Definition Motivation How to: Graph X-Ray Experimental Results Conclusion Now, let’s see some experimental results.
19
Experimental Results Datasets DBLP Node: author (315k)
Edge: co-authorship (1,800k) Attribute: conference & year (13k) KDD-2001, SIGMOD… We use DBLP to construct an attributed graph, where the nodes are authors and attribute is conference and year. The edge is constructed from co-authorship relationship.
20
Effectiveness: star-query
Here is a star-query, we want to a star-shape group of co-authors, with one author coming from each of PODS, IAT and ISBMS. We see Dr. Phillips Yu is in the center and the rest matching authors being well known domain experts in each conf. Query Result
21
Effectiveness: line-query
And here is a line query, we want to find authors from 4 different conferences who cooperate in a line fashion. Result
22
Effectiveness: loop-query
And this is a loop query. Result
23
Efficiency Response Time # of Edges Scale linearly Small slope
3-5 Seconds # of Edges ~2 M edges
24
Roadmap Introduction How to: Graph X-Ray Experimental Results
Problem Definition Motivation How to: Graph X-Ray Experimental Results Conclusion
25
Conclusion Graph X-Ray (G-Ray) More details in Poster Session
Best effort pattern match in large attributed graphs Scale linearly with small slope More details in Poster Session Monday (tonight) board number 8
26
G-Ray X-Ray Thank you!
27
Backup-slides
28
Proximity on Graph a.k.a relevance, closeness Multi-faceted
1 4 3 2 5 6 7 9 10 8 11 12 a.k.a relevance, closeness Multi-faceted Punish long path Edge weight Now, I will introduce some key concepts behind G-Ray. Once we have these key concepts, the alg. itself is quite straight-forward. So, the fist one is the proximity on the graph. How can we measure the proximity, or in other words, the relevance, the closeness, between two nodes on the graph. Without going into the details, I want claim that random walk with restart is a good solution for this problem. suppose How to: ---- random walk with restart
29
Random walk with restart
1 4 3 2 5 6 7 9 10 8 11 12 0.13 0.10 0.05 0.08 0.04 0.02 0.03 Node 4 Node 1 Node 2 Node 3 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node 12 0.13 0.10 0.22 0.05 0.08 0.04 0.03 0.02 Nearby nodes, higher scores Ranking vector More red, more relevant
30
How to rank the results Our goodness function
Measure the proximity between any two matching nodes if they are required to be connected. (two-way) Multiply them together In G-Ray, we approximately optimize this goodness functions If we have multiple matching subgraphs, we can rank them according to this goodness functions
31
How to rank the results Goodness = Prox (12, 4) x Prox (4, 12) x
matching node matching node matching node matching node Goodness = Prox (12, 4) x Prox (4, 12) x Prox (7, 4) x Prox (4, 7) x Prox (11, 7) x Prox (7, 11) x Prox (12, 11) x Prox (11, 12)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.