Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad

Slides:

Advertisements

Similar presentations

On the Vulnerability of Large Graphs

Advertisements

School of Computer Science Carnegie Mellon University National Taiwan University of Science & Technology Unifying Guilt-by-Association Approaches: Theorems.

gSpan: Graph-based substructure pattern mining

Fast Algorithms for Querying and Mining Large Graphs Hanghang Tong Machine Learning Department Carnegie Mellon University

Best-Effort Top-k Query Processing Under Budgetary Constraints

N EIGHBORHOOD B ASED F AST G RAPH S EARCH I N L ARGE N ETWORKS Arijit Khan, Nan Li, Xifeng Yan, Ziyu Guan Computer Science UC Santa Barbara {arijitkhan,

DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,

© 2012 IBM Corporation IBM Research Gelling, and Melting, Large Graphs by Edge Manipulation Joint Work by Hanghang Tong (IBM) B. Aditya Prakash (Virginia.

Exploiting Sparse Markov and Covariance Structure in Multiresolution Models Presenter: Zhe Chen ECE / CMR Tennessee Technological University October 22,

N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.

Fast Direction-Aware Proximity for Graph Mining KDD 2007, San Jose Hanghang Tong, Yehuda Koren, Christos Faloutsos.

SCS CMU Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong Aug , 2008, Las Vegas.

Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.

Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun.

© 2010 IBM Corporation Diversified Ranking on Large Graphs: An Optimization Viewpoint Hanghang Tong, Jingrui He, Zhen Wen, Ching-Yung Lin, Ravi Konuru.

SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos.

Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions.

Measure Proximity on Graphs with Side Information Joint Work by Hanghang Tong, Huiming Qu, Hani Jamjoom Speaker: Mary McGlohon 1 ICDM 2008, Pisa, Italy15-19.

The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.

Fast Random Walk with Restart and Its Applications

SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct , 2008, Napa, CA CIKM 2008.

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 3: Recommendations & proximity Faloutsos,

Query-Based Outlier Detection in Heterogeneous Information Networks Jonathan Kuck 1, Honglei Zhuang 1, Xifeng Yan 2, Hasan Cam 3, Jiawei Han 1 1 University.

Querying Big Graphs within Bounded Resources 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.

School of Electronics Engineering and Computer Science Peking University Beijing, P.R. China Ziqi Wang, Yuwei Tan, Ming Zhang.

Hubert CARDOTJY- RAMELRashid-Jalal QURESHI Université François Rabelais de Tours, Laboratoire d'Informatique 64, Avenue Jean Portalis, TOURS – France.

GDG DevFest Central Italy Joint work with J. Feldman, S. Lattanzi, V. Mirrokni (Google Research), S. Leonardi (Sapienza U. Rome), H. Lynch (Google)

1 Applications of Relative Importance  Why is relative importance interesting? Web Social Networks Citation Graphs Biological Data  Graphs become too.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 TANGENT: A Novel, “Surprise-me”, Recommendation Algorithm.

2015/10/111 DBconnect: Mining Research Community on DBLP Data Osmar R. Zaïane, Jiyang Chen, Randy Goebel Web Mining and Social Network Analysis Workshop.

Diversified Top-k Graph Pattern Matching 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.

Clustering Spatial Data Using Random Walk David Harel and Yehuda Koren KDD 2001.

7.1 and 7.2: Spanning Trees. A network is a graph that is connected –The network must be a sub-graph of the original graph (its edges must come from the.

On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.

Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.

ValuePick : Towards a Value-Oriented Dual-Goal Recommender System Leman Akoglu Christos Faloutsos OEDM in conjunction with ICDM 2010 Sydney, Australia.

KDD 2007, San Jose Fast Direction-Aware Proximity for Graph Mining Speaker: Hanghang Tong Joint work w/ Yehuda Koren, Christos Faloutsos.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong Guest Lecture.

Fast Random Walk with Restart and Its Applications Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan ICDM 2006 Dec , HongKong.

1 Presented by: Yuchen Bian MRWC: Clustering based on Multiple Random Walks Chain.

Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.

Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang.

KDD 2007, San Jose Fast Direction-Aware Proximity for Graph Mining Speaker: Hanghang Tong Joint work w/ Yehuda Koren, Christos Faloutsos.

Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.

RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.

Kijung Shin Jinhong Jung Lee Sael U Kang

Leveraging Knowledge Bases for Contextual Entity Exploration Categories Date:2015/09/17 Author:Joonseok Lee, Ariel Fuxman, Bo Zhao, Yuanhua Lv Source:KDD'15.

A global approach Finding correspondence between a pair of epipolar lines for all pixels simultaneously Local method: no guarantee we will have one to.

Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University.

Bo Zong, Yinghui Wu, Ambuj K. Singh, Xifeng Yan 1 Inferring the Underlying Structure of Information Cascades

CMU SCS Panel: Social Networks Christos Faloutsos CMU.

SCS CMU Speaker Hanghang Tong Colibri: Fast Mining of Large Static and Dynamic Graphs Speaking Skill Requirement.

Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang.

Outlier Detection for Information Networks Manish Gupta 15 th Jan 2013.

Finding Dense and Connected Subgraphs in Dual Networks

Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad

Probabilistic Data Management

Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad

Finding Story Chains in Newswire Articles

Large Graph Mining: Power Tools and a Practitioner’s guide

Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16

Self-tuning in Graph-Based Reference Disambiguation

Graph and Tensor Mining for fun and profit

Bidirectional Query Planning Algorithm

Learning to Rank Typed Graph Walks: Local and Global Approaches

Proximity in Graphs by Using Random Walks

Presentation transcript:

Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad L L N L Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad 8/13/2007 KDD 2007, San Jose

Input Output Query Graph Matching Subgraph Attributed Data Graph

Terminology: ``Conform’’ First, We say the subgraph H_t conforms the query graph H_q, if we have all desired job titles and connection between them. Matching Subgraph conforms Query Graph

Terminology: ``Interception’’ Intermediate node matching node matching node matching node matching node We allow the in-directed connection by introducing some extra nodes. For example, the connection between 12 and 4 is indirected. We refer this phenomena as interception, and the extra nodes, e.g. node 13 as intermediate node. And all remaining nodes as matching nodes, e.g. node 11 12,4 and 7. Matching Subgraph Query Graph Path 12-13-4 is an Interception

Terminology: ``Instantiate’’ Matching Subgraph Ht Query Graph Hq Whenever we have a matching subgraph H_t, we say H_t instantiates the query graph H_q. and the matching nodes in H_t instantiates the nodes in the query graph. for example, we say node 11 in H_t instantiates the SEC node in the query graph, and so on. Node 11 instantiates SEC node Ht instantiates Hq

Roadmap Introduction How to: Graph X-Ray Experimental Results Problem Definition Motivations How to: Graph X-Ray Experimental Results Conclusion

Motivation: Why Not SQL? Case 1: Exact match does not exist Q: How to find approximate answer? Case 2: Too many exact matches Q: How to rank them?

Motivation: Why Not SQL? (Cont.) Case 3: Exact match might be not the best answer ``Find CEO who has heavy contact with Accountant’’ Q: how to find right? Exact match 1 direct connection Inexact match Many indirect connections

Motivation: Efficiency Why Not Subgraph Isomorphism? Polynomial for fixed # of pattern query Q1: How to scale up linearly? Q2: … and with a small slope?

Wish List G-Ray meets all! Effectiveness Efficiency Both exact match & inexact Match Ranking among multiple results ``Best’’ answer (proximity-based) Efficiency Scale linearly Scale with small scope G-Ray meets all!

Roadmap Introduction How to: Graph X-Ray Experimental Results Problem Definition Motivations How to: Graph X-Ray Experimental Results Conclusion

Preliminary: Center-Piece Subgraph [Tong+] Q Original Graph Black: query nodes CePS is meta opt. in G-Ray!

Preliminary: Augmented Graph Data nodes 1,…13 Attribute nodes a Footnote Aug. Graph is crucial for computation!

G-Ray: quick overview (for loop ) Step 1: SF Step 2: NE Step 3: BR Step 4: NE Step 5: BR Step 6: NE Step 7: BR Step 8: BR SF: Seed-Finder NE: Neighborhood -Expander BR: Bridge

Seed-Finder ( ) Q: How to instantiate SEC node? A: Footnote `11’ is close to some un-known data nodes for `CEO’ `Account.’ and `Manager’

Neighborhood-Expander ( ) Q: How to instantiate CEO node? Step 1  Step 2? A: Footnote: Step 3  Step 4? Step 5  Step 6?

Bridge ( ) ? Q: A: Prim-like Alg. Footnote To maximize Step 6: NE Step 7: BR ? Q: A: Prim-like Alg. To maximize Should block node 11 and 7 Footnote Connection subgraph, or one single path?

Roadmap Introduction How to: Graph X-Ray Experimental Results Problem Definition Motivation How to: Graph X-Ray Experimental Results Conclusion Now, let’s see some experimental results.

Experimental Results Datasets DBLP Node: author (315k) Edge: co-authorship (1,800k) Attribute: conference & year (13k) KDD-2001, SIGMOD… We use DBLP to construct an attributed graph, where the nodes are authors and attribute is conference and year. The edge is constructed from co-authorship relationship.

Effectiveness: star-query Here is a star-query, we want to a star-shape group of co-authors, with one author coming from each of PODS, IAT and ISBMS. We see Dr. Phillips Yu is in the center and the rest matching authors being well known domain experts in each conf. Query Result

Effectiveness: line-query And here is a line query, we want to find authors from 4 different conferences who cooperate in a line fashion. Result

Effectiveness: loop-query And this is a loop query. Result

Efficiency Response Time # of Edges Scale linearly Small slope 3-5 Seconds # of Edges ~2 M edges

Roadmap Introduction How to: Graph X-Ray Experimental Results Problem Definition Motivation How to: Graph X-Ray Experimental Results Conclusion

Conclusion Graph X-Ray (G-Ray) More details in Poster Session Best effort pattern match in large attributed graphs Scale linearly with small slope More details in Poster Session Monday (tonight) board number 8

G-Ray X-Ray www.cs.cmu.edu/~htong Thank you!

Backup-slides

Proximity on Graph a.k.a relevance, closeness Multi-faceted 1 4 3 2 5 6 7 9 10 8 11 12 a.k.a relevance, closeness Multi-faceted Punish long path Edge weight Now, I will introduce some key concepts behind G-Ray. Once we have these key concepts, the alg. itself is quite straight-forward. So, the fist one is the proximity on the graph. How can we measure the proximity, or in other words, the relevance, the closeness, between two nodes on the graph. Without going into the details, I want claim that random walk with restart is a good solution for this problem. suppose How to: ---- random walk with restart

Random walk with restart 1 4 3 2 5 6 7 9 10 8 11 12 0.13 0.10 0.05 0.08 0.04 0.02 0.03 Node 4 Node 1 Node 2 Node 3 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node 12 0.13 0.10 0.22 0.05 0.08 0.04 0.03 0.02 Nearby nodes, higher scores Ranking vector More red, more relevant

How to rank the results Our goodness function Measure the proximity between any two matching nodes if they are required to be connected. (two-way) Multiply them together In G-Ray, we approximately optimize this goodness functions If we have multiple matching subgraphs, we can rank them according to this goodness functions

How to rank the results Goodness = Prox (12, 4) x Prox (4, 12) x matching node matching node matching node matching node Goodness = Prox (12, 4) x Prox (4, 12) x Prox (7, 4) x Prox (4, 7) x Prox (11, 7) x Prox (7, 11) x Prox (12, 11) x Prox (11, 12)