Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad

Slides:



Advertisements
Similar presentations
Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.
Advertisements

On the Vulnerability of Large Graphs
gSpan: Graph-based substructure pattern mining
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 – CHAPTER 4 GRAPHS 1.
N EIGHBORHOOD B ASED F AST G RAPH S EARCH I N L ARGE N ETWORKS Arijit Khan, Nan Li, Xifeng Yan, Ziyu Guan Computer Science UC Santa Barbara {arijitkhan,
Introduction to Graphs
Label Placement and graph drawing Imo Lieberwerth.
Optimization of Pearl’s Method of Conditioning and Greedy-Like Approximation Algorithm for the Vertex Feedback Set Problem Authors: Ann Becker and Dan.
N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.
Fast Direction-Aware Proximity for Graph Mining KDD 2007, San Jose Hanghang Tong, Yehuda Koren, Christos Faloutsos.
SCS CMU Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong Aug , 2008, Las Vegas.
Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun.
© 2010 IBM Corporation Diversified Ranking on Large Graphs: An Optimization Viewpoint Hanghang Tong, Jingrui He, Zhen Wen, Ching-Yung Lin, Ravi Konuru.
SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos.
Measure Proximity on Graphs with Side Information Joint Work by Hanghang Tong, Huiming Qu, Hani Jamjoom Speaker: Mary McGlohon 1 ICDM 2008, Pisa, Italy15-19.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Fast Random Walk with Restart and Its Applications
SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct , 2008, Napa, CA CIKM 2008.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 TANGENT: A Novel, “Surprise-me”, Recommendation Algorithm.
Presented by, Lokesh Chikkakempanna Authoritative Sources in a Hyperlinked environment.
Clustering Spatial Data Using Random Walk David Harel and Yehuda Koren KDD 2001.
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
KDD 2007, San Jose Fast Direction-Aware Proximity for Graph Mining Speaker: Hanghang Tong Joint work w/ Yehuda Koren, Christos Faloutsos.
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Fast Random Walk with Restart and Its Applications Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan ICDM 2006 Dec , HongKong.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
KDD 2007, San Jose Fast Direction-Aware Proximity for Graph Mining Speaker: Hanghang Tong Joint work w/ Yehuda Koren, Christos Faloutsos.
Introduction to Graphs. This Lecture In this part we will study some basic graph theory. Graph is a useful concept to model many problems in computer.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.
Kijung Shin Jinhong Jung Lee Sael U Kang
INFOMGP Student names and numbers Papers’ references Title.
Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University.
Keyword Searching and Browsing in Databases using BANKS Charuta Nakhe, Arvind Hulgeri, Gaurav Bhalotia, Soumen Chakrabarti, S. Sudarshan Presented by Sushanth.
Graph Indexing From managing and mining graph data.
SCS CMU Speaker Hanghang Tong Colibri: Fast Mining of Large Static and Dynamic Graphs Speaking Skill Requirement.
Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Ariel Fuxman, Panayiotis Tsaparas, Kannan Achan, Rakesh Agrawal (2008) - Akanksha Saxena 1.
Outlier Detection for Information Networks Manish Gupta 15 th Jan 2013.
Nanyang Technological University
Finding Dense and Connected Subgraphs in Dual Networks
Mathematical Foundations of AI
School of Computing Clemson University Fall, 2012
A paper on Join Synopses for Approximate Query Answering
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
Artificial Intelligence
Summarizing Entities: A Survey Report
Integrating Meta-Path Selection With User-Guided Object Clustering in Heterogeneous Information Networks Yizhou Sun†, Brandon Norick†, Jiawei Han†, Xifeng.
Finding Story Chains in Newswire Articles
Large Graph Mining: Power Tools and a Practitioner’s guide
An Efficient method to recommend research papers and highly influential authors. VIRAJITHA KARNATAPU.
Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16
Self-tuning in Graph-Based Reference Disambiguation
CS 188: Artificial Intelligence
Graph and Tensor Mining for fun and profit
Bidirectional Query Planning Algorithm
Example: Academic Search
Reductions Complexity ©D.Moshkovitz.
Reductions Complexity ©D.Moshkovitz.
Assembling Genomes BCH339N Systems Biology / Bioinformatics – Spring 2016 Edward Marcotte, Univ of Texas at Austin.
Visual Algebra for Teachers
Proximity in Graphs by Using Random Walks
Presentation transcript:

Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad L L N L Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad 8/13/2007 KDD 2007, San Jose

Input Output Query Graph Matching Subgraph Attributed Data Graph

Terminology: ``Conform’’ First, We say the subgraph H_t conforms the query graph H_q, if we have all desired job titles and connection between them. Matching Subgraph conforms Query Graph

Terminology: ``Interception’’ Intermediate node matching node matching node matching node matching node We allow the in-directed connection by introducing some extra nodes. For example, the connection between 12 and 4 is indirected. We refer this phenomena as interception, and the extra nodes, e.g. node 13 as intermediate node. And all remaining nodes as matching nodes, e.g. node 11 12,4 and 7. Matching Subgraph Query Graph Path 12-13-4 is an Interception

Terminology: ``Instantiate’’ Matching Subgraph Ht Query Graph Hq Whenever we have a matching subgraph H_t, we say H_t instantiates the query graph H_q. and the matching nodes in H_t instantiates the nodes in the query graph. for example, we say node 11 in H_t instantiates the SEC node in the query graph, and so on. Node 11 instantiates SEC node Ht instantiates Hq

Roadmap Introduction How to: Graph X-Ray Experimental Results Problem Definition Motivations How to: Graph X-Ray Experimental Results Conclusion

Motivation: Why Not SQL? Case 1: Exact match does not exist Q: How to find approximate answer? Case 2: Too many exact matches Q: How to rank them?

Motivation: Why Not SQL? (Cont.) Case 3: Exact match might be not the best answer ``Find CEO who has heavy contact with Accountant’’ Q: how to find right? Exact match 1 direct connection Inexact match Many indirect connections

Motivation: Efficiency Why Not Subgraph Isomorphism? Polynomial for fixed # of pattern query Q1: How to scale up linearly? Q2: … and with a small slope?

Wish List G-Ray meets all! Effectiveness Efficiency Both exact match & inexact Match Ranking among multiple results ``Best’’ answer (proximity-based) Efficiency Scale linearly Scale with small scope G-Ray meets all!

Roadmap Introduction How to: Graph X-Ray Experimental Results Problem Definition Motivations How to: Graph X-Ray Experimental Results Conclusion

Preliminary: Center-Piece Subgraph [Tong+] Q Original Graph Black: query nodes CePS is meta opt. in G-Ray!

Preliminary: Augmented Graph Data nodes 1,…13 Attribute nodes a Footnote Aug. Graph is crucial for computation!

G-Ray: quick overview (for loop ) Step 1: SF Step 2: NE Step 3: BR Step 4: NE Step 5: BR Step 6: NE Step 7: BR Step 8: BR SF: Seed-Finder NE: Neighborhood -Expander BR: Bridge

Seed-Finder ( ) Q: How to instantiate SEC node? A: Footnote `11’ is close to some un-known data nodes for `CEO’ `Account.’ and `Manager’

Neighborhood-Expander ( ) Q: How to instantiate CEO node? Step 1  Step 2? A: Footnote: Step 3  Step 4? Step 5  Step 6?

Bridge ( ) ? Q: A: Prim-like Alg. Footnote To maximize Step 6: NE Step 7: BR ? Q: A: Prim-like Alg. To maximize Should block node 11 and 7 Footnote Connection subgraph, or one single path?

Roadmap Introduction How to: Graph X-Ray Experimental Results Problem Definition Motivation How to: Graph X-Ray Experimental Results Conclusion Now, let’s see some experimental results.

Experimental Results Datasets DBLP Node: author (315k) Edge: co-authorship (1,800k) Attribute: conference & year (13k) KDD-2001, SIGMOD… We use DBLP to construct an attributed graph, where the nodes are authors and attribute is conference and year. The edge is constructed from co-authorship relationship.

Effectiveness: star-query Here is a star-query, we want to a star-shape group of co-authors, with one author coming from each of PODS, IAT and ISBMS. We see Dr. Phillips Yu is in the center and the rest matching authors being well known domain experts in each conf. Query Result

Effectiveness: line-query And here is a line query, we want to find authors from 4 different conferences who cooperate in a line fashion. Result

Effectiveness: loop-query And this is a loop query. Result

Efficiency Response Time # of Edges Scale linearly Small slope 3-5 Seconds # of Edges ~2 M edges

Roadmap Introduction How to: Graph X-Ray Experimental Results Problem Definition Motivation How to: Graph X-Ray Experimental Results Conclusion

Conclusion Graph X-Ray (G-Ray) More details in Poster Session Best effort pattern match in large attributed graphs Scale linearly with small slope More details in Poster Session Monday (tonight) board number 8

G-Ray X-Ray www.cs.cmu.edu/~htong Thank you!

Backup-slides

Proximity on Graph a.k.a relevance, closeness Multi-faceted 1 4 3 2 5 6 7 9 10 8 11 12 a.k.a relevance, closeness Multi-faceted Punish long path Edge weight Now, I will introduce some key concepts behind G-Ray. Once we have these key concepts, the alg. itself is quite straight-forward. So, the fist one is the proximity on the graph. How can we measure the proximity, or in other words, the relevance, the closeness, between two nodes on the graph. Without going into the details, I want claim that random walk with restart is a good solution for this problem. suppose How to: ---- random walk with restart

Random walk with restart 1 4 3 2 5 6 7 9 10 8 11 12 0.13 0.10 0.05 0.08 0.04 0.02 0.03 Node 4 Node 1 Node 2 Node 3 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node 12 0.13 0.10 0.22 0.05 0.08 0.04 0.03 0.02 Nearby nodes, higher scores Ranking vector More red, more relevant

How to rank the results Our goodness function Measure the proximity between any two matching nodes if they are required to be connected. (two-way) Multiply them together In G-Ray, we approximately optimize this goodness functions If we have multiple matching subgraphs, we can rank them according to this goodness functions

How to rank the results Goodness = Prox (12, 4) x Prox (4, 12) x matching node matching node matching node matching node Goodness = Prox (12, 4) x Prox (4, 12) x Prox (7, 4) x Prox (4, 7) x Prox (11, 7) x Prox (7, 11) x Prox (12, 11) x Prox (11, 12)