Distance-Constraint Reachability Computation in Uncertain Graphs

Slides:



Advertisements
Similar presentations
A*-tree: A Structure for Storage and Modeling of Uncertain Multidimensional Arrays Presented by: ZHANG Xiaofei March 2, 2011.
Advertisements

Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.
Every edge is in a red ellipse (the bags). The bags are connected in a tree. The bags an original vertex is part of are connected.
Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA.
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
A KLT-Based Approach for Occlusion Handling in Human Tracking Chenyuan Zhang, Jiu Xu, Axel Beaugendre and Satoshi Goto 2012 Picture Coding Symposium.
Graph Algorithms: Minimum Spanning Tree We are given a weighted, undirected graph G = (V, E), with weight function w:
Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions.
Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.
Novel Self-Configurable Positioning Technique for Multihop Wireless Networks Authors : Hongyi Wu Chong Wang Nian-Feng Tzeng IEEE/ACM TRANSACTIONS ON NETWORKING,
Efficiently Answering Reachability Queries on Large Directed Graphs Ruoming Jin Kent State University Joint work with Yang Xiang (KSU), Ning Ruan (KSU),
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
1 Wavelet synopses with Error Guarantees Minos Garofalakis Phillip B. Gibbons Information Sciences Research Center Bell Labs, Lucent Technologies Murray.
Abstract Shortest distance query is a fundamental operation in large-scale networks. Many existing methods in the literature take a landmark embedding.
Trip Planning Queries F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.-H. Teng Boston University.
Database k-Nearest Neighbors in Uncertain Graphs Lin Yincheng VLDB10.
Minimum Spanning Tree Algorithms. What is A Spanning Tree? u v b a c d e f Given a connected, undirected graph G=(V,E), a spanning tree of that graph.
Mehdi Kargar Aijun An York University, Toronto, Canada Discovering Top-k Teams of Experts with/without a Leader in Social Networks.
1 On Querying Historical Evolving Graph Sequences Chenghui Ren $, Eric Lo *, Ben Kao $, Xinjie Zhu $, Reynold Cheng $ $ The University of Hong Kong $ {chren,
CS774. Markov Random Field : Theory and Application Lecture 13 Kyomin Jung KAIST Oct
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
Module 5 – Networks and Decision Mathematics Chapter 23 – Undirected Graphs.
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
Optimal resource assignment to maximize multistate network reliability for a computer network Yi-Kuei Lin, Cheng-Ta Yeh Advisor : Professor Frank Y. S.
1/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science
Path-Hop: efficiently indexing large graphs for reachability queries Tylor Cai and C.K. Poon CityU of Hong Kong.
Extended Finite-State Machine Inference with Parallel Ant Colony Based Algorithms PPSN’14 September 13, 2014 Daniil Chivilikhin PhD student ITMO.
An Efficient Linear Time Triple Patterning Solver Haitong Tian Hongbo Zhang Zigang Xiao Martin D.F. Wong ASP-DAC’15.
Bayesian networks and their application in circuit reliability estimation Erin Taylor.
COSC 5341 High-Performance Computer Networks Presentation for By Linghai Zhang ID:
Minas Gjoka, Emily Smith, Carter T. Butts
Outline Standard 2-way minimum graph cut problem. Applications to problems in computer vision Classical algorithms from the theory literature A new algorithm.
1 Travel Times from Mobile Sensors Ram Rajagopal, Raffi Sevlian and Pravin Varaiya University of California, Berkeley Singapore Road Traffic Control TexPoint.
Bo Zong, Yinghui Wu, Ambuj K. Singh, Xifeng Yan 1 Inferring the Underlying Structure of Information Cascades
Subgraph Search Over Uncertain Graphs Erşan Demircioğlu.
Lecture 20. Graphs and network models 1. Recap Binary search tree is a special binary tree which is designed to make the search of elements or keys in.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
An Algorithm for Enumerating SCCs in Web Graph Jie Han, Yong Yu, Guowei Liu, and Guirong Xue Speaker : Seo, Jong Hwa.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Optimal Acceleration and Braking Sequences for Vehicles in the Presence of Moving Obstacles Jeff Johnson, Kris Hauser School of Informatics and Computing.
Finding Dense and Connected Subgraphs in Dual Networks
Shortest Path Problems
Pagerank and Betweenness centrality on Big Taxi Trajectory Graph
Proof technique (pigeonhole principle)
A paper on Join Synopses for Approximate Query Answering
CSE 2331/5331 Topic 9: Basic Graph Alg.
Estimation of the Number of Min-Cut Sets in a Network
Probabilistic Data Management
Community detection in graphs
Query-Friendly Compression of Graph Streams
Communication and Memory Efficient Parallel Decision Tree Construction
Introduction to Graph Theory Euler and Hamilton Paths and Circuits
Randomized Algorithms CS648
Finding Fastest Paths on A Road Network with Speed Patterns
Robustness of wireless ad hoc network topologies
Fast Nearest Neighbor Search on Road Networks
Robustness of wireless ad hoc network topologies
Introduction Wireless Ad-Hoc Network
Chapter 11 Graphs.
Efficient Subgraph Similarity All-Matching
SEG5010 Presentation Zhou Lanjun.
Jongik Kim1, Dong-Hoon Choi2, and Chen Li3
Compact routing schemes with improved stretch
Reducing Forks in the Blockchain via Probabilistic Verification
Network Models Michael Goodrich Some slides adapted from:
Approximate Graph Mining with Label Costs
Graphs G = (V,E) V is the vertex set.
PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs.
Accelerating Regular Path Queries using FPGA
Presentation transcript:

Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA

Uncertainty is ubiquitous! Why Uncertain Graphs? Increasing importance of graph/network data Social Network, Biological Network, Traffic/Transportation Network, Peer-to-Peer Network Probabilistic perspective gets more and more attention recently. Uncertainty is ubiquitous! Protein-Protein Interaction Networks Social Networks Probabilistic Trust/Influence Model False Positive > 45%

Uncertain Graph Model Possible worlds (2#Edge) Edge Independence Existence Probability Possible worlds (2#Edge) G1: G2: Weight of G2: Pr(G2) = 0.5 0.7 0.2 0.6 (1-0.5) * * * * (1-0.4) (1-0.9) (1-0.1) (1-0.3) = 0.0007938 * * * *

Distance-Constraint Reachability (DCR) Problem Given distance constraint d and two vertices s and t, Target Source What is the probability that s can reach t within distance d? A generalization of the two-terminal network reliability problem, which has no distance constraint.

Important Applications Peer-to-Peer (P2P) Networks Communication happens only when node distance is limited. Social Networks Trust/Influence can only be propagated only through small number of hops. Traffic Networks Travel distance (travel time) query What is the probability that we can reach the airport within one hour?

Example: Exact Computation d = 2, ? First Step: Enumerate all possible worlds (29), Pr(G1) Pr(G2) Pr(G3) Pr(G4) Second Step: Check for distance-constraint connectivity, … + Pr(G1) * 0 + Pr(G2) * 1 + Pr(G3) * 0 + Pr(G4) * 1 + … =

Approximating Distance-Constraint Reachability Computation Hardness Two-terminal network reliability is #P-Complete. DCR is a generalization. Our goal is to approximate through Sampling Unbiased estimator Minimal variance Low computational cost

Start from the most intuitive estimators, right?

Direct Sampling Approach Sampling Process Sample n graphs Sample each graph according to edge probability

Direct Sampling Approach (Cont’) Estimator Unbiased Variance = 1, s reach t within d; = 0, otherwise. Indicator function

Path-Based Approach Generate Path Set Enumerate all paths from s to t with length ≤ d Enumeration methods E.g., DFS

Path-Based Approach (Cont’) Path set Exactly computed by Inclusion-Exclusion principle Approximated by Monte-Carlo Algorithm by R. M. Karp and M. G. Luby ( ) Unbiased Variance

Can we do better?

Divide-and-Conquer Methodology Example +(s,a) -(s,a) +(a,t) -(a,t) +(s,b) -(s,b) … … … … … …

Divide and Conquer (Cont’) Summarize: # of leaf nodes is smaller than 2|E| . Each possible world exists only in one leaf node. Reachability is the sum of the weights of blue nodes. Leaf nodes form a nice sample space. all possible worlds Graphs having e1 Graphs not Having e1 s can reach t. s can not reach t.

How do we sample? Unequal probability sampling Start from here Pri: Sample Unit Weight; Sum of possible worlds’ probabilities in the node. qi: sampling probability, determined by properties of coins along the way. Unequal probability sampling Hansen-Hurwitz (HH) estimator Horvitz-Thomson (HT) estimator Sample Unit

Hansen-Hurwitz (HH) Estimator sample size = 1, blue node = 0, red node Estimator Unbiased Variance Weight Sampling probability To minimize the variance above, we have :Pri = qi Pri = p(e1)*p(e2)*(1-p(e3))*… Pri: the leaf node weight qi: the sampling probability P(e1) 1-P(e1) p(e1) : 1 – p(e1) P(e2) 1-P(e2) 1-P(e4) P(e4) 1-P(e3) p(e2) : 1 – p(e2) P(e3) p(e3) : 1 – p(e3)

Horvitz-Thomson (HT) Estimator # of Unique sample units Estimator Unbiased Variance To minimize vairance, we find Pri = qi Smaller variance than HH estimator

Can we further reduce the variance and computational cost?

Recursive Estimator Unbiased Variance: n1 + n2 = n Sample the entire space n times Sample the sub-space n1 times Sample the sub-space n2 times We can not minimize the variance without knowing τ1 and τ2. Then what can we do?

Sample Allocation We guess: What if We find: Variance reduced! n1 = n*p(e) n2 = n*(1-p(e))? We find: Variance reduced! HH Estimator: HT Estimator:

Sample Allocation (Cont’) Sampling Time Reduced!! Sample size = n Directly allocate samples n1=n*p(e1) n2=n*(1-p(e1)) n3=n1*p(e2) n4=n1*(1-p(e2)) Toss coin when sample size is small

Experimental Setup Experiment setting Goal: System Specification Relative Error Variance Computational Time System Specification 2.0GHz Dual Core AMD Opteron CPU 4.0GB RAM Linux

Experimental Results Synthetic datasets Erdös-Rényi random graphs Vertex#: 5000, edge density: 10, Sample size: 1000 Categorized by extracted-subgraph size (#edge) For each category, 1000 queries

Experimental Results Real datasets DBLP: 226,000 vertices, 1,400,000 edges Yeast PPIN: 5499 vertices, 63796 edges Fly PPIN: 7518 vertices, 51660 edges Extracted subgraphs size: 20 ~ 50 edges

Conclusions We first propose a novel s-t distance-constraint reachability problem in uncertain graphs. One efficient exact computation algorithm is developed based on a divide-and-conquer scheme. Compared with two classic reachability estimators, two significant unequal probability sampling estimators Hansen-Hurwitz (HH) estimator and Horvitz-Thomson (HT) estimator. Based on the enumeration tree framework, two recursive estimators Recursive HH, and Recursive HT are constructed to reduce estimation variance and time. Experiments demonstrate the accuracy and efficiency of our estimators.

Thank you ! Questions?