1 On Querying Historical Evolving Graph Sequences Chenghui Ren $, Eric Lo *, Ben Kao $, Xinjie Zhu $, Reynold Cheng $ $ The University of Hong Kong $ {chren,

Slides:



Advertisements
Similar presentations
Good afternoon, every one. Welcome to my oral defense
Advertisements

A lightweight framework for testing database applications Joe Tang Eric Lo Hong Kong Polytechnic University.
Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.
Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA.
The IEEE International Conference on Big Data 2013 Arash Fard M. Usman Nisar Lakshmish Ramaswamy John A. Miller Matthew Saltz Computer Science Department.
A Framework for Clustering Evolving Data Streams Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu Presented by: Di Yang Charudatta Wad.
Di Yang, Elke A. Rundensteiner and Matthew O. Ward Worcester Polytechnic Institute VLDB 2009, Lyon, France 1 A Shared Execution Strategy for Multiple Pattern.
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
Seunghwa Kang David A. Bader Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System.
CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.
Dynamic Pickup and Delivery with Transfers* P. Bouros 1, D. Sacharidis 2, T. Dalamagas 2, T. Sellis 1,2 1 NTUA, 2 IMIS – RC “Athena” * To appear in SSTD’11.
Constructing Popular Routes from Uncertain Trajectories Authors of Paper: Ling-Yin Wei (National Chiao Tung University, Hsinchu) Yu Zheng (Microsoft Research.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
CS Lecture 9 Storeing and Querying Large Web Graphs.
Linear-Time Reconstruction of Zero-Recombinant Mendelian Inheritance on Pedigrees without Mating Loops Authors: Lan Liu, Tao Jiang Univ. California, Riverside.
Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions.
Online Data Gathering for Maximizing Network Lifetime in Sensor Networks IEEE transactions on Mobile Computing Weifa Liang, YuZhen Liu.
Trip Planning Queries F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.-H. Teng Boston University.
Survey on Evolving Graphs Research Speaker: Chenghui Ren Supervisors: Prof. Ben Kao, Prof. David Cheung 1.
TEDI: Efficient Shortest Path Query Answering on Graphs Author: Fang Wei SIGMOD 2010 Presentation: Dr. Greg Speegle.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
X-Stream: Edge-Centric Graph Processing using Streaming Partitions
Optimizing Plurality for Human Intelligence Tasks Luyi Mo University of Hong Kong Joint work with Reynold Cheng, Ben Kao, Xuan Yang, Chenghui Ren, Siyu.
Rate-based Data Propagation in Sensor Networks Gurdip Singh and Sandeep Pujar Computing and Information Sciences Sanjoy Das Electrical and Computer Engineering.
Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Speaker: Ruirui Li 1 The University of Hong Kong.
Low-Power Gated Bus Synthesis for 3D IC via Rectilinear Shortest-Path Steiner Graph Chung-Kuan Cheng, Peng Du, Andrew B. Kahng, and Shih-Hung Weng UC San.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
Protecting Sensitive Labels in Social Network Data Anonymization.
RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014.
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
Mining High Utility Itemset in Big Data
HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli.
Parallel dynamic batch loading in the M-tree Jakub Lokoč Department of Software Engineering Charles University in Prague, FMP.
Clustering Moving Objects in Spatial Networks Jidong Chen, Caifeng Lai, Xiaofeng Meng, Renmin University of China Jianliang Xu, and Haibo Hu Hong Kong.
1/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science
Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.
Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.
Easiest-to-Reach Neighbor Search Fatimah Aldubaisi.
Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering.
MINING COLOSSAL FREQUENT PATTERNS BY CORE PATTERN FUSION FEIDA ZHU, XIFENG YAN, JIAWEI HAN, PHILIP S. YU, HONG CHENG ICDE07 Advisor: Koh JiaLing Speaker:
An Efficient Linear Time Triple Patterning Solver Haitong Tian Hongbo Zhang Zigang Xiao Martin D.F. Wong ASP-DAC’15.
Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.
Maze Routing Algorithms with Exact Matching Constraints for Analog and Mixed Signal Designs M. M. Ozdal and R. F. Hentschke Intel Corporation ICCAD 2012.
Di Yang, Zhengyu Guo, Elke A. Rundensteiner and Matthew O. Ward Worcester Polytechnic Institute EDBT 2010, Submitted 1 A Unified Framework Supporting Interactive.
Manuel Gomez Rodriguez Bernhard Schölkopf I NFLUENCE M AXIMIZATION IN C ONTINUOUS T IME D IFFUSION N ETWORKS , ICML ‘12.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
ApproxHadoop Bringing Approximations to MapReduce Frameworks
Graph Data Management Lab, School of Computer Science Branch Code: A Labeling Scheme for Efficient Query Answering on Tree
Private Release of Graph Statistics using Ladder Functions J.ZHANG, G.CORMODE, M.PROCOPIUC, D.SRIVASTAVA, X.XIAO.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Graph Indexing From managing and mining graph data.
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
1 Discovering Web Communities in the Blogspace Ying Zhou, Joseph Davis (HICSS 2007)
Da Yan (HKUST) James Cheng (CUHK) Wilfred Ng (HKUST) Steven Liu (HKUST)
1 Double-Patterning Aware DSA Template Guided Cut Redistribution for Advanced 1-D Gridded Designs Zhi-Wen Lin and Yao-Wen Chang National Taiwan University.
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
Pagerank and Betweenness centrality on Big Taxi Trajectory Graph
Distributed Network Traffic Feature Extraction for a Real-time IDS
A paper on Join Synopses for Approximate Query Answering
Spatial Online Sampling and Aggregation
Fast Nearest Neighbor Search on Road Networks
Zhenjiang Lin, Michael R. Lyu and Irwin King
Graph Colouring as a Challenge Problem for Dynamic Graph Processing on Distributed Systems Scott Sallinen, Keita Iwabuchi, Suraj Poudel, Maya Gokhale,
Efficient Subgraph Similarity All-Matching
Big Data Analytics: Exploring Graphs with Optimized SQL Queries
Fraction-Score: A New Support Measure for Co-location Pattern Mining
Donghui Zhang, Tian Xia Northeastern University
Distance-Constraint Reachability Computation in Uncertain Graphs
Accelerating Regular Path Queries using FPGA
Presentation transcript:

1 On Querying Historical Evolving Graph Sequences Chenghui Ren $, Eric Lo *, Ben Kao $, Xinjie Zhu $, Reynold Cheng $ $ The University of Hong Kong $ {chren, kao, xjzhu, * Hong Kong Polytechnic University *

2 Motivation  Graphs are widely used to model the world  The world is ever changing/Graphs evolve with time … …

3 Motivation  How does the importance of a vertex change?  E.g. closeness centrality Evolving Graph Sequence (EGS) …

4 Motivation  How does the shortest path between a and e change? … Evolving Graph Sequence (EGS) …

5 Key moments: Their distance changed How did they get closer? The shortest path distances between two particular Facebook users over one year period (365 snapshots) Example Study on Facebook EGS Shortest Path Query

6 Problem Definition Evolving Graph Sequence (EGS) Problem: Given a query (e.g., shortest path between a and e), find the solution for each snapshot in the EGS: … …

7 Issues of Querying EGS We are interested in the EGSs such that the snapshot graphs are: a)Large b)Numerous c)Gradually evolving We need:  Efficient algorithm to process queries on EGSs  Effective storage models to store EGSs Example: Facebook EGS a) 60,000 vertices, 900,000 edges b) 365 snapshots c) 99%+ edges in common

8 Outline  Introduction  Solution framework  Storage models  Experimental evaluation  Conclusions

9 Baseline Algorithm  Baseline algorithm: run a traditional algorithm directly on each snapshot in an EGS  E.g., breadth-first-search for shortest path query  Not efficient  Graphs in an EGS are usually large and numerous  Our goal: Exploit graph redundancies in an EGS to make query processing faster

10 Find-Verify-Fix (FVF) Framework An EGS

11 Find-Verify-Fix (FVF) Framework √ √ √ √

12 Preprocessing: Construct Representative Graphs

13 Preprocessing: Cluster Analysis Segmentation clustering algorithm:  A cluster consists of successive snapshots  A cluster satisfies: EGS

14 Query Processing Phase  Type of queries we use FVF to solve:  Shortest path  Closeness centrality  Graph diameter

15 Shortest Path Query Processing FIND Representative Solutions

16 Shortest Path Query Processing VERIFY Representative Solutions Bounding property:

17 Shortest Path Query Processing VERIFY Representative Solutions √ × × ×

18 Shortest Path Query Processing VERIFY Representative Solutions √√ ×

19 Shortest Path Query Processing FIX Representative Solutions

20 Outline  Introduction  Solution framework  Storage models  Experimental evaluation  Conclusions

21 EGS Storage Models  Wikipedia dataset (365 snapshots, >1M articles, >20M hyperlinks) Space cost: more than 365X20M = 7.3billion hyperlinks!!! Aims of storage models: 1) Compress data to fit in memory 2) Support the application of the FVF algorithm framework Effectiveness of our storage models: 50M hyperlinks for the baseline algorithm, 100M hyperlinks for the FVF algorithm, compared to 7.3 billion hyperlinks without compression!!!

22 Experimental Evaluation  Datasets  Real datasets  Facebook-friendship  YouTube  Wikipedia  Synthetic datasets  FVF VS Baseline  Baseline: Execute a graph algorithm on each snapshot independently  Settings  C++, Linux, CPU: 2.83GHz Dual Core, Memory: 4G

23 Experimental Evaluation Average graph edit similarity (ges) between successive snapshots  Dataset statistics

24 Experimental Evaluation- Shortest Path Queries 500 queries

25 Experimental Evaluation- Shortest Path Queries FBFriend dataset  A cluster satisfies: 1.Fewer graphs in a cluster 2.More clusters Find Time VF-Time Residual- SPA Time

26 Experimental Evaluation- Shortest Path Queries FBFriend dataset 1.Fewer graphs in a cluster 2.More clusters

27 Experimental Evaluation- Shortest Path Queries FBFriend dataset 1.Fewer graphs in a cluster 2.More clusters

28 Experimental Evaluation- Shortest Path Queries FBFriend dataset

29 Experimental Evaluation- Closeness Centrality Queries FBFriend dataset

30 Conclusions  We proposed the evolving graph sequences to model world evolution  We demonstrated that interesting information can be obtained by posing queries on the various EGSs  We introduced the find-verify-fix (FVF) framework to query EGSs  We discussed how to store EGSs  Experiments showed that our FVF framework is efficient and interesting information can be unveiled

31 Thank you! Chenghui Ren $, Eric Lo *, Ben Kao $, Xinjie Zhu $, Reynold Cheng $ $ The University of Hong Kong $ {chren, kao, xjzhu, * The Hong Kong Polytechnic University *

32 Related Work  Distance-based queries on a single large graph [F. Wei 2010, Y.Xiao 2009]  Our work focuses on processing queries on an evolving graph sequence  Graph database [D. Shasha 2002, X.Yan 2005]  Different: Their work usually only support graph queries (e.g. sub/super-graph query)  Similar: Both target to minimize the number of expensive graph operations  Time-dependent graph [B. Ding 2008]  Our work is different in two ways:  Node set is not fixed  Find answers on all snapshots