Path Computation in External Memory In this work we focus on undirected, unweighted graphs with small diameter. This fits well for real world graph data.

Slides:



Advertisements
Similar presentations
all-pairs shortest paths in undirected graphs
Advertisements

WSPD Applications.
CSE 390B: Graph Algorithms Based on CSE 373 slides by Jessica Miller, Ruth Anderson 1.
Dynamic Graph Algorithms - I
Efficient access to TIN Regular square grid TIN Efficient access to TIN Let q := (x, y) be a point. We want to estimate an elevation at a point q: 1. should.
Greed is good. (Some of the time)
Approximation, Chance and Networks Lecture Notes BISS 2005, Bertinoro March Alessandro Panconesi University La Sapienza of Rome.
Tutorial 6 of CSCI2110 Bipartite Matching Tutor: Zhou Hong ( 周宏 )
Data Structure and Algorithms (BCS 1223) GRAPH. Introduction of Graph A graph G consists of two things: 1.A set V of elements called nodes(or points or.
Advanced Data Structures
Fractional Cascading CSE What is Fractional Cascading anyway? An efficient strategy for dealing with iterative searches that achieves optimal.
I/O Efficient Algorithms. Problem Data is often too massive to fit in the internal memory The I/O communication between internal memory (fast) and external.
Data Transmission and Base Station Placement for Optimizing Network Lifetime. E. Arkin, V. Polishchuk, A. Efrat, S. Ramasubramanian,V. PolishchukA. EfratS.
Disk Access Model. Using Secondary Storage Effectively In most studies of algorithms, one assumes the “RAM model”: –Data is in main memory, –Access to.
MSSG: A Framework for Massive-Scale Semantic Graphs Timothy D. R. Hartley, Umit Catalyurek, Fusun Ozguner, Andy Yoo, Scott Kohn, Keith Henderson Dept.
CS Lecture 9 Storeing and Querying Large Web Graphs.
I/O-Algorithms Lars Arge Aarhus University February 16, 2006.
Sublinear Algorithms for Approximating Graph Parameters Dana Ron Tel-Aviv University.
1 On Compressing Web Graphs Michael Mitzenmacher, Harvard Micah Adler, Univ. of Massachusetts.
1 Data Structures and Algorithms Graphs I: Representation and Search Gal A. Kaminka Computer Science Department.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual.
I/O-Algorithms Lars Arge Aarhus University February 9, 2006.
I/O-Algorithms Lars Arge Aarhus University February 14, 2008.
CS 206 Introduction to Computer Science II 03 / 25 / 2009 Instructor: Michael Eckmann.
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
Route Planning Vehicle navigation systems, Dijkstra’s algorithm, bidirectional search, transit-node routing.
Approximate Distance Oracles for Geometric Spanner Networks Joachim Gudmundsson TUE, Netherlands Christos Levcopoulos Lund U., Sweden Giri Narasimhan Florida.
External-Memory MST (Arge, Brodal, Toma). Minimum-Spanning Tree Given a weighted, undirected graph G=(V,E), the minimum-spanning tree (MST) problem is.
Fractional Cascading and Its Applications G. S. Lueker. A data structure for orthogonal range queries. In Proc. 19 th annu. IEEE Sympos. Found. Comput.
Querying Big Graphs within Bounded Resources 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.
Computability Reports. More examples. Homework: Optimization. Other follow- ups. Start to plan presentation.
Neighbourhood Sampling for Local Properties on a Graph Stream A. Pavan, Iowa State University Kanat Tangwongsan, IBM Research Srikanta Tirthapura, Iowa.
Problems and MotivationsOur ResultsTechnical Contributions Membership: Maintain a set S in the universe U with |S| ≤ n. Given an x in U, answer whether.
Bin Yao Spring 2014 (Slides were made available by Feifei Li) Advanced Topics in Data Management.
X-Stream: Edge-Centric Graph Processing using Streaming Partitions
Oct 29, 2001CSE 373, Autumn External Storage For large data sets, the computer will have to access the disk. Disk access can take 200,000 times longer.
Selective Block Minimization for Faster Convergence of Limited Memory Large-scale Linear Models Kai-Wei Chang and Dan Roth Experiment Settings Block Minimization.
Representing and Using Graphs
Vladyslav Kolbasin Stable Clustering. Clustering data Clustering is part of exploratory process Standard definition:  Clustering - grouping a set of.
Graphs Data Structures and Algorithms A. G. Malamos Reference Algorithms, 2006, S. Dasgupta, C. H. Papadimitriou, and U. V. Vazirani Introduction to Algorithms,Third.
The LCA Problem Revisited
Fast, precise and dynamic distance queries Yair BartalHebrew U. Lee-Ad GottliebWeizmann → Hebrew U. Liam RodittyBar Ilan Tsvi KopelowitzBar Ilan → Weizmann.
Towards a Billion Routing Lookups per Second in Software  Author: Marko Zec, Luigi, Rizzo Miljenko Mikuc  Publisher: SIGCOMM Computer Communication Review,
Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.
MotivationFundamental ProblemsProblems on Graphs Parallel processors are becoming common place. Each core of a multi-core processor consists of a CPU and.
Constant-Time LCA Retrieval Presentation by Danny Hermelin, String Matching Algorithms Seminar, Haifa University.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Trees, Binary Search Trees, Balanced Trees, Graphs Graph Fundamentals Telerik Algo Academy
Fall 2008 NOEA - Computer Science1 Session 10 Graph Algorithms: Topological Order Minimum Spanning Trees (Prim) Shortest Path (Dijkstra)
CSE 421 Algorithms Richard Anderson Winter 2009 Lecture 5.
Graphs David Kauchak cs302 Spring Admin HW 12 and 13 (and likely 14) You can submit revised solutions to any problem you missed Also submit your.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
Cohesive Subgraph Computation over Large Graphs
CSE 554 Lecture 5: Contouring (faster)
Algorithms for Finding Distance-Edge-Colorings of Graphs
CPS216: Data-intensive Computing Systems
Computing Connected Components on Parallel Computers
Paweł Gawrychowski, Nadav Krasnopolsky, Shay Mozes, Oren Weimann
Genomic Data Clustering on FPGAs for Compression
CS 3343: Analysis of Algorithms
Enumerating Distances Using Spanners of Bounded Degree
Randomized Algorithms CS648
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
Sorting and Searching Tim Purcell NVIDIA.
CSE 373: Data Structures and Algorithms
CSE 417: Algorithms and Computational Complexity
Important Problem Types and Fundamental Data Structures
Indexing, Access and Database System Architecture
Presentation transcript:

Path Computation in External Memory In this work we focus on undirected, unweighted graphs with small diameter. This fits well for real world graph data as social network or web graphs. Determining the distance from one vertex to all other vertices in such a network is possible by a Breath-First Search (BFS). The best known BFS implementation for general graphs has an I/O- complexity of Ω(N/√B) I/Os. For graphs with small diameter exists an implementation with an I/O-complexity of O(d  scan(N)+sort(M)) I/Os. Social networks usually have a small diameter in O(log(N)) For our work we use the idea of a distance oracle published by Ajwani et al. in [1]. For the construction of their oracle the authors compute the BFS- tree of a small subset of the vertices and determine the distance of any pair of vertices u and v by dist(u)+dist(v)-2  dist(LCA) where LCA is the last common ancestor of u and v. We compute a set of BFS-trees with high degree vertices as root vertex. In real world data such vertices are often a centre vertex – an important part of the graph structure [1]. The Distance OraclePreliminary Results and Future Work Our distance oracle shall answer a single query using O(1) I/Os and O(n) batched queries in a row using O(1/B) I/Os. To achieve approximated distances close to the exact distance we determine for u and v their LCA in each tree. This is done by combining their inorder numbers in the BFS tree with the XOR operation [1]. For the construction of the inorder numbers we use a Huffman encoding for binary trees. Therefore some LCA vertices will be “dummy vertices” that have to be memorized due to the transformation into a binary tree. d(x,r) + d(y,r) – 2  d(lca(x,y),r) The result of the distance oracle is stored in two data structures: _IN and _DIST. In _IN for each vertex its distance to the root and its inorder number is stored for each tree. And in _DIST for each we store the inorder numbering and distance for all vertices including the dummy vertices. We use _DIST as a lookup table to determine the distance of the LCA. To achieve O(1/B) I/Os we scan _IN and _DIST (now as a vector in sorted order) in parallel for each tree separately and determine distance of u, v, the LCA and its distance with a few scanning and sorting steps. The best distance over all rounds is kept as the approximated distance. Preprocessing & Dynamic Aspects  For our distance oracle we have to compute a set of BFS trees once  To gain a faster preprocessing we can build a hierarchy instead −Compute a single large BFS tree and compute log(n) multi-BFS[2] tree sets with decreasing size of the trees  We restrict ourselves on graphs with small diameter. Therefore we can use MR-BFS instead of the more expensive MM-BFS.  Social network graphs are frequently updated. Therefore we investigate how to deal with that aspect. −One possibility is to update each BFS tree independently and do not wait until all trees of the set are updated. −For our hierarchical preprocessing dynamic updates can be easily done for the smaller sub trees using a constant fraction of I/Os. Only the few larger trees might take longer. References [1]Deepak Ajwani, W. Sean Kennedy, Alessandra Sala, Iraj Saniee. A Distance Approximation Oracle for Large Real-World Graphs Based on Graph Hyperbolicity. Unpublished manuscript. [2]Deepak Ajwani, Ulrich Meyer, David Veith. I/O-efficient Hierarchical Diameter Approximation. ESA I/O-efficient Approximate Distance Oracle for Real-world Graphs MADALGO – Center for Massive Data Algorithmics, a Center of the Danish National Research Foundation David Veith Goethe University For our experiments we focused on SSDs. As in [1] we use 20 full BFS trees to build our oracle. The results on this poster are based on experiments with a web graph called sk-2005 with 50 million vertices and 1.8 billion edges (law.di.unimi.it/webdata/sk-2005/). The preprocessing result for 20 BFS trees has a size of 35 GB for sk-2005.l We tested on a machine with a 4.1GHz AMD quad core, 6 SSDs and 32 GB main memory (but we restricted ourselves to small main memory usage on this machine). For a single query we were able to achieve a answer time of 6ms in average by using a small block size of a 4 KB. About 400 mostly random I/Os were obtained for 20 BFS trees per query. For the batched queries we can accelerate the average time per query to 16µs for up to 42  |V| queries. Ulrich Meyer Goethe University Deepak Ajwani Bell Labs, Ireland Future work This work is still in progress. We want to investigate the accuracy of our distance oracle for different data sets and plan to reduce the average time per query by some new tricks. xy r