Arijit Khan Nanyang Technological University (NTU), Singapore

Slides:

Advertisements

Similar presentations

Google News Personalization: Scalable Online Collaborative Filtering

Advertisements

Efficient Event-based Resource Discovery Wei Yan*, Songlin Hu*, Vinod Muthusamy +, Hans-Arno Jacobsen +, Li Zha* * Chinese Academy of Sciences, Beijing.

Scalable Content-Addressable Network Lintao Liu

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.

1 Accessing nearby copies of replicated objects Greg Plaxton, Rajmohan Rajaraman, Andrea Richa SPAA 1997.

Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,

Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.

1 One Torus to Rule Them All: Multi-dimensional Queries in P2P Systems Prasanna Ganesan Beverly Yang Hector Garcia-Molina Stanford University.

Storage Management and Caching in PAST, a large-scale, persistent peer- to-peer storage utility Authors: Antony Rowstorn (Microsoft Research) Peter Druschel.

Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.

N-Tier Architecture.

Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.

Network Aware Resource Allocation in Distributed Clouds.

Panagiotis Antonopoulos Microsoft Corp Ioannis Konstantinou National Technical University of Athens Dimitrios Tsoumakos.

Multiprossesors Systems.. What are Distributed Databases ? “ A Logically interrelated collection of shared data ( and a description of this data) physically.

Lecture 12 Distributed Hash Tables CPE 401/601 Computer Network Systems slides are modified from Jennifer Rexford.

Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.

1 Adaptive Parallelism for Web Search Myeongjae Jeon Rice University In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L. Cox (Rice),

Scalable and Topology-Aware Load Balancers in Charm++ Amit Sharma Parallel Programming Lab, UIUC.

AQWA Adaptive Query-Workload-Aware Partitioning of Big Spatial Data Dimosthenis Stefanidis Stelios Nikolaou.

Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,

Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.

Gorilla: A Fast, Scalable, In-Memory Time Series Database

A Study of Data Partitioning on OpenCL-based FPGAs Zeke Wang (NTU Singapore), Bingsheng He (NTU Singapore), Wei Zhang (HKUST) 1.

William Stallings Data and Computer Communications

Yiting Xia, T. S. Eugene Ng Rice University

Optimizing Distributed Actor Systems for Dynamic Interactive Services

Impact of Interference on Multi-hop Wireless Network Performance

Cohesive Subgraph Computation over Large Graphs

Nanyang Technological University

Memory Hierarchy Ideal memory is fast, large, and inexpensive

Overview Parallel Processing Pipelining

Advanced Topics in Concurrency and Reactive Programming: Case Study – Google Cluster Majeed Kassis.

CSCI5570 Large Scale Data Processing Systems

Hydra: Leveraging Functional Slicing for Efficient Distributed SDN Controllers Yiyang Chang, Ashkan Rezaei, Balajee Vamanan, Jahangir Hasan, Sanjay Rao.

N-Tier Architecture.

Distributed Network Traffic Feature Extraction for a Real-time IDS

Pastry Scalable, decentralized object locations and routing for large p2p systems.

Parallel Databases.

CSE-291 Cloud Computing, Fall 2016 Kesden

Parallel Programming By J. H. Wang May 2, 2017.

PREGEL Data Management in the Cloud

Parallel Density-based Hybrid Clustering

Parallel Data Laboratory, Carnegie Mellon University

Sub-millisecond Stateful Stream Querying over

CHAPTER 3 Architectures for Distributed Systems

Augmented Sketch: Faster and More Accurate Stream Processing

PA an Coordinated Memory Caching for Parallel Jobs

Load Balancing Memcached Traffic Using SDN

Database Performance Tuning and Query Optimization

Plethora: Infrastructure and System Design

Accessing nearby copies of replicated objects

A Survey on Distributed File Systems

Chapter 16: Distributed System Structures

Query-Friendly Compression of Graph Streams

Towards Effective Partition Management for Large Graphs

MapReduce Simplied Data Processing on Large Clusters

HashKV: Enabling Efficient Updates in KV Storage via Hashing

Be Fast, Cheap and in Control

Mayank Bhatt, Jayasi Mehar

Xiaoyang Zhang1, Yuchong Hu1, Patrick P. C. Lee2, Pan Zhou1

TECHNICAL SEMINAR PRESENTATION

Pei Fan*, Ji Wang, Zibin Zheng, Michael R. Lyu

EECS 498 Introduction to Distributed Systems Fall 2017

Graph Indexing for Shortest-Path Finding over Dynamic Sub-Graphs

Chapter 11 Database Performance Tuning and Query Optimization

Database System Architectures

Caching 50.5* + Apache Kafka

Fast Accesses to Big Data in Memory and Storage Systems

Virtual Memory 1 1.

Presentation transcript:

Arijit Khan Nanyang Technological University (NTU), Singapore On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage Arijit Khan Nanyang Technological University (NTU), Singapore Gustavo Segovia ETH Zurich, Switzerland Donald Kossmann Microsoft Research, Redmond, USA

100M Ratings, 480K Users, 17K Movies Big Graphs Facebook: > 800 million active users Google: > 1 trillion indexed pages Web Graph Social Network 31 billion RDF triples in 2011 31 billion RDF triples in 2011 De Bruijn: 4k nodes (k = 20, … , 40) 100M Ratings, 480K Users, 17K Movies Information Network Biological Network Graphs in Machine Learning On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 1/ 32

Background: Distributed Graph Querying Systems First, partition the graph, and then place each partition on a separate server, where query answering over that partition takes place. State-of-the-art distributed graph querying systems (e.g., SEDGE [SIGMOD’12], Trinity [SIGMOD’13], Horton [PVLDB’13]) On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 2/ 32

Background: Distributed Graph Querying Systems Disadvantages Fixed Routing (less flexible) Balanced Graph Partitioning and Re-Partitioning State-of-the-art distributed graph querying systems On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 3/ 32

Background: Distributed Graph Querying Systems Disadvantages Fixed Routing (less flexible) The server which contains the query node can only handle that request  the router maintains a fixed routing table (or, a fixed routing strategy, e.g., modulo hashing). Less flexible with respect to query routing and fault tolerance, e.g., adding more machines will require updating the data partition and/or the routing table. State-of-the-art distributed graph querying systems Balanced Graph Partitioning and Re-Partitioning On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 4/ 32

Background: Distributed Graph Querying Systems Disadvantages Fixed Routing (less flexible) Balanced Graph Partitioning and Re-Partitioning (1) workload balancing to maximize parallelism, (2) locality of data access to minimize network communication  NP-hard, difficult in power-law graphs. later updates to graph structure or variations in query workloads  graph re-partitioning/ replication  online monitoring of workload changes, re-partitioning of the graph topology, and migration of graph data across servers are expensive. State-of-the-art distributed graph querying systems On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 5/ 32

Roadmap Distributed graph querying and graph partitioning Decoupled graph querying system Related work Smart graph query routing Experimental results Conclusions On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 6/ 32

Decoupled Graph Querying System we decouple query processing and graph storage into two separate tiers. This decoupling happens at a logical level. Decoupled architecture for graph querying On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 7/ 32

Decoupled Graph Querying System Benefits Flexible routing Less reliant on good partitioning across storage servers [Due to our smart query routing strategy – will be discussed soon!] Decoupled architecture for graph querying On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 8/ 32

Decoupled Graph Querying System Benefits Flexible routing A query processor no longer assigned a fixed part of the graph  equally capable of handling any request  facilitating load balancing and fault tolerance. The query router can send a request to any of the query processors  more flexible query routing, e.g., more query processors can be added (or, a query processor that is down can be replaced) without affecting the routing strategy. Decoupled architecture for graph querying On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 9/ 32

Decoupled Graph Querying System Benefits Flexible routing Each tier can be scaled-up independently. A certain workload is processing intensive  allocate more servers to the processing tier. Graph size increases over time  add more servers in the storage tier. Decoupled architecture, being generic, can be employed in many existing graph querying systems. Decoupled architecture for graph querying On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 10/ 32

Roadmap Distributed graph querying and graph partitioning Decoupled graph querying system Related work Smart graph query routing Experimental results Conclusions On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 11/ 32

Related Work: Decoupling Storage and Query Processors Facebook’s Memcached [NSDI’13] Google’s F1 [PVLDB’13] ScaleDB [http://scaledb.com/pdfs/TechnicalOverview.pdf] Loesing et. al. (On the Design and Scalability of Distributed Shared-Data Databases) [SIGMOD’15] Binnig et. al. (The End of Slow Networks: It’s Time for a Redesign) [PVLDB’16] Shalita et. al. (Social Hash: An Assignment Framework for Optimizing Distributed Systems Operations on Social Networks) [NSDI’16] On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 12/ 32

Decoupled Graph Querying System Disadvantages Query processors may need to communicate with the storage tier via the network  additional penalty to the response time for answering a query. May cause high contention rates on either the network, storage tier, or both. Decoupled architecture for graph querying On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 13/ 32

Our Contribution: Smart Query Routing We design a smart query routing logic to utilize the cache of query processors over such decoupled architecture. More cache hits  reduce communication among query processors and storage servers. More cache hits  less reliant on good partitioning across storage servers. Decoupled architecture for graph querying On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 14/ 32

Roadmap Distributed graph querying and graph partitioning Decoupled graph querying system Related work Smart graph query routing Experimental results Conclusions On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 15/ 32

h-Hop Traversal Queries Online, h-hop queries: explore a small region of the entire graph, and require fast response time. Start with a query node, and traverse its neighboring nodes up to a certain number of hops (i.e., h = 2, 3). Examples h-hop neighbor aggregation h-step random walk with restart h-hop reachability More complex queries, e.g., node labeling and classification, expert finding, ranking, discovering functional modules, complexes, and pathways On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 16/ 32

Objectives for Smart Query Routing Leverage each processor’s cached data Balance workload even if skewed or contains hotspot Make fast routing decisions [a small constant time, or << O(n) ] Have low storage overhead in the router [a small fraction of the input graph size] Decoupled architecture for graph querying On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 17/ 32

Challenges in Smart Query Routing Smart Routing Objectives Objectives are conflicting For maximum cache locality, router can send all queries to the same processor (assuming no cache eviction)  imbalanced workload in processors  lower throughput. router could inspect the cache of each processor before making a good routing decision  network delay. Hence, router must infer what is likely to be in each processor’s cache. Leverage each processor’s cached data Balance workload even if skewed or contains hotspot Make fast routing decisions Have low storage overhead in the router On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 18/ 32

Challenges in Smart Query Routing Smart Routing Objectives are conflicting! Topology-Aware Locality successive queries on nearby nodes must be sent to the same processor. It is likely that h-hop neighborhoods of these nodes significantly overlap. How the router knows about nearby nodes without storing the entire graph topology? - use landmark, graph embedding 2-hop neighborhoods of u and v overlap significantly On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 19/ 32

Challenges in Smart Query Routing Smart Routing Objectives are conflicting! Query Stealing Always Routing queries to processors that have the most useful cache data  workload imbalance if skew/ query hotspot  lower throughput. We perform query stealing at router  Whenever a processor is idle and is ready to handle a new query, if it does not have any other requests assigned to it, the router may “steal” a request and send to it which was intended for another processor. Query stilling by maintaining topology-aware locality (as much as possible). On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 20/ 32

Smart Routing-1: Landmark If two nodes are close to a given landmark, they are likely to be close themselves. On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 21/ 32

Smart Routing-1: Landmark Pre-processing Select a small set of L nodes as landmarks. Compute distance of every node to landmarks. Assign landmarks to query processors: Every processor is assigned a “pivot” landmark with the intent that pivot landmarks are as far from each other as possible. Each remaining landmark is assigned to the processor which contains its closest pivot landmark. If two nodes are close to a given landmark, they are likely to be close themselves. The distance of a node u to a processor p is defined as the minimum distance of u to any landmark that is assigned to processor p. This distance information is stored in the router, which requires O(nP) space and O(nL) time to compute. On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 22/ 32

Smart Routing-1: Landmark Online Routing a query on node u  the router verifies the pre-computed distance d(u, p) for every processor p  selects the one with the smallest d(u, p) value. Routing decision time: O(p) If two nodes are close to a given landmark, they are likely to be close themselves. Load-balancing via Query-stealing Route to smallest load-balanced distance. Nearby nodes are routed in similar way, maintaining topology-aware locality. On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 23/ 32

Smart Routing-2: Embed Pre-processing Embed a graph in a lower D-dimensional Euclidean plane. The hop-count distance between graph nodes are approximately preserved via their Euclidean distance. Storage = O(nD), time = O(|L|2D + n|L|D) Graph embedding in 2D Euclidean plane A benefit of embed routing is that the pre-processing is independent of the system topology, allowing more processors to be easily added at a later time. On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 24/ 32

Smart Routing-2: Embed Online Routing Exponential moving average to compute the mean of the processor’s cache contents. Router finds the distance between a query node u and a processor p, denoted as d(u, p), and defined as the distance of the query node’s co-ordinates to the historical mean of the processor’s cache contents. Route query on u to processor p with minimum d(u, p). Routing decision time: O(PD) Graph embedding in 2D Euclidean plane On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 25/ 32

Smart Routing-2: Embed Online Routing Exponential moving average to compute the mean of the processor’s cache contents. Router finds the distance between a query node u and a processor p, denoted as d(u, p), and defined as the distance of the query node’s co-ordinates to the historical mean of the processor’s cache contents. Route query on u to processor p with minimum d(u, p). Routing decision time: O(PD) Graph embedding in 2D Euclidean plane Load-balancing via Query-stealing On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 25/ 32

Roadmap Distributed graph querying and graph partitioning Decoupled graph querying system Related work Smart graph query routing Experimental results Conclusions On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 26/ 32

Experimental Setup 27/ 32 Graph Datasets Cluster Configuration 12 servers each having 2.4 GHz Intel Xeon processors, 0 – 4GB cache. interconnected by 40 Gbps Infiniband, and also by 10 Gbps Ethernet. Use a single core of each server with the following configuration: 1 server as router, 7 servers in the processing tier, 4 servers in the storage tier; and communication over Infiniband with remote direct memory access (RDMA). RAMCloud as storage tier. Graph is stored as adjacency list – every node-id is key, and the corresponding value is an array of its 1-hop neighbors. The graph is partitioned across storage servers via RAMCloud’s default and inexpensive hash partitioning scheme, MurmurHash3 over graph nodes. On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 27/ 32

List of Experiments Comparison with distributed graph systems (SEDGE [SIGMOD’12] with Giraph [SIGMOD’10], GraphLab [VLDB’12]) that use smart graph partitioning and re-portioning - Our method achieves up to an order of magnitude higher throughput even with inexpensive hash partitioning of the graph! Scalability with number of processors and storage servers Impact of cache size Impact of graph updates Sensitivity w.r.t. different parameters: query locality and hotspot, h-hop queries, load factor, smoothing parameter, embedding dimensionality, landmark numbers, minimum distance between a pair of landmarks Performance Metrics Query efficiency, Query throughput, Cache hit rates Baseline Routing Methods Next ready, No cache, Modular hash with query stealing On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 28/ 32

Performance with Varying Number of Query Processors Embed routing is able to sustain almost same cache hit rate with many query processors. Hence, its throughput scales linearly with query processors. On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 29/ 32

Impact of Cache Sizes Both smart routings – Embed and Landmark – utilize the cache well; and for the same amount of cache, they achieve lower response time compared to baseline routings. On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 30/ 32

Roadmap Distributed graph querying and graph partitioning Decoupled graph querying system Related work Smart graph query routing Experimental results Conclusions On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 31/ 32

Conclusions Decoupled graph querying system Smart query routing to achieve higher cache hits for h-hop traversal queries emphasize less on expensive graph partitioning and re-partitioning across storage tiers provide linear scalability in throughput with more number of query processors works well in the presence of query hotspots adaptive to workload changes and graph updates. Decoupled architecture for graph querying On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage: Arijit Khan (NTU Singapore), G. Segovia, and D. Kossmann 32/ 32