Be Fast, Cheap and in Control

Slides:

Advertisements

Similar presentations

Efficient Event-based Resource Discovery Wei Yan*, Songlin Hu*, Vinod Muthusamy +, Hans-Arno Jacobsen +, Li Zha* * Chinese Academy of Sciences, Beijing.

Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.

Evaluating scalability Peer-to-Peer File Sharing Networks of Sayantan Mitra Vibhor Goyal.

SILT: A Memory-Efficient, High-Performance Key-Value Store

FAWN: Fast Array of Wimpy Nodes Developed By D. G. Andersen, J. Franklin, M. Kaminsky, A. Phanishayee, L. Tan, V. Vasudevan Presented by Peter O. Oliha.

Ceph: A Scalable, High-Performance Distributed File System Priya Bhat, Yonggang Liu, Jing Qin.

Scalable Flow-Based Networking with DIFANE 1 Minlan Yu Princeton University Joint work with Mike Freedman, Jennifer Rexford and Jia Wang.

Authors: David G. Andersen et al. Offense: Chang Seok Bae Yi Yang.

10/31/2007cs6221 Internet Indirection Infrastructure ( i3 ) Paper By Ion Stoica, Daniel Adkins, Shelley Zhuang, Scott Shenker, Sonesh Sharma Sonesh Sharma.

FAWN: A Fast Array of Wimpy Nodes Presented by: Aditi Bose & Hyma Chilukuri.

Squirrel: A decentralized peer- to-peer web cache Paul Burstein 10/27/2003.

FAWN: A Fast Array of Wimpy Nodes Presented by: Clint Sbisa & Irene Haque.

BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,

Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.

Web Server Load Balancing/Scheduling Asima Silva Tim Sutherland.

Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.

Supporting Strong Cache Coherency for Active Caches in Multi-Tier Data-Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan, S. Krishnamoorthy,

Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.

Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University.

Software-Defined Networks Jennifer Rexford Princeton University.

Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.

Scalable Web Server on Heterogeneous Cluster CHEN Ge.

ORange: Multi Field OpenFlow based Range Classifier Liron Schiff Tel Aviv University Yehuda Afek Tel Aviv University Anat Bremler-Barr Inter Disciplinary.

Wire Speed Packet Classification Without TCAMs ACM SIGMETRICS 2007 Qunfeng Dong (University of Wisconsin-Madison) Suman Banerjee (University of Wisconsin-Madison)

CEPH: A SCALABLE, HIGH-PERFORMANCE DISTRIBUTED FILE SYSTEM S. A. Weil, S. A. Brandt, E. L. Miller D. D. E. Long, C. Maltzahn U. C. Santa Cruz OSDI 2006.

Architecture for Caching Responses with Multiple Dynamic Dependencies in Multi-Tier Data- Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan,

MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Presented.

Eiger: Stronger Semantics for Low-Latency Geo-Replicated Storage Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡ * Princeton,

Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.

Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.

Amar Phanishayee,LawrenceTan,Vijay Vasudevan

GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.

Networking Named Content Van Jacobson, Diana K. Smetters, James D. Thornton, Michael F. Plass, Nicholas H. Briggs, Rebecca L. Braynard.

CellSDN: Software-Defined Cellular Core networks Xin Jin Princeton University Joint work with Li Erran Li, Laurent Vanbever, and Jennifer Rexford.

Full and Para Virtualization

CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.

Hiearchial Caching in Traffic Server. Hiearchial Caching  A set of techniques and mechanisms to increase the size and performance of network caches.

WAN Technologies. 2 Large Spans and Wide Area Networks MAN networks: Have not been commercially successful.

BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,

Decibel: Isolation and Sharing in Disaggregated Rack-Scale Storage

Hathi: Durable Transactions for Memory using Flash

Web Server Load Balancing/Scheduling

Coral: A Peer-to-peer Content Distribution Network

Algorithmic Improvements for Fast Concurrent Cuckoo Hashing

Slicer: Auto-Sharding for Datacenter Applications

Web Server Load Balancing/Scheduling

Advanced Topics in Concurrency and Reactive Programming: Case Study – Google Cluster Majeed Kassis.

Authors: Sajjad Rizvi, Xi Li, Bernard Wong, Fiodar Kazhamiaka

Finding a Needle in Haystack : Facebook’s Photo storage

Presented by Haoran Wang

VIRTUAL SERVERS Presented By: Ravi Joshi IV Year (IT)

(slides by Nick Feamster)

PA an Coordinated Memory Caching for Parallel Jobs

Load Balancing Memcached Traffic Using SDN

Plethora: Infrastructure and System Design

Arijit Khan Nanyang Technological University (NTU), Singapore

EE 122: Peer-to-Peer (P2P) Networks

COS 518: Advanced Computer Systems Lecture 9 Michael Freedman

Building Peer-to-Peer Systems with Chord, a Distributed Lookup Service

2018/12/10 Energy Efficient SDN Commodity Switch based Practical Flow Forwarding Method Author: Amer AlGhadhban and Basem Shihada Publisher: 2016 IEEE/IFIP.

Othello Hashing and Its Applications for Network Processing

Prophecy: Using History for High-Throughput Fault Tolerance

Specialized Cloud Architectures

NetCloud Hong Kong 2017/12/11 NetCloud Hong Kong 2017/12/11 PA-Flow:

Caching 50.5* + Apache Kafka

Fast Accesses to Big Data in Memory and Storage Systems

CSE 552 preparation for reading

Elmo Muhammad Shahbaz Lalith Suresh, Jennifer Rexford, Nick Feamster,

A Closer Look at NFV Execution Models

Presentation transcript:

Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Raghav Sethi Michael Kaminsky David G. Andersen Michael J. Freedman

Goal: fast and cost-effective key-value store Target: cluster-level storage for modern cloud services Massive number of small key-value objects Highly skewed and dynamic workloads Aggressive latency and throughput performance goals This talk: scale-out flash-based storage for this setting

Key challenge: dynamic load balancing time t time t+x How to handle the highly skewed and dynamic workloads? Today’s solution: data migration / replication system overhead consistency challenge

Fast, small cache can ensure load balancing Need only cache O(nlogn) items to provide good load balance, where n is the number of backend nodes. [Fan, SOCC’11] less-hot queries, better-balanced loads hottest queries flash-based backend nodes frontend cache node E.g., 100 backends with hundreds of billions of items + cache with 10,000 entries How to efficiently serve queries with cache and backend nodes? How to efficiently update the cache under dynamic workloads?

High overheads with traditional caching architectures clients backends cache miss clients failure point miss backends cache (load balancer) Look-aside Look-through Cache must process all queries and handle misses In our case, cache is small and hit ratio could be low Throughput is bounded by the cache I/O High latency for queries for uncached keys

SwitchKV: content-aware routing clients controller OpenFlow Switches cache backends Switches route requests directly to the appropriate nodes Latency can be minimized for all queries Throughput can scale out with # of backends Availability would not be affected by cache node failures

Exploit SDN and switch hardware Clients encode key information in packet headers Encode key hash in MAC for read queries Encode destination backend ID in IP for all queries Switches maintain forwarding rules and route query packets L2 table hit Packet In Packet Out exact match rule per cached key to the cache miss TCAM table Packet Out match rule per physical machine

Keep cache and switch rules updated New challenges for cache updates Only cache the hottest O(nlogn) items Limited switch rule update rate Goal: react quickly to workload changes with minimal updates switch rule update top-k <key, load> list (periodic) cache fetch <key, value> backend controller bursty hot <key, value> (instant)

Evaluation How well does a fast small cache improve the system load balance and throughput? Does SwitchKV improve system performance compared to traditional architectures? Can SwitchKV react quickly to workload changes?

Evaluation Platform Reference backend 1 Gb link Intel Atom C2750 processor Intel DC P3600 PCIe-based SSD RocksDB with 120 million 1KB objects 99.4K queries per second

Evaluation Platform Xeon Server 1 Xeon Server 2 # of backends 128 backend tput 100 KQPS keyspace size 10 billion key size 16 bytes value size 128 bytes query skewness Zipf 0.99 cache size 10,000 entries Client Cache Ryu 40 GbE 40 GbE 40 GbE Pica8 P-3922 (OVS 2.3) 40 GbE 40 GbE 40 GbE 40 GbE 40 GbE Backends Backends Xeon Server 3 Xeon Server 4 Default settings in this talk Use Intel DPDK to efficiently transfer packets and modify headers Client adjusts its sending rate, keep loss rate between 0.5% and 1%

Throughput with and without caching Cache (10,000 entries) Backends aggregate (with cache) Backends aggregate (without cache)

Throughput vs. Number of backends backend rate limit: 50KQPS, cache rate limit: 5MQPS

End-to-end latency vs. Throughput

Throughput with workload changes Traditional cache update method Periodic top-k updates only Periodic top-k updates + instant bursty hot key updates Make 200 cold keys become the hottest keys every 10 seconds

Conclusion SwitchKV: high-performance and cost-efficient KV store Fast, small cache guarantees backend load balancing Efficient content-aware OpenFlow switching Low (tail) latency Scalable throughput High availability Keep high performance under highly dynamic workloads