University of Illinois at Urbana-Champaign

Slides:



Advertisements
Similar presentations
Singly linked lists Doubly linked lists
Advertisements

Sharding and the Isis 2 DHT Did you understand how the Isis 2 distributed hash table works?
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Consistency and Replication Chapter 7 Part II Replica Management & Consistency Protocols.
Map/Reduce in Practice Hadoop, Hbase, MongoDB, Accumulo, and related Map/Reduce- enabled data stores.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Cuckoo Filter: Practically Better Than Bloom
Precept 6 Hashing & Partitioning 1 Peng Sun. Server Load Balancing Balance load across servers Normal techniques: Round-robin? 2.
Hit or Miss ? !!!.  Cache RAM is high-speed memory (usually SRAM).  The Cache stores frequently requested data.  If the CPU needs data, it will check.
Bloom Filters Kira Radinsky Slides based on material from:
Data Structures Hash Tables
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
Look-up problem IP address did we see the IP address before?
P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
E.G.M. PetrakisHashing1 Hashing on the Disk  Keys are stored in “disk pages” (“buckets”)  several records fit within one page  Retrieval:  find address.
1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with.
Proteus: Power Proportional Memory Cache Cluster in Data Centers Shen Li, Shiguang Wang, Fan Yang, Shaohan Hu, Fatemeh Saremi, Tarek Abdelzaher.
BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
Distributed storage for structured data
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Lecture 20-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2012 Indranil Gupta (Indy) Nov 1, 2012 NoSQL/Key-value Stores Lecture.
PCAP Project: Probabilistic CAP and Adaptive Key-value Stores
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
Implementing Stacks Ellen Walker CPSC 201 Data Structures Hiram College.
Compact Data Structures and Applications Gil Einziger and Roy Friedman Technion, Haifa.
TinyLFU: A Highly Efficient Cache Admission Policy
The Design and Implementation of Log-Structure File System M. Rosenblum and J. Ousterhout.
A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.
Triggers. Why Triggers ? Suppose a warehouse wishes to maintain a minimum inventory of each item. Number of items kept in items table Items(name, number,...)
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
The Bloom Paradox Ori Rottenstreich Joint work with Yossi Kanizo and Isaac Keslassy Technion, Israel.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
CS333 Intro to Operating Systems Jonathan Walpole.
Distributed Systems CS Consistency and Replication – Part IV Lecture 13, Oct 23, 2013 Mohammad Hammoud.
Author: Haoyu Song, Murali Kodialam, Fang Hao and T.V. Lakshman Publisher/Conf. : IEEE International Conference on Network Protocols (ICNP), 2009 Speaker:
NOSQL DATABASE Not Only SQL DATABASE
Bloom Filters. Lecture on Bloom Filters Not described in the textbook ! Lecture based in part on: Broder, Andrei; Mitzenmacher, Michael (2005), "Network.
Cuckoo Filter: Practically Better Than Bloom Author: Bin Fan, David G. Andersen, Michael Kaminsky, Michael D. Mitzenmacher Publisher: ACM CoNEXT 2014 Presenter:
Distributed systems. distributed systems and protocols distributed systems: use components located at networked computers use message-passing to coordinate.
Bigtable: A Distributed Storage System for Structured Data
Review William Cohen. Sessions Randomized algorithms – The “hash trick” – Bloom filters how would you use it? how and why does it work? – Count-min sketch.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
1 Plaxton Routing. 2 History Greg Plaxton, Rajmohan Rajaraman, Andrea Richa. Accessing nearby copies of replicated objects, SPAA 1997 Used in several.
The Raft Consensus Algorithm Diego Ongaro and John Ousterhout Stanford University.
Big Data Yuan Xue CS 292 Special topics on.
Bigtable A Distributed Storage System for Structured Data.
BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
University of Illinois at Urbana-Champaign
Data Loss and Data Duplication in Kafka
Jonathan Walpole Computer Science Portland State University
Cassandra Storage Engine
Updating SF-Tree Speaker: Ho Wai Shing.
CSE-291 (Cloud Computing) Fall 2016
The Variable-Increment Counting Bloom Filter
The NoSQL Column Store used by Facebook
Ivy Eva Wu.
Accessing nearby copies of replicated objects
Massively Parallel Cloud Data Storage Systems
Providing Secure Storage on the Internet
湖南大学-信息科学与工程学院-计算机与科学系
Edge computing (1) Content Distribution Networks
Bloom Filters Very fast set membership. Is x in S? False Positive
CSE 486/586 Distributed Systems Consistency --- 2
RAIDR: Retention-Aware Intelligent DRAM Refresh
Presentation transcript:

University of Illinois at Urbana-Champaign The Forgetful Bloom Filter (and their use in NoSQL Databases) Rajath Subramanyam, Indranil Gupta Luke Leslie, Wenting Wang University of Illinois at Urbana-Champaign DPRG: http://dprg.cs.uiuc.edu

Key-value/NoSQL Databases and Counters Key-value and NoSQL Database Systems Growing segment in Industry (21% CAGR, $3.4 B by 2020) Offer Eventual consistency Orders of magnitude lower latency Apache Cassandra (Facebook), HBase (Yahoo/Hortonworks), Voldemort (LinkedIn), Riak (Basho), MongoDB, … Used to implement counters Incrementally count “things” Likes, tweets, updates, events, etc. 2

Need: Idempotence In Counters Basic counter operation: <inc> (update or +1) Update identified uniquely by (client id, client update sequence number) Applications using such counters require a guarantee of idempotence. Exactly-once semantics Idempotence known to be impossible in distributed systems with message losses and failures Client sends udpate to server, but receives no response [Cassandra JIRA-2495] 3

Guaranteeing Idempotence In Counters Three possible solutions Approach 1: Clients don’t submit duplicate updates.  Under-count Approach 2: Server maintains list of all updates. Duplicate updates are checked and rejected.  Too much space Approach 3: Server maintains a Bloom filter to store all client updates. Less space and few false positives Grows large and continuously over time Can’t delete old entries (updates) 4

Bloom Filter - Refresher Can’t forget old entries Fills up over time Bloom filter size is fixed Cannot auto-scale without maintaining history Compact way of representing a set of items Checking for existence in set is cheap Some probability of false positives: an item not in set may check true as being in set Never false negatives On insert, set all hashed bits. On check-if-present, return true if all hashed bits set. False positives False positive rate low k=4 hash functions 100 items 3200 bits FP rate = 0.02% Large Bit Map 1 2 3 Hash_1 Key-K Hash_2 . 69 Hash_m 111 5 127

Introducing: Forgetful Bloom Filter (FBF) Kept at server Maintains a moving window that stores recent updates Updates are automatically forgotten after a timeout FBF is self-adaptive to meet user-specified inaccuracy requirement 6

Membership check happens Simple FBF Past Bloom Filter Insert happens here Membership check happens here Present Bloom Filter Future Bloom Filter Time Now 7

Simple FBF – Over Time Past Bloom Filter Insert Present and Future Check All 3 Present Bloom Filter Future Bloom Filter Now Time 8

Simple FBF – Over Time Past Bloom Filter Insert Present and Future Check All 3 Present Bloom Filter Now Time 9

Simple FBF – Over Time Past Bloom Filter Insert Present and Future Check All 3 Present Bloom Filter Future Bloom Filter (empty) Now Time 10

FBF Optimizations and Analysis Faster membership check Future and Past BFs do not overlap Only check consecutive pairs of BFs, starting from Future BF First hit results in a true answer Mathematical Analysis of FBF False Positive Details in Paper Used to decide when to auto-scale FBF 11

FBF Refreshing and N-FBF “Refresh” operation slides Bloom Filters left Happens periodically, every t time units t=Refresh period (configurable) Also configurable: number of Bloom filters N-FBF Have N Past Bloom Filters, 1 Present BF, and 1 Future BF Simple FBF = 1-FBF N-FBF Refresh Oldest Past  Second-oldest Past Second-Oldest Past  Third-oldest Past … Newest Past  Present Present  Future Future  Empty 12

Self-Adaptive N-FBF Given a user-specified SLO on (minimum) false positive probability Perform Dynamic Resizing of N-FBF Adjust t (refresh period) Adjust N, number of BFs in N-FBF Continuously measure false positives and use analysis to calculate False Positive Probability (FPP) Adjust t and N 13

Self-Adaptive N-FBF (2) Given a user-specified SLO on (minimum) false positive probability Continuously measure false positives and use analysis to calculate False Positive Probability (FPP) If measured FPP > 90% of SLO FPP (in danger) Increase space and sampling rate Multiplicative increase of N, and Additive decrease of t If measured FPP < 10% of SLO FPP (too conservative) Decrease space and sampling rate Additive decrease of N, and Additive increase of t 14

Integration Into Apache Cassandra – Background In Cassandra Counters are a special data structure (column family) Clients can issue update operations (e.g., +1’s) on any key in that table Counters may be replicated 15

(Non-)Idempotence in Cassandra Cassandra v0.8-2.0: uses sharding and replication Client sends requests to same server Not idempotent Incorrect values can result from client retries (e.g., after a write timeout) or commit log replay Cassandra v2.1: eliminated remote and local shards, instead uses local locking Higher overhead due to locking Can still be non-idempotent due to client retries 16

FBFs in Cassandra 17 Integrated into Cassandra v0.8-2.0 The following changes are required: Client-side Each client operation needs to be identified by a globally unique identifier, such as < client id, per-client sequence number>. Server-side An FBF associated with each counter column. Commitlog shards also carry unique request id (to avoid duplication) Server immediately propagates update to all other servers (who update their FBFs and commitlogs) When server receives an update Performs membership check for update id in FBF If present, update rejected Else, applied, and added to FBF, commitlog, and memtable Above steps are performed atomically 17

I. FBF Datastructure Experiments Optimized membership check lowers false positives Analysis predicts false positives well 18

Self-Adaptive FBF More Filters (N)  Lower FPP Frequent Refresh  Lower FPP 19

FBF Self-Adaptation Over Time As more elements are inserted, FPP goes up; But FBF adjusts N and t To keep FPP below SLO 20

II. FBF Integrated into Generic Key-Value Database 100% accurate when using FBF Default is 6% inaccurate FBF increases latency by at most 10% (2 ms)

FBF vs Very Large Bloom Filter (Recycled - RBF) FBF uses same space to achieve lower false positive probabilities 22

III. FBF Integrated into Apache Cassandra v2.0 100% accurate counts with FBF Default has 8% error 23

Takeaways New Datastructure called Forgetful Bloom Filter Forgets entries after a while, self-adjusting to meet false positive rate SLO Used as compact way to maintain list of received counter updates (server side) Uses space more efficiently than “just one large Bloom filter”, to reach lower false positive rate Avoids locking of Cassandra v2.1 Increases latency by at most 10% Integrated into Cassandra v2.0 100% accurate, while default is off by about 6-8% 24 DPRG: http://dprg.cs.uiuc.edu