Project Voldemort: What’s New Alex Feinberg. The plan  Introduction  Motivation  Inspiration  Implementation  Present day  New features within the.

Project Voldemort: What’s New Alex Feinberg

The plan  Introduction  Motivation  Inspiration  Implementation  Present day  New features within the last months  New features in active development  The roadmap  Wanted Features  Q&A

Introduction  Project Voldemort: a scalable, highly available, distributed, key/value store  Data Platform team at LinkedIn – Data driven features – The infrastructure to run them  Original work by Jay Kreps, Bhupesh Bansal  The presenter: just hired a month ago to work full time on Voldemort

Data Driven Features…

Motivation  Data driven features are data intensive in terms of reads, writes and the size of the datasets  Scaling a relational database: if data can’t be federated, RDBMS becomes a de-facto K/V store  SQL –Relational algebra is a powerful tool, but not a universal solution –Passing strings around is cumbersome, ORMs can be leaky abstractions

“The Exploits of a Mom” © XKCD

 Memcached is an excellent in-memory key/value cache –Used extensively by high traffic websites, including LinkedIn –High throughput, low latency –Excellent scalability  Hadoop –Used extensively by the Data Platform team –High average throughput, but high latency –Excellent scalability  Wanted –Persistence and replication –Low latency –No single points of failure –Scalable: accommodate more data by adding more machines Non-relational Alternatives

Inspiration  Amazon’s Dynamo  SOSP paper late 2007  Key-value store  Consistent hashing, vector clocks  Gossip protocol  Hinted handoff, Merkle Trees

Consistent Hashing  A key belongs to a partition  A node can hold multiple partitions  There is a tunable replication factor (N)  If N is 3, a key mapped to partition P is written to P-1, P and P+1

Vector Clocks  From Leslie Lamport (also author of LaTeX)  Want to determine the order of writes  Total order demands strong consistency – Partial ordering: determine “x came before y” relation in most cases  Associate a vector clock with a value –Versioned value is a (value, vector clock) tuple –Multiple versioned values can exist for a key –We can use a vector clock to determine causality –If two versioned values aren’t causally related, allow application to reconcile –Shopping cart example

Vector Clocks: Initial State

Vector Clocks: Event Occurs

Vector Clocks: Multi-cast the Vector Clock

Vector Clocks: Node Becomes Partitioned

Vector Clocks: Causality Determined

Implementation  Customization at all layers –Pluggable serialization (JSON, protocol buffers, Thrift) allows keys and values to be structures rather than just strings  Tunable R, W, N parameters  Storage engines –No persistent data structure that is good at everything –BDB is most popular –Read only stores

Present day  Production use at LinkedIn –Multiple clusters –Data Platform usage –Other teams’ usage –Read only stores for data built out in Hadoop  Production use outside of LinkedIn –Gilt Group, KaChing, others  Revision control through git –Hosted on github  Active developer community, inside and outside LinkedIn

Recently Added: Read Only Stores  Motivation  Offline batch/computing  Optimize the store for atomic swaps and rollbacks  Leverage what Hadoop provides  Implementation  Memory mapped files  Integration with Hadoop  Driver program to initiate fetch and swap in parallel

Recently Added: NIO  Non-blocking IO, why? –Scalability and the c10k problem  Java’s NIO framework –Added in 1.4, greatly improved in 1.5 and 1.6 –Will use native scalable poll implementation  Tricky to get good performance  Contributed by Kirk True

NIO Performance and Scalability

Recently Added: Data Compression  Motivation: smaller data size –Denormalized data leads to big blobs –Less to transfer between client and server –More of the data can be stored in main memory –Less to transfer from disk to memory –Compression/decompression is fast –If we’re I/O bound, less bytes to express the same data implies better performance  Implementation  Usage

Monitoring and Administration  In place: JMX hooks –View statistics (how many queries are made? How long are they taking?) –Perform operations (analogous to SNMP traps)  Admin Server –Functionality which is needed, but shouldn’t be performed by regular store clients –Ability to update and retrieve cluster/store metadata –Functionality efficiently stream keys and values in a partition  Network class loader/server side filtering

On The Roadmap  Failure detection  Large value support  Publish/subscribe  Rebalancing

On The Roadmap: Rebalancing  Rebalancing: ability to add a server to a cluster while the cluster is still running  Node enters a cluster, “steals” a partition from other nodes (fetches it as a stream using the admin protocol)  Pull-based gossip protocol to let other nodes know that it’s in the cluster –Metadata about cluster membership treated as data, conflicts reconciled using vector clocks  While the new node is transferring the partitions, gets sent to it are redirected to the donor node(s)

Stability and Infrastructure  Testing “in the cloud”  Distributed systems have to be tested on multi- node clusters  Distributed systems have complex failure scenarios  A storage system, above all, must be stable  Automated testing allows rapid iteration while maintaining confidence in systems’ correctness and stability  EC2-based testing framework  Tests are invoked programmatically  Contributed by Kirk True  Adaptable to other cloud hosting providers  Will run on a regular basis  Regular releases for new features and bug fixes  Trunk stays stable

Wanted Features  Clients for other languages  Outside of the JVM  Ruby, PHP (popular for web development)  On the JVM  JRuby, Scala, Clojure  Different languages have different idioms  Java’s idiom is objects with mutable state  Views  Inspired by CouchDB  Want to change a value for a key without transfering that value back and forth  Example: adding to a list, incrementing a counter  Less collisions/conflicts

Contributions are Welcome  Thriving open source community –Fork us on Github: http://github.com/voldemort/voldemorthttp://github.com/voldemort/voldemort –Wiki: http://wiki.github.com/voldemort/voldemorthttp://wiki.github.com/voldemort/voldemort  Fun projects: http://wiki.github.com/voldemort/voldemort/fun-projectshttp://wiki.github.com/voldemort/voldemort/fun-projects –IRC channel: #Voldemort on Freenode (irc.freenode.org)  Want to work on this full time? LinkedIn is hiring!  Just in the Data Platform group  Other technologies: Scala, Hadoop, ZooKeeper, Lucene, Netty  Projects: real time faceted search, distributed graph databases, machine learning, data mining, information retrieval / extraction, NLP  Open source projects: Zoie, Bobo, Sensei-search, decomposer, kamikaze (three more on the way!)  More elsewhere!  Contact me  http://www.linkedin.com/in/alexfeinberg http://www.linkedin.com/in/alexfeinberg  afeinberg@linkedin.com afeinberg@linkedin.com

Questions?  Questions?

Project Voldemort: What’s New Alex Feinberg. The plan  Introduction  Motivation  Inspiration  Implementation  Present day  New features within the.

Similar presentations

Presentation on theme: "Project Voldemort: What’s New Alex Feinberg. The plan  Introduction  Motivation  Inspiration  Implementation  Present day  New features within the."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Project Voldemort: What’s New Alex Feinberg. The plan  Introduction  Motivation  Inspiration  Implementation  Present day  New features within the.

Similar presentations

Presentation on theme: "Project Voldemort: What’s New Alex Feinberg. The plan  Introduction  Motivation  Inspiration  Implementation  Present day  New features within the."— Presentation transcript:

Similar presentations

About project

Feedback