Kafka high-throughput, persistent, multi-reader streams

Slides:



Advertisements
Similar presentations
An Erlang Implementation of Restms. Why have messaging? Separates applications cheaply Feed information to the right applications cheaply Interpret feed.
Advertisements

The Big Data Ecosystem at LinkedIn
Building LinkedIn’s Real-time Data Pipeline
M AINTAINING L ARGE A ND F AST S TREAMING I NDEXES O N F LASH Aditya Akella, UW-Madison First GENI Measurement Workshop Joint work with Ashok Anand, Steven.
Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon.
Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.
R2: An application-level kernel for record and replay Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, Z. Zhang, (MSR Asia, Tsinghua, MIT),
Distributed Storage March 12, Distributed Storage What is Distributed Storage?  Simple answer: Storage that can be shared throughout a network.
The Evolution of Data Infrastructure at Linkedin LinkedIn Confidential ©2013 All Rights Reserved.
Felix Ehm CERN BE-CO. Content  Introduction  JMS in the Controls System  Deployment and Operation  Conclusion.
MSN 2004 Network Memory Servers: An idea whose time has come Glenford Mapp David Silcott Dhawal Thakker.
Data Infrastructure at LinkedIn Shirshanka Das XLDB
Big Data Open Source Software and Projects ABDS in Summary XVI: Layer 13 Part 1 Data Science Curriculum March Geoffrey Fox
G Robert Grimm New York University SGI’s XFS or Cool Pet Tricks with B+ Trees.
Computer Science 162 Section 1 CS162 Teaching Staff.
CS4432: Database Systems II Lecture 2 Timothy Sutherland.
Nikolay Tomitov Technical Trainer SoftAcad.bg.  What are Amazon Web services (AWS) ?  What’s cool when developing with AWS ?  Architecture of AWS 
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark 2.
Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.
Case Study - GFS.
IBM Research – Thomas J Watson Research Center | March 2006 © 2006 IBM Corporation Events and workflow – BPM Systems Event Application symposium Parallel.
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
The WebLogic Messaging Kernel: Design Challenges Greg Brail Senior Staff Software Engineer BEA Systems.
Data Freeway : Scaling Out to Realtime Eric Hwang, Sam Rash
Hypertable Doug Judd Background  Zvents plan is to become the “Google” of local search  Identified the need for a scalable DB 
Distributed Systems. Interprocess Communication (IPC) Processes are either independent or cooperating – Threads provide a gray area – Cooperating processes.
Introduction to Hadoop and HDFS
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
Architectures of distributed systems Fundamental Models
Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.
CS 5204 (FALL 2005)1 Leases: An Efficient Fault Tolerant Mechanism for Distributed File Cache Consistency Gray and Cheriton By Farid Merchant Date: 9/21/05.
Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.
1 Specification and Implementation of Dynamic Web Site Benchmarks Sameh Elnikety Department of Computer Science Rice University.
PFPC: A Parallel Compressor for Floating-Point Data Martin Burtscher 1 and Paruj Ratanaworabhan 2 1 The University of Texas at Austin 2 Cornell University.
Towards a Billion Routing Lookups per Second in Software  Author: Marko Zec, Luigi, Rizzo Miljenko Mikuc  Publisher: SIGCOMM Computer Communication Review,
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Presenters: Rezan Amiri Sahar Delroshan
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Disk Farms at Jefferson Lab Bryan Hess
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
Tycho: A General Purpose Virtual Registry and Asynchronous Messaging System Matthew Grove ACET Invited Talk February 2006.
AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?
Bigtable: A Distributed Storage System for Structured Data
CEG 2400 FALL 2012 Windows Servers Network Operating Systems.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
DESIGNING HIGH PERFORMANCE ETL FOR DATA WAREHOUSE. Best Practices and approaches. Alexei Khalyako (SQLCAT) & Marcel Franke (pmOne)
Apache Kafka A distributed publish-subscribe messaging system
JAFKA A fast MQ. overview ● ● 271KB single jar ● 3.5MB with all dependencies and configurations.
CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook
Robert Metzger, Aljoscha Connecting Apache Flink® to the World: Reviewing the streaming connectors.
Pilot Kafka Service Manuel Martín Márquez. Pilot Kafka Service Manuel Martín Márquez.
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
Data Management with Google File System Pramod Bhatotia wp. mpi-sws
Database Services Katarzyna Dziedziniewicz-Wojcik On behalf of IT-DB.
Introduction to Apache Kafka
Distributed File Systems
Alternative system models
Spark Presentation.
Replication Middleware for Cloud Based Storage Service
Introduction to Spark.
Boyang Peng, Le Xu, Indranil Gupta
KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures
Evolution of messaging systems and event driven architecture
Benchmarking Cloud Serving Systems with YCSB
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Presentation transcript:

Kafka high-throughput, persistent, multi-reader streams

LinkedIn SNA (Search, Network, Analytics) Worked on a number of open source projects at LinkedIn (Voldemort, Azkaban, …) Hadoop, data products

Problem How do you model and process stream data for a large website?

Examples Tracking and Logging – Who/what/when/ where Metrics – State of the servers Queuing – Buffer between online and “nearline” processing Change capture – Database updates Messaging Examples: numerous JMS brokers, RabbitMQ, ZeroMQ

The Hard Parts persistence, scale, throughput, replication, semantics, simplicity

Tracking

Tracking Basics Example “events” (around 60 types) –––––––– Search Page view Invitation Impressions Avro serialization Billions of events Hundreds of GBs/day Most events have multiple consumers –––––––––– Security services Data warehouse Hadoop News feed Ad hoc

Existing messaging systems JMS – An API not an implementation – Not a very good API Weak or no distribution model High complexity Painful to use – Not cross language Existing systems seem to perform poorly with large datasets

Ideas 1. Eschew random-access persistent data structures 2. Allow very parallel consumption (e.g. Hadoop) 3. Explicitly distributed 4. Push/Pull not Push/Push

Performance Test Two Amazon EC2 large instances – Dual core AMD 2.0 GHz – rpm SATA drive – 8GB memory 200 byte messages 8 Producer threads 1 Consumer thread Kafka – Flush 10k messages – Batch size = 50 ActiveMQ – syncOnWrite = false – fileCursor

Performance Summary Producer – 111,729.6 messages/sec – 22.3 MB per sec Consumer – 193,681.7 messages/sec – 38.6 MB per sec On on our hardware – 50MB/sec produced – 90MB/sec consumed

How can we get high performance with persistence?

Some tricks Disks are fast when used sequentially – Single thread linear read/write speed: > 300MB/ sec – Reads are faster still, when cached – Appends are effectively O(1) – Reads from known offset are effectively O(1) End-to-end message batching Zero-copy network implementation (sendfile) Zero copy message processing APIs

Implementation ~5k lines of Scala Standalone jar NIO socket server Zookeeper handles distribution, client state Simple protocol Python, Ruby, and PHP clients contributed

Distribution Producer randomly load balanced Consumer balances M brokers, N consumers

Hadoop InputFormat

Consumer State Data is retained for N days Client can calculate next valid offset from any fetch response All server APIs are stateless Client can reload if necessary

APIs // Sending messages client.send(“topic”, messages) // Receiveing messages Iterable stream = client.createMessageStreams(…).get(“topic”).get(0) for(message: stream) { // process messages }

(0.06 release) Data published to persistent topics, and redistributed by primary key between stages

Also coming soon End-to-end block compression Contributed php, ruby, python clients Hadoop InputFormat, OutputFormat Replication

The End