Data Loss and Data Duplication in Kafka

Slides:



Advertisements
Similar presentations
Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung
Advertisements

Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Distributed Systems Tutorial 9 – Windows Azure Storage written by Alex Libov Based on SOSP 2011 presentation winter semester,
Networking Theory (Part 1). Introduction Overview of the basic concepts of networking Also discusses essential topics of networking theory.
Replication Monitoring University of Maryland Institute for Advanced Computer Studies.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Backup and Recovery Part 1.
National Manager Database Services
The Hadoop Distributed File System: Architecture and Design by Dhruba Borthakur Presented by Bryant Yao.
Software Engineer, #MongoDBDays.
1 Transport Layer Computer Networks. 2 Where are we?
Netflix Data Pipeline with Kafka Allen Wang & Steven Wu.
MAHADEV KONAR Apache ZooKeeper. What is ZooKeeper? A highly available, scalable, distributed coordination kernel.
Oracle10g RAC Service Architecture Overview of Real Application Cluster Ready Services, Nodeapps, and User Defined Services.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
Unified solution Easy to configure, manage, and monitor Reuse existing investments SAN/DAS environments Allow using HA hardware resources Fast seamless.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Fault Tolerant Services
Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #MongoDB Introduction to Sharding.
University of Illinois at Urbana-Champaign
Presenter: Seikwon KAIST The Google File System 【 Ghemawat, Gobioff, Leung 】
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture Chunkservers Master Consistency Model File Mutation Garbage.
CSci8211: Distributed Systems: Raft Consensus Algorithm 1 Distributed Systems : Raft Consensus Alg. Developed by Stanford Platform Lab  Motivated by the.
Apache Kafka A distributed publish-subscribe messaging system
CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Reliable multicast Tolerates process crashes. The additional requirements are: Only correct processes will receive multicasts from all correct processes.
1 Chapter 24 Internetworking Part 4 (Transport Protocols, UDP and TCP, Protocol Port Numbers)
Robert Metzger, Aljoscha Connecting Apache Flink® to the World: Reviewing the streaming connectors.
Pilot Kafka Service Manuel Martín Márquez. Pilot Kafka Service Manuel Martín Márquez.
Cassandra The Fortune Teller
Messaging in Distributed Systems
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
CSE-291 (Distributed Systems) Winter 2017 Gregory Kesden
HERON.
Primary-Backup Replication
Kafka & Samza Weize Sun.
Cassandra - A Decentralized Structured Storage System
HBase Mohamed Eltabakh
Real-time Streaming and Data Pipelines with Apache Kafka
A detailed explanation of Apache Kafka applications in practice
Real-Time Processing with Apache Flume, Kafka, and Storm Kamlesh Dhawale Ankalytics
Scaling Apache Flink® to very large State
Google File System.
Maximum Availability Architecture Enterprise Technology Centre.
CPS 512 midterm exam #1, 10/7/2016 Your name please: ___________________ NetID:___________ /60 /40 /10.
A Technical Overview of Microsoft® SQL Server™ 2005 High Availability Beta 2 Matthew Stephen IT Pro Evangelist (SQL Server)
Google Filesystem Some slides taken from Alan Sussman.
Gregory Kesden, CSE-291 (Storage Systems) Fall 2017
Gregory Kesden, CSE-291 (Cloud Computing) Fall 2016
CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden
Big Data - in Performance Engineering
آزمايشگاه سيستمهای هوشمند علی کمالی زمستان 95
Ewen Cheslack-Postava
EECS 498 Introduction to Distributed Systems Fall 2017
EECS 498 Introduction to Distributed Systems Fall 2017
The Google File System (GFS)
The Google File System (GFS)
The Google File System (GFS)
Hbase – NoSQL Database Presented By: 13MCEC13.
The Google File System (GFS)
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Caching 50.5* + Apache Kafka
Presentation transcript:

Data Loss and Data Duplication in Kafka Jayesh Thakrar

Kafka is a distributed, partitioned, replicated, durable commit log service. It provides the functionality of a messaging system, but with a unique design. Exactly once - each message is delivered once and only once

Data Loss and Duplicate Prevention Monitoring AGENDA Kafka Overview Data Loss Data Duplication Data Loss and Duplicate Prevention Monitoring

Kafka Overview

Kafka As A Log Abstraction Client: Producer Kafka Server = Kafka Broker Topic: app_events Client: Consumer A Client: Consumer B Source: https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

Topic Partitioning . . . Client: Producer or Consumer Kafka Broker Topic: app_events Source: https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

Topic Partitioning – Scalability Kafka Broker 0 Leader Replica Kafka Broker 2 Replica Replica Replica Leader Kafka Broker 1 Replica Clients: Producer, Consumer Leader Replica

Topic Partitioning – redundancy Kafka Broker 0 Kafka Broker 2 Leader Replica Replica Replica Replica Leader Kafka Broker 1 Replica Client: Producer, Consumer Leader Replica

Topic Partitioning – Redundancy/durability Kafka Broker 0 Kafka Broker 2 Leader Replica Replica Replica Replica Leader Kafka Broker 1 Replica Pull-based inter-broker replication Leader Replica

Topic Partitioning – summary Log sharded into partitions Messages assigned to partitions by API or custom partitioner Partitions assigned to brokers (manual or automatic) Partitions replicated (as needed) Messages ordered within each partition Message offset = absolute position in partition Partitions stored on filesystem as ordered sequence of log segments (files)

Other Key Concepts Cluster = collection of brokers Broker-id = a unique id (integer) assigned to each broker Controller = functionality within each broker responsible for leader assignment and management, with one being the active controller Replica = partition copy, represented (identified) by the broker-id Assigned replicas = set of all replicas (broker-ids) for a partition ISR = In-Sync Replicas = subset of assigned replicas (brokers) that are “in-sync/caught-up”* with the leader (ISR always includes the leader)

Data Loss

Data Loss : Inevitable Upto 0.01% data loss For 700 billion messages / day, that's up to 7 million / day

Data loss at the producer Kafka Producer API API Call-tree kafkaProducer.send() …. accumulator.append() // buffer …. sender.send() // network I/O Messages accumulate in buffer in batches Batched by partition, retry at batch level Expired batches dropped after retries Error count and other metrics via JMX Data Loss at Producer Failure to close / flush producer on termination Dropped batches due to communication or other errors when acks = 0 or retry exhaustion Data produced faster than delivery, causing BufferExhaustedException (deprecated in 0.10+)

dATA LOSS AT The CLUSTER (BY BROKERS) 1 Was it a leader? 4 Other replicas in ISR? Y Y Broker Crashes Detected by Controller via zookeeper Elect another leader N N 2 Y 5 6 Relax, everything will be fine N Was it in ISR? Allow unclean election? Other replicas available? Y Y 3 N N ISR >= min.insync.replicas? N Y Partition unavailable !! 7

Non-leader broker crash 1 Was it a leader? 4 Other replicas in ISR? Y Y Broker Crashes Detected by Controller via zookeeper Elect another leader N N 2 Y 5 6 Relax, everything will be fine N Was it in ISR? Allow unclean election? Other replicas available? Y Y 3 N N ISR >= min.insync.replicas? N Y Partition unavailable !! 7

Leader broker crash: Scenario 1 Was it a leader? 4 Other replicas in ISR? Y Y Broker Crashes Detected by Controller via zookeeper Elect another leader N N 2 Y 5 6 Relax, everything will be fine N Was it in ISR? Allow unclean election? Other replicas available? Y Y 3 N N ISR >= min.insync.replicas? N Y Partition unavailable !! 7

Leader broker crash: Scenario 2 1 Was it a leader? 4 Other replicas in ISR? Y Y Broker Crashes Detected by Controller via zookeeper Elect another leader N N 2 Y 5 6 Relax, everything will be fine N Was it in ISR? Allow unclean election? Other replicas available? Y Y 3 N N ISR >= min.insync.replicas? N Y Partition unavailable !! 7

dATA LOSS AT The CLUSTER (BY BROKERS) 1 Was it a leader? 4 Other replicas in ISR? Y Y Broker Crashes Detected by Controller via zookeeper Elect another leader N N 2 Y 5 6 Relax, everything will be fine N Was it in ISR? Allow unclean election? Other replicas available? Y Y 3 N N ISR >= min.insync.replicas? N Y Potential data-loss depending upon acks config at producer. See KAFKA-3919 KAFKA-4215 Partition unavailable !! 7

FROM KAFKA-3919

FROM KAFKA-4215

Config for Data Durability and Consistency Producer config - acks = -1 (or all) - max.block.ms (blocking on buffer full, default = 60000) and retries - request.timeout.ms (default = 30000) – it triggers retries Topic config - min.insync.replicas = 2 (or higher) Broker config - unclean.leader.election.enable = false - timeout.ms (default = 30000) – inter-broker timeout for acks

Config for Availability and Throughput Producer config - acks = 0 (or 1) - buffer.memory, batch.size, linger.ms (default = 100) - request.timeout.ms, max.block.ms (default = 60000), retries - max.in.flight.requests.per.connection Topic config - min.insync.replicas = 1 (default) Broker config - unclean.leader.election.enable = true

Data Duplication

Data Duplication: How it occurs Client: Producer Producer (API) retries = messages resent after timeout when retries > 1 Kafka Broker Topic: app_events Consumer consumes messages more than once after restart from unclean shutdown / crash Client: Consumer A Client: Consumer B

Data Loss & Duplication Detection

How to Detect Data loss & Duplication - 1 1) Msg from producer to Kafka 2) Ack from Kafka with details 3) Producer inserts into store 4) Consumer reads msg 5) Consumer validates msg If exists not duplicate consume msg delete msg If missing duplicate msg Audit: Remaining msgs in store are "lost" or "unconsumed" msgs 1 Producer Kafka 4 Consumer 2 Memcache / HBase / Cassandra / Other Store 5 3 Topic, Partition, Offset | Msg Key or Hash KEY | VALUE

How to Detect Data loss & Duplication - 2 1 Producer Kafka 4 Consumer 1) Msg from producer to Kafka 2) Ack from Kafka with details 3) Producer maintains window stats 4) Consumer reads msg 5) Consumer validates window stats at end of interval 2 Memcache / HBase / Cassandra / Other Store 5 3 Source, time-window | Msg count or some other checksum (e.g. totals, etc) KEY | VALUE

Data Duplication: How to minimize at consumer Client: Producer Kafka Broker Topic: app_events If possible, lookup last processed offset in destination at startup Client: Consumer A Client: Consumer B

Monitoring

Monitoring and Operations: JMX Metrics Producer JMX Consumer JMX

Questions?

Jayesh Thakrar jthakrar@conversantmedia.com