Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Loss and Data Duplication in Kafka

Similar presentations


Presentation on theme: "Data Loss and Data Duplication in Kafka"— Presentation transcript:

1 Data Loss and Data Duplication in Kafka
Jayesh Thakrar

2 Kafka is a distributed, partitioned, replicated, durable commit log service. It provides the functionality of a messaging system, but with a unique design. Exactly once - each message is delivered once and only once

3 Data Loss and Duplicate Prevention Monitoring
AGENDA Kafka Overview Data Loss Data Duplication Data Loss and Duplicate Prevention Monitoring

4 Kafka Overview

5 Kafka As A Log Abstraction
Client: Producer Kafka Server = Kafka Broker Topic: app_events Client: Consumer A Client: Consumer B Source:

6 Topic Partitioning . . . Client: Producer or Consumer Kafka Broker
Topic: app_events Source:

7 Topic Partitioning – Scalability
Kafka Broker 0 Leader Replica Kafka Broker 2 Replica Replica Replica Leader Kafka Broker 1 Replica Clients: Producer, Consumer Leader Replica

8 Topic Partitioning – redundancy
Kafka Broker 0 Kafka Broker 2 Leader Replica Replica Replica Replica Leader Kafka Broker 1 Replica Client: Producer, Consumer Leader Replica

9 Topic Partitioning – Redundancy/durability
Kafka Broker 0 Kafka Broker 2 Leader Replica Replica Replica Replica Leader Kafka Broker 1 Replica Pull-based inter-broker replication Leader Replica

10 Topic Partitioning – summary
Log sharded into partitions Messages assigned to partitions by API or custom partitioner Partitions assigned to brokers (manual or automatic) Partitions replicated (as needed) Messages ordered within each partition Message offset = absolute position in partition Partitions stored on filesystem as ordered sequence of log segments (files)

11 Other Key Concepts Cluster = collection of brokers
Broker-id = a unique id (integer) assigned to each broker Controller = functionality within each broker responsible for leader assignment and management, with one being the active controller Replica = partition copy, represented (identified) by the broker-id Assigned replicas = set of all replicas (broker-ids) for a partition ISR = In-Sync Replicas = subset of assigned replicas (brokers) that are “in-sync/caught-up”* with the leader (ISR always includes the leader)

12 Data Loss

13 Data Loss : Inevitable Upto 0.01% data loss
For 700 billion messages / day, that's up to 7 million / day

14 Data loss at the producer
Kafka Producer API API Call-tree kafkaProducer.send() …. accumulator.append() // buffer …. sender.send() // network I/O Messages accumulate in buffer in batches Batched by partition, retry at batch level Expired batches dropped after retries Error count and other metrics via JMX Data Loss at Producer Failure to close / flush producer on termination Dropped batches due to communication or other errors when acks = 0 or retry exhaustion Data produced faster than delivery, causing BufferExhaustedException (deprecated in 0.10+)

15 dATA LOSS AT The CLUSTER (BY BROKERS)
1 Was it a leader? 4 Other replicas in ISR? Y Y Broker Crashes Detected by Controller via zookeeper Elect another leader N N 2 Y 5 6 Relax, everything will be fine N Was it in ISR? Allow unclean election? Other replicas available? Y Y 3 N N ISR >= min.insync.replicas? N Y Partition unavailable !! 7

16 Non-leader broker crash
1 Was it a leader? 4 Other replicas in ISR? Y Y Broker Crashes Detected by Controller via zookeeper Elect another leader N N 2 Y 5 6 Relax, everything will be fine N Was it in ISR? Allow unclean election? Other replicas available? Y Y 3 N N ISR >= min.insync.replicas? N Y Partition unavailable !! 7

17 Leader broker crash: Scenario 1
Was it a leader? 4 Other replicas in ISR? Y Y Broker Crashes Detected by Controller via zookeeper Elect another leader N N 2 Y 5 6 Relax, everything will be fine N Was it in ISR? Allow unclean election? Other replicas available? Y Y 3 N N ISR >= min.insync.replicas? N Y Partition unavailable !! 7

18 Leader broker crash: Scenario 2
1 Was it a leader? 4 Other replicas in ISR? Y Y Broker Crashes Detected by Controller via zookeeper Elect another leader N N 2 Y 5 6 Relax, everything will be fine N Was it in ISR? Allow unclean election? Other replicas available? Y Y 3 N N ISR >= min.insync.replicas? N Y Partition unavailable !! 7

19 dATA LOSS AT The CLUSTER (BY BROKERS)
1 Was it a leader? 4 Other replicas in ISR? Y Y Broker Crashes Detected by Controller via zookeeper Elect another leader N N 2 Y 5 6 Relax, everything will be fine N Was it in ISR? Allow unclean election? Other replicas available? Y Y 3 N N ISR >= min.insync.replicas? N Y Potential data-loss depending upon acks config at producer. See KAFKA KAFKA-4215 Partition unavailable !! 7

20 FROM KAFKA-3919

21 FROM KAFKA-4215

22 Config for Data Durability and Consistency
Producer config - acks = -1 (or all) - max.block.ms (blocking on buffer full, default = 60000) and retries - request.timeout.ms (default = 30000) – it triggers retries Topic config - min.insync.replicas = 2 (or higher) Broker config - unclean.leader.election.enable = false timeout.ms (default = 30000) – inter-broker timeout for acks

23 Config for Availability and Throughput
Producer config - acks = 0 (or 1) - buffer.memory, batch.size, linger.ms (default = 100) - request.timeout.ms, max.block.ms (default = 60000), retries - max.in.flight.requests.per.connection Topic config - min.insync.replicas = 1 (default) Broker config - unclean.leader.election.enable = true

24 Data Duplication

25 Data Duplication: How it occurs
Client: Producer Producer (API) retries = messages resent after timeout when retries > 1 Kafka Broker Topic: app_events Consumer consumes messages more than once after restart from unclean shutdown / crash Client: Consumer A Client: Consumer B

26 Data Loss & Duplication Detection

27 How to Detect Data loss & Duplication - 1
1) Msg from producer to Kafka 2) Ack from Kafka with details 3) Producer inserts into store 4) Consumer reads msg 5) Consumer validates msg If exists not duplicate consume msg delete msg If missing duplicate msg Audit: Remaining msgs in store are "lost" or "unconsumed" msgs 1 Producer Kafka 4 Consumer 2 Memcache / HBase / Cassandra / Other Store 5 3 Topic, Partition, Offset | Msg Key or Hash KEY | VALUE

28 How to Detect Data loss & Duplication - 2
1 Producer Kafka 4 Consumer 1) Msg from producer to Kafka 2) Ack from Kafka with details 3) Producer maintains window stats 4) Consumer reads msg 5) Consumer validates window stats at end of interval 2 Memcache / HBase / Cassandra / Other Store 5 3 Source, time-window | Msg count or some other checksum (e.g. totals, etc) KEY | VALUE

29 Data Duplication: How to minimize at consumer
Client: Producer Kafka Broker Topic: app_events If possible, lookup last processed offset in destination at startup Client: Consumer A Client: Consumer B

30 Monitoring

31 Monitoring and Operations: JMX Metrics
Producer JMX Consumer JMX

32 Questions?

33 Jayesh Thakrar


Download ppt "Data Loss and Data Duplication in Kafka"

Similar presentations


Ads by Google