Robert Metzger, Aljoscha Connecting Apache Flink® to the World: Reviewing the streaming connectors.

Slides:



Advertisements
Similar presentations
Apache ZooKeeper By Patrick Hunt, Mahadev Konar
Advertisements

Wait-free coordination for Internet-scale systems
Building LinkedIn’s Real-time Data Pipeline
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Message Queues COMP3017 Advanced Databases Dr Nicholas Gibbins –
Spark: Cluster Computing with Working Sets
Kafka high-throughput, persistent, multi-reader streams
E-Transactions: End-to-End Reliability for Three-Tier Architectures Svend Frølund and Rachid Guerraoui.
Tyler Akidau, Alex Balikov, Kaya Bekiro glu, Slava Chernyak, Josh Haberman, Reuven Lax, Sam McVeety, Daniel Mills, Paul Nordstrom, Sam Whittle Google.
CS 582 / CMPE 481 Distributed Systems
Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 14 Wenbing Zhao Department of Electrical and Computer Engineering.
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Geo-distributed Messaging with RabbitMQ
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
Rollback-Recovery Protocols I Message Passing Systems Nabil S. Al Ramli.
ZOOKEEPER. CONTENTS ZooKeeper Overview ZooKeeper Basics ZooKeeper Architecture Getting Started with ZooKeeper.
Stream Processing with Tamás István Ujj
Apache Kafka A distributed publish-subscribe messaging system
CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook
Big thanks to everyone!.
MillWheel Fault-Tolerant Stream Processing at Internet Scale
Pilot Kafka Service Manuel Martín Márquez. Pilot Kafka Service Manuel Martín Márquez.
Database recovery techniques
HERON.
Data Loss and Data Duplication in Kafka
E-Storm: Replication-based State Management in Distributed Stream Processing Systems Xunyun Liu, Aaron Harwood, Shanika Karunasekera, Benjamin Rubinstein.
PROTECT | OPTIMIZE | TRANSFORM
Replicated LevelDB on JBoss Fuse
Introduction to Spark Streaming for Real Time data analysis
The Future of Apache Flink®
Chapter 10 Data Analytics for IoT
Original Slides by Nathan Twitter Shyam Nutanix
Real-Time Processing with Apache Flume, Kafka, and Storm Kamlesh Dhawale Ankalytics
Scaling Apache Flink® to very large State
Collecting heterogeneous data into a central repository
PREGEL Data Management in the Cloud
National Research Center “Kurchatov Institute”
Streaming Analytics with Apache Flink 1.0
EEC 688/788 Secure and Dependable Computing
A Messaging Infrastructure for WLCG
Introduction to HDFS: Hadoop Distributed File System
$ whoami System administrator/developer for about 18 years
AWS DevOps Engineer - Professional dumps.html Exam Code Exam Name.
COS 518: Advanced Computer Systems Lecture 11 Michael Freedman
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
Ministry of Higher Education
MillWheel: Fault-Tolerant Stream Processing at Internet Scale
Ken Birman & Kishore Pusukuri, Spring 2018
Transaction Management
Ewen Cheslack-Postava
Evolution of messaging systems and event driven architecture
EEC 688/788 Secure and Dependable Computing
Message Queuing.
EEC 688/788 Secure and Dependable Computing
Database Recovery 1 Purpose of Database Recovery
COS 518: Advanced Computer Systems Lecture 12 Michael Freedman
Last Class: Fault Tolerance
Pig Hive HBase Zookeeper
Presentation transcript:

Robert Metzger, Aljoscha Connecting Apache Flink® to the World: Reviewing the streaming connectors

What to expect from this talk  Overview of all available connectors  Kafka connector internals  End-to-end exactly-once  Apache Bahir and the future of connectors  [Bonus] Message Queues and the Message Acknowledging Source 2

Connectors in Apache Flink® “Hello World, let’s connect” 3

Connectors in Flink 1.1 ConnectorSourceSinkNotes Streaming filesBoth source and sink are exactly-once Apache KafkaConsumers (sources) exactly-once Amazon KinesisConsumers (sources) exactly-once RabbitMQ / AMQPConsumers (sources) exactly-once ElasticsearchNo guarantees Apache CassandraExactly-once with idempotent updates Apache NifiNo guarantees RedisNo guarantees 4 There is also a Twitter Source and an ActiveMQ connector in Apache Bahir

Streaming connectors by activity Streaming connectors ordered by number of threads/mentions on the list:  Apache Kafka (250+) (since 0.7)  Apache Cassandra (38) (since 1.1)  ElasticSearch (34) (since 0.10)  File sources (~30) (since 0.10)  Redis (27) (since 1.0)  RabbitMQ (11) (since 0.7)  Kinesis (10) (since 1.1)  Apache Nifi (5) (since 0.10) 5 Date of evaluation

The Apache Kafka Connector 6

Apache Kafka connector: Intro “Apache Kafka is publish-subscribe messaging rethought as a distributed commit log.” 7 This page contains material copied from

Apache Kafka connector: Consumer  Flink has two main Kafka consumer implementations For Kafka 0.8 an implementation against the “SimpleConsumer” API of Kafka For Kafka 0.9+ we are using the new Kafka consumer (KAFKA-1326)  The producers are basically the same 8

Kafka 0.8 Consumer 9 Fetcher Thread Kafka Broker topicB:1 topicB:3 Kafka Broker topicA:3 topicB:6 topicB:5 Kafka Broker topicB:4 topicB:2 topicA:1 Kafka Broker topicA:2 topicB:0 topicA:0 Consumer Thread Fetcher Thread Consumer Thread topicA:2 topicB:0 topicA:0 topicB:1 topicB:3 topicB:4 topicB:2 topicA:1 topicA:3 topicB:6 topicB:5 Kafka Cluster Flink Cluster Each TaskManager has one Consumer Thread, coordinating Fetcher Threads for each Kafka broker TaskManager

Kafka 0.8 Broker rebalance 10 Fetcher Thread Kafka Broker topicB:1 topicB:3 Kafka Broker topicA:3 topicB:6 topicB:5 Kafka Broker topicB:4 topicB:2 topicA:1 Kafka Broker topicA:2 topicB:0 topicA:0 Consumer Thread Fetcher Thread Consumer Thread topicA:2 topicB:0 topicA:0 topicB:1 topicB:3 topicB:4 topicB:2 topicA:1 topicA:3 topicB:6 topicB:5 Kafka Cluster Flink Cluster The consumer is able to handle broker failures 1 Broker fails 2 Thread returns partitions

Kafka 0.8 Broker rebalance 11 Fetcher Thread Kafka Broker topicB:1 topicB:3 Kafka Broker topicA:3 topicB:6 topicB:5 Kafka Broker topicB:4 topicB:2 topicA:1 Kafka Broker topicA:2 topicB:0 topicA:0 Consumer Thread Fetcher Thread Consumer Thread topicA:2 topicB:0 topicA:0 topicB:1 topicB:3 topicB:4 topicB:2 topicA:1 topicA:3 topicB:6 topicB:5 Kafka Cluster Flink Cluster On a failure, the Consumer Thread re-assigns partitions and spawns new threads as needed 1 Broker fails 2 Thread returns partitions topicB:4 topicB:2 topicA:1

Kafka 0.8 Broker rebalance 12 Fetcher Thread Kafka Broker topicB:1 topicB:3 Kafka Broker topicA:3 topicB:6 topicB:5 Kafka Broker topicB:4 topicB:2 topicA:1 Kafka Broker topicA:2 topicB:0 topicA:0 Consumer Thread Fetcher Thread Consumer Thread topicA:2 topicB:0 topicA:0 topicB:1 topicB:3 topicA:3 topicB:6 topicB:5 Kafka Cluster Flink Cluster On a failure, the Consumer Thread re-assigns partitions and spawns new threads as needed 3 Kafka reassigns partitions topicB:4 topicB:2 topicA:1 topicB:2 topicB:4 topicA:1 topicB:2 Fetcher Thread topicB:4 topicA:1 topicB:4 topicB:2 topicA:1 4 Flink assigns partitions to existing or new threads

Kafka 0.9+ Consumer 13 Kafka Broker topicB:1 topicB:3 Kafka Broker topicA:3 topicB:6 topicB:5 Kafka Broker topicB:4 topicB:2 topicA:1 Kafka Broker topicA:2 topicB:0 topicA:0 Consumer Thread Kafka Cluster Flink Cluster New Kafka Consumer Magic TaskManager Since Kafka 0.9, the new Consumer API handles broker failures/rebalancing, offset committing, topic querying, …

Exactly-once for Kafka consumers  Mechanism is the same for all connector versions  Offsets to Zookeeper / Broker for group.id restart and external tools (at-least-once)  Offsets checkpointed for exactly-once with Flink state 14

abcdeabcde Flink Kafka Consumer Flink Map Operator counter = 0 Zookeeper/Broker offset partition 0: 0 offset partition 1: 0 Flink Checkpoint Coordinator Pending: Completed: offsets = 0, 0 This toy example is reading from a Kafka topic with two partitions, each containing “a”, “b”, “c”, … as messages. The offset is set to 0 for both partitions, a counter is initialized to 0.

abcdeabcde Flink Kafka Consumer Flink Map Operator a a counter = 0 Zookeeper/Broker offset partition 0: 0 offset partition 1: 0 Flink Checkpoint Coordinator Pending: Completed: offsets = 1, 0 The Kafka consumer starts reading messages from partition 0. Message “a” is in-flight, the offset for the first consumer has been set to 1.

abcdeabcde Flink Kafka Consumer Flink Map Operator a a counter = 1 Zookeeper/Broker offset partition 0: 0 offset partition 1: 0 Flink Checkpoint Coordinator Pending: Completed: offsets = 2, 1 a a b b Trigger Checkpoint at source Message “a” arrives at the counter, it is set to 1. The consumers both read the next records (“b” and “a”). The offsets are set accordingly. In parallel, the checkpoint coordinator decides to trigger a checkpoint at the source …

abcdeabcde Flink Kafka Consumer Flink Map Operator a a counter = 2 Zookeeper/Broker offset partition 0: 0 offset partition 1: 0 Flink Checkpoint Coordinator Pending: Completed: offsets = 3, 1 a a b b offsets = 2, 1 c c The source has created a snapshot of its state (“offset=2,1”), which is now stored in the checkpoint coordinator. The sources emitted a checkpoint barrier after messages “a” and “b”.

abcdeabcde Flink Kafka Consumer Flink Map Operator counter = 3 Zookeeper/Broker offset partition 0: 0 offset partition 1: 0 Flink Checkpoint Coordinator Pending: Completed: offsets = 3, 2 a a b b offsets = 2, 1 counter = 3 c c b b The map operator has received checkpoint barriers from both sources. It checkpoints its state (counter=3) in the coordinator. At the same time, the consumers are further reading more data from the Kafka partitions.

abcdeabcde Flink Kafka Consumer Flink Map Operator counter = 4 Zookeeper/Broker offset partition 0: 0 offset partition 1: 0 Flink Checkpoint Coordinator Pending: Completed: offsets = 3, 2 a a offsets = 2, 1 counter = 3 c c b b Notify checkpoint complete The checkpoint coordinator informs the Kafka consumer that the checkpoint has been completed. It commits the checkpoints offsets into Zookeeper. Note that Flink is not relying on the Kafka offsets in ZK for restoring from failures

abcdeabcde Flink Kafka Consumer Flink Map Operator counter = 4 Zookeeper/Broker offset partition 0: 2 offset partition 1: 1 Flink Checkpoint Coordinator Pending: Completed: offsets = 3, 2 a a offsets = 2, 1 counter = 3 c c b b Checkpoint in Zookeeper/ Broker The checkpoint is now persisted in Zookeeper. External tools such as the Kafka Offset Checker can see the lag of the consumer group.

abcdeabcde Flink Kafka Consumer Flink Map Operator counter = 5 Zookeeper/Broker offset partition 0: 2 offset partition 1: 1 Flink Checkpoint Coordinator Pending: Completed: offsets = 4, 2 offsets = 2, 1 counter = 3 c c b b d d The processing further advances

abcdeabcde Flink Kafka Consumer Flink Map Operator counter = 5 Zookeeper/Broker offset partition 0: 2 offset partition 1: 1 Flink Checkpoint Coordinator Pending: Completed: offsets = 4, 2 offsets = 2, 1 counter = 3 c c b b d d Failure Some failure has happened (such as worker failure)

abcdeabcde Flink Kafka Consumer Flink Map Operator counter = 3 Zookeeper/Broker offset partition 0: 2 offset partition 1: 1 Flink Checkpoint Coordinator Pending: Completed: offsets = 2, 1 counter = 3 Reset all operators to last completed checkpoint The checkpoint coordinator restores the state at all the operators participating at the checkpointing. The Kafka sources start from offset 2 and 1, the counter’s value is 3.

abcdeabcde Flink Kafka Consumer Flink Map Operator counter = 3 Zookeeper/Broker offset partition 0: 2 offset partition 1: 1 Flink Checkpoint Coordinator Pending: Completed: offsets = 3, 1 offsets = 2, 1 counter = 3 Continue processing … c c The system continues with the processing, the counter’s value is consistent across a worker failure.

End-to-End exactly once 26

Consistently move and process data 27 Process Transform Analyze Process Transform Analyze Exactly-once: Apache Kafka Kinesis RabbitMQ / ActiveMQ File monitoring Exactly-once: Rolling file sink With idempotent updates Apache Cassandra Elasticsearch Redis At-least-once (duplicates): Apache Kafka  Flink allows to move data between systems, keeping consistency

Continuous File Monitoring 28 Some FileSystem Monitoring task Periodic Querying Parallel file reader File Path Offset File Path Offset Records  The monitoring task checkpoints the last “modification time”  The file readers checkpoint the current file + offset and the list of pending files to read

Rolling / Bucketing File Sink  System time bucketing  Bucketing based on record data 29 Bucketing Operator 11:00 10:00 9: Bucketing Operator 8:

Bucketing File Sink exactly-once  On Hadoop 2.7+, we call truncate() to remove invalid data on restore  On earlier versions, we’ll write a metadata file with valid offsets  Downstream consumers must take valid offset metadata into account 30

Kafka Producer: Avoid data loss  Apache Kafka does currently not provide the infrastructure to produce in an exactly-once fashion  By avoiding data-loss, we can guarantee at-least-once. 31 Flink Kafka Producer Kafka broker Kafka partition unacknowledged=7 On checkpoint, Flink calls flush() and waits for unack == 0  Guarantee that data has been written ACK

Apache Bahir and the future of connectors What’s next 32

Future of Connectors in Flink  Kafka 0.10 support, with timestamps  Dynamic scaling support for Kafka and other connectors  Refactor Kafka connector API 33

Apache Bahir™  Bahir is a community specialized in connectors, allowing faster releases independent of engine releases.  Apache Bahir™ has been created for providing community- contributed connectors a platform, following Apache governance.  The Flink community decided to move some of our connectors there. Kafka, Kinesis, streaming files, … will stay in Flink!  Flink connectors in Bahir: ActiveMQ, Redis, Flume sink, RethinkDB (incoming), streaming Hbase (incoming).  New connector contributions are welcome! 34 Disclaimer: The description of the Bahir community is my personal view. I am not a representative of the project.

Time for questions… 35

Connectors in Apache Flink  Ask me now!  Follow me on  Ask the Flink community on  Ask me privately on 36

Message Queues Exactly-once for 37

Message Queues supported by Flink 38  Traditional message queues have different semantics than Kafka, Kinesis, etc.  RabbitMQ Advanced Message Queuing Protocol (AMQP) Available in Apache Flink  ActiveMQ Java Message Service (JMS) Available in Apache Bahir (no release yet) Image source:

Message Queue Semantics 39 Flink RabbitMQ Source Offset Flink Kafka Consumer  In MQs, messages are removed once they are consumed  Replay not possible

Message Acknowledging  Once a checkpoint has been completed by all operators, the messages in the queue are acknowledged, leading to their removal from the queue. 40 id=8 id=7 id=6 id=5 id=4 id=3 id=2 id=1 Flink RabbitMQ Source Flink RabbitMQ Source Checkpoint 1: id=1 id=2 id=3 Checkpoint 1: id=1 id=2 id=3 Unconfirmed Checkpoint 2: id=4 id=5 id=6 Checkpoint 2: id=4 id=5 id=6 Unconfirmed Checkpoint 1 completed id=8 id=7 id=6 id=5 id=4 Flink RabbitMQ Source Flink RabbitMQ Source Checkpoint 1: id=1 id=2 id=3 Checkpoint 1: id=1 id=2 id=3 Confirmed Checkpoint 2: id=4 id=5 id=6 Checkpoint 2: id=4 id=5 id=6 Unconfirmed Message queue ACK id=1 ACK id=1 ACK id=2 ACK id=2 ACK id=3 ACK id=3

Message Acknowledging  In case of a failure, all the unacknowledged messages are consumed again 41 id=8 id=7 id=6 id=5 id=4 id=3 id=2 id=1 Flink RabbitMQ Source Flink RabbitMQ Source Checkpoint 1: id=1 id=2 id=3 Checkpoint 1: id=1 id=2 id=3 Unconfirmed Checkpoint 2: id=4 id=5 id=6 Checkpoint 2: id=4 id=5 id=6 Unconfirmed System failure Flink RabbitMQ Source Flink RabbitMQ Source Checkpoint 1: id=1 id=2 id=3 Checkpoint 1: id=1 id=2 id=3 Message queue Unconfirmed id=8 id=7 id=6 id=5 id=4 id=3 id=2 id=1 Message are not lost and send again after recovery

Message Acknowledging  What happens if the system fails after a checkpoint is completed, but before all messages have been acknowledged? 42 Checkpoint 1 completed id=8 id=7 id=6 id=5 id=4 Flink RabbitMQ Source Flink RabbitMQ Source Checkpoint 1: id=1 id=2 id=3 Checkpoint 1: id=1 id=2 id=3 Confirmed Checkpoint 2: id=4 id=5 id=6 Checkpoint 2: id=4 id=5 id=6 Unconfirmed ACK id=1 ACK id=1 ACK id=2 ACK id=2 ACK id=3 ACK id=3 FAIL  Flink stores a correlation ID of each (un-acked) message to de-duplicate on restore id=3