Streaming Analytics with Apache Flink 1.0

Slides:



Advertisements
Similar presentations
MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO
Advertisements

Big Data Open Source Software and Projects ABDS in Summary XIV: Level 14B I590 Data Science Curriculum August Geoffrey Fox
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
MapReduce Online Veli Hasanov Fatih University.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed System Architectures.
Parallel and Distributed Simulation Time Warp: Basic Algorithm.
Pulsar Realtime Analytics At Scale Tony Ng, Sharad Murthy June 11, 2015.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
co-founder / data Artisans
John Plummer Technical Specialist Data Platform Microsoft Ltd StreamInsight Complex Event Processing (CEP) Platform.
Technology for Tomorrow’s High Performance Exchanges Paul Michaud Global Executive IT Architect for Financial Markets November 2009 © 2009 IBM Corporation.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Marcelo R.N. Mendes. What is FINCoS? A set of tools for data generation, load submission, and performance measurement of CEP systems; Main Characteristics:
Oracle Architecture - Structure. Oracle Architecture - Structure The Oracle Server architecture 1. Structures are well-defined objects that store the.
THE EYESWEB PLATFORM - GDE The EyesWeb XMI multimodal platform GDE 5 March 2015.
Marcelo R.N. Mendes. What is FINCoS? A Java-based set of tools for data generation, load submission, and performance measurement of event processing systems;
ODL based AI/ML for Networks Prem Sankar Gopannan, Ericsson
Apache Tez : Accelerating Hadoop Query Processing Page 1.
Microsoft Ignite /28/2017 6:07 PM
Robert Metzger, Aljoscha Connecting Apache Flink® to the World: Reviewing the streaming connectors.
Big thanks to everyone!.
Experiences in running Apache Flink® at large scale
MillWheel Fault-Tolerant Stream Processing at Internet Scale
Pilot Kafka Service Manuel Martín Márquez. Pilot Kafka Service Manuel Martín Márquez.
PDES Introduction The Time Warp Mechanism
Flink, Queryable State, and High Frequency Time Series Data
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
Connected Infrastructure
TensorFlow– A system for large-scale machine learning
Data Loss and Data Duplication in Kafka
PROTECT | OPTIMIZE | TRANSFORM
Smart Building Solution
Streaming Analytics & CEP Two sides of the same coin?
The Future of Apache Flink®
Some practical information
Distributed Programming in “Big Data” Systems Pramod Bhatotia wp
Scaling Apache Flink® to very large State
SOFTWARE DESIGN AND ARCHITECTURE
Smart Building Solution
Software Design and Architecture
PREGEL Data Management in the Cloud
Stream Analytics with SQL on Apache Flink®
Running Apache Flink® Everywhere
Connected Infrastructure
Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.
Sub-millisecond Stateful Stream Querying over
Remote Monitoring solution
Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.
Apache Flink and Stateful Stream Processing
QCon.ai, San Francisco April, 11th 2018
ETL Architecture for Real-Time BI
Benchmarking Modern Distributed Stream Processing Systems
COS 518: Advanced Computer Systems Lecture 11 Michael Freedman
Capital One Architecture Team and DataTorrent
湖南大学-信息科学与工程学院-计算机与科学系
EECS 498 Introduction to Distributed Systems Fall 2017
Ewen Cheslack-Postava
Evolution of messaging systems and event driven architecture
Slides prepared by Samkit
Architecture for Real-Time ETL
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)
The Dataflow Model.
Parallel and Distributed Simulation
Chi: A Scalable & Programmable Control Plane for Distributed Stream Processing Luo Mai, Kai Zeng, Rahul Potharaju, Le Xu, Steve Suh, Shivaram Venkataraman,
with Raul Castro Fernandez* Matteo Migliavacca+ and Peter Pietzuch*
COS 518: Advanced Computer Systems Lecture 12 Michael Freedman
COS 518: Distributed Systems Lecture 11 Mike Freedman
MapReduce: Simplified Data Processing on Large Clusters
Alex Karcher 5 tips for production ready Azure Functions
Presentation transcript:

Streaming Analytics with Apache Flink 1.0 Stephan Ewen @stephanewen

Distributed Streaming Data Flow Apache Flink Stack DataStream API Stream Processing DataSet API Batch Processing Runtime Distributed Streaming Data Flow Libraries Streaming and batch as first class citizens.

Distributed Streaming Data Flow Today DataStream API Stream Processing DataSet API Batch Processing Runtime Distributed Streaming Data Flow Libraries Streaming and batch as first class citizens.

Streaming is the next programming paradigm for data applications, and you need to start thinking in terms of streams.

Streaming technology is enabling the obvious: continuous processing on data that is continuously produced

Continuous Processing with Batch Continuous ingestion Periodic (e.g., hourly) files Periodic batch jobs

λ Architecture "Batch layer": what we had before "Stream layer": approximate early results

A Stream Processing Pipeline collect store analyze serve

A brief History of Flink January ‘10 December ‘14 v0.5 v0.6 v0.7 March ‘16 Flink Project Incubation Top Level Project v0.8 v0.10 Release 1.0 Stratosphere (Flink precursor) v0.9 April ‘14

A brief History of Flink The academia gap: Reading/writing papers, teaching, worrying about thesis January ‘10 December ‘14 v0.5 v0.6 v0.7 March ‘16 Flink Project Incubation Top Level Project v0.8 v0.10 Release 1.0 Stratosphere (Flink precursor) v0.9 April ‘14 Realizing this might be interesting to people beyond academia (even more so, actually)

Programs and Dataflows val lines: DataStream[String] = env.addSource(new FlinkKafkaConsumer09(…)) val events: DataStream[Event] = lines.map((line) => parse(line)) val stats: DataStream[Statistic] = stream .keyBy("sensor") .timeWindow(Time.seconds(5)) .sum(new MyAggregationFunction()) stats.addSink(new RollingSink(path)) Source Transformation Transformation Sink Source [1] map() [1] keyBy()/ window()/ apply() [1] Sink [1] Streaming Dataflow Source [2] map() [2] keyBy()/ window()/ apply() [2]

What makes Flink flink? True Streaming Event Time Stateful Streaming Low latency Make more sense of data High Throughput Works on real-time and historic data Well-behaved flow control (back pressure) True Streaming Event Time Windows & user-defined state Stateful Streaming APIs Libraries Complex Event Processing Exactly-once semantics for fault tolerance Globally consistent savepoints Flexible windows (time, count, session, roll-your own)

Streaming Analytics by Example

Time-Windowed Aggregations case class Event(sensor: String, measure: Double) val env = StreamExecutionEnvironment.getExecutionEnvironment val stream: DataStream[Event] = env.addSource(…) stream .keyBy("sensor") .timeWindow(Time.seconds(5)) .sum("measure")

Time-Windowed Aggregations case class Event(sensor: String, measure: Double) val env = StreamExecutionEnvironment.getExecutionEnvironment val stream: DataStream[Event] = env.addSource(…) stream .keyBy("sensor") .timeWindow(Time.seconds(60), Time.seconds(5)) .sum("measure")

Session-Windowed Aggregations case class Event(sensor: String, measure: Double) val env = StreamExecutionEnvironment.getExecutionEnvironment val stream: DataStream[Event] = env.addSource(…) stream .keyBy("sensor") .window(EventTimeSessionWindows.withGap(Time.seconds(60))) .max("measure")

Session-Windowed Aggregations case class Event(sensor: String, measure: Double) val env = StreamExecutionEnvironment.getExecutionEnvironment val stream: DataStream[Event] = env.addSource(…) stream .keyBy("sensor") .window(EventTimeSessionWindows.withGap(Time.seconds(60))) .max("measure") Flink 1.1 syntax

Pattern Detection case class Event(producer: String, evtType: Int, msg: String) case class Alert(msg: String) val stream: DataStream[Event] = env.addSource(…) stream .keyBy("producer") .flatMap(new RichFlatMapFuncion[Event, Alert]() { lazy val state: ValueState[Int] = getRuntimeContext.getState(…) def flatMap(event: Event, out: Collector[Alert]) = { val newState = state.value() match { case 0 if (event.evtType == 0) => 1 case 1 if (event.evtType == 1) => 0 case x => out.collect(Alert(event.msg, x)); 0 } state.update(newState) })

Embedded key/value state store Pattern Detection case class Event(producer: String, evtType: Int, msg: String) case class Alert(msg: String) val stream: DataStream[Event] = env.addSource(…) stream .keyBy("producer") .flatMap(new RichFlatMapFuncion[Event, Alert]() { lazy val state: ValueState[Int] = getRuntimeContext.getState(…) def flatMap(event: Event, out: Collector[Alert]) = { val newState = state.value() match { case 0 if (event.evtType == 0) => 1 case 1 if (event.evtType == 1) => 0 case x => out.collect(Alert(event.msg, x)); 0 } state.update(newState) }) Embedded key/value state store

Many more Joining streams (e.g. combine readings from sensor) Detecting Patterns (CEP) Applying (changing) rules or models to events Training and applying online machine learning models …

(It's) About Time

The biggest change in moving from batch to streaming is handling time explicitly

Example: Windowing by Time case class Event(id: String, measure: Double, timestamp: Long) val env = StreamExecutionEnvironment.getExecutionEnvironment val stream: DataStream[Event] = env.addSource(…) stream .keyBy("id") .timeWindow(Time.seconds(15), Time.seconds(5)) .sum("measure")

Example: Windowing by Time case class Event(id: String, measure: Double, timestamp: Long) val env = StreamExecutionEnvironment.getExecutionEnvironment val stream: DataStream[Event] = env.addSource(…) stream .keyBy("id") .timeWindow(Time.seconds(15), Time.seconds(5)) .sum("measure")

Different Notions of Time Event Producer Message Queue Flink Data Source Flink Window Operator partition 1 partition 2 Event Time Ingestion Time Window Processing Time

Event Time vs. Processing Time 1977 1980 1983 1999 2002 2005 2015 Processing Time Episode IV Episode V Episode VI Episode I Episode II Episode III Episode VII Event Time

IoT / Mobile Applications Queue / Log Stream Analysis Events analyzed in a data streaming system Events occur on devices Events stored in a log

Out of order Streams

Out of order Streams

Out of order Streams

Out of order Streams Out of order !!! First burst of events Second burst of events

Out of order Streams Instant event-at-a-time Arrival time windows First burst of events Second burst of events Event time windows

Processing Time Window by operator's processing time case class Event(id: String, measure: Double, timestamp: Long) val env = StreamExecutionEnvironment.getExecutionEnvironment env.setStreamTimeCharacteristic(ProcessingTime) val stream: DataStream[Event] = env.addSource(…) stream .keyBy("id") .timeWindow(Time.seconds(15), Time.seconds(5)) .sum("measure") Window by operator's processing time

Ingestion Time case class Event(id: String, measure: Double, timestamp: Long) val env = StreamExecutionEnvironment.getExecutionEnvironment env.setStreamTimeCharacteristic(IngestionTime) val stream: DataStream[Event] = env.addSource(…) stream .keyBy("id") .timeWindow(Time.seconds(15), Time.seconds(5)) .sum("measure")

Event Time case class Event(id: String, measure: Double, timestamp: Long) val env = StreamExecutionEnvironment.getExecutionEnvironment env.setStreamTimeCharacteristic(EventTime) val stream: DataStream[Event] = env.addSource(…) stream .keyBy("id") .timeWindow(Time.seconds(15), Time.seconds(5)) .sum("measure")

Event Time case class Event(id: String, measure: Double, timestamp: Long) val env = StreamExecutionEnvironment.getExecutionEnvironment env.setStreamTimeCharacteristic(EventTime) val stream: DataStream[Event] = env.addSource(…) val tsStream = stream.assignAscendingTimestamps(_.timestamp) tsStream .keyBy("id") .timeWindow(Time.seconds(15), Time.seconds(5)) .sum("measure")

Event Time case class Event(id: String, measure: Double, timestamp: Long) val env = StreamExecutionEnvironment.getExecutionEnvironment env.setStreamTimeCharacteristic(EventTime) val stream: DataStream[Event] = env.addSource(…) val tsStream = stream.assignTimestampsAndWatermarks( new MyTimestampsAndWatermarkGenerator()) tsStream .keyBy("id") .timeWindow(Time.seconds(15), Time.seconds(5)) .sum("measure")

Watermarks Stream (in order) Stream (out of order) Event Watermark 7 W(11) W(20) Watermark 9 10 11 14 15 17 Event Event timestamp 18 20 19 21 23 Stream (out of order) 7 W(11) W(17) 11 15 9 12 14 17 22 20 19 21 Watermark Event Event timestamp

Watermarks in Parallel Source (1) Source (2) map (1) map (2) window (1) window (2) 29 17 14 W(33) W(17) A|30 B|31 C|30 D|15 E|30 F|15 G|18 H|20 K|35 Watermark Event Time at the operator Event [id|timestamp] at input streams 33 Q|44 N|39 M|39 Watermark Generation R|37 O|23 L|22

Mixing Event Time Processing Time case class Event(id: String, measure: Double, timestamp: Long) val env = StreamExecutionEnvironment.getExecutionEnvironment env.setStreamTimeCharacteristic(EventTime) val stream: DataStream[Event] = env.addSource(…) val tsStream = stream.assignAscendingTimestamps(_.timestamp) tsStream .keyBy("id") .window(SlidingEventTimeWindows.of(seconds(15), seconds(5)) .trigger(new MyTrigger()) .sum("measure")

Window Triggers React to any combination of Event Time Processing Time Event data Example of a mixed EventTime / Proc. Time Trigger: Trigger when event time reaches window end OR When processing time reaches window end plus 30 secs.

Trigger example .sum("measure") public class EventTimeTrigger extends Trigger<Object, TimeWindow> { public TriggerResult onElement(Object evt, long time, TimeWindow window, TriggerContext ctx) { ctx.registerEventTimeTimer(window.maxTimestamp()); ctx.registerProcessingTimeTimer(window.maxTimestamp() + 30000); return TriggerResult.CONTINUE; } public TriggerResult onEventTime(long time, TimeWindow w, TriggerContext ctx) { return TriggerResult.FIRE_AND_PURGE; public TriggerResult onProcessingTime(long time, TimeWindow w, TriggerContext c) {

Trigger example .sum("measure") public class EventTimeTrigger extends Trigger<Object, TimeWindow> { public TriggerResult onElement(Object evt, long time, TimeWindow window, TriggerContext ctx) { ctx.registerEventTimeTimer(window.maxTimestamp()); ctx.registerProcessingTimeTimer(window.maxTimestamp() + 30000); return TriggerResult.CONTINUE; } public TriggerResult onEventTime(long time, TimeWindow w, TriggerContext ctx) { return TriggerResult.FIRE_AND_PURGE; public TriggerResult onProcessingTime(long time, TimeWindow w, TriggerContext c) { return TriggerResult.FIRE_AND_CONTINUE;

Matters of State (Fault Tolerance, Reinstatements, etc)

Back to the Aggregation Example case class Event(id: String, measure: Double, timestamp: Long) val env = StreamExecutionEnvironment.getExecutionEnvironment val stream: DataStream[Event] = env.addSource( new FlinkKafkaConsumer09(topic, schema, properties)) stream .keyBy("id") .timeWindow(Time.seconds(15), Time.seconds(5)) .sum("measure") Stateful

Fault Tolerance Prevent data loss (reprocess lost in-flight events) Recover state consistency (exactly-once semantics) Pending windows & user-defined (key/value) state Checkpoint based fault tolerance Periodicaly create checkpoints Recovery: resume from last completed checkpoint Async. Barrier Snapshots (ABS) Algorithm 

Checkpoints data stream newer records older records event State of the dataflow at point Y State of the dataflow at point X

Checkpoint Barriers Markers, injected into the streams

Checkpoint Procedure

Checkpoint Procedure

Savepoints A "Checkpoint" is a globally consistent point-in-time snapshot of the streaming program (point in stream, state) A "Savepoint" is a user-triggered retained checkpoint Streaming programs can start from a savepoint Savepoint B Savepoint A

(Re)processing data (in batch) Re-processing data (what-if exploration, to correct bugs, etc.) Usually by running a batch job with a set of old files Tools that map files to times 2016-3-1 12:00 am 2016-3-1 1:00 am 2016-3-1 2:00 am 2016-3-11 10:00pm 2016-3-11 11:00pm 2016-3-12 12:00am 2016-3-12 1:00am … Collection of files, by ingestion time To the batch processor

Unclear Batch Boundaries 2016-3-1 12:00 am 2016-3-1 1:00 am 2016-3-1 2:00 am 2016-3-11 10:00pm 2016-3-11 11:00pm 2016-3-12 12:00am 2016-3-12 1:00am … To the batch processor ? What about sessions across batches?

(Re)processing data (streaming) Draw savepoints at times that you will want to start new jobs from (daily, hourly, …) Reprocess by starting a new job from a savepoint Defines start position in stream (for example Kafka offsets) Initializes pending state (like partial sessions) Run new streaming program from savepoint Savepoint

Continuous Data Sources Stream of Kafka Partitions partition partition Savepoint Kafka offsets + Operator state WIP (target: Flink 1.1) File mod timestamp + File position + Operator state Savepoint 2016-3-1 12:00 am 2016-3-1 1:00 am 2016-3-1 2:00 am 2016-3-11 11:00pm 2016-3-12 12:00am 2016-3-12 1:00am 2016-3-11 10:00pm … Stream view over sequence of files

Upgrading Programs A program starting from a savepoint can differ from the program that created the savepoint Unique operator names match state and operator Mechanism be used to fix bugs in programs, to evolve programs, parameters, libraries, …

State Backends Large state is a collection of key/value pairs State backend defines what data structure holds the state, plus how it is snapshotted Most common choices Main memory – snapshots to master Main memory – snapshots to dist. filesystem RocksDB – snapshots to dist. filesystem

Complex Event Processing Primer

Example: Temperature Monitoring Receiving temperature an power events from sensors Looking for temperatures repeatedly exceeding thresholds within a short time period (10 secs)

Event Types

Defining Patterns

Generating Alerts

An Outlook on Things to Come

data integration & distribution platform Flink in the wild 30 billion events daily 2 billion events in 10 1Gb machines data integration & distribution platform See talks by at

Roadmap Dynamic Scaling, Resource Elasticity Stream SQL CEP enhancements Incremental & asynchronous state snapshotting Mesos support More connectors, end-to-end exactly once API enhancements (e.g., joins, slowly changing inputs) Security (data encryption, Kerberos with Kafka)

I stream, do you?

Why does Flink stream flink? Low latency Make more sense of data High Throughput Works on real-time and historic data Well-behaved flow control (back pressure) True Streaming Event Time Windows & user-defined state Stateful Streaming APIs Libraries Complex Event Processing Exactly-once semantics for fault tolerance Globally consistent savepoints Flexible windows (time, count, session, roll-your own)

Addendum

Latency and Throughput

Low Latency and High Throughput Frequently though to be mutually exclusive Event-at-a-time  low latency, low throughput Mini batch  high latency, high throughput The above is not true! Very little latency has to be sacrificed for very high throughput

Latency and Throughput

Latency and Throughput

The Effect of Buffering Network stack does not always operate in event-at-a-time mode Optional buffering adds some milliseconds latency but increases throughput No effect on application logic

On a technical level Decouple all things Clocks Buffering … Wall clock time (processing time) Event time (watermarks & punctuations) Consistency clock (logical checkpoint timestamps) Buffering Windows (application logic) Network (throughput tuning) …

Decoupling clocks

Stream Alignment

On exactly-once guarantees Giant topic of confusion exactly what?

High Availability

High Availability Checkpoints JobManager Client Apache Zookeeper™ Take snapshots TaskManagers

High Availability Checkpoints JobManager Client Apache Zookeeper™ Take snapshots Persist snapshots Send handles to JM TaskManagers

High Availability Checkpoints JobManager Client Apache Zookeeper™ Take snapshots Persist snapshots Send handles to JM Create global checkpoint TaskManagers

High Availability Checkpoints JobManager Client Apache Zookeeper™ Take snapshots Persist snapshots Send handles to JM Create global checkpoint Persist global checkpoint TaskManagers

High Availability Checkpoints JobManager Client Apache Zookeeper™ Take snapshots Persist snapshots Send handles to JM Create global checkpoint Persist global checkpoint Write handle to ZooKeeper TaskManagers

The Counting Pyramid of Needs