Download presentation
Presentation is loading. Please wait.
1
Running Apache Flink® Everywhere
Stephan Ewen
2
How is Flink deployed? A two minute search on the mailing list reveals
Embedded Service (OSGI) Standalone Cluster YARN Jobs YARN Sessions Docker/Kubernetes Standalone Cloud YARN->Myriad->Mesos Docker on Mesos
3
How is Flink deployed? A two minute search on the mailing list reveals
Embedded Service (OSGI) Standalone Cluster YARN Jobs YARN Sessions Docker/Kubernetes Standalone Cloud Mesos Jobs YARN->Myriad->Mesos Docker on Mesos Mesos Sessions (soon!)
4
How is Flink deployed? Users run mostly isolated jobs or multi-job sessions Embedded Service (OSGI) Standalone Cluster YARN Jobs YARN Sessions Docker/Kubernetes Standalone Cloud Mesos Jobs YARN->Myriad->Mesos Docker on Mesos Mesos Sessions
5
Resource Management Resources controlled by the framework or another service. Embedded Service (OSGI) Standalone Cluster YARN Jobs YARN Sessions Docker/Kubernetes Standalone Cloud Mesos Jobs YARN->Myriad->Mesos Docker on Mesos Mesos Sessions
6
More dimensions coming up…
Dynamic Resources Number of TaskManagers changes over job lifetime "Trusted" processes Run under superuser credential and dispatch jobs Uniform vs. Heterogeneous Resources Run different functions in different size containers E.g., simple mapper in small container, heavy window operator in large container No blocking on any process type YARN job needs to continue while ApplicationMaster is down Avoiding "Job Submit" step
7
Reworking the Flink Process Model
8
Flink Improvement Proposal 6
Core Idea Creating composable building blocks Create different compositions for different scenarios FLIP-6 design document: Currently driving parties:
9
Recap: Current status (Standalone)
TaskManager Client (1) Register (2) Submit Job JobManager TaskManager (3) Deploy Tasks TaskManager Standalone Flink Cluster
10
Recap: Current status (YARN)
Client (1) Submit YARN App. (FLINK) YARN ResourceManager (3) Poll status (2) Spawn AppMaster (6) All TaskManager started (4) Start TaskManagers TaskManager Application Master JobManager (5) Register TaskManager (7) Submit Job (8) Deploy Tasks TaskManager YARN Cluster
11
The Building Blocks ResourceManager Dispatcher JobManager TaskManager
ClusterManager-specific May live across jobs Manages available Containers/TaskManagers Used to acquire / release resources Lives across jobs Touch-point for job submissions Spawns JobManagers May spawn ResourceManager JobManager TaskManager Single job only, started per job Thinks in terms of "task slots" Deploys and monitors job/task execution Registers at ResourceManager Gets tasks from one or more JobManagers
12
The Building Blocks ResourceManager TaskManager JobManager
(2) Start TaskManager ResourceManager TaskManager (3) Register (1) Request slots JobManager (4) Deploy Tasks
13
Building Flink-on-YARN
YARN Cluster Client (1) Submit YARN App. (JobGraph / JARs) YARN ResourceManager (2) Spawn AppMaster Application Master (4) Start TaskManagers Flink-YARN ResourceManager TaskManager (5) Register (3) Request slots TaskManager JobManager TaskManager (6) Deploy Tasks YARN Cluster
14
Building Flink-on-YARN
Main differences from current YARN mode All containers started with JARs, config files in classpath Credentials & Secrets are strictly bound to a single job Slots are allocated/released as needed/freed Basic building block for elastic resource usage Client disconnects after submitting job, does not need to wait until TaskManagers are up
15
Building Flink-on-YARN (separate RM)
YARN Cluster Client (1) Submit YARN App. (JobGraph / JARs) YARN ResourceManager (2) Spawn AppMaster Application Master (4) Start TaskManagers Flink-YARN ResourceManager TaskManager (3) Start JobMngr (4) Request slots (5) Register TaskManager JobManager (6) Deploy Tasks TaskManager YARN Cluster
16
Building Flink-on-YARN (w/ dispatcher)
YARN Cluster Client (1) HTTP POST JobGraph/Jars (2) Submit YARN App. (JobGraph / JARs) Flink YARN Dispatcher YARN ResourceManager (3) Spawn AppMaster Application Master (5) Start TaskManagers Flink-YARN ResourceManager TaskManager (6) Register (4) Request slots TaskManager JobManager TaskManager (7) Deploy Tasks YARN Cluster
17
Building Flink-on-Mesos
Mesos Cluster Client (1) HTTP POST JobGraph/Jars (2) Allocate container for Flink master Flink Mesos Dispatcher Mesos Master (3) Start Process (and supervise) Flink Master Process (5) Start TaskManagers Flink Mesos ResourceManager TaskManager (6) Register (4) Request slots TaskManager JobManager TaskManager (7) Deploy Tasks Mesos Cluster
18
Building Standalone Standalone Cluster Standby Master Process
Flink Master Process Standalone ResourceManager (1) Register (3) Request slots JobManager JobManager TaskManager (7) Deploy Tasks (2) Start JobMngr TaskManager (1) Submit JobGraph/Jars Flink Cluster Client Dispatcher TaskManager Standalone Cluster
19
Building Flink-on-Docker/K8S
Master Container Worker Container TaskManager Flink Master Process (3) Register Flink-Container ResourceManager Worker Container TaskManager JobManager (4) Deploy Tasks (2) Run & Start Worker Container TaskManager Program Runner (1) Container framework starts Master & Worker Containers
20
Building Flink-on-Docker/K8S
This is a blueprint for all setups where external services control resources and start new TaskManagers For example AWS EC2 Flink image with auto-scaling groups Can be extended to have N equal containers, out of which one becomes master, remainder workers With upcoming dynamic-scaling feature (see Till's talk), JobManager scales job to use all available resources
21
Multi-Job Sessions
22
Flink-YARN ResourceManager
Example: YARN session YARN ResourceManager (1) Submit YARN App. (FLINK – session) (2) Spawn AppMaster ApplicationMaster (6) Start TaskManagers Client Flink-YARN ResourceManager TaskManager (3) Submit Job A (7) Register (5) Request slots (11) Request slots TaskManager (9) Submit Job B JobManager (A) JobManager (B) (8, 12) Deploy Tasks TaskManager (4) Start JobMngr (10) Start JobMngr Dispatcher YARN Cluster
23
Sessions vs. Jobs For each Job submitted, the session will spawn its own JobManager All jobs run under session-user credentials ResourceManager holds on to containers for a certain time Jobs quickly following one another reuse containers (quicker response) Internally, sessions build on the dispatcher component
24
Wrap-up
25
More stuff Dynamically acquire/release resources
Slots are allocated/released from Resource Manager as needed ResourceManager allocates/releases containers over time Strong interplay with "Dynamic Scaling" (rf. talk by Till yesterday) Resource Profiles: Containers of different size Requests can pass a "profile" (CPU / memory / disk), or simply use "default profile" Resource Managers YARN & Mesos can allocate respective containers
26
Wrapping it up It’s a zoo of cluster managers out there
Following different paradigms Usage patterns vary because of Flink's broad use cases Isolated long running jobs vs. many short-lived jobs Shared clusters vs. per-user authenticated resources We are making "jobs" and "sessions" explicit constructs Flexible building blocks, composed in various ways to accommodate different scenarios
27
Appendix
28
Flink Streaming cornerstones
Low latency Make more sense of data High Throughput Works on real-time and historic data Well-behaved flow control (back pressure) Performant Streaming Event Time Windows & user-defined state Stateful Streaming APIs Libraries Complex Event Processing Exactly-once semantics for fault tolerance Globally consistent savepoints Flexible windows (time, count, session, roll-your own)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.