Presentation is loading. Please wait.

Presentation is loading. Please wait.

Running Apache Flink® Everywhere

Similar presentations


Presentation on theme: "Running Apache Flink® Everywhere"— Presentation transcript:

1 Running Apache Flink® Everywhere
Stephan Ewen

2 How is Flink deployed? A two minute search on the mailing list reveals
Embedded Service (OSGI) Standalone Cluster YARN Jobs YARN Sessions Docker/Kubernetes Standalone Cloud YARN->Myriad->Mesos Docker on Mesos

3 How is Flink deployed? A two minute search on the mailing list reveals
Embedded Service (OSGI) Standalone Cluster YARN Jobs YARN Sessions Docker/Kubernetes Standalone Cloud Mesos Jobs YARN->Myriad->Mesos Docker on Mesos Mesos Sessions (soon!)

4 How is Flink deployed? Users run mostly isolated jobs or multi-job sessions Embedded Service (OSGI) Standalone Cluster YARN Jobs YARN Sessions Docker/Kubernetes Standalone Cloud Mesos Jobs YARN->Myriad->Mesos Docker on Mesos Mesos Sessions

5 Resource Management Resources controlled by the framework or another service. Embedded Service (OSGI) Standalone Cluster YARN Jobs YARN Sessions Docker/Kubernetes Standalone Cloud Mesos Jobs YARN->Myriad->Mesos Docker on Mesos Mesos Sessions

6 More dimensions coming up…
Dynamic Resources Number of TaskManagers changes over job lifetime "Trusted" processes Run under superuser credential and dispatch jobs Uniform vs. Heterogeneous Resources Run different functions in different size containers E.g., simple mapper in small container, heavy window operator in large container No blocking on any process type YARN job needs to continue while ApplicationMaster is down Avoiding "Job Submit" step

7 Reworking the Flink Process Model

8 Flink Improvement Proposal 6
Core Idea Creating composable building blocks Create different compositions for different scenarios FLIP-6 design document: Currently driving parties:

9 Recap: Current status (Standalone)
TaskManager Client (1) Register (2) Submit Job JobManager TaskManager (3) Deploy Tasks TaskManager Standalone Flink Cluster

10 Recap: Current status (YARN)
Client (1) Submit YARN App. (FLINK) YARN ResourceManager (3) Poll status (2) Spawn AppMaster (6) All TaskManager started (4) Start TaskManagers TaskManager Application Master JobManager (5) Register TaskManager (7) Submit Job (8) Deploy Tasks TaskManager YARN Cluster

11 The Building Blocks ResourceManager Dispatcher JobManager TaskManager
ClusterManager-specific May live across jobs Manages available Containers/TaskManagers Used to acquire / release resources Lives across jobs Touch-point for job submissions Spawns JobManagers May spawn ResourceManager JobManager TaskManager Single job only, started per job Thinks in terms of "task slots" Deploys and monitors job/task execution Registers at ResourceManager Gets tasks from one or more JobManagers

12 The Building Blocks ResourceManager TaskManager JobManager
(2) Start TaskManager ResourceManager TaskManager (3) Register (1) Request slots JobManager (4) Deploy Tasks

13 Building Flink-on-YARN
YARN Cluster Client (1) Submit YARN App. (JobGraph / JARs) YARN ResourceManager (2) Spawn AppMaster Application Master (4) Start TaskManagers Flink-YARN ResourceManager TaskManager (5) Register (3) Request slots TaskManager JobManager TaskManager (6) Deploy Tasks YARN Cluster

14 Building Flink-on-YARN
Main differences from current YARN mode All containers started with JARs, config files in classpath Credentials & Secrets are strictly bound to a single job Slots are allocated/released as needed/freed Basic building block for elastic resource usage Client disconnects after submitting job, does not need to wait until TaskManagers are up

15 Building Flink-on-YARN (separate RM)
YARN Cluster Client (1) Submit YARN App. (JobGraph / JARs) YARN ResourceManager (2) Spawn AppMaster Application Master (4) Start TaskManagers Flink-YARN ResourceManager TaskManager (3) Start JobMngr (4) Request slots (5) Register TaskManager JobManager (6) Deploy Tasks TaskManager YARN Cluster

16 Building Flink-on-YARN (w/ dispatcher)
YARN Cluster Client (1) HTTP POST JobGraph/Jars (2) Submit YARN App. (JobGraph / JARs) Flink YARN Dispatcher YARN ResourceManager (3) Spawn AppMaster Application Master (5) Start TaskManagers Flink-YARN ResourceManager TaskManager (6) Register (4) Request slots TaskManager JobManager TaskManager (7) Deploy Tasks YARN Cluster

17 Building Flink-on-Mesos
Mesos Cluster Client (1) HTTP POST JobGraph/Jars (2) Allocate container for Flink master Flink Mesos Dispatcher Mesos Master (3) Start Process (and supervise) Flink Master Process (5) Start TaskManagers Flink Mesos ResourceManager TaskManager (6) Register (4) Request slots TaskManager JobManager TaskManager (7) Deploy Tasks Mesos Cluster

18 Building Standalone Standalone Cluster Standby Master Process
Flink Master Process Standalone ResourceManager (1) Register (3) Request slots JobManager JobManager TaskManager (7) Deploy Tasks (2) Start JobMngr TaskManager (1) Submit JobGraph/Jars Flink Cluster Client Dispatcher TaskManager Standalone Cluster

19 Building Flink-on-Docker/K8S
Master Container Worker Container TaskManager Flink Master Process (3) Register Flink-Container ResourceManager Worker Container TaskManager JobManager (4) Deploy Tasks (2) Run & Start Worker Container TaskManager Program Runner (1) Container framework starts Master & Worker Containers

20 Building Flink-on-Docker/K8S
This is a blueprint for all setups where external services control resources and start new TaskManagers For example AWS EC2 Flink image with auto-scaling groups Can be extended to have N equal containers, out of which one becomes master, remainder workers With upcoming dynamic-scaling feature (see Till's talk), JobManager scales job to use all available resources

21 Multi-Job Sessions

22 Flink-YARN ResourceManager
Example: YARN session YARN ResourceManager (1) Submit YARN App. (FLINK – session) (2) Spawn AppMaster ApplicationMaster (6) Start TaskManagers Client Flink-YARN ResourceManager TaskManager (3) Submit Job A (7) Register (5) Request slots (11) Request slots TaskManager (9) Submit Job B JobManager (A) JobManager (B) (8, 12) Deploy Tasks TaskManager (4) Start JobMngr (10) Start JobMngr Dispatcher YARN Cluster

23 Sessions vs. Jobs For each Job submitted, the session will spawn its own JobManager All jobs run under session-user credentials ResourceManager holds on to containers for a certain time Jobs quickly following one another reuse containers (quicker response) Internally, sessions build on the dispatcher component

24 Wrap-up

25 More stuff Dynamically acquire/release resources
Slots are allocated/released from Resource Manager as needed ResourceManager allocates/releases containers over time Strong interplay with "Dynamic Scaling" (rf. talk by Till yesterday) Resource Profiles: Containers of different size Requests can pass a "profile" (CPU / memory / disk), or simply use "default profile" Resource Managers YARN & Mesos can allocate respective containers

26 Wrapping it up It’s a zoo of cluster managers out there
Following different paradigms Usage patterns vary because of Flink's broad use cases Isolated long running jobs vs. many short-lived jobs Shared clusters vs. per-user authenticated resources We are making "jobs" and "sessions" explicit constructs Flexible building blocks, composed in various ways to accommodate different scenarios

27 Appendix

28 Flink Streaming cornerstones
Low latency Make more sense of data High Throughput Works on real-time and historic data Well-behaved flow control (back pressure) Performant Streaming Event Time Windows & user-defined state Stateful Streaming APIs Libraries Complex Event Processing Exactly-once semantics for fault tolerance Globally consistent savepoints Flexible windows (time, count, session, roll-your own)


Download ppt "Running Apache Flink® Everywhere"

Similar presentations


Ads by Google