Introducing Flink on Mesos Eron Wright – DELL
2of 15 What is Apache Mesos? A popular cluster manager (similar to YARN) Makes available CPU, memory, & disk resources Unique capabilities for storage services Emerging as a foundation for data-centric, converged infrastructure Provides a programming model for using cluster resources A Mesos program is called a “framework” Packaged into an open-source distribution called DCOS Prescribes best practices related to Mesos frameworks, related services, etc.
3of 15 Why Flink on Mesos? Flink works best on a cluster manager –Easy to scale each job independently –Externalize scheduling logic (fairness, quota, …) –Good job isolation Flink can benefit from unique Mesos capabilities –Disk resources –Dynamic resource management –Unique management features (e.g. inverse offers for controlled downscaling & maintenance)
Flink Master Process
6of 15 Introduction Flink Master Process The Flink Master Process is: –The “Application Master” for a single Flink cluster –A Mesos framework! Hosts numerous components: –Job Manager –Resource Manager (acts as Mesos scheduler) –Artifact Server (HTTP server for Mesos fetcher) Responsible for TM scaling and recovery –Handles JobManager scale change requests –Stores task state in ZooKeeper host1 host2 Master JM RM HTTPD TM Mesos
7of 15 How it Works Flink Master Process Offer handling: –Uses Netflix Fenzo as an optimizerNetflix Fenzo –Gathers offers until all tasks launched Recovery: –Stores intentional state in ZooKeeper –Master uses leader election –Mesos allows some time for recovery before killing tasks Monitoring: –Detects task failure; launches replacement automatically. host1 host2 Master TM 4. Launch Mesos 2. Resource Offers 1. Register 5. Fetch (HTTP) 6. Status update 3. Optimize
8of 15 Configuration Flink Master Process (Con’t) Framework Info –mesos.resourcemanager.framework.secret –mesos.resourcemanager.framework.principal –mesos.resourcemanager.framework.role Mesos Master Info – mesos.master : (IP address or ZK lookup info) –mesos.failover-timeout Note: no port configuration is necessary; Mesos automatically assigns ports.
10of 15 Introduction Dispatcher A highly-available service for launching Flink clusters. A Mesos framework! Accessed via REST by the CLI DCOS compatibility: –HTTP-based –Accessible via the Admin Router –(future) JWT authentication Aligned with FLIP-6 host1 1D 1C 1B 1A host2 2D 2C 2B 2A host3 3D 3C 3B 3A host4 4D 4C 4B 4A Dispatcher Master TM Master CLI TM Mesos
11of 15 Framework Hierarchy Dispatcher (Con’t) Nesting of frameworks is a common Mesos pattern. Here, Marathon launches the dispatcher, which launches the Flink Master Process, etc. Architecturally, it avoids a dependency on the Marathon API. For example, Aurora could be used here in place of Marathon. Dispatcher Master Maratho n TM (Task)
12of 15 Launching a Session Dispatcher (Con’t) Use: CLI uploads files to dispatcher via HTTP –Flink Configuration –Supplemental files ( --ship ) –Keytabs –Certificates Dispatcher adds additional elements: –Configuration ›ZooKeeper Namespace –Flink JAR –… host1 1D 1C 1B 1A host2 2D 2C 2B 2A host3 3D 3C 3B 3A host4 4D 4C 4B 4A Dispatcher Master TM CLI HTTP(S) TM HTTP(S) Mesos
13of 15 Dispatcher Deployment Modes Dispatcher (Con’t) Dispatcher is usable in two ways Remote Mode: –Recommended for detached execution Local Mode: –Recommended for simple, interactive sessions (e.g. flink shell) 3C 3B 3A 4C 4B 4A Dispatcher Master CLI HTTP(S) 3C 3B 3A 4C 4B 4A Master CLI + Dispatcher Local ModeRemote Mode
15of 15 Future Directions Dynamic Scaling –Add/remove Task Managers in response to scale changes over a job’s lifetime –Support Mesos maintenance procedures (e.g. inverse offers) Dispatcher Evolution (FLIP-6) –Generalize to support all deployment scenarios, unified CLI –Provide a centralized Web UI (incl. job history) –Authentication Support (e.g. OAuth 2.0) Docker Image Support –Tracking the “Mesos unified containerizer” Mesos Disk Support –Allocate multiple disks for Task Manager temp space –Scale up the I/O
16of 15 Project Status Targeted for: Flink 1.2 Contributors: –Eron Wright (Dell EMC) –Maximilian Michels (data Artisans) Design Doc: –Mesos Integration on Google DocsMesos Integration on Google Docs JIRAs: –FLINK-1984 – Integrate Flink with Apache MesosFLINK-1984 Code: –