Apache Tez : Accelerating Hadoop Query Processing Page 1
Agenda Page 2 Overview of Tez –Goals –High-level architecture –Status –Roadmap Hive on Tez Pig on Tez
© Hortonworks Inc Tez – Introduction Page 3 Distributed execution framework targeted towards data-processing applications. Based on expressing a computation as a dataflow graph. Built on top of YARN – the resource management framework for Hadoop. Open source Apache incubator project and Apache licensed.
© Hortonworks Inc Tez – Design Themes Page 4 Empowering End Users Execution Performance
© Hortonworks Inc Tez – Empowering End Users Expressive dataflow definition API’s Flexible Input-Processor-Output runtime model Data type agnostic Simplifying deployment Page 5
© Hortonworks Inc Tez – Empowering End Users Expressive dataflow definition API’s –Enable definition of complex data flow pipelines using simple graph connection API’s. Tez expands the logical plan at runtime. –Targeted towards data processing applications like Hive/Pig but not limited to it. Hive/Pig query plans naturally map to Tez dataflow graphs with no translation impedance. Page 6 TaskA-1 TaskA-2 TaskB-1 TaskB-2 TaskC-1 TaskC-2 TaskD-1 TaskD-2 TaskE-1 TaskE-2
© Hortonworks Inc Aggregate Stage Partition Stage Preprocessor Stage Tez – Empowering End Users Expressive dataflow definition API’s Page 7 Sampler Task-1 Task-2 Task-1 Task-2 Task-1 Task-2 Samples Ranges Distributed Sort
© Hortonworks Inc Tez – Empowering End Users Flexible Input-Processor-Output runtime model –Construct physical runtime executors dynamically by connecting different inputs, processors and outputs. –End goal is to have a library of inputs, outputs and processors that can be programmatically composed to generate useful tasks. Page 8 IntermediateReduce ShuffleInput ReduceProcessor FileSortedOutput FinalReduce ShuffleInput ReduceProcessor HDFSOutput PairwiseJoin Input1 JoinProcessor FileSortedOutput Input2
© Hortonworks Inc Tez – Empowering End Users Data type agnostic –Tez is only concerned with the movement of data. Files and streams of bytes. –Does not impose any data format on the user application. MR application can use Key-Value pairs on top of Tez. Hive and Pig can use tuple oriented formats that are natural and native to them. Page 9 File Stream Key Value Tez Task Tuples User Code Bytes
© Hortonworks Inc Tez – Empowering End Users Simplifying deployment –Tez is a completely client side application. –No deployments to do. Simply upload to any accessible FileSystem and change local Tez configuration to point to that. –Enables running different versions concurrently. Easy to test new functionality while keeping stable versions for production. –Leverages YARN local resources. Page 10 Client Machine Client Machine Node Manager Node Manager TezTask Node Manager Node Manager TezTask TezClient HDFS Tez Lib 1 Tez Lib 2 Client Machine Client Machine TezClient
© Hortonworks Inc Tez – Empowering End Users Expressive dataflow definition API’s Flexible Input-Processor-Output runtime model Data type agnostic Simplifying usage With great power API’s come great responsibilities Tez is a framework on which end user applications can be built Page 11
© Hortonworks Inc Tez – Execution Performance Performance gains over Map Reduce Optimal resource management Plan reconfiguration at runtime Dynamic physical data flow decisions Page 12
© Hortonworks Inc Tez – Execution Performance Performance gains over Map Reduce –Eliminate replicated write barrier between successive computations. –Eliminate job launch overhead of workflow jobs. –Eliminate extra stage of map reads in every workflow job. –Eliminate queue and resource contention suffered by workflow jobs that are started after a predecessor job completes. Page 13 Pig/Hive - MR Pig/Hive - Tez
© Hortonworks Inc Tez – Execution Performance Optimal resource management –Reuse YARN containers to launch new tasks. –Reuse YARN containers to enable shared objects across tasks. Page 14 YARN Container TezTask Host TezTask1 TezTask2 Shared Objects YARN Container Tez Application Master Tez Application Master Start Task Task Done Start Task
© Hortonworks Inc Tez – Execution Performance Plan reconfiguration at runtime –Dynamic runtime concurrency control based on data size, user operator resources, available cluster resources and locality. –Advanced changes in dataflow graph structure. –Progressive graph construction in concert with user optimizer. Page 15 HDFS Blocks YARN Resources YARN Resources Stage 1 50 maps 100 partitions Stage 1 50 maps 100 partitions Stage reducers Stage reducers Stage 1 50 maps 100 partitions Stage 1 50 maps 100 partitions Stage reducers Stage reducers Only 10GB’s of data
© Hortonworks Inc Tez – Execution Performance Dynamic physical data flow decisions –Decide the type of physical byte movement and storage on the fly. –Store intermediate data on distributed store, local store or in- memory. –Transfer bytes via blocking files or streaming and the spectrum in between. Page 16 Producer (small size) Producer (small size) In-Memory Consumer Producer Local File Consumer At Runtime
© Hortonworks Inc Tez – Deep Dive – API dag.addVertex(map1); dag.addVertex(map2); dag.addVertex(reduce1); dag.addVertex(reduce2); dag.addVertex(join1); Edge edge1 = Edge(map1, reduce1, BIPARTITE, STABLE, OnFileSortedOutput.class, ShuffledMergedInput.class); Edge edge2 = Edge(map2, reduce2, BIPARTITE, STABLE, OnFileSortedOutput.class, ShuffledMergedInput.class); Edge edge3 = Edge(reduce1, join1, BIPARTITE, STABLE, OnFileSortedOutput.class, ShuffledMergedInput.class); Edge edge4 = Edge(reduce2, join1, BIPARTITE, STABLE, OnFileSortedOutput.class, ShuffledMergedInput.class); dag.addEdge(edge1); dag.addEdge(edge2); dag.addEdge(edge3); dag.addEdge(edge4); Page 17 reduce1 map2 reduce2 join1 map1 Stable Bipartite Stable Bipartite
© Hortonworks Inc Tez – Deep Dive – Execution Page 18 reduce1 map2 reduce2 join1 map1
© Hortonworks Inc Tez – Deep Dive – Scheduling Page 19 reduce1 map1 Start vertex Vertex Scheduler Start tasks DAG Scheduler DAG Scheduler Get Priority Start vertex Task Scheduler Task Scheduler Get container Vertex Scheduler Determines when tasks in a vertex can start DAG Scheduler Determines priority of task Task Scheduler Allocates containers from YARN and assigns them to tasks
© Hortonworks Inc Tez – Deep Dive – Task Execution Page 20 Task Attempt (real on machine) Task Attempt (real on machine) Task Attempt (logical in AM) Task Attempt (logical in AM) Env, cmd line, resources Task JVM Input Processor Output Input Processor Output Get Task Start container Input Processor Output Data Information Data Information Data Events Start task shell with user specified env, resources etc. Fetch and instantiate Input, Processor, Output objects Receive (incremental) input information and process the input Provide output information
© Hortonworks Inc Tez – Current status Apache Incubator Project –Rapid development. Over 330 jiras opened. Over 220 resolved. –Growing community. Focus on stability –Testing and quality are highest priority. –Working on Tez+YARN to fix basic performance overheads. –Code ready and deployed on multi-node environments. DAG of MR processing is working – Already functionally equivalent to Map Reduce. Existing Map Reduce jobs can be executed on Tez with few or no changes. – Working Hive prototype that can target Tez for execution of queries (HIVE-4660). –Work started on prototype of Pig that can target Tez. Page 21
© Hortonworks Inc Tez – Current status Page 22 Fact Table Dimension Table 1 Result Table 1 Dimension Table 2 Result Table 2 Dimension Table 3 Result Table 3 Join Typical pattern in a TPC-DS query Fact Table Dimension Table 1 Optimization for small data sets Both can now run as a single Tez job
© Hortonworks Inc Tez – MRR Performance Page 23 TPC-DS Query 12 with Hive on Tez
© Hortonworks Inc Tez – Roadmap Full DAG support –Multi-way input and output. –Other graph connection patterns. Performance optimizations –Container reuse –Cross task shared resources –Using HDFS data caching Runtime plan optimizations –Automatic input (map) parallelism –Automatic aggregation (reduce) parallelism Usability. –Stability and testability –Recovery and history Page 24
© Hortonworks Inc Tez – Community Early adopters and contributors welcome –Adopters to drive more scenarios. Contributors to make them happen. Stay tuned for Tez meetups with deep dives on Tez architecture and using Tez Useful links –Work tracking: –Code: –High level design document and API specification: – Developer list: User list: Issues list: Page 25
© Hortonworks Inc Tez – Takeaways Distributed execution framework that works on computations represented as dataflow graphs Naturally maps to execution plans produced by query optimizers Execution architecture designed to enable dynamic performance optimizations at runtime Open source Apache project – your use-cases and code are welcome It works and is already being used by Hive Page 26
© Hortonworks Inc Tez Thanks for your time and attention! Questions? Page 27