Advanced Hadoop Tuning and Optimizations

Slides:



Advertisements
Similar presentations
Starfish: A Self-tuning System for Big Data Analytics.
Advertisements

The map and reduce functions in MapReduce are easy to test in isolation, which is a consequence of their functional style. For known inputs, they produce.
MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
SDN + Storage.
Mapreduce and Hadoop Introduce Mapreduce and Hadoop
Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
O’Reilly – Hadoop: The Definitive Guide Ch.6 How MapReduce Works 16 July 2010 Taewhi Lee.
O’Reilly – Hadoop: The Definitive Guide Ch.5 Developing a MapReduce Application 2 July 2010 Taewhi Lee.
Hadoop: Nuts and Bolts Data-Intensive Information Processing Applications ― Session #2 Jimmy Lin University of Maryland Tuesday, February 2, 2010 This.
Towards Energy Efficient MapReduce Yanpei Chen, Laura Keys, Randy H. Katz University of California, Berkeley LoCal Retreat June 2009.
CPS216: Advanced Database Systems (Data-intensive Computing Systems) How MapReduce Works (in Hadoop) Shivnath Babu.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Hadoop: The Definitive Guide Chap. 8 MapReduce Features
Inter-process Communication in Hadoop
Hadoop & Cheetah. Key words Cluster  data center – Lots of machines thousands Node  a server in a data center – Commodity device fails very easily Slot.
The Hadoop Distributed File System: Architecture and Design by Dhruba Borthakur Presented by Bryant Yao.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.
Distributed and Parallel Processing Technology Chapter6
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
CPS216: Advanced Database Systems (Data-intensive Computing Systems) Introduction to MapReduce and Hadoop Shivnath Babu.
Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.
The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
Page 1 A Platform for Scalable One-pass Analytics using MapReduce Boduo Li, E. Mazur, Y. Diao, A. McGregor, P. Shenoy SIGMOD 2011 IDS Fall Seminar 2011.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
Learn. Hadoop Online training course is designed to enhance your knowledge and skills to become a successful Hadoop developer and In-depth knowledge of.
By: Joel Dominic and Carroll Wongchote 4/18/2012.
BIG DATA/ Hadoop Interview Questions.
Practical Hadoop: do’s and don’ts by example Kacper Surdy, Zbigniew Baranowski.
CCD-410 Cloudera Certified Developer for Apache Hadoop (CCDH) Cloudera.
Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.
Map reduce Cs 595 Lecture 11.
Big Data is a Big Deal!.
How Alluxio (formerly Tachyon) brings a 300x performance improvement to Qunar’s streaming processing Xueyan Li (Qunar) & Chunming Li (Garena)
Hadoop Aakash Kag What Why How 1.
Chapter 10 Data Analytics for IoT
Large-scale file systems and Map-Reduce
Introduction to MapReduce and Hadoop
Myoungjin Kim1, Yun Cui1, Hyeokju Lee1 and Hanku Lee1,2,*
Database Performance Tuning and Query Optimization
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
MapReduce Simplied Data Processing on Large Clusters
The Basics of Apache Hadoop
湖南大学-信息科学与工程学院-计算机与科学系
GARRETT SINGLETARY.
Hadoop Distributed Filesystem
MapReduce Algorithm Design Adapted from Jimmy Lin’s slides.
Setup Sqoop.
Data processing with Hadoop
CS 345A Data Mining MapReduce This presentation has been altered.
Distributed Systems CS
Chapter 11 Database Performance Tuning and Query Optimization
System calls….. C-program->POSIX call
5/7/2019 Map Reduce Map reduce.
MapReduce: Simplified Data Processing on Large Clusters
Map Reduce, Types, Formats and Features
Presentation transcript:

Advanced Hadoop Tuning and Optimizations Presented By: Sanjay Sharma

Hadoop- The Good/Bad/Ugly Hadoop is GOOD- that is why we all are here Hadoop is not BAD- else we would not be here Hadoop is sometimes Ugly- why? Out of the box configuration not friendly Difficult to debug Performance – tuning/optimizations is a black art

Configuration parameters

Compression mapred.compress.map.output: Map Output Compression Default: False Pros: Faster disk writes, lower disk space usage, lesser time spent on data transfer (from mappers to reducers). Cons: Overhead in compression at Mappers and decompression at Reducers. Suggestions: For large cluster and large jobs this property should be set true. The compression codec can also be set through the property mapred.map.output.compression.codec (Default is org.apache.hadoop.io.compress.DefaultCodec).

Speculative Execution mapred.map/reduce.tasks.speculative.execution: Enable/Disable task (map/reduce) speculative Execution Default: True Pros: Reduces the job time if the task progress is slow due to memory unavailability or hardware degradation. Cons: Increases the job time if the task progress is slow due to complex and large calculations. On a busy cluster speculative execution can reduce overall throughput, since redundant tasks are being executed in an attempt to bring down the execution time for a single job. Suggestions: In large jobs where average task completion time is significant (> 1 hr) due to complex and large calculations and high throughput is required the speculative execution should be set to false.

Number of Maps/Reducers mapred.tasktracker.map/reduce.tasks.maximum: Maximum tasks (map/reduce) for a tasktracker Default: 2 Suggestions: Recommended range - (cores_per_node)/2 to 2x(cores_per_node), especially for large clusters. This value should be set according to the hardware specification of cluster nodes and resource requirements of tasks (map/reduce). e.g. a node has 8GB main memory + 8 core CPU + swap space Maximum memory required by a task ~ 500MB Memory required by Tasktracker, Datanode and other processes ~ (1 + 1 +1) = 3GB Maximum tasks that can be run = (8-3) GB/500MB = 10 Number of map or reduce task (out of the maximum tasks) can be decided on the basis of memory usage and computation complexities of the tasks. The memory available to each task (JVM) is controlled by mapred.child.java.opts property. The default is –Xmx200m (200 MB). Other JVM options can also be provided in this property.

File block size dfs.block.size: File system block size Default: 67108864 (bytes) Suggestions: Small cluster and large data set: default block size will create a large number of map tasks. e.g. Input data size = 160 GB and dfs.block.size = 64 MB then the minimum no. of maps= (160*1024)/64 = 2560 maps. If dfs.block.size = 128 MB minimum no. of maps= (160*1024)/128 = 1280 maps. If dfs.block.size = 256 MB minimum no. of maps= (160*1024)/256 = 640 maps.  In a small cluster (6-10 nodes) the map task creation overhead is considerable. So dfs.block.size should be large in this case but small enough to utilize all the cluster resources.  The block size should be set according to size of the cluster, map task complexity, map task capacity of cluster and average size of input files.

Sort size io.sort.mb: Buffer size (MBs) for sorting Default: 100 Suggestions: For Large jobs (the jobs in which map output is very large), this value should be increased keeping in mind that it will increase the memory required by each map task. So the increment in this value should be according to the available memory at the node. Greater the value of io.sort.mb, lesser will be the spills to the disk, saving write to the disk.

Sort factor io.sort.factor: Stream merge factor Default: 10 Suggestions: For Large jobs (the jobs in which map output is very large and number of maps are also large) which have large number of spills to disk, value of this property should be increased. The number of input streams (files) to be merged at once in the map/reduce tasks, as specified by io.sort.factor, should be set to a sufficiently large value (for example, 100) to minimize disk accesses. Increment in io.sort.factor, benefits in merging at reducers since the last batch of streams (equal to io.sort.factor) are sent to the reduce function without merging, thus saving time in merging.

JVM reuse mapred.job.reuse.jvm.num.tasks: Reuse single JVM Default: 1 Suggestions: The minimum overhead of JVM creation for each task is around 1 second. So for the tasks which live for seconds or a few minutes and have lengthy initialization, this value can be increased to gain performance.

Reduce parallel copies mapred.reduce.parallel.copies: Threads for parallel copy at reducer Default: 5 Description: The number of threads used to copy map outputs to the reducer. Suggestions: For Large jobs (the jobs in which map output is very large), value of this property can be increased keeping in mind that it will increase the total CPU usage.

The Other Threads dfs.namenode{/mapred.job.tracker}.handler.count :server threads that handle remote procedure calls (RPCs) Default: 10 Suggestions: This can be increased for larger server (50-64). dfs.datanode.handler.count :server threads that handle remote procedure calls (RPCs) Default: 3 Suggestions: This can be increased for larger number of HDFS clients (6-8). tasktracker.http.threads : number of worker threads on the HTTP server on each TaskTracker Default: 40 Suggestions: The can be increased for larger clusters (50).

Other hotspots

Revelation-Temporary space Temporary space allocation: Jobs which generate large intermediate data (map output) should have enough temporary space controlled by property mapred.local.dir. This property specifies list directories where the MapReduce stores intermediate data for jobs. The data is cleaned-up after the job completes. By default, replication factor for file storage on HDFS is 3, which means that every file has three replicas. As a rule of thumb, at least 25% of the total hard disk should be allocated for intermediate temporary output. So effectively, only ¼ hard disk space is available for business use. The default value for mapred.local.dir is ${hadoop.tmp.dir}/mapred/local. So if mapred.local.dir is not set, hadoop.tmp.dir must have enough space to hold job’s intermediate data. If the node doesn’t have enough temporary space the task attempt will fail and starts a new attempt, thus impacting the performance.

Java- JVM JVM tuning: Besides normal java code optimizations, JVM settings for each child task also affects the processing time. On slave node end, the task tracker and data node use 1 GB RAM each. Effective use of the remaining RAM as well as choosing the right GC mechanism for each Map or Reduce task is very important for maximum utilization of hardware resources. The default max RAM for child tasks is 200MB which might be insufficient for many production grade jobs. The JVM settings for child tasks are governed by mapred.child.java.opts property. Use JDK 1.6 64 BIT– + +XX:CompressedOops helpful in dealing with OOM errors Do remember changing Linux open file descriptor Set java.net.preferIPv4Stack set to true, to avoid timeouts in cases where the OS/JVM picks up an IPv6 address and must resolve the hostname.

Logging Is a friend to developers, Foe in production Default - INFO level dfs.namenode.logging.level hadoop.job.history hadoop.logfile.size/count

Static Data strategies Available Approaches JobConf.set(“key”,”value”) Distributed cache HDFS shared file Suggested approaches if above ones not efficient Memcached Tokyocabinet/TokyoTyrant Berkley DB HBase

Debugging and profiling- Arun C Murthy Hadoop Map-Reduce – Tuning and Debugging- from Arun C Murthy presentation Debugging Log files/UI view Local runner Single machine mode Set keep.failed.task.files to true and use the IsolationRunner Profiling Set mapred.task.profile to true Use mapred.task.profile.{maps|reduces} hprof support is built-in Use mapred.task.profile.params to set options for the debugger Possibly DistributedCache for the profiler’s agent

Tuning - Arun C Murthy Hadoop Map-Reduce – Tuning and Debugging- from Arun C Murthy presentation Tuning Tell HDFS and Map-Reduce about your network! – Rack locality script: topology.script.file.name Number of maps – Data locality Number of reduces – You don’t need a single output file!Log files/UI view Amount of data processed per Map - Consider fatter maps, Custom input format Combiner - multi-level combiners at both Map and Reduce Check to ensure the combiner is useful! Map-side sort -io.sort.mb, io.sort.factor, io.sort.record.percent, io.sort.spill.percent Shuffle Compression for map-outputs – mapred.compress.map.output , mapred.map.output.compression.codec , lzo via libhadoop.so, tasktracker.http.threads mapred.reduce.parallel.copies, mapred.reduce.copy.backoff, mapred.job.shuffle.input.buffer.percent, mapred.job.shuffle.merge.percent, mapred.inmem.merge.threshold, mapred.job.reduce.input.buffer.percent Compress the job output Miscellaneous -Speculative execution, Heap size for the child, Re-use jvm for maps/reduces, Raw Comparators

Next steps Hadoop Vaidya (since 0.20.0) Job configuration analyzer (WIP-to be contributed back to Hadoop) Part of Analyze Job web ui Analyze and suggest config parameters from job.xml Smart suggestion engine/auto-correction

Conclusion Performance of Hadoop MapReduce jobs can be improved without increasing the hardware costs, by tuning several key configuration parameters for cluster specifications, input data size and processing complexity.

References Hadoop.apache.org Hadoop-performance tuning--white paper v1 1.pdf – Arun C Murthy Intel_White_Paper_Optimizing_Hadoop_Deployments.pdf