Myoungjin Kim1, Yun Cui1, Hyeokju Lee1 and Hanku Lee1,2,*

Slides:



Advertisements
Similar presentations
Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)
Advertisements

Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
LIBRA: Lightweight Data Skew Mitigation in MapReduce
SLA-Oriented Resource Provisioning for Cloud Computing
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
CPS216: Advanced Database Systems (Data-intensive Computing Systems) How MapReduce Works (in Hadoop) Shivnath Babu.
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
Final Project: Video Transcoding on Cloud Environments Queenie Wong CMPT 880.
U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Authors: Jiann-Liang Chenz, Szu-Lin Wuy,Yang-Fang Li, Pei-Jia Yang,Yanuarius Teofilus Larosa th International Wireless Communications and Mobile.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
An Approach for Processing Large and Non-uniform Media Objects on MapReduce-Based Clusters Rainer Schmidt and Matthias Rella Speaker: Lin-You Wu.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
CPS216: Advanced Database Systems (Data-intensive Computing Systems) Introduction to MapReduce and Hadoop Shivnath Babu.
Performance Evaluation of Image Conversion Module Based on MapReduce for Transcoding and Transmoding in SMCCSE Speaker : 吳靖緯 MA0G IEEE.
MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.
U N I V E R S I T Y O F S O U T H F L O R I D A Hadoop Alternative The Hadoop Alternative Larry Moore 1, Zach Fadika 2, Dr. Madhusudhan Govindaraju 2 1.
Design of On-Demand Analysis for Cloud Service Configuration using Related-Annotation Hyogun Yoon', Hanku Lee' 2 `, ' Center for Social Media Cloud Computing,
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Design Issues of Prefetching Strategies for Heterogeneous Software DSM Author :Ssu-Hsuan Lu, Chien-Lung Chou, Kuang-Jui Wang, Hsiao-Hsi Wang, and Kuan-Ching.
A Method for Providing Personalized Home Media Service Using Cloud Computing Technology Cui Yunl, Myoungjin Kim l and Hanku Lee l 'z * ' Department of.
A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.
Silberschatz and Galvin  Operating System Concepts Module 1: Introduction What is an operating system? Simple Batch Systems Multiprogramming.
By: Joel Dominic and Carroll Wongchote 4/18/2012.
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
Distributed Video Transcoding System based on MapReduce for Video Content Delivery Myoungjin Kim', Hanku Lee l 'z* Hyeokju Lee' and Seungho Han' ' Department.
Presenter: Yue Zhu, Linghan Zhang A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint.
BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.
Lecture 4. MapReduce Instructor: Weidong Shi (Larry), PhD
Big Data is a Big Deal!.
Chapter 1: Introduction
Chapter 1: Introduction
Hadoop Aakash Kag What Why How 1.
By Chris immanuel, Heym Kumar, Sai janani, Susmitha
An Open Source Project Commonly Used for Processing Big Data Sets
Microarchitecture.
Distributed Network Traffic Feature Extraction for a Real-time IDS
Spark Presentation.
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 1: Introduction
Introduction to HDFS: Hadoop Distributed File System
HTML5 based Notification System for Updating
Chapter 1: Introduction
Hadoop Clusters Tess Fulkerson.
Software Engineering Introduction to Apache Hadoop Map Reduce
Chapter 1: Introduction
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
Distributed Systems CS
The Basics of Apache Hadoop
On Spatial Joins in MapReduce
Introduction to Apache
Chapter 1: Introduction
Data processing with Hadoop
Language Processors Application Domain – ideas concerning the behavior of a software. Execution Domain – Ideas implemented in Computer System. Semantic.
Interpret the execution mode of SQL query in F1 Query paper
Introduction to Operating Systems
Distributed Systems CS
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 1: Introduction
Presentation transcript:

Myoungjin Kim1, Yun Cui1, Hyeokju Lee1 and Hanku Lee1,2,* Performance Evaluation of a Hadoop-based Distributed Video Transcoding System for Mobile Media Service Myoungjin Kim1, Yun Cui1, Hyeokju Lee1 and Hanku Lee1,2,* 1 Department of Internet and Multimedia Engineering, Konkuk University, Seoul, Korea 2 Center for Social Media Cloud Computing, Konkuk University, Seoul, Korea {tough105, ilycy, hjlee09, hlee}@konkuk.ac.kr Abstract. Previously, we proposed a Hadoop-based Distributed Video Transcoding System that transcodes a large number of video data sets into specific video formats depending on user-requested options. In order to reduce the transcoding time exponentially, we apply a Hadoop Distributed File System (HDFS) and a MapReduce framework to our system. Hadoop and MapReduce are designed to process petabyte-scale text data in a parallel and distributed manner, while our system processes multi-media data. In this study, we measure the total transcoding time for various values of three MapReduce tuning parameters: block replication factor, Hadoop Distributed File System block size, and Java Virtual Machine reuse option. Thus, we determine the optimal values of the parameters affecting transcoding performance. Keywords: MapReduce, Hadoop, Video transcoding, Performance Evaluation 1 Introduction In order to reduce transcoding time significantly, we proposed a Hadoop-based Distributed Video Transcoding System (HDVTS) based on MapReduce [2] running on a Hadoop Distributed File System (HDFS) [2]. The proposed system is able to transcode a variety of video coding formats into the MPEG-4 video format. Improvements in quality and speed are realized by adopting a HDFS for storing large amounts of video data created from numerous users, MapReduce for distributed and parallel processing of video data, and Xuggler libraries for transcoding based on an open source. MapReduce is widely utilized for large-scale text data analysis in the cloud environment. Several MapReduce tuning parameters need to be set by users and administrators who manipulate MapReduce applications. Hence, in order to assist unqualified administrators, Shivnath presented techniques that automate the process of setting the tuning parameters for MapReduce programs [1]. These techniques are applicable only to MapReduce programs that are suitable for petabyte-scale text data. However, the MapReduce framework applied to our system handles multi-media data. Hence, optimal tuning parameters for video transcoding processing in Hadoop * Corresponding author - 308 -

2 Brief Overview on a HDVTS must be considered. In this study, the optimal values of the parameters affecting the transcoding performance are determined by measuring the total transcoding time for various values of three parameters: dfs.replication for block replication factor, dfs.block.size for block size factor, and mapred.job.reuse.jvm.num.tasks for JVM reuse factor. 2 Brief Overview on a HDVTS In this section, we briefly describe our proposed system architecture. 1) Our system contains a codec transcoding function and a function with a different display size, codec method, and container format. 2) Our system mainly focuses on the batch processing of large numbers of video files collected for a fixed period rather than the processing of small video files collected in real time. 3) HDFS is applied to our system in order to avoid the high cost of the communication of the video file while data transfer occurs for distributed processing. HDFS is also applied to our system due to the large chunk size (64MB) policy suitable for processing video files and the user-level distributed system. 4) Our system follows load balancing, fault tolerance, and merging and splitting policies provided from MapReduce for distributed processing. 3 Experiment and Results Performance evaluation is conducted on a 28 node HDFS cluster consisting of 1 master node and 27 slave nodes (data node). Each node running on the Linux OS (CentOS 5.5) is equipped with two Intel Xeon 4 core 2.13GHz processors with 4GB registered ECC DDR memory and 1TB SATA-2. All nodes are interconnected with a 100Mbps Ethernet adapter. To verify the performance evaluation for encoding very large amounts of video files into target files, we create and use six types of video data sets (1, 2, 4, 8, 10, 50 GB) having different sizes. The total time to transcode the original video data sets (Xvid, AVI, 200 MB, 1280×720) into target files (MPEG4, MP4, 60 MB, 320×240) is measured. The various Hadoop configurations used are: (1) JVM runs in server mode with 1024 MB heap memory for map tasks, (2) JVM reuse option is enabled, (3) HDFS block size is 64 MB, (4) Block replication factor is three, and (5) I/O file buffer size is 4 KB. According to Table 1, our system provides an excellent transcoding time for very large amounts of video files. For example, our system takes approximately 2000 sec (about 33 min) in order to complete the transcoding process in a 50G video data set with the default Hadoop options. In this experiment, we demonstrate convincingly that our system shows a better performance when block size is set to 256MB and 512MB or block replication is set to 3EA or JVM reuse is set to -1 as compared to other conditions. - 309 -

4 Conclusion References Table 1. Total video transcoding time for four sets of experiments (s) Total video transcoding times (s) Image data set size 1GB 2GB 4GB 8GB 10GB 50GB Block size (MB) 32 173 226 335 532 666 2837 64 124 169 207 311 375 1623 128 103 108 120 199 209 820 256 102 106 116 443 512 105 111 109 444 Block replication Factor 1 144 309 467 527 2330 2 160 176 239 323 396 1636 3 4 170 222 212 326 391 1643 5 166 224 347 400 1683 JVM Reuse 167 219 298 450 572 2461 -1 164 213 295 446 556 2429 Resizing 320 x 240 640 x 480 346 358 571 905 1116 5062 800 x 480 401 412 681 1164 1329 5901 4 Conclusion This study aims to find the optimal values of the tuning parameters in a Hadoop-based distributed video transcoding system by measuring the total transcoding time for various values of three parameters: block size, block replication factor, and JVM reuse factor. From our experiments, it is determined that our system exhibits good performance for the media transcoding processes when the block size has a value that is greater than or nearly equal to the original file size, and the block replication factor and JVM reuse factor are configured as 3 and -1, respectively. Acknowledgments. This research was supported by the MKE (The Ministry of Knowledge Economy), Korea, under the ITRC (Information Technology Research Center) support program supervised by the NIPA (National IT Industry Promotion Agency (NIPA-2012 ± (NIPA-2012-H0301-12-3006)). References Shivnath Babu: Towards automatic optimization of MapReduce programs, In: the 1st ACM symposium on Cloud computing, pp.137-142, ACM Press, New York(2010) Wikipedia, http://en.wikipedia.org/wiki/Apache_Hadoop, 2001 - 310 -