Myoungjin Kim1, Yun Cui1, Hyeokju Lee1 and Hanku Lee1,2,*

Myoungjin Kim1, Yun Cui1, Hyeokju Lee1 and Hanku Lee1,2,*
Performance Evaluation of a Hadoop-based Distributed Video Transcoding System for Mobile Media Service Myoungjin Kim1, Yun Cui1, Hyeokju Lee1 and Hanku Lee1,2,* 1 Department of Internet and Multimedia Engineering, Konkuk University, Seoul, Korea 2 Center for Social Media Cloud Computing, Konkuk University, Seoul, Korea {tough105, ilycy, hjlee09, Abstract. Previously, we proposed a Hadoop-based Distributed Video Transcoding System that transcodes a large number of video data sets into specific video formats depending on user-requested options. In order to reduce the transcoding time exponentially, we apply a Hadoop Distributed File System (HDFS) and a MapReduce framework to our system. Hadoop and MapReduce are designed to process petabyte-scale text data in a parallel and distributed manner, while our system processes multi-media data. In this study, we measure the total transcoding time for various values of three MapReduce tuning parameters: block replication factor, Hadoop Distributed File System block size, and Java Virtual Machine reuse option. Thus, we determine the optimal values of the parameters affecting transcoding performance. Keywords: MapReduce, Hadoop, Video transcoding, Performance Evaluation 1 Introduction In order to reduce transcoding time significantly, we proposed a Hadoop-based Distributed Video Transcoding System (HDVTS) based on MapReduce [2] running on a Hadoop Distributed File System (HDFS) [2]. The proposed system is able to transcode a variety of video coding formats into the MPEG-4 video format. Improvements in quality and speed are realized by adopting a HDFS for storing large amounts of video data created from numerous users, MapReduce for distributed and parallel processing of video data, and Xuggler libraries for transcoding based on an open source. MapReduce is widely utilized for large-scale text data analysis in the cloud environment. Several MapReduce tuning parameters need to be set by users and administrators who manipulate MapReduce applications. Hence, in order to assist unqualified administrators, Shivnath presented techniques that automate the process of setting the tuning parameters for MapReduce programs [1]. These techniques are applicable only to MapReduce programs that are suitable for petabyte-scale text data. However, the MapReduce framework applied to our system handles multi-media data. Hence, optimal tuning parameters for video transcoding processing in Hadoop * Corresponding author

2 Brief Overview on a HDVTS
must be considered. In this study, the optimal values of the parameters affecting the transcoding performance are determined by measuring the total transcoding time for various values of three parameters: dfs.replication for block replication factor, dfs.block.size for block size factor, and mapred.job.reuse.jvm.num.tasks for JVM reuse factor. 2 Brief Overview on a HDVTS In this section, we briefly describe our proposed system architecture. 1) Our system contains a codec transcoding function and a function with a different display size, codec method, and container format. 2) Our system mainly focuses on the batch processing of large numbers of video files collected for a fixed period rather than the processing of small video files collected in real time. 3) HDFS is applied to our system in order to avoid the high cost of the communication of the video file while data transfer occurs for distributed processing. HDFS is also applied to our system due to the large chunk size (64MB) policy suitable for processing video files and the user-level distributed system. 4) Our system follows load balancing, fault tolerance, and merging and splitting policies provided from MapReduce for distributed processing. 3 Experiment and Results Performance evaluation is conducted on a 28 node HDFS cluster consisting of 1 master node and 27 slave nodes (data node). Each node running on the Linux OS (CentOS 5.5) is equipped with two Intel Xeon 4 core 2.13GHz processors with 4GB registered ECC DDR memory and 1TB SATA-2. All nodes are interconnected with a 100Mbps Ethernet adapter. To verify the performance evaluation for encoding very large amounts of video files into target files, we create and use six types of video data sets (1, 2, 4, 8, 10, 50 GB) having different sizes. The total time to transcode the original video data sets (Xvid, AVI, 200 MB, 1280×720) into target files (MPEG4, MP4, 60 MB, 320×240) is measured. The various Hadoop configurations used are: (1) JVM runs in server mode with 1024 MB heap memory for map tasks, (2) JVM reuse option is enabled, (3) HDFS block size is 64 MB, (4) Block replication factor is three, and (5) I/O file buffer size is 4 KB. According to Table 1, our system provides an excellent transcoding time for very large amounts of video files. For example, our system takes approximately 2000 sec (about 33 min) in order to complete the transcoding process in a 50G video data set with the default Hadoop options. In this experiment, we demonstrate convincingly that our system shows a better performance when block size is set to 256MB and 512MB or block replication is set to 3EA or JVM reuse is set to -1 as compared to other conditions.

4 Conclusion References
Table 1. Total video transcoding time for four sets of experiments (s) Total video transcoding times (s) Image data set size 1GB 2GB 4GB 8GB 10GB 50GB Block size (MB) 32 173 226 335 532 666 2837 64 124 169 207 311 375 1623 128 103 108 120 199 209 820 256 102 106 116 443 512 105 111 109 444 Block replication Factor 1 144 309 467 527 2330 2 160 176 239 323 396 1636 3 4 170 222 212 326 391 1643 5 166 224 347 400 1683 JVM Reuse 167 219 298 450 572 2461 -1 164 213 295 446 556 2429 Resizing 320 x 240 640 x 480 346 358 571 905 1116 5062 800 x 480 401 412 681 1164 1329 5901 4 Conclusion This study aims to find the optimal values of the tuning parameters in a Hadoop-based distributed video transcoding system by measuring the total transcoding time for various values of three parameters: block size, block replication factor, and JVM reuse factor. From our experiments, it is determined that our system exhibits good performance for the media transcoding processes when the block size has a value that is greater than or nearly equal to the original file size, and the block replication factor and JVM reuse factor are configured as 3 and -1, respectively. Acknowledgments. This research was supported by the MKE (The Ministry of Knowledge Economy), Korea, under the ITRC (Information Technology Research Center) support program supervised by the NIPA (National IT Industry Promotion Agency (NIPA-2012 ± (NIPA-2012-H )). References Shivnath Babu: Towards automatic optimization of MapReduce programs, In: the 1st ACM symposium on Cloud computing, pp , ACM Press, New York(2010) Wikipedia,

Myoungjin Kim1, Yun Cui1, Hyeokju Lee1 and Hanku Lee1,2,*

Similar presentations

Presentation on theme: "Myoungjin Kim1, Yun Cui1, Hyeokju Lee1 and Hanku Lee1,2,*"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Myoungjin Kim1, Yun Cui1, Hyeokju Lee1 and Hanku Lee1,2,*

Similar presentations

Presentation on theme: "Myoungjin Kim1, Yun Cui1, Hyeokju Lee1 and Hanku Lee1,2,*"— Presentation transcript:

Similar presentations

About project

Feedback