Distributed Video Transcoding System based on MapReduce for Video Content Delivery Myoungjin Kim', Hanku Lee l 'z* Hyeokju Lee' and Seungho Han' ' Department.

Slides:

Advertisements

Similar presentations

Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Advertisements

MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.

MapReduce Simplified Data Processing on Large Clusters

 Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware  Created by Doug Cutting and.

Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.

LIBRA: Lightweight Data Skew Mitigation in MapReduce

MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.

Building a Distributed Full-Text Index for the Web S. Melnik, S. Raghavan, B.Yang, H. Garcia-Molina.

 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)

CPS216: Advanced Database Systems (Data-intensive Computing Systems) How MapReduce Works (in Hadoop) Shivnath Babu.

Google Distributed System and Hadoop Lakshmi Thyagarajan.

Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc

Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.

Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.

U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.

Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.

Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)

Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.

Authors: Jiann-Liang Chenz, Szu-Lin Wuy,Yang-Fang Li, Pei-Jia Yang,Yanuarius Teofilus Larosa th International Wireless Communications and Mobile.

CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.

MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.

Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.

An Approach for Processing Large and Non-uniform Media Objects on MapReduce-Based Clusters Rainer Schmidt and Matthias Rella Speaker: Lin-You Wu.

MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.

W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.

Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.

Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.

Introduction to Hadoop and HDFS

Cloud Distributed Computing Platform 2 Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)

Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 作者 :Rutvik Karve ， Devendra Dahiphale ， Amit Chhajer 報告 : 饒展榕.

MapReduce How to painlessly process terabytes of data.

An Architecture for Distributed High Performance Video Processing in the Cloud Speaker : 吳靖緯 MA0G IEEE 3rd International Conference.

Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters Hung-chih Yang(Yahoo!), Ali Dasdan(Yahoo!), Ruey-Lung Hsiao(UCLA), D. Stott Parker(UCLA)

An Architecture for Distributed High Performance Video Processing in the Cloud 作者 :Pereira, R.; Azambuja, M.; Breitman, K.; Endler, M. 出處 :2010 IEEE 3rd.

Performance Evaluation of Image Conversion Module Based on MapReduce for Transcoding and Transmoding in SMCCSE Speaker : 吳靖緯 MA0G IEEE.

Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 2011 UKSim 5th European Symposium on Computer Modeling and Simulation Speker : Hong-Ji.

Database Applications (15-415) Part II- Hadoop Lecture 26, April 21, 2015 Mohammad Hammoud.

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce Leonidas Akritidis Panayiotis Bozanis Department of Computer & Communication.

U N I V E R S I T Y O F S O U T H F L O R I D A Hadoop Alternative The Hadoop Alternative Larry Moore 1, Zach Fadika 2, Dr. Madhusudhan Govindaraju 2 1.

Design of On-Demand Analysis for Cloud Service Configuration using Related-Annotation Hyogun Yoon', Hanku Lee' 2 `, ' Center for Social Media Cloud Computing,

CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.

MapReduce: Simplified Data Processing on Large Clusters Lim JunSeok.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Design Issues of Prefetching Strategies for Heterogeneous Software DSM Author :Ssu-Hsuan Lu, Chien-Lung Chou, Kuang-Jui Wang, Hsiao-Hsi Wang, and Kuan-Ching.

A Method for Providing Personalized Home Media Service Using Cloud Computing Technology Cui Yunl, Myoungjin Kim l and Hanku Lee l 'z * ' Department of.

IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing.

Toward Efficient and Simplified Distributed Data Intensive Computing IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 22, NO. 6, JUNE 2011PPT.

Page 1 A Platform for Scalable One-pass Analytics using MapReduce Boduo Li, E. Mazur, Y. Diao, A. McGregor, P. Shenoy SIGMOD 2011 IDS Fall Seminar 2011.

MapReduce & Hadoop IT332 Distributed Systems. Outline  MapReduce  Hadoop  Cloudera Hadoop  Tutorial 2.

Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies

Authors: Jiann-Liang Chenz, Szu-Lin Wuy, Yang-Fang Li, Pei-Jia Yang,

{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.

INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.

By: Joel Dominic and Carroll Wongchote 4/18/2012.

MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.

COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University

Big Data is a Big Deal!.

An Open Source Project Commonly Used for Processing Big Data Sets

Introduction to MapReduce and Hadoop

Hadoop Clusters Tess Fulkerson.

Myoungjin Kim1, Yun Cui1, Hyeokju Lee1 and Hanku Lee1,2,*

MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner

Ministry of Higher Education

Database Applications (15-415) Hadoop Lecture 26, April 19, 2016

Distributed Systems CS

Ch 4. The Evolution of Analytic Scalability

Distributed Systems CS

MapReduce: Simplified Data Processing on Large Clusters

Presentation transcript:

Distributed Video Transcoding System based on MapReduce for Video Content Delivery Myoungjin Kim', Hanku Lee l 'z* Hyeokju Lee' and Seungho Han' ' Department of Internet and Multimedia Engineering, Konkuk University, Seoul, Korea 2 Center for Social Media Cloud Computing, Konkuk University, Seoul, Korea {tough105, hlee, hjlee09, Abstract. In this paper, we propose a distributed video transcoding system that transcodes various video codec formats into the MPEG-4 video format. This system makes various kinds of video content available for heterogeneous devices such as the smart phones, PCs, TVs, and pads. We design and implement our system using the MapReduce framework running on a Hadoop Distributed File System (HDFS) platform and the media processing library Xuggler. In this way, we exponentially reduce the encoding time to transcode large amounts of video content, facilitating a transcoding function. Based on experiments performed on a 28-node cluster, the proposed distributed video transcoding system provides excellent performance in terms of speed and quality. Keywords: Video Transcoding, MapReduce, Hadoop, and Distributed Processing 1 Introduction Due to the explosive growth of mobile devices and improvements in mobile device communication technologies, media streaming services and applications based on video content have gained remarkable popularity and interest from users. Under these circumstances, transcoding techniques [1] are needed in order to deliver large amounts of video data in multiple formats to heterogeneous mobile devices. The transcoding system requires a high performance computing capability as well as the intensive use of scalable computing resources. Video transcoding includes three main process steps: decoding, resizing, and encoding. However, the transcoding processing time is extremely slow since most existing transcoding systems run within a traditional single PC environment. In order to reduce transcoding time significantly, we propose a distributed video transcoding system based on MapReduce [2] running on a Hadoop Distributed File System (HDFS) [2]. The proposed system is able to transcode a variety of video coding formats into the MPEG-4 video format. Improvements in quality and speed are realized by adopting a HDFS for storing large amounts of video data created from * Corresponding author Session 6A 733

numerous users, MapReduce for distributed and parallel processing of video data, and Xuggler [3] for transcoding based on an open source. Our paper is organized as follows: Section 2 introduces Hadoop and MapReduce. In Section 3, we explain our distributed video transcoding system based on MapReduce with simple characteristics. Experiment and its results are provided in Section 4. Section 6 concludes our paper. 2 Hadoop and MapReduce Hadoop, inspired by Google's MapReduce and Google File System, is a software framework that supports data-intensive distributed applications handling thousands of nodes and petabytes of data. It can perform scalable and timely analytical processing of large data sets to extract useful information. Hadoop consists of two important frameworks: 1) Hadoop Distributed File System (HDFS), like GFS, is a distributed, scalable and portable file system written in Java. 2) MapReduce is the first framework developed by Google for processing large data sets. The MapReduce framework provides a specific programming model and a runtime system for processing and creating large data sets amenable to various real-world tasks. This framework also handles automatic scheduling, communication, and synchronization for processing huge datasets and it has fault tolerance capability. The MapReduce programming model is executed in two main steps called mapping and reducing. Mapping and reducing are defined by mapper and reducer functions. Each phase has a list of key and value pairs as input and output. In the mapping step, MapReduce receives input data sets and then feeds each data element to the mapper in the form of key and value pairs. In the reducing step, all the outputs from the mapper are processed, and the final result is generated by the reducer using the merging process. Figurel shows the example of word counting using MapReduce framework. Output brown, 2 fox, 2 how, 1 now, I the, 3 ate, I cow, 1 mouse, 1 quick, 1 the quick brown fox how now brown cow the fox ate the mouse Map Shuffle & Sort Reduce the, 1 brown, 1 fox, 1 Fig. 1. An example of word counting using MapReduce 734 Computers, Networks, Systems, and Industrial Appications

3 Distributed Video Transcoding System based on MapReduce In this section, we briefly describe our proposed system architecture. 1) Our system contains a codec transcoding function and a function with a different display size, codec method, and container format. 2) Our system mainly focuses on the batch processing of large numbers of video files collected for a fixed period rather than the processing of small video files collected in real time. 3) HDFS is applied to our system in order to avoid the high cost of the communication of the video file while data transfer occurs for distributed processing. HDFS is also applied to our system due to the large chunk size (64MB) policy suitable for processing video files and the user-level distributed system. 4) Our system follows load balancing, fault tolerance, and merging and splitting policies provided from MapReduce for distributed processing. The proposed system uses HDFS as storage for distributed parallel processing. The very large amount of collected data is automatically distributed in the data nodes of HDFS. For distributed parallel processing, the proposed system exploits the Hadoop MapReduce framework. In addition, Xuggler libraries for video resizing and encoding are utilized in Mapper. The Map function processes each chunked block of video data in a distributed and parallel manner. Figure 2 illustrates the overall system architecture of a distributed video transcoding system based on MapReduce. Storage Virtualization CPU Virtualization HDFS-based Splitting & Merging Domain(HbSMD) Network Virtualization MapReduce-based Transcoding Domain(MbTD) Cloud-based Infrastructure Domain(CbID) Fig. 2. Overall system architecture of a distributed video transcoding system Original Encoded Video Data sets  ■ ■ Video Data Collection Domain(VDCD). 1.1 TranscodedVideoDatasets 0 0 CI Original Encoded Video — — - Transcoded Video Session 6A 735

4 Experiment and Results Performance evaluation is conducted on a 28 node HDFS cluster consisting of 1 master node and 27 slave nodes (data node). Each node running on the Linux OS (CentOS 5.5) is equipped with two Intel Xeon 4 core 2.13GHz processors with 4GB registered ECC DDR memory and 1TB SATA-2. All nodes are interconnected with a 100Mbps Ethernet adapter. We also use Java 1.6.0_23, Hadoop , and Xuggler 3.4. To verify the performance evaluation for encoding very large amounts of video files into target files, we prepare and use seven kinds of video data sets (Table 1) according to their volume sizes. We measure the total time to transcode each data set with the different MapReduce options of block size (default: 64MB) and block replications (default: 3EA). Table 2 lists the parameters for the original and target transcoded video file. Table 3 lists the measured transcoding times in seconds for different block size and replication. According to Table 3, our system provides an excellent transcoding time for very large amounts of video files. For example, our system takes approximately 2000 sec (about 33 min) in order to complete the transcoding process in a 50G video data set with the default Hadoop options. In this experiment, we demonstrate convincingly that our system shows a better performance when block size is set to 256MB and 512MB or block replication is set to 3EA as compared to other conditions. Table 1. Video data sets for performance evaluation Parameter 1 G B 2 G B 4 G B 8 G B 1 0 G B 2 0 G B 5 0 G B Number of video files Table 2. Parameters for each original and the transcoded video file Parameter Original video file Transcoded video file Codec XviD MPEG-4 Container AVI MP4 Size 200MB 60MB Duration 3m19s 3m19s Resolution 1280 x x 240 Frame rate fps fps Table 3. Total transcoding time according to the change of block size and the number of block replication (sec) Block Size 1G2G4G8G10G50GReplication1G2G4G8G10G50G 32MB lEA MB EA MB EA MB EA MB EA Computers, Networks, Systems, and Industrial Appications

5 Conclusion In this paper, we proposed a distributed video transcoding system based on MapReduce and HDFS that transcodes various video codec formats into the MPEG-4 video format. Our system ensures uniform transcoded video quality and a fast transcoding process. We experimentally verified the excellence of our system in video transcoding processing using distributed techniques. Acknowledgments. This research was supported by the MKE (The Ministry of Knowledge Economy), Korea, under the ITRC (Information Technology Research Center) support program supervised by the NIPA (National IT Industry Promotion Agency (NIPA-2012 — (NIPA H )). References 1.Sandro Moiron, Mohammed Ghanbari, Pedro Assuncalo and Sergio Faria: Video Transcoding Technique, Studies in Computational Intelligence, vol. 231, pp Springer, Heidelberg (2009) 2.Wikipedia, Hadoop, Session 6A 737