Final Project: Video Transcoding on Cloud Environments Queenie Wong CMPT 880.

Slides:



Advertisements
Similar presentations
Parallelizing Video Transcoding With Load Balancing On Cloud Computing Song Lin, Xinfeng Zhang, Qin Y, Siwei Ma Circuits and Systems, 2013 IEEE.
Advertisements

Suggested Course Outline Cloud Computing Bahga & Madisetti, © 2014Book website:
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
Load Rebalancing for Distributed File Systems in Clouds Hung-Chang Hsiao, Member, IEEE Computer Society, Hsueh-Yi Chung, Haiying Shen, Member, IEEE, and.
SkewTune: Mitigating Skew in MapReduce Applications
CMU SCS : Multimedia Databases and Data Mining Extra: intro to hadoop C. Faloutsos.
CMPT-884 Jan 18, 2010 Video Copy Detection using Hadoop Presented by: Cameron Harvey Naghmeh Khodabakhshi CMPT 820 December 2, 2010.
Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.
Gueyoung Jung, Nathan Gnanasambandam, and Tridib Mukherjee International Conference on Cloud Computing 2012.
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
A Hadoop MapReduce Performance Prediction Method
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Hadoop, Hadoop, Hadoop!!! Jerome Mitchell Indiana University.
Multimedia Cloud Computing Wenwu Zhu, Senior Internet Media Microsoft Research Asia Chong Luo, Member of the IEEE Jianfeng Wang, Master.
Project Proposal (Title + Abstract) Due Wednesday, September 4, 2013.
Ch 4. The Evolution of Analytic Scalability
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Frankie Pike. 2010: 1.2 zettabytes 1.2 trillion gigabytes DVDs past the moon 2-way = 6 newspapers everyday ~58% growth per year Why care?
University of Zagreb MMVE 2012 workshop1 Towards Reinterpretation of Interaction Complexity for Load Prediction in Cloud-based MMORPGs Mirko Sužnjević,
MapReduce April 2012 Extract from various presentations: Sudarshan, Chungnam, Teradata Aster, …
MapReduce vs. Parallel DBMS Hamid Safizadeh, Otelia Buffington
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
Location-aware MapReduce in Virtual Cloud 2011 IEEE computer society International Conference on Parallel Processing Yifeng Geng1,2, Shimin Chen3, YongWei.
An Approach for Processing Large and Non-uniform Media Objects on MapReduce-Based Clusters Rainer Schmidt and Matthias Rella Speaker: Lin-You Wu.
Face Detection And Recognition For Distributed Systems Meng Lin and Ermin Hodžić 1.
Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany wwwmatthes.in.tum.de Data-Parallel.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Vyassa Baratham, Stony Brook University April 20, 2013, 1:05-2:05pm cSplash 2013.
Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 作者 :Rutvik Karve , Devendra Dahiphale , Amit Chhajer 報告 : 饒展榕.
An Architecture for Distributed High Performance Video Processing in the Cloud Speaker : 吳靖緯 MA0G IEEE 3rd International Conference.
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters Hung-chih Yang(Yahoo!), Ali Dasdan(Yahoo!), Ruey-Lung Hsiao(UCLA), D. Stott Parker(UCLA)
An Architecture for Distributed High Performance Video Processing in the Cloud 作者 :Pereira, R.; Azambuja, M.; Breitman, K.; Endler, M. 出處 :2010 IEEE 3rd.
Large scale IP filtering using Apache Pig and case study Kaushik Chandrasekaran Nabeel Akheel.
A Hierarchical MapReduce Framework Yuan Luo and Beth Plale School of Informatics and Computing, Indiana University Data To Insight Center, Indiana University.
Large scale IP filtering using Apache Pig and case study Kaushik Chandrasekaran Nabeel Akheel.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 29-May 3, 2013 Mr. Scan: Efficient Clustering with MRNet and GPUs Evan Samanas and Ben.
A P2P-Based Architecture for Secure Software Delivery Using Volunteer Assistance Purvi Shah, Jehan-François Pâris, Jeffrey Morgan and John Schettino IEEE.
A framework for scalable cloud video recorder system in surveillance environment th International Conference on Ubiquitous Intelligence and Computing.
U N I V E R S I T Y O F S O U T H F L O R I D A Hadoop Alternative The Hadoop Alternative Larry Moore 1, Zach Fadika 2, Dr. Madhusudhan Govindaraju 2 1.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Web Log Data Analytics with Hadoop
Toward Efficient and Simplified Distributed Data Intensive Computing IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 22, NO. 6, JUNE 2011PPT.
C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.
ApproxHadoop Bringing Approximations to MapReduce Frameworks
Efficient Load Balancing Algorithm for Cloud Computing Network Che-Lun Hung 1, Hsiao-hsi Wang 2 and Yu-Chen Hu 2 1 Dept. of Computer Science & Communication.
Memcached Integration with Twister Saliya Ekanayake - Jerome Mitchell - Yiming Sun -
Load Rebalancing for Distributed File Systems in Clouds.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
Distributed Process Discovery From Large Event Logs Sergio Hernández de Mesa {
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
Matrix Multiplication in Hadoop
Distributed Video Transcoding System based on MapReduce for Video Content Delivery Myoungjin Kim', Hanku Lee l 'z* Hyeokju Lee' and Seungho Han' ' Department.
Lecture #4 Introduction to Data Parallelism and MapReduce CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
Csinparallel.org Workshop 307: CSinParallel: Using Map-Reduce to Teach Parallel Programming Concepts, Hands-On Dick Brown, St. Olaf College Libby Shoop,
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Hadoop Aakash Kag What Why How 1.
Parallel Software Development with Intel Threading Analysis Tools
15-826: Multimedia Databases and Data Mining
Algorithms for Big Data Delivery over the Internet of Things
Efficient Load Balancing Algorithm for Cloud
Myoungjin Kim1, Yun Cui1, Hyeokju Lee1 and Hanku Lee1,2,*
MapReduce Simplied Data Processing on Large Clusters
Ch 4. The Evolution of Analytic Scalability
Hadoop Technopoints.
Introduction to MapReduce
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

Final Project: Video Transcoding on Cloud Environments Queenie Wong CMPT 880

Introduction  Cloud computing technology has become mature and accessible to the public  Many complex and high computational operations can be distributed and processed on cloud environments  Complex video transcoding operations can be distributed to available nodes for paralleling processing – The encoding time of a MJEPG file was reduced from 7.5 hours to 2 minutes by scaling up to 6 nodes Queenie Wong2

Problem Statements  Excess Key-frame problem: Inappropriate splitting position on the original video file can create excess key- frames in the final output  How to perform video transcoding on Hadoop – Do not have native video transcoding library  Find out the right level of parallelism and split size for maps tasks – Too small split size: synchronization overheads – Too large split size: lack of dynamic load balancing Queenie Wong3

Proposed Solutions  Excess Key-frame problem: – Solution: MKVmerge program  Video transcoding on Hadoop – Solution: FFmpeg transcoding tool  The right level of parallelism and split size for maps tasks – Solutions:  Performance indicator to measure the efficiency of parallelism level  Video Transcoding Performance Test to find the optimal split size Queenie Wong4

Video Transcoding Performance Test  Studies suggested: – The right level of parallelism for maps seems to be around maps/node – The size of each map task is roughly 16MB to 64MB  Performance tests against different parallelism levels and split size to find out the ideal set up for video transcoding  Implementation: – Apache Hadoop, FFmpeg for video transcoding and MKVmerge for video splitting at key frames boundary Queenie Wong5

Parallel Nodes Test Results Queenie Wong6

Threshold for Adding Extra Nodes  Threshold: 5% of initial processing time (1 node) Reduction: (initial time – current time)/initial time Efficiency of parallelism level: (previous time – current time)/initial time ( ) / = 3.7 % Queenie Wong7

Split Size Test Results Queenie Wong8

Challenges  Tuning programs for files with different size  Get familiar with FFmpeg for video transcoding operations  Repositioning to key frame boundary  Measurement variance caused by I/O operations  Unstable software problems  Learn, setup, use and debug Hadoop and MapReduce program within a short timeframe Queenie Wong9

Conclusion  Video transcoding performance has been diminished when the system overhead exceed the benefit of parallel processing  Efficiency indicator is proposed to measure the efficiency of the level of parallelism  The optimal split sizes for transcoding is 64MB for files with common sizes  Enforce load balancing on every node in order to maximize the benefit of paralleling processing Queenie Wong10

References [1] R. Pereira, M. Azambuja, K. Breitman, and M. Endler, “An architecture for distributed high performance video processing in the cloud,” 2010 IEEE 3rd International Conference on Cloud Computing, pp. 482–489, [2] “mkvtoolnix - matroska tools for linux/unix and windows.” [3] “Ffmpeg.” [Online]. Available: [4] R. Schmidt and M. Rella, An approach for processing large and non-uniform media objects on mapreduce-based clusters. Springer Berlin Heidelberg, [5] “Yahoo! hadoop tutorial,” [Online]. Available: [6] J. Lin and C. Dyer, Data-Intensive Text Processing with MapReduce. Morgan & Claypool, [7] J. Dean and S. Ghemawat, “Mapreduce: simplified data processing on large clusters,” Comunications of the ACM, vol. 51, no. 1, pp. 107–113, Queenie Wong11

References [8] Z. Y, C. Wang, C.Thomborson, J. Wang, S. Lian, and A. Vasilakos, “Multimedia applications and security in mapreduce: opportunities and challenges,” Concurrency and Computation: Practice and Experience, vol. 17, no. 24, pp. 2083–2101, [9] A.Garcia, H.Kalva,and B.Furht,“Astudy of transcoding on cloud environments for video contentdelivery,” inProceedings of the 2010 ACM multimedia workshop on Mobile cloud media computing. ACM, 2010, pp. 13–18. [10] C. Yan, X. Yang, Z. Yu, M. Li, and X. Li, “Incmr: Incremental data processing based on mapreduce,” in 2012 IEEE 5th International Conference on Cloud Computing (CLOUD). IEEE, 2012, pp. 534–541. [11] N. S. Chahal and B. S. Khehra, “A comparative study for optimization of video file compression in cloud environment,” International Journal of Computer Applications, vol. 60, no. 13, pp. 27–30, [12] Yzyero. Hadoop + ffmpeg on mapreduce. [Online]. Available: [13] “Apache hadoop.” [Online]. Available: Queenie Wong12