Chavit Denninnart, Mohsen Amini Salehi and Xiangbo Li

Slides:



Advertisements
Similar presentations
1 Concurrency Control Chapter Conflict Serializable Schedules  Two actions are in conflict if  they operate on the same DB item,  they belong.
Advertisements

1 A Hybrid Adaptive Feedback Based Prefetcher Santhosh Verma, David Koppelman and Lu Peng Louisiana State University.
SLA-Oriented Resource Provisioning for Cloud Computing
Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.
AMLAPI: Active Messages over Low-level Application Programming Interface Simon Yau, Tyson Condie,
Distributed Multimedia Systems
Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.
Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
On-Demand Media Streaming Over the Internet Mohamed M. Hefeeda, Bharat K. Bhargava Presented by Sam Distributed Computing Systems, FTDCS Proceedings.
New Challenges in Cloud Datacenter Monitoring and Management
Coding techniques for digital cinema Andreja Samčović University of Belgrade Faculty of Transport and Traffic Engineering.
Mohamed Hefeeda 1 School of Computing Science Simon Fraser University, Canada Video Streaming over Cooperative Wireless Networks Mohamed Hefeeda (Joint.
Using Prediction to Accelerate Coherence Protocols Authors : Shubendu S. Mukherjee and Mark D. Hill Proceedings. The 25th Annual International Symposium.
Server to Server Communication Redis as an enabler Orion Free
Recent Results in Combined Coding for Word-Based PPM Radu Rădescu George Liculescu Polytechnic University of Bucharest Faculty of Electronics, Telecommunications.
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
Performance Analysis of Preemption-aware Scheduling in Multi-Cluster Grid Environments Mohsen Amini Salehi, Bahman Javadi, Rajkumar Buyya Cloud Computing.
Uppsala, April 12-16th 2010EGEE 5th User Forum1 A Business-Driven Cloudburst Scheduler for Bag-of-Task Applications Francisco Brasileiro, Ricardo Araújo,
Project Presentation By: Dean Morrison 12/6/2006 Dynamically Adaptive Prepaging for Effective Virtual Memory Management.
11 Online Computing and Predicting Architectural Vulnerability Factor of Microprocessor Structures Songjun Pan Yu Hu Xiaowei Li {pansongjun, huyu,
Taeho Kgil, Trevor Mudge Advanced Computer Architecture Laboratory The University of Michigan Ann Arbor, USA CASES’06.
A novel, low-latency algorithm for multiple group-by query optimization Duy-Hung Phan Pietro Michiardi ICDE16.
Resource Provision for Batch and Interactive Workloads in Data Centers Ting-Wei Chang, Pangfeng Liu Department of Computer Science and Information Engineering,
Input and Output Optimization in Linux for Appropriate Resource Allocation and Management James Avery King.
Optimizing Distributed Actor Systems for Dynamic Interactive Services
Is Virtualization ready for End-to-End Application Performance?
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
T-Share: A Large-Scale Dynamic Taxi Ridesharing Service
Introduction to Load Balancing:
EEE Embedded Systems Design Process in Operating Systems 서강대학교 전자공학과
Guangxiang Du*, Indranil Gupta
Applying Control Theory to Stream Processing Systems
Chapter 5a: CPU Scheduling
Selective Code Compression Scheme for Embedded System
Edinburgh Napier University
A Study of Group-Tree Matching in Large Scale Group Communications
Application Level Fault Tolerance and Detection
Analyzing Security and Energy Tradeoffs in Autonomic Capacity Management Wei Wu.
Accelerating MapReduce on a Coupled CPU-GPU Architecture
A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.
Chapter 6: CPU Scheduling
-A File System for Lots of Tiny Files
CPU Scheduling G.Anuradha
Module 5: CPU Scheduling
Zhen Xiao, Qi Chen, and Haipeng Luo May 2013
Autonomous Aggregate Data Analytics in Untrusted Cloud
TimeTrader: Exploiting Latency Tail to Save Datacenter Energy for Online Search Balajee Vamanan, Hamza Bin Sohail, Jahangir Hasan, and T. N. Vijaykumar.
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
3: CPU Scheduling Basic Concepts Scheduling Criteria
Jason Neih and Monica.S.Lam
Chapter5: CPU Scheduling
P. (Saday) Sadayappan Ohio State University
Smita Vijayakumar Qian Zhu Gagan Agrawal
CARP: Compression-Aware Replacement Policies
Chapter 5: CPU Scheduling
Chapter 6: CPU Scheduling
Chapter 5: CPU Scheduling
Operating System , Fall 2000 EA101 W 9:00-10:00 F 9:00-11:00
Kyoungwoo Lee, Minyoung Kim, Nikil Dutt, and Nalini Venkatasubramanian
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
CSE 542: Operating Systems
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Fog Computing for Low Latency, Interactive Video Streaming
Towards Predictable Datacenter Networks
Li Li, Zhu Li, Vladyslav Zakharchenko, Jianle Chen, Houqiang Li
Presentation transcript:

Chavit Denninnart, Mohsen Amini Salehi and Xiangbo Li Leveraging Computational Reuse for Cost- and QoS-Efficient Task Scheduling in Clouds Adel Nadjaran Toosi Faculty of Information Technology Monash University Chavit Denninnart, Mohsen Amini Salehi and Xiangbo Li High Performance Cloud Computing Lab . School of Computing and Informatics University of Louisiana Lafayette 13 Nov 2018

Introduction Cloud based computing system generally contains back-end scheduler and multiple worker nodes.  Resources are not truly limitless Cost limitation, etc. System can get oversubscribed

Introduction Resources are not truly limitless Cost limitation, etc. System can get oversubscribed Tasks have their deadlines to meet otherwise QoE will suffer Identical or similar task exist Merge task to reduce computation Carelessly merge or re-order tasks also make tasks missing their deadlines

Motivation We are motivated by Video Streaming system that process videos on demand. Not all of multiple versions of the same video in different codec and settings are preprocessed Have preprocess segments of popular videos in popular setting ready to use Process on the fly for rarely access videos and rarely access settings

Background: Video Transcoding Video Processing Transcoding converting a video file from one format to another. Type of tasks in our scenario Bit rate adjustment Spatial resolution reduction Temporal resolution reduction (Frame rate) Video compression standard conversion

Background: Video Transcoding Sequences Segment header GOPS I frames P frames B frames Macro blocks Each transcoding tasks are in GOPs levels Viedo is too coarse, Frame is too small-grames

Background: CVSE Cloud-based Video Streaming Engine

Module Architecture

System Model and Problem Definition We assume a homogeneous computing system Caching system co-exist but not aware by our system Already cached video segment do not become a transcoding task System get oversubscribed without scaling up/out our resources to cope with workload The system may get oversubscribed during peak hours.  Users may have limited budget, limited available resource scaling The question is: “How to reduce oversubscription?”

Contribution of This Work Proposing an efficient way of identifying potentially merge-able tasks.  Determining appropriateness and potential side-effects of merging tasks Analyzing the performance of the task aggregation mechanism on the viewers’ QoE and time of deploying cloud resources (VMs)

Similarity Levels Task Level Similarity Operation Level Similarity Same GOP, same process, same parameters Deduplicated out, no extra overhead for merged task Operation Level Similarity Same GOP, same Process Data Level Similarity Same GOP Operation level: Ffmpeg -I inputfile –r resolution1 output1 resolutions2 output2

Steps Find mergeable pairs Determine merge appropriateness Task aggregation Focus on first two Not task aggregation: it is application-specific. We have impeleted for on-demand vi

Mergeable Tasks Detection Have three hash tables with keys of all tasks in the queue Per similarity Level Value of hash entry point to task object Merge tasks upon arrival to admission control create 3 keys from request signature to check for identical existing hash values.

Mergeable Tasks Detection Search the key in hash tables in levels from maximum reusability to least reusability, Task -> Operational -> Data If the hash entry exist, we found a merge candidate O(1) Hash table maintaining: removing entry from hash table when they are executed Update hash table when we merge a task

Merge Appropriateness Evaluation Only Operational- and Data-level similarity Aggregated task become one object for the scheduling system Aggregated task still take longer than each individual task to execute, therefore can cause original task in the queue or task following it to miss deadline

Merge Appropriateness Evaluation Procedure: Two scenarios will be checked: candidate tasks are merged another where candidate tasks stay separated (i.e. not aggregated). Estimate completion time of each task Merging imposes additional deadline misses over non-merged scenario. DO NOT MERGE Merging does not cause additional deadline misses. MERGE

Merge Appropriateness Evaluation

Experimental Setup CVSE run in simulation mode Simulate 8 VMs, VM scaling turned off Do real scheduling, but not actually transcoding Sampling transcoding time with mean and SD gathered from benchmark the transcoding operations on g2.2xlarge Amazon on each individual GOPs Compare between system with merging and without task merging Experiment on three scheduling policies FCFS EDF Max Urgency (MU) 8 threads, scaling off Max urgency means sorted based on the latest time they can be stared and meet the dealine

less amount of VM time for the same amount of work Experimental Results 14% saving Makespan in Seconds 14% Saving less amount of VM time for the same amount of work

Experimental Results Deadline miss rate-> Viewer’s QoE violation DMR = Dealine Miss Rate Deadline miss rate-> Viewer’s QoE violation Less deadline miss rate -> better average QoE

Conclusions Cumulative execution time reduction results in deadline miss rate reduction Detect candidate task in O(1) We saved more than 14% off of execution time and dramatically reduce deadline miss rate

Future Works Scalable merge aggressiveness factor Scale between aggregate tasks more aggressively to save more overall computing power, or be conservative and do not cause deadline violation to involved tasks Heterogeneous machine system Workflow Scenarios using Directed Acyclic Graph to perform additional computational reuse

Thank you Questions?