Job Management on Azure Batch/Scheduler

Slides:



Advertisements
Similar presentations
SALSA HPC Group School of Informatics and Computing Indiana University.
Advertisements

Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
Microsoft Cloud Futures 2010 April 9, 2010 Jie Li 1, Youngryel Ryu 2, Deb Agarwal 3, Keith Jackson 3, Marty Humphrey 1, Catharine van Ingen 4 University.
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 4.
CPS216: Advanced Database Systems (Data-intensive Computing Systems) How MapReduce Works (in Hadoop) Shivnath Babu.
VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT.
Cloud MapReduce : a MapReduce Implementation on top of a Cloud Operating System Speaker : 童耀民 MA1G Authors: Huan Liu, Dan Orban Accenture.
Computer System Architectures Computer System Software
Introduction To Windows Azure Cloud
Larisa kocsis priya ragupathy
DISTRIBUTED COMPUTING
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.
SALSA HPC Group School of Informatics and Computing Indiana University.
NGS Innovation Forum, Manchester4 th November 2008 Condor and the NGS John Kewley NGS Support Centre Manager.
VApp Product Support Engineering Rev E VMware Confidential.
Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center Kasidit Chanchio Vasabilab Dept of Computer Science,
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications Thilina Gunarathne, Tak-Lon Wu Judy Qiu, Geoffrey Fox School of Informatics,
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
Microsoft Cloud Computing. Topics to be covered 1.Environmental Features of windows azure 2.What is Cloud Computing 3.Roles in Cloud Computing 4.Benefits.
1 Chapter Overview Using Standby Servers Using Failover Clustering.
SYSTEM MODELS FOR ADVANCED COMPUTING Jhashuva. U 1 Asst. Prof CSE
CSE 5810 Biomedical Informatics and Cloud Computing Zhitong Fei Computer Science & Engineering Department The University of Connecticut CSE5810: Introduction.
Volunteer Computing with BOINC: a Tutorial David P. Anderson Space Sciences Laboratory University of California – Berkeley May 16, 2006.
Microsoft Build /1/2017 1:25 AM Azure Batch
Enhancements for Voltaire’s InfiniBand simulator
Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.
TensorFlow– A system for large-scale machine learning
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
LIGHTWEIGHT CLOUD COMPUTING FOR FAULT-TOLERANT DATA STORAGE MANAGEMENT
Chapter 1: Introduction
PEER-TO-PEER NETWORK FAMILIES
Introduction to Distributed Platforms
Distributed Network Traffic Feature Extraction for a Real-time IDS
Chapter 1: Introduction
Chapter 1: Introduction
VIDIZMO Deployment Options
Chapter 1: Introduction
CHAPTER 3 Architectures for Distributed Systems
AWS Batch Overview A highly-efficient, dynamically-scaled, batch computing service May 2017.
Chapter 21: Virtualization Technology and Security
Design and Implement Cloud Data Platform Solutions
Chapter 1: Introduction
Myoungjin Kim1, Yun Cui1, Hyeokju Lee1 and Hanku Lee1,2,*
Chapter 1: Introduction
Replication Middleware for Cloud Based Storage Service
Overview Introduction VPS Understanding VPS Architecture
Cloud Computing Dr. Sharad Saxena.
Chapter 22: Virtualization Security
Chapter 4: Threads.
Haiyan Meng and Douglas Thain
Chapter 17: Database System Architectures
Outline Virtualization Cloud Computing Microsoft Azure Platform
Ch 4. The Evolution of Analytic Scalability
CLUSTER COMPUTING.
Cloud computing mechanisms
Chapter 1: Introduction
Software models - Software Architecture Design Patterns
Language Processors Application Domain – ideas concerning the behavior of a software. Execution Domain – Ideas implemented in Computer System. Semantic.
Technical Capabilities
Subject Name: Operating System Concepts Subject Number:
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 1: Introduction
COMPANY PROFILE: REELWAY
Chapter 1: Introduction
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Job Management on Azure Batch/Scheduler Duc-Huy Do and Koukeng Yang (Keng)

Outline Azure Batch/Scheduler Architecture Overview Azure Job Scheduling Overview Applications Suitable on Azure Batch/Scheduler Benchmarks of Azure for Parallel Computing 1-4 : (I will start the introduction if thats ok,) ill go briefly about outline and mention slide 3 and 4. 5- 6-11 : job scheduling, virtual cluster ( + comparison xenosever), job scheduling case study parallel loading) 12 - 13 - 14-15 : azure benchmark

Azure Batch/Scheduler Architecture Overview Azure Batch - For parallel HPC style jobs on the cloud. Run multiple jobs in parallel on an Azure cloud VM cluster. Azure Scheduler - For recurring jobs on a specific interval. For example, run this job every day at 6:00 PM PST. 2 4 3 7 5 1 6 “Centralized job scheduler” NOTES: https://azure.microsoft.com/en-us/documentation/articles/scheduler-intro/ http://gauravmantri.com/2013/11/10/windows-azure- scheduler-service-part-i-introduction/

Azure Job Scheduling Overview To Run a Job, need to instantiate few objects: Specify Batch pools (create and pick the number of needed VMs, pick the specs etc) Specify Batch jobs and tasks (jobs created and tasks submitted to a pool, tasks are queued, then assigned to VM’s) (Automatic detection and retry of frozen or failing tasks) -Use multiple compute units -A job is decomposed into multiple independent tasks -Tasks are processed in a separate compute nodes, simultaneously Create a job object (need a job id) Assign it to a pool Create tasks for the job

Virtual Cluster: (some benefits) Virtual cluster nodes can be either physical machines or VMs, and of course you can have multiple VMs running different OSes on the same physical node. A VM runs with a guest OS, which is often different from the host OS that manages the resources in the physical machine upon which the VM is running. The purpose of using VMs is to consolidate multiple functionalities on the same server, which greatly enhances server utilization and application flexibility. You can have VMs replicated in multiple servers for the purpose of promoting distributed parallelism, fault tolerance and disaster recovery. The number of nodes within a virtual cluster can grow or shrink dynamically, similar to the way an overlay network varies in size in a peer-to-peer network. The failure of any physical nodes may disable some VMs installed on the failing nodes, but VM failure won’t pull down the host system. Difference with Amazon Xeno Server: They have Management VM that acts as a Job Manager Server? A host OS to manage the resources in the physical machine where the VM (with guest os) is running.

Use case: Parallel Data Loading with Azure Batch Get feed from format info from Metadata Create destination tables Get list of file to process Load parser class to use For each file to process Load file content from Blob storage Parse file content to DataTable Dump DataTable content to destination (DW)

Get lists of feeds to process Create a Job, create a task for each feed, add the tasks to the job, submit the job

Parallel Applications Suitable on Azure Batch/Scheduler Applications that are embarrassingly parallel Applications that don’t require massive amounts of message passing and state management (each VM can only handle 32 messages asynchronously at a time) Example suitable parallel applications: image rendering, image analysis, monte carlo risk simulations, media transcoding, etc. “Data Computation”

Azure Parallel Computing Test Benchmarks Bioinformatic Genome Sequence Assembly test on Azure and Windows HPC Cluster3 Windows HPC Cluster Specs - total of 24 CPU cores - total of 24GB of memory Azure Cluster Specs - total of 8 cores @ 1.6 Ghz each - total of 14GB of RAM Data Set Size Small - (1.4 GB) Medium - (3.75 GB) Large - (10.9 GB)

Simulation Runner: A cloud-based Parallel and Distributed HPC platform

Improvement with cache (Azure Blob Storage)

References Need to be changed [1] B. S. Đorđević, S. P. Jovanović, and V. V. Timčenko, “Cloud Computing in Amazon and Microsoft Azure platforms: Performance and service comparison,” in Telecommunications Forum Telfor (℡FOR), 2014 22nd, 2014, pp. 931–934. [2] B. D. Martino, G. Cretella, A. Esposito, and R. G. Sperandeo, “Semantic Representation of Cloud Services: A Case Study for Microsoft Windows Azure,” in 2014 International Conference on Intelligent Networking and Collaborative Systems (INCoS), 2014, pp. 647–652. [3] Guangjun Zhang, Yingying Yao, Chunmiao Zheng, "HPC Environment on Azure Cloud for Hydrological Parameter Estimation", Computational Science and Engineering (CSE) 2014 IEEE 17th International Conference on, pp. 299-304, 2014. [4] M. Bihis and S. Roychowdhury, “A generalized flow for multi-class and binary classification tasks: An Azure ML approach,” in 2015 IEEE International Conference on Big Data (Big Data), 2015, pp. 1728–1737. [5] P. Subhashini and S. Nalla, “Data retrieval mechanism using Amazon simple storage service and Windows Azure,” in 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), 2016, pp. 412–414. [6] V. Persico, P. Marchetta, A. Botta, and A. Pescape, “On Network Throughput Variability in Microsoft Azure Cloud,” in 2015 IEEE Global Communications Conference (GLOBECOM), 2015, pp. 1–6. [7] Z. Liu, H. Zou, and W. Ye, “Simulation Runner: A Cloud-Based Parallel and Distributed HPC Platform,” in 2015 IEEE 8th International Conference on Cloud Computing, 2015, pp. 885–892. Need to be changed