Download presentation
Presentation is loading. Please wait.
1
Job Management on Azure Batch/Scheduler
Duc-Huy Do and Koukeng Yang (Keng)
2
Outline Azure Batch/Scheduler Architecture Overview
Azure Job Scheduling Overview Applications Suitable on Azure Batch/Scheduler Benchmarks of Azure for Parallel Computing 1-4 : (I will start the introduction if thats ok,) ill go briefly about outline and mention slide 3 and 4. 5- 6-11 : job scheduling, virtual cluster ( + comparison xenosever), job scheduling case study parallel loading) 12 - 13 - 14-15 : azure benchmark
5
Azure Batch/Scheduler Architecture Overview
Azure Batch - For parallel HPC style jobs on the cloud. Run multiple jobs in parallel on an Azure cloud VM cluster. Azure Scheduler - For recurring jobs on a specific interval. For example, run this job every day at 6:00 PM PST. 2 4 3 7 5 1 6 “Centralized job scheduler” NOTES: scheduler-service-part-i-introduction/
6
Azure Job Scheduling Overview
To Run a Job, need to instantiate few objects: Specify Batch pools (create and pick the number of needed VMs, pick the specs etc) Specify Batch jobs and tasks (jobs created and tasks submitted to a pool, tasks are queued, then assigned to VM’s) (Automatic detection and retry of frozen or failing tasks) -Use multiple compute units -A job is decomposed into multiple independent tasks -Tasks are processed in a separate compute nodes, simultaneously Create a job object (need a job id) Assign it to a pool Create tasks for the job
7
Virtual Cluster: (some benefits)
Virtual cluster nodes can be either physical machines or VMs, and of course you can have multiple VMs running different OSes on the same physical node. A VM runs with a guest OS, which is often different from the host OS that manages the resources in the physical machine upon which the VM is running. The purpose of using VMs is to consolidate multiple functionalities on the same server, which greatly enhances server utilization and application flexibility. You can have VMs replicated in multiple servers for the purpose of promoting distributed parallelism, fault tolerance and disaster recovery. The number of nodes within a virtual cluster can grow or shrink dynamically, similar to the way an overlay network varies in size in a peer-to-peer network. The failure of any physical nodes may disable some VMs installed on the failing nodes, but VM failure won’t pull down the host system. Difference with Amazon Xeno Server: They have Management VM that acts as a Job Manager Server? A host OS to manage the resources in the physical machine where the VM (with guest os) is running.
8
Use case: Parallel Data Loading with Azure Batch
Get feed from format info from Metadata Create destination tables Get list of file to process Load parser class to use For each file to process Load file content from Blob storage Parse file content to DataTable Dump DataTable content to destination (DW)
9
Get lists of feeds to process
Create a Job, create a task for each feed, add the tasks to the job, submit the job
12
Parallel Applications Suitable on Azure Batch/Scheduler
Applications that are embarrassingly parallel Applications that don’t require massive amounts of message passing and state management (each VM can only handle 32 messages asynchronously at a time) Example suitable parallel applications: image rendering, image analysis, monte carlo risk simulations, media transcoding, etc. “Data Computation”
13
Azure Parallel Computing Test Benchmarks
Bioinformatic Genome Sequence Assembly test on Azure and Windows HPC Cluster3 Windows HPC Cluster Specs - total of 24 CPU cores - total of 24GB of memory Azure Cluster Specs - total of Ghz each - total of 14GB of RAM Data Set Size Small - (1.4 GB) Medium - (3.75 GB) Large - (10.9 GB)
14
Simulation Runner: A cloud-based Parallel and Distributed HPC platform
15
Improvement with cache (Azure Blob Storage)
16
References Need to be changed
[1] B. S. Đorđević, S. P. Jovanović, and V. V. Timčenko, “Cloud Computing in Amazon and Microsoft Azure platforms: Performance and service comparison,” in Telecommunications Forum Telfor (℡FOR), nd, 2014, pp. 931–934. [2] B. D. Martino, G. Cretella, A. Esposito, and R. G. Sperandeo, “Semantic Representation of Cloud Services: A Case Study for Microsoft Windows Azure,” in International Conference on Intelligent Networking and Collaborative Systems (INCoS), 2014, pp. 647–652. [3] Guangjun Zhang, Yingying Yao, Chunmiao Zheng, "HPC Environment on Azure Cloud for Hydrological Parameter Estimation", Computational Science and Engineering (CSE) 2014 IEEE 17th International Conference on, pp , 2014. [4] M. Bihis and S. Roychowdhury, “A generalized flow for multi-class and binary classification tasks: An Azure ML approach,” in 2015 IEEE International Conference on Big Data (Big Data), 2015, pp. 1728–1737. [5] P. Subhashini and S. Nalla, “Data retrieval mechanism using Amazon simple storage service and Windows Azure,” in rd International Conference on Computing for Sustainable Global Development (INDIACom), 2016, pp. 412–414. [6] V. Persico, P. Marchetta, A. Botta, and A. Pescape, “On Network Throughput Variability in Microsoft Azure Cloud,” in 2015 IEEE Global Communications Conference (GLOBECOM), 2015, pp. 1–6. [7] Z. Liu, H. Zou, and W. Ye, “Simulation Runner: A Cloud-Based Parallel and Distributed HPC Platform,” in 2015 IEEE 8th International Conference on Cloud Computing, 2015, pp. 885–892. Need to be changed
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.