Job Scheduling and Runtime in DLWorkspace

Slides:



Advertisements
Similar presentations
© 2012 All rights reserved to Ceedo. Flexible Desktops. Dynamic Workplace. Ceedo Client Offerings For Service Providers Ceedo Client Workspace Virtualization.
Advertisements

Server 2012 R2 Essentials - What’s new ? Bart #techninebe Technine Group.
Report Distribution Report Distribution in PeopleTools 8.4 Doug Ostler & Eric Knapp 7264.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 8: Implementing and Managing Printers.
Google App Engine Google APIs OAuth Facebook Graph API
System Center 2012 Setup The components of system center App Controller Data Protection Manager Operations Manager Orchestrator Service.
© 2010 VMware Inc. All rights reserved Access Control Module 8.
Access Control Module 8. Module You Are Here VMware vSphere 4.1: Install, Configure, Manage – Revision A vSphere Environment Introduction to VMware.
HORIZONT 1 TWS/WebAdmin The Web Interface for TWS Release Notes HORIZONT Software for Datacenters Garmischer Str. 8 D München Tel ++49(0)89 / 540.
Boston Bootcamp April 27 th, 2013 Azure Websites Udaiappa Ramachandran ( Udai
Components of Windows Azure - more detail. Windows Azure Components Windows Azure PaaS ApplicationsWindows Azure Service Model Runtimes.NET 3.5/4, ASP.NET,
A Web 2.0 Portal for Teragrid Fugang Wang Gregor von Laszewski May 2009.
On Premises Storage Servers Networking O/S Middleware Virtualization Data Applications Runtime You manage Infrastructure (as a Service) Storage.
Projects. High Performance Computing Projects Design and implement an HPC cluster with one master node and two compute nodes. (Hint: use Rocks HPC Cluster.
1. Introduction  The JavaScript Grid Portal is trying to find a way to access Grid through Web browser, while using Web 2.0 technologies  The portal.
86% 50% Infrastructure provisioning Enterprise-class multi- tenant infrastructure for hybrid environments System Center capabilities Application.
111 EMC CONFIDENTIAL—INTERNAL USE ONLY NMC -- NW Administration NMC Team NetWorker 7.3 TOI July 28, 2005.
AppManager Product Status Update David Mount Technical Manager – UK, Ireland & Middle East David Mount Technical Manager – UK, Ireland & Middle East.
CSE 548 Advanced Computer Network Security Document Search in MobiCloud using Hadoop Framework Sayan Cole Jaya Chakladar Group No: 1.
Mainframe (Host) - Communications - User Interface - Business Logic - DBMS - Operating System - Storage (DB Files) Terminal (Display/Keyboard) Terminal.
ALICE, ATLAS, CMS & LHCb joint workshop on
Ubuntu, SUSE, OpenSUSE, CentOS & Oracle EL + hundreds on VM Depot Bring your own framework! Ecosystem Supported Microsoft 1st Party Support.
Module 4 Planning for Group Policy. Module Overview Planning Group Policy Application Planning Group Policy Processing Planning the Management of Group.
NTU Cloud 2010/05/30. System Diagram Architecture Gluster File System – Provide a distributed shared file system for migration NFS – A Prototype Image.
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Auditing Project Architecture VERY HIGH LEVEL Tanya Levshina.
WMarket For Adminstrators Install with Docker or the Automatic Script.
Microsoft Virtual Academy Module 12 Managing Services with VMM and App Controller.
Sharing Resources Lesson 6. Objectives Manage NTFS and share permissions Determine effective permissions Configure Windows printing.
SoCal GPUG Meeting – February Agenda Installation and Configuration Standard Features Standard vs. eConnect Adapters Alternatives Resources Integration.
Tutorial on Science Gateways, Roma, Catania Science Gateway Framework Motivations, architecture, features Riccardo Rotondo.
Free, online, technical courses Take a free online course. Microsoft Virtual Academy.
Microsoft Virtual Academy Module 9 Configuring and Managing the VMM Library.
Docker for Ops: Operationalize Your Apps in Production Vivek Saraswat Sr. Product Evan Hazlett Sr. Software
Elara Introduction Wentao Zhang? (NOTE: PASTE IN PORTRAIT AND SEND BEHIND FOREGROUND GRAPHIC FOR CROP)
Linux Systems Administration 101 National Computer Institute Sep
Linux Systems Administration
ONAP E2E Flow `.
ArcGIS for Server Security: Advanced
Architecture Review 10/11/2004
“Information Sharing Portal for Indus Sub-System”
Creo Schematics Installation
Interactive Job in DLWorkspace
Web application hosting with Openshift, and Docker images
Administration Tools Cluster.exe is a command line tool that you can use for scripting or remote administration through slow WAN links. Cluadmin.exe is.
Web application hosting with Openshift, and Docker images
DL (Deep Learning) Workspace
Working With Azure Batch AI
Open Source Toolkit for Turn-Key AI Cluster (Introduction)
TensorFlow on Kubernetes with GPU Enabled
DL (Deep Learning) Workspace
DL (Deep Learning) Workspace
TYPES OF SERVER. TYPES OF SERVER What is a server.
Printers.
Setting policies in kubernetes
Confidential – Oracle Internal/Restricted/Highly Restricted
Open Source Toolkit for Turn-Key AI Cluster (Introduction)
Intro to Docker Containers and Orchestration in the Cloud
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Virtualization Layer Virtual Hardware Virtual Networking
Introduction to Apache
CUPS Print Services.
Tiers vs. Layers.
Cloud computing mechanisms
Managing Services with VMM and App Controller
Service Template Creation from the Ground Up
OpenShift as a cloud for Data Science
Service Template Creation from the Ground Up
SQL Server Devops with production data
Containers on Azure Peter Lasne Sr. Software Development Engineer
Presentation transcript:

Job Scheduling and Runtime in DLWorkspace Cloud Computing and Storage Group July. 7th, 2017 Contact: Hongzhi Li (hongzl@microsoft.com), Jin Li (jinl@microsoft.com)

System Diagram SQL server Cluster Web Portal RestfulAPI Job Manager K8s Master API

SQL server Web Portal RestfulAPI Job Manager K8s Master API Cluster Web Portal: Authentication Get job parameters from users and submit the request to RestfulAPI Browse and manage the existing jobs Monitor the cluster status etc…

SQL server Web Portal RestfulAPI Job Manager K8s Master API Cluster RestfulAPI: Process the request from web portal SubmitJob ListJobs KillJob GetJobDetail GetClusterStatus ApproveJob etc…

SQL server Web Portal RestfulAPI Job Manager K8s Master API Cluster

SQL server Web Portal RestfulAPI Job Manager K8s Master API Cluster Cluster Manager: Job manager Get new submitted jobs from SQL server, generate k8s pod description file and submitted to k8s master api. The pod description file is generated from templates. Query job status from k8s api and update the job status to SQL server etc… Log manager Node manager User manager

SQL server Web Portal RestfulAPI Job Manager K8s Master API Cluster

DLWorkspace Job Runtime Nvidia driver plugin Shared storage Special permission Special device mapping

DLWorkspace Job Runtime - Nvidia driver plugin Install nvidia driver on the host machine CoreOS: use privileged Docker to insert kernel module Ubuntu: apt-get install nvidia-*** Official Kubernetes: Put driver libraries to a folder e.g. /opt/nvidia-driver/ Map the driver folder to container (the Docker image should be inherited from nvidia/cuda) Our customized Kubernetes: Call nvidia-docker-plugin to create a Docker volume for nvidia driver libraries Mount the Docker volume to container

DLWorkspace Job Runtime - Shared Storage All the shared storage are mounted on the host and then mapped to the container Storage mount point DLWorkspace system folder storage, work, jobfiles Soft link from storage mount point to system folder Samba interface to allow users access their home folder (work folder) and data folder from windows machines (domain machines)

DLWorkspace Job Runtime - Special permission E.g. run privileged Docker Special approval work flow is supported (On going…) If the cluster is configured to allow special permission, it may require additional approval from the system admin