1 Integrating GPUs into Condor Timothy Blattner Marquette University Milwaukee, WI April 22, 2009.

Slides:



Advertisements
Similar presentations
Building a secure Condor ® pool in an open academic environment Bruce Beckles University of Cambridge Computing Service.
Advertisements

CSC 360- Instructor: K. Wu Overview of Operating Systems.
Operating System.
Operating System Structures
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
Chapter 2 Operating System Overview Operating Systems: Internals and Design Principles, 6/E William Stallings.
Monte-Carlo method and Parallel computing  An introduction to GPU programming Mr. Fang-An Kuo, Dr. Matthew R. Smith NCHC Applied Scientific Computing.
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
GPUs on Clouds Andrew J. Younge Indiana University (USC / Information Sciences Institute) UNCLASSIFIED: 08/03/2012.
Acceleration of the Smith– Waterman algorithm using single and multiple graphics processors Author : Ali Khajeh-Saeed, Stephen Poole, J. Blair Perot. Publisher:
GPU Computing with CUDA as a focus Christie Donovan.
ECE 526 – Network Processing Systems Design Software-based Protocol Processing Chapter 7: D. E. Comer.
Evaluating GPU Passthrough in Xen for High Performance Cloud Computing Andrew J. Younge 1, John Paul Walters 2, Stephen P. Crago 2, and Geoffrey C. Fox.
Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.
INTRODUCTION OS/2 was initially designed to extend the capabilities of DOS by IBM and Microsoft Corporations. To create a single industry-standard operating.
Chapter 13 Embedded Systems
Chapter 6 - Implementing Processes, Threads and Resources Kris Hansen Shelby Davis Jeffery Brass 3/7/05 & 3/9/05 Kris Hansen Shelby Davis Jeffery Brass.
HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.
Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.
Computer Organization Review and OS Introduction CS550 Operating Systems.
Chapter 3 Operating Systems Introduction to CS 1 st Semester, 2015 Sanghyun Park.
Infrastructure Provision for Users at CamGrid Mark Calleja Cambridge eScience Centre
Chapter 8 Windows Outline Programming Windows 2000 System structure Processes and threads in Windows 2000 Memory management The Windows 2000 file.
Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.
Operating System Overview Dr. Sunny Jeong & Mr. M.H. Park Operating Systems: Internals and Design Principles, 6/E William Stallings.
Shared memory systems. What is a shared memory system Single memory space accessible to the programmer Processor communicate through the network to the.
Chapter 1. Introduction What is an Operating System? Mainframe Systems
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
HTCondor and BOINC. › Berkeley Open Infrastructure for Network Computing › Grew out of began in 2002 › Middleware system for volunteer computing.
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
CS 1308 Computer Literacy and the Internet. Introduction  Von Neumann computer  “Naked machine”  Hardware without any helpful user-oriented features.
Standard Grade Computing System Software & Operating Systems.
By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.
Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems CSCI-6140 – Computer Operating Systems David Goldschmidt, Ph.D.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto OS Design.
General Purpose Computing on Graphics Processing Units: Optimization Strategy Henry Au Space and Naval Warfare Center Pacific 09/12/12.
Applying GPU and POSIX Thread Technologies in Massive Remote Sensing Image Data Processing By: Group 17 King Mongkut's Institute of Technology Ladkrabang.
GPU Architecture and Programming
Privilege separation in Condor Bruce Beckles University of Cambridge Computing Service.
Guide To UNIX Using Linux Third Edition Chapter 8: Exploring the UNIX/Linux Utilities.
"Distributed Computing and Grid-technologies in Science and Education " PROSPECTS OF USING GPU IN DESKTOP-GRID SYSTEMS Klimov Georgy Dubna, 2012.
1 Threads, SMP, and Microkernels Chapter Multithreading Operating system supports multiple threads of execution within a single process MS-DOS.
Jie Chen. 30 Multi-Processors each contains 8 cores at 1.4 GHz 4GB GDDR3 memory offers ~100GB/s memory bandwidth.
Some key aspects of NVIDIA GPUs and CUDA. Silicon Usage.
Operating System Structure A key concept of operating systems is multiprogramming. –Goal of multiprogramming is to efficiently utilize all of the computing.
We will focus on operating system concepts What does it do? How is it implemented? Apply to Windows, Linux, Unix, Solaris, Mac OS X. Will discuss differences.
Multithreaded Programing. Outline Overview of threads Threads Multithreaded Models  Many-to-One  One-to-One  Many-to-Many Thread Libraries  Pthread.
1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
Operating System Concepts
GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.
NVIDIA® TESLA™ GPU Based Super Computer By : Adam Powell Student # For COSC 3P93.
CSCE451/851 Introduction to Operating Systems
Introduction to threads
REAL-TIME OPERATING SYSTEMS
Gwangsun Kim, Jiyun Jeong, John Kim
THE OPERATION SYSTEM The need for an operating system
CS490 Windows Internals Quiz 2 09/27/2013.
NVIDIA Profiler’s Guide
Privilege Separation in Condor
NVIDIA Fermi Architecture
General Programming on Graphical Processing Units
General Programming on Graphical Processing Units
Basic Grid Projects – Condor (Part I)
Lecture Topics: 11/1 General Operating System Concepts Processes
Mr. M. D. Jamadar Assistant Professor
Presentation transcript:

1 Integrating GPUs into Condor Timothy Blattner Marquette University Milwaukee, WI April 22, 2009

2 Outline Background and Vision Background and Vision Graphics Cards Graphics Cards Condor Approach Condor Approach Problems Problems Conclusions and Future Work Conclusions and Future Work

3 Graphics cards Powerful – NVIDIA Tesla C1060 Powerful – NVIDIA Tesla C1060  240 massively parallel processing cores  4 GB GDDR3  CUDA Capable ~993 gigaflops ~993 gigaflops ~$1,300 ~$1,300 Cheap – NVIDIA 9800 GT Cheap – NVIDIA 9800 GT  112 massively parallel processing cores  512 MB GDDR3  CUDA Capable ~$120 ~$120

4 Vision and Focus Pool of computers containing graphics cards, managed by Condor Pool of computers containing graphics cards, managed by Condor Provide users the ability to utilize graphics cards identified by Condor Provide users the ability to utilize graphics cards identified by Condor ? ? ? Central Manager

5 Opportunities Resources may already be there Majority of machines have graphics cards in them Majority of machines have graphics cards in them GPU resources sit idle while Condor runs on the CPU GPU resources sit idle while Condor runs on the CPU Similar work GPUGRID.net GPUGRID.net Distributed computing project using NVIDIA graphics card for atom molecular simulations of proteins Distributed computing project using NVIDIA graphics card for atom molecular simulations of proteins Uses GPU-enabled BOINC client Uses GPU-enabled BOINC client

6 Prototype Implementation  Linux only  Script queries operating system and graphics card  Hawkeye Cron job manager runs script  Script outputs graphics card information into ClassAd format  Binary for NVIDIA cards for more specific information

7 Graphics Card Architecture

8 Graphics card APIs  Favor general purpose computations  CUDA (NVIDIA)  Brook (ATI)  openCL (Khronos Group)

9 CUDA Programming Model  Kernels are functions run on the device (GPU)  Host (CPU) code invokes kernels and determines – Number of threads – Thread block structure for organizing threads  Kernel invocations are asynchronous – Control returns to the CPU immediately – CUDA provides synchronization primitives – Some CUDA calls (e.g. memory allocation) are synchronous

10 Hawkeye Cron Job Manager Provides mechanism for collecting, storing, and using information about computers Provides mechanism for collecting, storing, and using information about computers Periodically executes specified program(s) Periodically executes specified program(s)  Program outputs in form of ClassAd  Outputs are added to machine's ClassAd

11 Hawkeye Implementation Added to local configuration file Added to local configuration file Runs script every minute Runs script every minute Condor user must be granted graphics card privileges in order to query the card Condor user must be granted graphics card privileges in order to query the card STARTD_CRON_JOBLIST = $(STARTD_CRON_JOBLIST), UPDATEGPU STARTD_CRON_UPDATEGPU_EXECUTABLE = gpu.sh STARTD_CRON_UPDATEGPU_PERIOD = 1m STARTD_CRON_UPDATEGPU_MODE = Periodic STARTD_CRON_UPDATEGPU_KILL = True

12 Script Output HasGpu = True NGpu = 1 Gpu0 = "Quadro FX 3700" Gpu0CudaCapable = True Gpu0_Major = 1 Gpu0_Minor = 1 Gpu0Mem = Gpu0Procs = 14 Gpu0Cores = 112 Gpu0ShareMem = Gpu0ThreadsPerBlock = 512 Gpu0ClockRate = 1.24 HasCuda = True -

13 Job Submission Users can submit jobs with GPU requirements into Condor Users can submit jobs with GPU requirements into Condor Portable across Linux Distros Portable across Linux Distros Universe = vanilla Executable = tests/CudaJob Initialdir = gpuJobs Requirements = (HasGpu == true) && (Gpu0CudaCapable == true) Log = gpu_test.log Error = gpu_test.stderr Output = gpu_test.stdout Queue condor_submit gpu_job.submit

14 Access Control /dev/nvidiactl, /dev/nvidia* devices need read/write by submitting/running user /dev/nvidiactl, /dev/nvidia* devices need read/write by submitting/running user Could be Could be Nobody, open access Nobody, open access Controlled by Unix group, containing limited users Controlled by Unix group, containing limited users Integrated more directly with Condor user control, slot users Integrated more directly with Condor user control, slot users

15 Problems Preemption Preemption  Jobs running in GPU kernel cannot be interrupted reliably by Unix signals Watchdog timer Watchdog timer  After 5 seconds, job is killed  A Solution: use general purpose graphics card as secondary display Memory Security Memory Security  Malicious users, interrupting a job between GPU kernel calls, have the opportunity to overwrite or copy GPU memory

16 Summary  Condor based approach for advertising GPU resources  Linux-based prototype implementation  Can access available GPUs  Works best on dedicated machines, with no need for preemption  Current Limitations  Doesn’t report GPU usage  Lack of preemption  Limited OS and video card support

17 Future Work Create benchmark and testing suite Create benchmark and testing suite Handle preemption Handle preemption Investigate how watchdog works Investigate how watchdog works GPU usage reporting GPU usage reporting Integrate memory protection Integrate memory protection Support more Operating Systems Support more Operating Systems Windows and Mac OS X Windows and Mac OS X Support alternative architectures and APIs Support alternative architectures and APIs Brook and OpenCL Brook and OpenCL

18 Questions?Contact: