Douglas Thain Computer Sciences Department University of Wisconsin-Madison (In Bologna for June 2000) Remote.

Slides:



Advertisements
Similar presentations
Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
Advertisements

Section 6.2. Record data by magnetizing the binary code on the surface of a disk. Data area is reusable Allows for both sequential and direct access file.
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.
Douglas Thain and Miron Livny Computer Sciences Department University of Wisconsin-Madison
Operating Systems File Systems (in a Day) Ch
Reliable I/O on the Grid Douglas Thain and Miron Livny Condor Project University of Wisconsin.
© 2007 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.1 Computer Networks and Internets with Internet Applications, 4e By Douglas.
GRID Workload Management System Massimo Sgaravatto INFN Padova.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Processes.
Douglas Thain Computer Sciences Department University of Wisconsin-Madison October Condor by Example.
Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller Center for High Throughput Computing Department of Computer Sciences.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
Threads CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
Cross Cluster Migration Remote access support Adianto Wibisono supervised by : Dr. Dick van Albada Kamil Iskra, M. Sc.
The Difficulties of Distributed Data Douglas Thain Condor Project University of Wisconsin
Condor Overview Bill Hoagland. Condor Workload management system for compute-intensive jobs Harnesses collection of dedicated or non-dedicated hardware.
Jim Basney Computer Sciences Department University of Wisconsin-Madison Managing Network Resources in.
Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Zach Miller Condor Project Computer Sciences Department University of Wisconsin-Madison Flexible Data Placement Mechanisms in Condor.
Condor Project Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
I/O Systems ◦ Operating Systems ◦ CS550. Note:  Based on Operating Systems Concepts by Silberschatz, Galvin, and Gagne  Strongly recommended to read.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
1 COMPSCI 110 Operating Systems Who - Introductions How - Policies and Administrative Details Why - Objectives and Expectations What - Our Topic: Operating.
1 Lecture 4: Threads Operating System Fall Contents Overview: Processes & Threads Benefits of Threads Thread State and Operations User Thread.
Process Management. Processes Process Concept Process Scheduling Operations on Processes Interprocess Communication Examples of IPC Systems Communication.
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
Condor Tugba Taskaya-Temizel 6 March What is Condor Technology? Condor is a high-throughput distributed batch computing system that provides facilities.
CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.
Grid Computing I CONDOR.
Transparent Process Migration: Design Alternatives and the Sprite Implementation Fred Douglis and John Ousterhout.
Intermediate Condor Rob Quick Open Science Grid HTC - Indiana University.
1 The Roadmap to New Releases Todd Tannenbaum Department of Computer Sciences University of Wisconsin-Madison
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 3: Operating-System Structures System Components Operating System Services.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Turning science problems into HTC jobs Wednesday, July 29, 2011 Zach Miller Condor Team University of Wisconsin-Madison.
Alain Roy Computer Sciences Department University of Wisconsin-Madison I/O Access in Condor and Grid.
The Roadmap to New Releases Derek Wright Computer Sciences Department University of Wisconsin-Madison
Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
CE Operating Systems Lecture 13 Linux/Unix interprocess communication.
Review of Condor,SGE,LSF,PBS
Distributed System Concepts and Architectures Services
Processes CSCI 4534 Chapter 4. Introduction Early computer systems allowed one program to be executed at a time –The program had complete control of the.
Derek Wright Computer Sciences Department University of Wisconsin-Madison Condor and MPI Paradyn/Condor.
Condor Week 2004 The use of Condor at the CDF Analysis Farm Presented by Sfiligoi Igor on behalf of the CAF group.
Distributed System Services Fall 2008 Siva Josyula
Operating Systems CSE 411 CPU Management Sept Lecture 10 Instructor: Bhuvan Urgaonkar.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
Page 1 Printing & Terminal Services Lecture 8 Hassan Shuja 11/16/2004.
Peter Couvares Associate Researcher, Condor Team Computer Sciences Department University of Wisconsin-Madison
Processes and Virtual Memory
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor and (the) Grid (one of.
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Chapter 2: The Linux System Part 2.
Distributed File Systems Questions answered in this lecture: Why are distributed file systems useful? What is difficult about distributed file systems?
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor NT Condor ported.
Five todos when moving an application to distributed HTC.
1 Module 3: Processes Reading: Chapter Next Module: –Inter-process Communication –Process Scheduling –Reading: Chapter 4.5, 6.1 – 6.3.
Intermediate Condor Monday morning, 10:45am Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Remote I/O in Condor.
Haiyan Meng and Douglas Thain
Presentation transcript:

Douglas Thain Computer Sciences Department University of Wisconsin-Madison (In Bologna for June 2000) Remote I/O in Condor

Outline › Introduction › Using Remote I/O › Implementation › Build Your Own: Bypass › Conclusion

Introduction › Distributed systems provide you with access to a diverse array of machines.  INFN Condor Pool (200)  UW/CS Condor Pool (500)  The Grid: (10000s?) › Although you have permission to use these machines, they may be unfriendly to your application.

Introduction (Cont.) › Remote I/O is an adapter which provides a friendly execution environment on an unfriendly machine. › Condor uses remote I/O to homogenize the many machines in a Condor pool. › Can we adapt this to the Grid?

What is Unfriendly? › Programs can technically execute:  Correct CPU and OS and enough memory › But missing some critical items:  No input files.  No space for output files.  No shared filesystem.  No login - run as "nobody"?

Range of Unfriendliness › Anonymous compute node on the Grid:  Run as "nobody", with no access to disk. › Machine at other institution:  Can login, have some disk, but no file system. › Machine down the hall:  Can login, share one NFS mount, but not another.

Why use an unfriendly machine? › After all, homogeneous clusters are the norm:  10s or 100s of identical machines.  Centrally administrated.  Shared filesystem Cluster File Server

Because: You need more machines! › Another hundred idle machines could be found across the street or in the next department.. Cluster File Server Cluster File Server

You need more machines! (Cont.) › But, your application may not find the resources it needs. Cluster File Server Cluster File Server HELP!

You need more machines! (Cont.) › The problem is worse when we consider a global data Grid of many resources! Cluster File Server File Server File Server Cluster File Server Cluster File Server Cluster File Server HELP!

Solution: Remote I/O › Condor remote I/O creates a friendly environment on an unfriendly machine. Cluster File Server Cluster File Server Just like home!

Outline › Introduction › Using Remote I/O › Implementation › Build Your Own: Bypass › Conclusion

Using Remote I/O › Condor provides several "universes":  Vanilla – Unmodified UNIX jobs  Standard - UNIX jobs + remote I/O  Scheduler  Globus (Not described here)  PVM/MPI

Which Universe?Cluster File Server Cluster File Server VANILLA STANDARD

Vanilla Universe › Submit any sort of UNIX program to the Condor system. › Advantages:  No relinking required.  Any program at all, including Binaries Shell scripts Interpreted programs (java, perl) Multiple processes

Vanilla Universe (Cont.) › Disadvantages:  No checkpointing.  Very limited remote I/O services. Specify input files explicitly. Specify output files explicitly.  Condor will refuse to start a vanilla job on a machine that is unfriendly. ClassAds: FilesystemDomain and UIDDomain

Standard Universe › Submit a specially-linked UNIX application to the Condor system. › Advantages:  Checkpointing for fault tolerance.  Remote I/O services: Friendly environment anywhere in the world. Data buffering and staging. I/O performance feedback. User remapping of data sources.

Standard Universe (Cont.) › Disadvantages:  Must statically link with Condor library.  Limited class of applications: Single-process UNIX binaries. Certain system calls prohibited.

System Call Limitations › Standard universe does not allow:  Multiple processes: fork(), exec(), system()  Inter-process communication: semaphores, messages, shared memory  Complex I/O: mmap(), select(), poll(), non-blocking I/O, …  Kernel-level threads (User level threads are OK.)

System Call Limitations (Cont.) › Too restrictive?  Use the vanilla universe.

System Call Features › The standard universe does allow:  Signals But, Condor reserves SIGTSTP and SIGUSR1.  Sockets Keep it brief - network connections, by nature, cannot migrate or checkpoint.

System Call Features (Cont.) › The standard universe does allow:  Complex I/O on sockets select(), poll(), and non-blocking I/O can be used on sockets, but not other sorts of files.  User-level threads

Which Universe? › Vanilla:  Perfect for a Condor pool of identical machines. › Standard:  Needed for heterogeneous Condor pools, flocked pools, and more generally, unfriendly machines on the Grid. › The rest of this talk concerns the standard universe.

Using the Standard Universe › Link with Condor library. › Submit the job. › Get brief I/O feedback while running. › Get complete I/O feedback when done. › If needed, remap files.

Link with Condor Library › Simply use condor_compile in front of your normal link line. › For example, gcc main.o utils.o -o program › Becomes: condor_compile gcc main.o utils.o -o program › Despite the name, only re-linking is required, not re-compiling.

Submit Job › Create a submit file: % vi program.submit › Submit the job: % condor_submit program.submit Universe = standard input = program.in output = program.out executable = program queue 3

Brief I/O Summary % condor_q -io -- Schedd: c01.cs.wisc.edu : ID OWNER READ WRITE SEEK XPUT BUFSIZE BLKSIZE joe KB KB KB/s KB 32.0 KB joe KB KB B /s KB 32.0 KB joe 44.7 KB 22.1 KB B /s KB 32.0 KB 3 jobs; 0 idle, 3 running, 0 held

Complete I/O Summary in Your condor job "/usr/joe/records.remote input output" exited with status 0. Total I/O: KB/s effective throughput 5 files opened 104 reads totaling KB 316 writes totaling 1.2 MB 102 seeks I/O by File: buffered file /usr/joe/output opened 2 times 4 reads totaling 12.4 KB 4 writes totaling 12.4 KB buffered file /usr/joe/input opened 2 times 100 reads totaling KB 311 write totaling 1.2 MB 101 seeks

Complete I/O Summary in › The summary helps identify performance problems. Even advanced users don't know exactly how their programs and libraries operate.

Complete I/O Summary in (Cont.) › Example:  CMSSIM - physics analysis program.  “Why is this job so slow?”  Data summary: read 250 MB from 20 MB file.  Very high SEEK total -> random access.  Solution: Increase buffer to 20 MB.

Buffer Parameters › By default:  buffer_size = (512 KB)  buffer_block_size = (32 KB) › Change parameters in submit file:  buffer_size =  buffer_block_size = 32768

If Needed, Remap Files › Suppose the program is hard-coded to open datafile, but you want each instance to get a slightly different copy. In the submit file, add: file_remaps = "datafile = /usr/joe.data.$(PROCESS)" › Process one gets /usr/joe.data.1 › Process two gets /usr/joe.data.2 › And so on...

If Needed, Remap Files (Cont.) › The same syntax will allows the user to direct the application to other third-party data sources such as web servers: file_remaps = "datafile =

Outline › Introduction › Using Remote I/O › Implementation › Build Your Own: Bypass › Conclusion

The Big Picture

The Machines Has all of your files, or knows where to find them. Accepts your identity and credentials Allows you to run a process, but it might not: › have some of your files. › accept your identity.

General Strategy › Trap all the application's I/O operations.  open(), close(), read(), write(), seek(), … › Route them to the correct service. › Cache both service decisions and actual data.

Application › Plain UNIX program. › Unaware that it is part of a distributed system. › Statically linked against Condor library.

Condor Library › Sends system calls to various services via RPC. › Buffers and stages data. › Asks shadow for policy decisions.

Shadow › Makes policy decisions for application. › Executes remote system calls for application.

Opening a File Shadow Condor Library Application Open("datafile",O_RDONLY);

Opening a File Shadow Condor Library Application Where is "datafile?" Open("datafile",O_RDONLY);

Opening a File Shadow Condor Library Application URL: local:/usr/joe/datafile Buffering: none. Open("datafile",O_RDONLY); Where is "datafile?"

Opening a File Shadow Condor Library Application Open("/usr/joe/datafile",O_RDONLY) Foreign Machine Open("datafile",O_RDONLY); Where is "datafile?" URL: local:/usr/joe/datafile Buffering: none.

Opening a File Shadow Condor Library Application Foreign Machine Open("datafile",O_RDONLY); Where is "datafile?" Open("/usr/joe/datafile",O_RDONLY) Success URL: local:/usr/joe/datafile Buffering: none.

Opening a File Shadow Condor Library Application Success Foreign Machine Open("datafile",O_RDONLY); Where is "datafile?" Success URL: local:/usr/joe/datafile Buffering: none. Open("/usr/joe/datafile",O_RDONLY)

Shadow Responses › URL:  remote: Use remote system calls.  local: Use local system calls.  special: Use local system calls, disable checkpointing.  http: Fetch from a web server.  Others in development…

Shadow Responses (Cont.) › Buffering:  None.  Buffer partial data.  Stage whole file to local disk.

Some Fast, Some Slow Shadow Condor Library Application Foreign Machine RPC over network: Several milliseconds, or (much) worse! Function call: Less than a microsecond? System call: 10s or 100s of microseconds

Reading data from a file Shadow Condor Library Application Read 1024 bytes from "datafile" Read 1024 bytes from "/usr/joe/datafile" Success Foreign Machine Library remembers where datafile is - no need to communicate with the shadow Low latency, random-access data source: Read directly

Reading data from a file Shadow Condor Library Application Read 1024 bytes from "otherfile" up to 32 times Read bytes from "otherfile" High-latency, random-access data source: Buffer large chunks Data buffer

Reading data from a file Shadow Condor Library Application High-latency, sequential-access data source: Stage file to local disk. FTP Server Local copy of "otherfile" Open("datafile",O_RDONLY); Where do I open "datafile"? URL: ftp://server/datafile Buffer: Stage to disk.

Reading data from a file Shadow Condor Library Application Random access service can be provided from the local copy. FTP Server Local copy of "otherfile"

Guiding Principle › Policy in shadow, mechanisms in library.  Shadow makes policy decisions because it knows the system configuration.  Library is closest to the application, so it routes system calls to the destination selected by the shadow.

Policy at Shadow Shadow User Override "I know file x can be quickly loaded from ftp://ftp.cs.wisc.edu/y" Condor Library "There is plenty of space to stage files over here." Scheduling System "The foreign machine is not in your cluster"

Policy at Shadow Shadow User Override "I know file x can be quickly loaded from ftp://ftp.cs.wisc.edu/y" Condor Library "There is plenty of space to stage files over here." Scheduling System "The foreign machine is not in your cluster" "Direct all requests for x to ftp://ftp.cs.wisc.edu/y"

Policy Decisions › May be different on each foreign machine  In same building: "use foreign machine”  In other country: "use home machine” › May change as job migrates  same building -> other country › May change by user control  "Let's see if NFS is faster than AFS”

Outline › Introduction › Using Remote I/O › Implementation › Build Your Own: Bypass › Conclusion

Build Your Own: Bypass › Generalize remote I/O -> split execution. › Building split execution systems is hard. › Bypass is a tool for building split execution systems.

Build Your Own: Bypass (Cont.) › Unlike Condor, Bypass can be used on any UNIX program without re-linking. › Example: GASS Agent

Generalized Split Execution Shadow Agent Application Trap a subset of available system calls Replace them with arbitrary code. Allow RPCs to a shadow in the home environment. Allow arbitrary code at the home machine.

Split Execution is Hard › Trapping system calls involves a large body of knowledge of particular OS and version  Library entry points: _read, __read, __libc_read  System call entries: socket(), open("/dev/tcp")  Wacky header files: #define stat(a,b) _xstat(VERSION,a,b)

Split Execution is Hard (Cont.) › RPCs must be platform-neutral  Byte sizes and ordering off_t is 8 bytes on Alpha, but 4 bytes on Intel  Structure contents and order struct stat has different members on different platforms  Symbolic values O_CREAT is a source-level symbol, but its actual value is different on every platform.

Split Execution is Hard (Cont.) › The code replacing system calls must be able to execute the original system calls! › Example: Sandboxing  Trap open().  Check for unauthorized file names. Return failure for some. Re-invoke the original open() for others.

Bypass Makes it Easy! Specification File Knowledge File Bypass We provide: ugly details of system call trapping. You provide: How you want the system to work. Your Agent Your Shadow

Example: GASS Agent › Let's create an Agent that changes all calls to UNIX open() and close() into their analogues in Globus GASS. This will instrument the application with remote file fetching and staging. Application Agent (THE GRID) Open(“ Globus_gass_open(“

Example: GASS Agent (Cont.) agent_prologue "globus_gass_file.h" }}; int open( const char *name, int flags, [int mode] ) agent_action {{ globus_module_activate( GLOBUS_GASS_FILE_MODULE ); return globus_gass_open( namame, flags, mode ); }}; int close( int fd ) agent_action {{ return globus_gass_close( fd ); }};

Example: GASS Agent (Cont.) › Generate the source code.  bypass -agent gass.bypass › Compile into a shared library.  g++ gass_agent.C (libraries) -shared -o gass.so › Insert the library into your environment.  setenv LD_PRELOAD /path/to/gass.so

Example: GASS Agent (Cont.) › Now, run any plain old UNIX program. The program may be given URLs in place of filenames. Globus GASS will stage and cache the needed files. % cp /tmp/yahoo.html % grep address Academic information

Bypass › Uses ideas from Condor, but is a separate tool. › User specifies design, Bypass provides details.

Bypass (Cont.) › Can be applied to any unmodified, dynamically-linked UNIX program at run time.  Works on Linux, Solaris, IRIX, OSF/1.  Static linking only on HP-UX.

Bypass (Cont.) › The "knowledge file" is certainly not complete!  Our experience: Each new OS version has new tricks in the standard library that must be foleded into the knowledge file.

Outline › Introduction › Using Remote I/O › Under the Hood › Build Your Own: Bypass › Conclusion

Future Work › Lots of new plumbing, but still adding faucets  FTP, SRB, GASS, SAM … › Find and use third-party staging grounds?  Turn checkpoint server into general staging ground.

Future Work (Cont.) › Interaction with CPU scheduling:  Release CPU while waiting for slow tape?  Stage data, then allocate CPU?

In Summary… › Harnessing large numbers of CPUs requires that you use unfriendly machines. › Remote I/O is an adapter which provides a friendly execution environment on an unfriendly machine.

In Summary… (Cont.) › Condor uses remote I/O to homogenize the many machines in a Condor pool. › Bypass allows the quick construction of split execution systems, allowing remote I/O techniques to be used outside of Condor.

Need More Info? › Contact Douglas Thain  › Condor Web Page:  › Bypass Web Page:  › Questions now?