Presentation is loading. Please wait.

Presentation is loading. Please wait.

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY The stagesub tool Sudharshan S. Vazhkudai Computer Science Research Group CSMD Oak Ridge National.

Similar presentations


Presentation on theme: "OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY The stagesub tool Sudharshan S. Vazhkudai Computer Science Research Group CSMD Oak Ridge National."— Presentation transcript:

1 OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY The stagesub tool Sudharshan S. Vazhkudai Computer Science Research Group CSMD Oak Ridge National Laboratory

2 OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Problem Space  Users’ job workflow (stage, compute, offload) is mired by numerous issues  Staging and offloading are large data jobs prone to failure  Early staging and late offloading wastes scratch space  Delayed offloading renders result data vulnerable to purging  Compute time wasted on staging/offloading as part of the job

3 OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Solution  Recognition that job input and output data needs to be managed more efficiently: in the way it is scheduled  Automatic scheduling of data staging/offloading activities so they can better coincide with computing  Results:  From a center standpoint:  Our techniques optimize resource usage and increase data and service availability  From a user job standpoint:  Our techniques reduce job turnaround time and optimize the usage of allocated time

4 OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Coordination between staging, offloading and computation  Motivation  Lack of global coordination between the storage hierarchy and system software  Problems with manual and scripted staging  Human operational cost, wasted compute time/storage and increased wait time due to resubmissions  Our approach  Explicit specification of I/O activities alongside computation in a job script  Zero-charge Data Transfer Queue  Planning and Orchestration

5 OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Specification of I/O Activities  Instrumented PBS job script:  #!/bin/bash  #PBS -N ldrdtest  #PBS -l nodes=4  #PBS -l walltime=00:015:00  #PBS -q batch  #PBS -A project-account  #PBS -M vazhkudaiss@ornl.govvazhkudaiss@ornl.gov  #STAGEIN [-d HH:MM:SS] hpssdir://hpss.ccs.ornl.gov/home/vazhkuda/inp [-k /spin/sys/.keytab/vazhkuda.kt] [-p 1217] [-l vazhkuda] file:///ccs/home/vazhkuda/SVLDRD/test/scratch file:///ccs/home/vazhkuda/SVLDRD/test/scratch  Multiple stagein directives: from more than one source  Your compute job  #STAGEOUT file:///ccs/home/vazhkuda/SVLDRD/test/scratch/out hpssdir:///home/vazhkuda/inp  Multiple stageout directives: to more than one source  Protocols supported: scp://, hpss://, gsiftp://, file://

6 OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Data Transfer Queue  Motivation:  Queuing up and scheduling data transfers  Treats data transfers as “data jobs”  Enables efficient management of data transfers  Specifics:  Zero-charge data transfer queue  Queue operated with the same scheduling policy as compute queue  Queue is managed using access controls to avoid misuse  Added benefit:  Data transfer jobs can now be logged and charged, if need be!

7 OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Orchestration  Parsing into individual stagin, compute and stageout jobs  Dependency setup and management using resource manager primitives: wrapper around qsub Data Queue Job Queue Head Node 1. Stage Data 2. Compute Job 3. Offload Data Job Script I/O Nodes Compute Nodes Planner 1 2 after 1 3 after 2

8 OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Usage  module load stagesub  man stagesub  stagesub  References:  Z. Zhang, C. Wang, S. S. Vazhkudai, X, Ma, G. Pike, J.W. Cobb, F. Mueller, “Optimizing Center Performance through Coordinated Data Staging, Scheduling and Recovery”, Proceedings of Supercomputing 2007 (SC07): Int'l Conference on High Performance Computing, Networking, Storage and Analysis, Reno, Nevada, November 2007.  H. Monti, A.R. Butt, S.S. Vazhkudai, “Just-in-time Staging of Large Input Data for Supercomputing Jobs”, Proceedings of Petascale Data Storage Workshop, Supercomputing 2008, Austin, Texas, November 2008.


Download ppt "OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY The stagesub tool Sudharshan S. Vazhkudai Computer Science Research Group CSMD Oak Ridge National."

Similar presentations


Ads by Google