Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison Managing and Scheduling Data Placement (DaP) Requests
Outline › Motivation › DaP Scheduler › Case Study: DAGMan › Conclusions
Demand for Storage › Applications require access to larger and larger amounts of data Database systems Multimedia applications Scientific applications Eg. High Energy Physics & Computational Genomics Currently terabytes soon petabytes of data
Is Remote access good enough? › Huge amounts of data (mostly in tapes) › Large number of users › Distance / Low Bandwidth › Different platforms › Scalability and efficiency concerns => A middleware is required
Two approaches › Move job/application to the data Less common Insufficient computational power on storage site Not efficient Does not scale › Move data to the job/application
Move data to the Job Huge tape library (terabytes) Compute cluster LAN Local Storage Area (eg. Local Disk, NeST Server..) WAN Remote Staging Area
Main Issues › 1. Insufficient local storage area › 2. CPU should not wait much for I/O › 3. Crash Recovery › 4. Different Platforms & Protocols › 5. Make it simple
Data Placement Scheduler (DaPS) › Intelligently Manages and Schedules Data Placement (DaP) activities/jobs › What Condor is for computational jobs, DaPS means the same for DaP jobs › Just submit a bunch of DaP jobs and then relax..
DaPS Architecture DAPS Server AcceptExec.Sched. DaPS Client Req. GridFTP ServerNeST ServerSRB Server Local Disk GridFTP ServerSRM Server Req. Buffer Req. LocalRemote Queue Thirdparty transfer Get Put
DaPS Client Interface › Command line: dap_submit › API: dapclient_lib.a dapclient_interface.h
DaP jobs › Defined as ClassAds › Currently four types: Reserve Release Transfer Stage
DaP Job ClassAds [ Type = Reserve; Server = nest://turkey.cs.wisc.edu; Size = 100MB; reservation_no = 1; …… ] [ Type = Transfer; Src_url = srb://ghidorac.sdsc.edu/kosart.condor/x.dat; Dst_url = nest://turkey.cs.wisc.edu/kosart/x.dat; reservation_no = 1; ]
Supported Protocols › Currently supported: FTP GridFTP NeST (chirp) SRB (Storage Resource Broker) › Very soon: SRM (Storage Resource Manager) GDMP (Grid Data Management Pilot)
Case Study: DAGMan.dag File Condor Job Queue A DAGMan C D A B
Current DAG structure › All jobs are assumed to be computational jobs Job A Job B Job C Job D
Current DAG structure › If data transfer to/from remote sites is required, this is performed via pre- and post-scripts attached to each job. Job A PRE Job B POST Job C Job D
New DAG structure Add DaP jobs to the DAG structure PRE Job B POST Transfer in Reserve In & out Job B Transfer out Release in Release out
New DAGMan Architecture.dag File Condor Job Queue A DAGMan B D A C DaPS Job Queue X Y X
Conclusion › More intelligent management of remote data transfer & staging increase local storage utilization maximize CPU throughput
Future Work › Enhanced interaction with DAGMan › Data Level Management instead of File Level Management › Possible integration with Kangaroo to keep the network pipeline full
Thank You for Listening & Questions › For more information Drop by my office anytime Room: 3361, Computer Science & Stats. Bldg. to: