STORK: Making Data Placement a First Class Citizen in the Grid Tevfik Kosar and Miron Livny University of Wisconsin-Madison March 25 th, 2004 Tokyo, Japan
Stork: Making Data Placement a First Class Citizen in the Grid 2 Some Remarkable Numbers ApplicationFirst Data Data Volume (TB/yr) User Community SDSS s LIGO s ATLAS/ CMS 20055, s Characteristics of four physics experiments targeted by GriPhyN: Source: GriPhyN Proposal, 2000
Stork: Making Data Placement a First Class Citizen in the Grid 3 Even More Remarkable… “..the data volume of CMS is expected to subsequently increase rapidly, so that the accumulated data volume will reach 1 Exabyte (1 million Terabytes) by around 2015.” Source: PPDG Deliverables to CMS
Stork: Making Data Placement a First Class Citizen in the Grid 4 More Data Intensive Applications.. Genomic information processing applications Biomedical Informatics Research Network (BIRN) applications Cosmology applications (MADCAP) Methods for modeling large molecular systems Coupled climate modeling applications Real-time observatories, applications, and data- management (ROADNet)
Stork: Making Data Placement a First Class Citizen in the Grid 5 Need for Data Placement Data placement: locate, move, stage, replicate, cache data; allocate and de- allocate storage space for data; clean-up everything afterwards.
Stork: Making Data Placement a First Class Citizen in the Grid 6 State of the Art FedEx Manual Handling Using Simple Scripts RFT, LDR, SRM Data placement activities are simply regarded as “second class citizens” in the Grid world.
Stork: Making Data Placement a First Class Citizen in the Grid 7 Our Goal Make data placement activities “first class citizens” in the Grid just like the computational jobs! They will be queued, scheduled, monitored, managed and even checkpointed.
Stork: Making Data Placement a First Class Citizen in the Grid 8 Outline Introduction Methodology Grid Challenges Stork Solutions Case Studies Conclusions
Stork: Making Data Placement a First Class Citizen in the Grid 9 Stage-in Execute the Job Stage-out Methodology Individual Jobs
Stork: Making Data Placement a First Class Citizen in the Grid 10 Stage-in Execute the Job Stage-out Methodology Stage-in Execute the jobStage-out Individual Jobs
Stork: Making Data Placement a First Class Citizen in the Grid 11 Methodology Stage-in Execute the Job Stage-out Stage-in Execute the jobStage-outRelease input spaceRelease output space Allocate space for input & output data Individual Jobs
Stork: Making Data Placement a First Class Citizen in the Grid 12 Methodology Stage-in Execute the Job Stage-out Stage-in Execute the jobStage-outRelease input spaceRelease output space Allocate space for input & output data Data Placement Jobs Computational Jobs
Stork: Making Data Placement a First Class Citizen in the Grid 13 DAG Executer Methodology Compute Job Queue DaP A A.submit DaP B B.submit Job C C.submit ….. Parent A child B Parent B child C Parent C child D, E ….. C DaP Job Queue E DAG specification ACB D E F
Stork: Making Data Placement a First Class Citizen in the Grid 14 Grid Challenges Heterogeneous Resources Different Job Requirements Hiding Failures from Applications Overloading Limited Resources
Stork: Making Data Placement a First Class Citizen in the Grid 15 Stork Solutions to Grid Challenges Interaction with Heterogeneous Resources Flexible Job Representation and Multilevel Policy Support Run-time Adaptation Failure Recovery and Efficient Resource Utilization
Stork: Making Data Placement a First Class Citizen in the Grid 16 Interaction with Heterogeneous Resources Modularity, extendibility Plug-in protocol support A library of “data placement” modules Inter-protocol Translations
Stork: Making Data Placement a First Class Citizen in the Grid 17 Direct Protocol Translation
Stork: Making Data Placement a First Class Citizen in the Grid 18 Protocol Translation using Disk Cache
Stork: Making Data Placement a First Class Citizen in the Grid 19 Stork Solutions to Grid Challenges Interaction with Heterogeneous Resources Flexible Job Representation and Multilevel Policy Support Run-time Adaptation Failure Recovery and Efficient Resource Utilization
Stork: Making Data Placement a First Class Citizen in the Grid 20 Flexible Job Representation and Multilevel Policy Support ClassAd job description language Flexible and extendible Persistent Allows global and job level policies
Stork: Making Data Placement a First Class Citizen in the Grid 21 Sample Stork submit file [ Type = “Transfer”; Src_Url = “srb://ghidorac.sdsc.edu/kosart.condor/x.dat”; Dest_Url = “nest://turkey.cs.wisc.edu/kosart/x.dat”; …… Max_Retry = 10; Restart_in = “2 hours”; ]
Stork: Making Data Placement a First Class Citizen in the Grid 22 Stork Solutions to Grid Challenges Interaction with Heterogeneous Resources Flexible Job Representation and Multilevel Policy Support Run-time Adaptation Failure Recovery and Efficient Resource Utilization
Stork: Making Data Placement a First Class Citizen in the Grid 23 Run-time Adaptation Dynamic protocol selection [ dap_type = “transfer”; src_url = “drouter://slic04.sdsc.edu/tmp/foo.dat”; dest_url = “drouter://quest2.ncsa.uiuc.edu/tmp/foo.dat”; alt_protocols = “nest-nest, gsiftp-gsiftp”; ] [ dap_type = “transfer”; src_url = “any://slic04.sdsc.edu/tmp/foo.dat”; dest_url = “any://quest2.ncsa.uiuc.edu/tmp/foo.dat”; ]
Stork: Making Data Placement a First Class Citizen in the Grid 24 Stork Solutions to Grid Challenges Interaction with Heterogeneous Resources Flexible Job Representation and Multilevel Policy Support Run-time Adaptation Failure Recovery and Efficient Resource Utilization
Stork: Making Data Placement a First Class Citizen in the Grid 25 Failure Recovery and Efficient Resource Utilization Hides failures from users “retry” mechanism “kill and restart” mechanism Control number of concurrent transfers from/to any storage system Prevents overloading Space allocation and De-allocations Make sure space is available
Stork: Making Data Placement a First Class Citizen in the Grid 26 Case Study I: SRB-UniTree Data Pipeline Transfer ~3 TB of DPOSS data (2611 x 1.1 GB files) from to No direct interface Only way for transfer is building a pipeline
Stork: Making Data Placement a First Class Citizen in the Grid 27 SRB Server UniTree Server SDSC CacheNCSA Cache SRB get Grid-FTP/ DiskRouter third-party UniTree put Submit Site SRB-UniTree Data Pipeline
Stork: Making Data Placement a First Class Citizen in the Grid 28 UniTree not responding Diskrouter reconfigured and restarted SDSC cache reboot & UW CS Network outage Software problem Failure Recovery
Stork: Making Data Placement a First Class Citizen in the Grid 29 Case Study -II
Stork: Making Data Placement a First Class Citizen in the Grid 30 Dynamic Protocol Selection
Stork: Making Data Placement a First Class Citizen in the Grid 31 Conclusions Regard data placement as real jobs. Treat computational and data placement jobs differently. Introduce a specialized scheduler for data placement. End-to-end automation, fault tolerance, run-time adaptation, multilevel policy support.
Stork: Making Data Placement a First Class Citizen in the Grid 32 Future work Enhanced interaction between Stork and higher level planners co-scheduling of CPU and I/O Enhanced authentication mechanisms More run-time adaptation
Stork: Making Data Placement a First Class Citizen in the Grid 33 You don’t have to FedEx your data anymore.. Stork delivers it for you! For more information URL: to: