Download presentation
Presentation is loading. Please wait.
Published byBente Jespersen Modified over 6 years ago
1
STORK: A Scheduler for Data Placement Activities in Grid
Tevfik Kosar University of Wisconsin-Madison
2
Some Remarkable Numbers
Characteristics of four physics experiments targeted by GriPhyN: Application First Data Data Volume (TB/yr) User Community SDSS 1999 10 100s LIGO 2002 250 ATLAS/ CMS 2005 5,000 1000s Source: GriPhyN Proposal, 2000
3
Even More Remarkable… “ ..the data volume of CMS is expected to subsequently increase rapidly, so that the accumulated data volume will reach 1 Exabyte (1 million Terabytes) by around 2015.” Source: PPDG Deliverables to CMS
4
Other Data Intensive Applications
Genomic information processing applications Biomedical Informatics Research Network (BIRN) applications Cosmology applications (MADCAP) Methods for modeling large molecular systems Coupled climate modeling applications Real-time observatories, applications, and data-management (ROADNet)
5
Need to Deal with Data Placement
Data need to be moved, staged, replicated, cached, removed; storage space for data should be allocated, de-allocated. We call all of these data related activities in the Grid as Data Placement (DaP) activities.
6
State of the Art Data placement activities in the Grid are performed either manually or by simple scripts. Data placement activities are simply regarded as “second class citizens” of the computation dominated Grid world.
7
Our Goal Our goal is to make data placement activities “first class citizens” in the Grid just like the computational jobs! They need to be queued, scheduled, monitored and managed, and even checkpointed.
8
Outline Introduction Grid Challenges Stork Solutions
Case Study: SRB-UniTree Data Pipeline Conclusions & Future Work
9
Grid Challenges Heterogeneous Resources Limited Resources
Network/Server/Software Failures Different Job Requirements Scheduling of Data & CPU together
10
Stork Intelligently & reliably schedules, runs, monitors, and manages Data Placement (DaP) jobs in a heterogeneous Grid environment & ensures that they complete. What Condor means for computational jobs, Stork means the same for DaP jobs. Just submit a bunch of DaP jobs and then relax..
11
Stork Solutions to Grid Challenges
Specialized in Data Management Modularity & Extendibility Failure Recovery Global & Job Level Policies Interaction with Higher Level Planners/Schedulers
12
Already Supported URLs
file:/ -> Local File ftp:// -> FTP gsiftp:// -> GridFTP nest:// -> NeST (chirp) protocol srb:// -> SRB (Storage Resource Broker) srm:// -> SRM (Storage Resource Manager) unitree:// -> UniTree server diskrouter:// -> UW DiskRouter
13
Higher Level Planners DAGMan Condor-G Stork Gate Keeper SRB SRM NeST
(compute) Stork (DaP) Gate Keeper StartD SRB SRM NeST GridFTP RFT
14
Interaction with DAGMan
Condor Job Queue A Job A A.submit DaP X X.submit Job C C.submit Parent A child C, X Parent X child B ….. DAGMan A Stork Job Queue X X C B Y D
15
Sample Stork submit file
[ Type = “Transfer”; Src_Url = “srb://ghidorac.sdsc.edu/kosart.condor/x.dat”; Dest_Url = “nest://turkey.cs.wisc.edu/kosart/x.dat”; …… Max_Retry = 10; Restart_in = “2 hours”; ]
16
Case Study: SRB-UniTree Data Pipeline
We have transferred ~3 TB of DPOSS data (2611 x 1.1 GB files) from SRB to UniTree using 3 different pipeline configurations. The pipelines are built using Condor and Stork scheduling technologies. The whole process is managed by DAGMan.
17
1 Submit Site SRB Server UniTree Server SRB get UniTree put NCSA Cache
18
2 Submit Site SRB Server UniTree Server SRB get UniTree put SDSC Cache
NCSA Cache GridFTP
19
3 Submit Site SRB Server UniTree Server SRB get UniTree put SDSC Cache
NCSA Cache DiskRouter
20
Outcomes of the Study 1. Stork interacted easily and successfully with different underlying systems: SRB, UniTree, GridFTP and Diskrouter.
21
Outcomes of the Study (2)
2. We had the chance to compare different pipeline topologies and configurations: Configuration End-to-end rate (MB/sec) 1 5.0 2 3.2 3 5.95
22
Outcomes of the Study (3)
3. Almost all possible network, server, and software failures were recovered automatically.
23
Failure Recovery Diskrouter reconfigured and restarted
UniTree not responding SDSC cache reboot & UW CS Network outage SRB server maintenance
24
For more information on the results of this study, please check:
25
Conclusions Stork makes data placement a “first class citizen”.
Stork is the Condor of data placement world. Stork is fault tolerant, easy to use, modular, extendible, and very flexible.
26
Future Work More intelligent scheduling
Data level management instead of file level management Checkpointing for transfers Security
27
You don’t have to FedEx your data anymore.. Stork delivers it for you!
For more information Drop by my office anytime Room: 3361, Computer Science & Stats. Bldg. to:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.