Download presentation
Presentation is loading. Please wait.
Published byLillian Banks Modified over 9 years ago
1
Reliable and Efficient Grid Data Placement using Stork and DiskRouter Tevfik Kosar University of Wisconsin-Madison kosart@cs.wisc.edu April 15 th, 2004
2
A Single Project.. LHC (Large Hadron Collider) Comes online in 2006 Will produce 1 Exabyte data by 2012 Accessed by ~2000 physicists, 150 institutions, 30 countries
3
And Many Others.. Genomic information processing applications Biomedical Informatics Research Network (BIRN) applications Cosmology applications (MADCAP) Methods for modeling large molecular systems Coupled climate modeling applications Real-time observatories, applications, and data-management (ROADNet)
4
The Same Big Problem.. Need for data placement: Locate the data Send data to processing sites Share the results with other sites Allocate and de-allocate storage Clean-up everything Do these reliably and efficiently
5
Outline Introduction Stork DiskRouter Case Studies Conclusions
6
Stork A scheduler for data placement activities in the Grid What Condor is for computational jobs, Stork is for data placement Stork comes with a new concept: “Make data placement a first class citizen in the Grid.”
7
The Concept Stage-in Execute the Job Stage-out Stage-in Execute the jobStage-outRelease input spaceRelease output space Allocate space for input & output data Individual Jobs
8
The Concept Stage-in Execute the Job Stage-out Stage-in Execute the jobStage-outRelease input spaceRelease output space Allocate space for input & output data Data Placement Jobs Computational Jobs
9
DAGMan The Concept Condor Job Queue DaP A A.submit DaP B B.submit Job C C.submit ….. Parent A child B Parent B child C Parent C child D, E ….. C Stork Job Queue E DAG specification ACB D E F
10
Why Stork? Stork understands the characteristics and semantics of data placement jobs. Can make smart scheduling decisions, for reliable and efficient data placement.
11
Failure Recovery and Efficient Resource Utilization Fault tolerance Just submit a bunch of data placement jobs, and then go away.. Control number of concurrent transfers from/to any storage system Prevents overloading Space allocation and De-allocations Make sure space is available
12
Support for Heterogeneity Protocol translation using Stork memory buffer.
13
Support for Heterogeneity Protocol translation using Stork Disk Cache.
14
Flexible Job Representation and Multilevel Policy Support [ Type = “Transfer”; Src_Url = “srb://ghidorac.sdsc.edu/kosart.condor/x.dat”; Dest_Url = “nest://turkey.cs.wisc.edu/kosart/x.dat”; …… Max_Retry = 10; Restart_in = “2 hours”; ]
15
Run-time Adaptation Dynamic protocol selection [ dap_type = “transfer”; src_url = “drouter://slic04.sdsc.edu/tmp/test.dat”; dest_url = “drouter://quest2.ncsa.uiuc.edu/tmp/test.dat”; alt_protocols = “nest-nest, gsiftp-gsiftp”; ] [ dap_type = “transfer”; src_url = “any://slic04.sdsc.edu/tmp/test.dat”; dest_url = “any://quest2.ncsa.uiuc.edu/tmp/test.dat”; ]
16
Run-time Adaptation Run-time Protocol Auto-tuning [ link = “slic04.sdsc.edu – quest2.ncsa.uiuc.edu”; protocol = “gsiftp”; bs = 1024KB;//block size tcp_bs= 1024KB;//TCP buffer size p= 4; ]
17
Outline Introduction Stork DiskRouter Case Studies Conclusions
18
DiskRouter A mechanism for high performance, large scale data transfers Uses hierarchical buffering to aid in large scale data transfers Enables application-level overlay network for maximizing bandwidth Supports application-level multicast
19
Store and Forward Improves performance when bandwidth fluctuation between A and B is independent of the bandwidth fluctuation between B and C DiskRouter With DiskRouter Without DiskRouter A B C
20
DiskRouter Overlay Network A B 90 Mb/s
21
DiskRouter Overlay Network A B DiskRouter 90 Mb/s 400 Mb/s C Add a DiskRouter Node C which is not necessarily on the path from A to B, to enforce use of an alternative path.
22
Data Mover/Distributed Cache Source writes to the closest DiskRouter and Destination receives it up from its closest DiskRouter Source Destination DiskRouter Cloud
23
Outline Introduction Stork DiskRouter Case Studies Conclusions
24
Case Study I: SRB-UniTree Data Pipeline Transfer ~3 TB of DPOSS data from SRB @SDSC to UniTree @NCSA A data pipeline created with Stork and DiskRouter SRB Server UniTree Server SDSC Cache NCSA Cache Submit Site
25
UniTree not responding Diskrouter reconfigured and restarted SDSC cache reboot & UW CS Network outage Software problem Failure Recovery
26
Case Study -II
27
Dynamic Protocol Selection
28
Runtime Adaptation Before Tuning: parallelism = 1 block_size = 1 MB tcp_bs = 64 KB After Tuning: parallelism = 4 block_size = 1 MB tcp_bs = 256 KB
29
Conclusions Regard data placement as first class citizen. Introduce a specialized scheduler for data placement. Introduce a high performance data transfer tool. End-to-end automation, fault tolerance, run-time adaptation, multilevel policy support, reliable and efficient transfers.
30
Future work Enhanced interaction between Stork, DiskRouter and higher level planners co-scheduling of CPU and I/O Enhanced authentication mechanisms More run-time adaptation
31
You don’t have to FedEx your data anymore.. We deliver it for you! For more information Stork: Tevfik Kosar Email: kosart@cs.wisc.edu http://www.cs.wisc.edu/condor/stork DiskRouter: George Kola Email: kola@cs.wisc.edu http://www.cs.wisc.edu/condor/diskrouter
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.