Download presentation
Presentation is loading. Please wait.
Published byCharles Pearson Modified over 8 years ago
1
Bulk Data Transfer Activities We regard data transfers as “first class citizens,” just like computational jobs. We have transferred ~3 TB of DPOSS data (2611 x 1.1 GB files) from SRB to UniTree using 3 different pipeline configurations. The pipelines are built using Condor and Stork scheduling technologies. The whole process is managed by DAGMan. 2 We used the experimental DiskRouter tool instead of Globus GridFTP for cache-to-cache transfers. We obtained an end-to-end throughput (from SRB to UniTree) of 20 files per hour (5.95 MB/sec). Unitree not responding Diskrouter reconfigured and restarted 1 We used the native file transfer mechanisms for each underlying system: SRB, Globus GridFTP, and UniTree for the transfers. We described each data transfer with a five stage pipeline, resulting in a 5x2611 node workflow (DAG) managed by DAGMan. We obtained an end-to-end throughput (from SRB to UniTree) of 11 files per hour (3.2 MB/sec). SRB Server UniTree Server SDSC CacheNCSA Cache SRB get Globus-url-copy MSS put Submit Site A B C D Move X from C to D Move X from A to B Remove X from A Move X from B to C Remove X from B Move X from C to D Move X from A to B Remove X from A Move X from B to C Remove X from B Move X from C to D Move X from A to B Remove X from A Move X from B to C Remove X from B Move X from C to D Move X from A to B Remove X from A Move X from B to C Remove X from B DAG File SRB Server UniTree Server NCSA Cache SRB get MSS put Submit Site A C D 3 We skipped the SDSC cache, and performed direct SRB transfers from SRB server to NCSA cache. We described each data transfer with a three stage pipeline, resulting in a 3x2611 node workflow (DAG). We obtained an end-to-end throughput (from SRB to UniTree) of 17 files per hour (5.00 MB/sec). SRB server problem DAG File Move X from A to C Move X from C to D Remove X from C Move X from A to C Move X from C to D Remove X from C Move X from A to C Move X from C to D Remove X from C Move X from A to C Move X from C to D Remove X from C Unitree maintenance PDQ Expedition Condor is a specialized workload management system for compute- intensive jobs. Condor provides a job queuing mechanism, scheduling policy, priority scheme, resource monitoring, and resource management. Condor chooses when and where to run jobs based upon a policy, carefully monitors their progress, and ultimately informs the user upon completion. http://www.cs.wisc.edu/condor What a batch system means for computational jobs, Stork means the same for data placement activities (ie. transfer, replication, reservations, staging) in Grid: it schedules, runs, monitors data placement jobs and ensures that they complete. Stork can interact with heterogeneous middleware and end-storage systems easily and recover from failures successfully. Stork makes data placement a first class citizen of Grid computing. http://www.cs.wisc.edu/condor/stork DAGMan DAGman (Directed Acyclic Graph Manager) is a meta-scheduler for Condor. It manages dependencies between jobs at a higher level than the Condor Scheduler. DAGMan can now also interact with Stork. http://www.cs.wisc.edu/condor/dagman DiskRouter Moves large amounts of data efficiently (on the order of terabytes) Uses disk as a buffer to aid in large data transfers Performs application level routing Increases network throughput by using multiple sockets and setting tcp buffer sizes explicitly http://www.cs.wisc.edu/condor/diskrouter GridFTP: High performance, secure, reliable data transfer protocol from Globus http://www.globus.org/datagrid/grid ftp.html SRB: Storage Resource Broker Client-Server middleware that provides a uniform interface for connecting to heterogeneous data resources http://www.npaci.edu/DICE/SRB UniTree: NCSA’s High-speed, high-capacity mass storage system http://www.ncsa.uiuc.edu/Divisions/ CC/HPDM/unitree SRB server maintenance SDSC cache reboot & UW CS Network outage
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.