STORK: A Scheduler for Data Placement Activities in Grid

Slides:



Advertisements
Similar presentations
Condor Project Computer Sciences Department University of Wisconsin-Madison Introduction Condor.
Advertisements

Jaime Frey Computer Sciences Department University of Wisconsin-Madison OGF 19 Condor Software Forum Routing.
Jens G Jensen Atlas Petabyte store Supporting Multiple Interfaces to Mass Storage Providing Tape and Mass Storage to Diverse Scientific Communities.
Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
Data Grids Jon Ludwig Leor Dilmanian Braden Allchin Andrew Brown.
1 Using Stork Barcelona, 2006 Condor Project Computer Sciences Department University of Wisconsin-Madison
Condor Project Computer Sciences Department University of Wisconsin-Madison Stork An Introduction Condor Week 2006 Milan.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A Computation Management Agent for Multi-Institutional Grids
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter : S.Y.Chen.
The Difficulties of Distributed Data Douglas Thain Condor Project University of Wisconsin
Condor Overview Bill Hoagland. Condor Workload management system for compute-intensive jobs Harnesses collection of dedicated or non-dedicated hardware.
Data Grid Web Services Chip Watson Jie Chen, Ying Chen, Bryan Hess, Walt Akers.
CHAPTER FIVE Enterprise Architectures. Enterprise Architecture (Introduction) An enterprise-wide plan for managing and implementing corporate data assets.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Harnessing the Capacity of Computational.
Vladimir Litvin, Harvey Newman Caltech CMS Scott Koranda, Bruce Loftis, John Towns NCSA Miron Livny, Peter Couvares, Todd Tannenbaum, Jamie Frey Wisconsin.
Workflow Management in Condor Gökay Gökçay. DAGMan Meta-Scheduler The Directed Acyclic Graph Manager (DAGMan) is a meta-scheduler for Condor jobs. DAGMan.
Networked Storage Technologies Douglas Thain University of Wisconsin GriPhyN NSF Project Review January 2003 Chicago.
BaBar MC production BaBar MC production software VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
1 Use of SRMs in Earth System Grid Arie Shoshani Alex Sim Lawrence Berkeley National Laboratory.
Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008 John Bresnahan Michael Link Raj Kettimuthu Argonne National Laboratory and.
INFSO-RI Enabling Grids for E-sciencE DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
Nick LeRoy & Jeff Weber Computer Sciences Department University of Wisconsin-Madison Managing.
Peter F. Couvares (based on material from Tevfik Kosar, Nick LeRoy, and Jeff Weber) Associate Researcher, Condor Team Computer Sciences Department University.
STORK: Making Data Placement a First Class Citizen in the Grid Tevfik Kosar and Miron Livny University of Wisconsin-Madison March 25 th, 2004 Tokyo, Japan.
George Kola Computer Sciences Department University of Wisconsin-Madison DiskRouter: A Mechanism for High.
Cluster 2004 San Diego, CA A Client-centric Grid Knowledgebase George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison September 23 rd,
Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison Managing and Scheduling Data.
Review of Condor,SGE,LSF,PBS
Storage Research Meets The Grid Remzi Arpaci-Dusseau.
GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid
STORK: Making Data Placement a First Class Citizen in the Grid Tevfik Kosar University of Wisconsin-Madison May 25 th, 2004 CERN.
INFSO-RI Enabling Grids for E-sciencE EGEE Review WISDOM demonstration Vincent Bloch, Vincent Breton, Matteo Diarena, Jean Salzemann.
1 Stork: State of the Art Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison
Peter F. Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Managing Job.
A Fully Automated Fault- tolerant System for Distributed Video Processing and Off­site Replication George Kola, Tevfik Kosar and Miron Livny University.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
Peter Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Introduction &
Bulk Data Transfer Activities We regard data transfers as “first class citizens,” just like computational jobs. We have transferred ~3 TB of DPOSS data.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor and (the) Grid (one of.
George Kola Computer Sciences Department University of Wisconsin-Madison Data Pipelines: Real Life Fully.
Reliable and Efficient Grid Data Placement using Stork and DiskRouter Tevfik Kosar University of Wisconsin-Madison April 15 th, 2004.
GridShell/Condor: A virtual login Shell for the NSF TeraGrid (How do you run a million jobs on the NSF TeraGrid?) The University of Texas at Austin.
Data Analysis w ith PROOF, PQ2, Condor Data Analysis w ith PROOF, PQ2, Condor Neng Xu, Wen Guan, Sau Lan Wu University of Wisconsin-Madison 30-October-09.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
Run-time Adaptation of Grid Data Placement Jobs George Kola, Tevfik Kosar and Miron Livny Condor Project, University of Wisconsin.
Vendredi 27 avril 2007 Management of ATLAS CC-IN2P3 Specificities, issues and advice.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Job Delegation and Planning.
Condor DAGMan: Managing Job Dependencies with Condor
Real Time Fake Analysis at PIC
Introduction to Distributed Platforms
U.S. ATLAS Grid Production Experience
Example: Rapid Atmospheric Modeling System, ColoState U
Migratory File Services for Batch-Pipelined Workloads
Using Stork An Introduction Condor Week 2006
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
Building Grids with Condor
US CMS Testbed.
Pegasus and Condor Gaurang Mehta, Ewa Deelman, Carl Kesselman, Karan Vahi Center For Grid Technologies USC/ISI.
Part Three: Data Management
Fault Tolerance Distributed Web-based Systems
Outline Problem DiskRouter Overview Details Real life DiskRouters
Outline What users want ? Data pipeline overview
Mats Rynge USC Information Sciences Institute
Presentation transcript:

STORK: A Scheduler for Data Placement Activities in Grid Tevfik Kosar University of Wisconsin-Madison kosart@cs.wisc.edu

Some Remarkable Numbers Characteristics of four physics experiments targeted by GriPhyN: Application First Data Data Volume (TB/yr) User Community SDSS 1999 10 100s LIGO 2002 250 ATLAS/ CMS 2005 5,000 1000s Source: GriPhyN Proposal, 2000

Even More Remarkable… “ ..the data volume of CMS is expected to subsequently increase rapidly, so that the accumulated data volume will reach 1 Exabyte (1 million Terabytes) by around 2015.” Source: PPDG Deliverables to CMS

Other Data Intensive Applications Genomic information processing applications Biomedical Informatics Research Network (BIRN) applications Cosmology applications (MADCAP) Methods for modeling large molecular systems Coupled climate modeling applications Real-time observatories, applications, and data-management (ROADNet)

Need to Deal with Data Placement Data need to be moved, staged, replicated, cached, removed; storage space for data should be allocated, de-allocated. We call all of these data related activities in the Grid as Data Placement (DaP) activities.

State of the Art Data placement activities in the Grid are performed either manually or by simple scripts. Data placement activities are simply regarded as “second class citizens” of the computation dominated Grid world.

Our Goal Our goal is to make data placement activities “first class citizens” in the Grid just like the computational jobs! They need to be queued, scheduled, monitored and managed, and even checkpointed.

Outline Introduction Grid Challenges Stork Solutions Case Study: SRB-UniTree Data Pipeline Conclusions & Future Work

Grid Challenges Heterogeneous Resources Limited Resources Network/Server/Software Failures Different Job Requirements Scheduling of Data & CPU together

Stork Intelligently & reliably schedules, runs, monitors, and manages Data Placement (DaP) jobs in a heterogeneous Grid environment & ensures that they complete. What Condor means for computational jobs, Stork means the same for DaP jobs. Just submit a bunch of DaP jobs and then relax..

Stork Solutions to Grid Challenges Specialized in Data Management Modularity & Extendibility Failure Recovery Global & Job Level Policies Interaction with Higher Level Planners/Schedulers

Already Supported URLs file:/ -> Local File ftp:// -> FTP gsiftp:// -> GridFTP nest:// -> NeST (chirp) protocol srb:// -> SRB (Storage Resource Broker) srm:// -> SRM (Storage Resource Manager) unitree:// -> UniTree server diskrouter:// -> UW DiskRouter

Higher Level Planners DAGMan Condor-G Stork Gate Keeper SRB SRM NeST (compute) Stork (DaP) Gate Keeper StartD SRB SRM NeST GridFTP RFT

Interaction with DAGMan Condor Job Queue A Job A A.submit DaP X X.submit Job C C.submit Parent A child C, X Parent X child B ….. DAGMan A Stork Job Queue X X C B Y D

Sample Stork submit file [ Type = “Transfer”; Src_Url = “srb://ghidorac.sdsc.edu/kosart.condor/x.dat”; Dest_Url = “nest://turkey.cs.wisc.edu/kosart/x.dat”; …… Max_Retry = 10; Restart_in = “2 hours”; ]

Case Study: SRB-UniTree Data Pipeline We have transferred ~3 TB of DPOSS data (2611 x 1.1 GB files) from SRB to UniTree using 3 different pipeline configurations. The pipelines are built using Condor and Stork scheduling technologies. The whole process is managed by DAGMan.

1 Submit Site SRB Server UniTree Server SRB get UniTree put NCSA Cache

2 Submit Site SRB Server UniTree Server SRB get UniTree put SDSC Cache NCSA Cache GridFTP

3 Submit Site SRB Server UniTree Server SRB get UniTree put SDSC Cache NCSA Cache DiskRouter

Outcomes of the Study 1. Stork interacted easily and successfully with different underlying systems: SRB, UniTree, GridFTP and Diskrouter.

Outcomes of the Study (2) 2. We had the chance to compare different pipeline topologies and configurations: Configuration End-to-end rate (MB/sec) 1 5.0 2 3.2 3 5.95

Outcomes of the Study (3) 3. Almost all possible network, server, and software failures were recovered automatically.

Failure Recovery Diskrouter reconfigured and restarted UniTree not responding SDSC cache reboot & UW CS Network outage SRB server maintenance

For more information on the results of this study, please check: http://www.cs.wisc.edu/condor/stork/

Conclusions Stork makes data placement a “first class citizen”. Stork is the Condor of data placement world. Stork is fault tolerant, easy to use, modular, extendible, and very flexible.

Future Work More intelligent scheduling Data level management instead of file level management Checkpointing for transfers Security

You don’t have to FedEx your data anymore.. Stork delivers it for you! For more information Drop by my office anytime Room: 3361, Computer Science & Stats. Bldg. Email to: kosart@cs.wisc.edu