STORK: Making Data Placement a First Class Citizen in the Grid Tevfik Kosar and Miron Livny University of Wisconsin-Madison March 25 th, 2004 Tokyo, Japan.

Slides:



Advertisements
Similar presentations
Condor Project Computer Sciences Department University of Wisconsin-Madison Introduction Condor.
Advertisements

AT LOUISIANA STATE UNIVERSITY CCT: Center for Computation & LSU Stork Data Scheduler: Current Status and Future Directions Sivakumar Kulasekaran.
1 Using Stork Barcelona, 2006 Condor Project Computer Sciences Department University of Wisconsin-Madison
Condor Project Computer Sciences Department University of Wisconsin-Madison Stork An Introduction Condor Week 2006 Milan.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A Computation Management Agent for Multi-Institutional Grids
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
GRID DATA MANAGEMENT PILOT (GDMP) Asad Samar (Caltech) ACAT 2000, Fermilab October , 2000.
A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter : S.Y.Chen.
The Condor Data Access Framework GridFTP / NeST Day 31 July 2001 Douglas Thain.
The Difficulties of Distributed Data Douglas Thain Condor Project University of Wisconsin
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Miron Livny Computer Sciences Department University of Wisconsin-Madison From Compute Intensive to Data.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Harnessing the Capacity of Computational.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor and Grid Challenges.
Vladimir Litvin, Harvey Newman Caltech CMS Scott Koranda, Bruce Loftis, John Towns NCSA Miron Livny, Peter Couvares, Todd Tannenbaum, Jamie Frey Wisconsin.
Workflow Management in Condor Gökay Gökçay. DAGMan Meta-Scheduler The Directed Acyclic Graph Manager (DAGMan) is a meta-scheduler for Condor jobs. DAGMan.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Dynamic Firewalls and Service Deployment Models for Grid Environments Gian Luca Volpato, Christian Grimm RRZN – Leibniz Universität Hannover Cracow Grid.
Networked Storage Technologies Douglas Thain University of Wisconsin GriPhyN NSF Project Review January 2003 Chicago.
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
1 Use of SRMs in Earth System Grid Arie Shoshani Alex Sim Lawrence Berkeley National Laboratory.
File and Object Replication in Data Grids Chin-Yi Tsai.
Concept: Well-managed provisioning of storage space on OSG sites owned by large communities, for usage by other science communities in OSG. Examples –Providers:
Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008 John Bresnahan Michael Link Raj Kettimuthu Argonne National Laboratory and.
Development Timelines Ken Kennedy Andrew Chien Keith Cooper Ian Foster John Mellor-Curmmey Dan Reed.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Welcome and Condor Project Overview.
Ruth Pordes, Fermilab CD, and A PPDG Coordinator Some Aspects of The Particle Physics Data Grid Collaboratory Pilot (PPDG) and The Grid Physics Network.
Globus GridFTP and RFT: An Overview and New Features Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
INFSO-RI Enabling Grids for E-sciencE DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN.
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
Peter F. Couvares (based on material from Tevfik Kosar, Nick LeRoy, and Jeff Weber) Associate Researcher, Condor Team Computer Sciences Department University.
George Kola Computer Sciences Department University of Wisconsin-Madison DiskRouter: A Mechanism for High.
Cluster 2004 San Diego, CA A Client-centric Grid Knowledgebase George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison September 23 rd,
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison Managing and Scheduling Data.
STORK: Making Data Placement a First Class Citizen in the Grid Tevfik Kosar University of Wisconsin-Madison May 25 th, 2004 CERN.
ATLAS Grid Requirements A First Draft Rich Baker Brookhaven National Laboratory.
1 Stork: State of the Art Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison
A Fully Automated Fault- tolerant System for Distributed Video Processing and Off­site Replication George Kola, Tevfik Kosar and Miron Livny University.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Peter Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Introduction &
Bulk Data Transfer Activities We regard data transfers as “first class citizens,” just like computational jobs. We have transferred ~3 TB of DPOSS data.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor and (the) Grid (one of.
Parag Mhashilkar Computing Division, Fermi National Accelerator Laboratory.
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
George Kola Computer Sciences Department University of Wisconsin-Madison Data Pipelines: Real Life Fully.
Reliable and Efficient Grid Data Placement using Stork and DiskRouter Tevfik Kosar University of Wisconsin-Madison April 15 th, 2004.
OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY The stagesub tool Sudharshan S. Vazhkudai Computer Science Research Group CSMD Oak Ridge National.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
Run-time Adaptation of Grid Data Placement Jobs George Kola, Tevfik Kosar and Miron Livny Condor Project, University of Wisconsin.
1 CMS Virtual Data Overview Koen Holtman Caltech/CMS GriPhyN all-hands meeting, Marina del Rey April 9, 2001.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Job Delegation and Planning.
Condor DAGMan: Managing Job Dependencies with Condor
U.S. ATLAS Grid Production Experience
Example: Rapid Atmospheric Modeling System, ColoState U
Outline Problem DiskRouter Overview Details Real life DiskRouters
Outline What users want ? Data pipeline overview
STORK: A Scheduler for Data Placement Activities in Grid
VORB Virtual Object Ring Buffers
Wide Area Workload Management Work Package DATAGRID project
Mats Rynge USC Information Sciences Institute
Presentation transcript:

STORK: Making Data Placement a First Class Citizen in the Grid Tevfik Kosar and Miron Livny University of Wisconsin-Madison March 25 th, 2004 Tokyo, Japan

Stork: Making Data Placement a First Class Citizen in the Grid 2 Some Remarkable Numbers ApplicationFirst Data Data Volume (TB/yr) User Community SDSS s LIGO s ATLAS/ CMS 20055, s Characteristics of four physics experiments targeted by GriPhyN: Source: GriPhyN Proposal, 2000

Stork: Making Data Placement a First Class Citizen in the Grid 3 Even More Remarkable… “..the data volume of CMS is expected to subsequently increase rapidly, so that the accumulated data volume will reach 1 Exabyte (1 million Terabytes) by around 2015.” Source: PPDG Deliverables to CMS

Stork: Making Data Placement a First Class Citizen in the Grid 4 More Data Intensive Applications.. Genomic information processing applications Biomedical Informatics Research Network (BIRN) applications Cosmology applications (MADCAP) Methods for modeling large molecular systems Coupled climate modeling applications Real-time observatories, applications, and data- management (ROADNet)

Stork: Making Data Placement a First Class Citizen in the Grid 5 Need for Data Placement Data placement: locate, move, stage, replicate, cache data; allocate and de- allocate storage space for data; clean-up everything afterwards.

Stork: Making Data Placement a First Class Citizen in the Grid 6 State of the Art FedEx Manual Handling Using Simple Scripts RFT, LDR, SRM Data placement activities are simply regarded as “second class citizens” in the Grid world.

Stork: Making Data Placement a First Class Citizen in the Grid 7 Our Goal Make data placement activities “first class citizens” in the Grid just like the computational jobs! They will be queued, scheduled, monitored, managed and even checkpointed.

Stork: Making Data Placement a First Class Citizen in the Grid 8 Outline Introduction Methodology Grid Challenges Stork Solutions Case Studies Conclusions

Stork: Making Data Placement a First Class Citizen in the Grid 9 Stage-in Execute the Job Stage-out Methodology Individual Jobs

Stork: Making Data Placement a First Class Citizen in the Grid 10 Stage-in Execute the Job Stage-out Methodology Stage-in Execute the jobStage-out Individual Jobs

Stork: Making Data Placement a First Class Citizen in the Grid 11 Methodology Stage-in Execute the Job Stage-out Stage-in Execute the jobStage-outRelease input spaceRelease output space Allocate space for input & output data Individual Jobs

Stork: Making Data Placement a First Class Citizen in the Grid 12 Methodology Stage-in Execute the Job Stage-out Stage-in Execute the jobStage-outRelease input spaceRelease output space Allocate space for input & output data Data Placement Jobs Computational Jobs

Stork: Making Data Placement a First Class Citizen in the Grid 13 DAG Executer Methodology Compute Job Queue DaP A A.submit DaP B B.submit Job C C.submit ….. Parent A child B Parent B child C Parent C child D, E ….. C DaP Job Queue E DAG specification ACB D E F

Stork: Making Data Placement a First Class Citizen in the Grid 14 Grid Challenges Heterogeneous Resources Different Job Requirements Hiding Failures from Applications Overloading Limited Resources

Stork: Making Data Placement a First Class Citizen in the Grid 15 Stork Solutions to Grid Challenges Interaction with Heterogeneous Resources Flexible Job Representation and Multilevel Policy Support Run-time Adaptation Failure Recovery and Efficient Resource Utilization

Stork: Making Data Placement a First Class Citizen in the Grid 16 Interaction with Heterogeneous Resources Modularity, extendibility Plug-in protocol support A library of “data placement” modules Inter-protocol Translations

Stork: Making Data Placement a First Class Citizen in the Grid 17 Direct Protocol Translation

Stork: Making Data Placement a First Class Citizen in the Grid 18 Protocol Translation using Disk Cache

Stork: Making Data Placement a First Class Citizen in the Grid 19 Stork Solutions to Grid Challenges Interaction with Heterogeneous Resources Flexible Job Representation and Multilevel Policy Support Run-time Adaptation Failure Recovery and Efficient Resource Utilization

Stork: Making Data Placement a First Class Citizen in the Grid 20 Flexible Job Representation and Multilevel Policy Support ClassAd job description language Flexible and extendible Persistent Allows global and job level policies

Stork: Making Data Placement a First Class Citizen in the Grid 21 Sample Stork submit file [ Type = “Transfer”; Src_Url = “srb://ghidorac.sdsc.edu/kosart.condor/x.dat”; Dest_Url = “nest://turkey.cs.wisc.edu/kosart/x.dat”; …… Max_Retry = 10; Restart_in = “2 hours”; ]

Stork: Making Data Placement a First Class Citizen in the Grid 22 Stork Solutions to Grid Challenges Interaction with Heterogeneous Resources Flexible Job Representation and Multilevel Policy Support Run-time Adaptation Failure Recovery and Efficient Resource Utilization

Stork: Making Data Placement a First Class Citizen in the Grid 23 Run-time Adaptation Dynamic protocol selection [ dap_type = “transfer”; src_url = “drouter://slic04.sdsc.edu/tmp/foo.dat”; dest_url = “drouter://quest2.ncsa.uiuc.edu/tmp/foo.dat”; alt_protocols = “nest-nest, gsiftp-gsiftp”; ] [ dap_type = “transfer”; src_url = “any://slic04.sdsc.edu/tmp/foo.dat”; dest_url = “any://quest2.ncsa.uiuc.edu/tmp/foo.dat”; ]

Stork: Making Data Placement a First Class Citizen in the Grid 24 Stork Solutions to Grid Challenges Interaction with Heterogeneous Resources Flexible Job Representation and Multilevel Policy Support Run-time Adaptation Failure Recovery and Efficient Resource Utilization

Stork: Making Data Placement a First Class Citizen in the Grid 25 Failure Recovery and Efficient Resource Utilization Hides failures from users  “retry” mechanism  “kill and restart” mechanism Control number of concurrent transfers from/to any storage system  Prevents overloading Space allocation and De-allocations  Make sure space is available

Stork: Making Data Placement a First Class Citizen in the Grid 26 Case Study I: SRB-UniTree Data Pipeline Transfer ~3 TB of DPOSS data (2611 x 1.1 GB files) from to No direct interface Only way for transfer is building a pipeline

Stork: Making Data Placement a First Class Citizen in the Grid 27 SRB Server UniTree Server SDSC CacheNCSA Cache SRB get Grid-FTP/ DiskRouter third-party UniTree put Submit Site SRB-UniTree Data Pipeline

Stork: Making Data Placement a First Class Citizen in the Grid 28 UniTree not responding Diskrouter reconfigured and restarted SDSC cache reboot & UW CS Network outage Software problem Failure Recovery

Stork: Making Data Placement a First Class Citizen in the Grid 29 Case Study -II

Stork: Making Data Placement a First Class Citizen in the Grid 30 Dynamic Protocol Selection

Stork: Making Data Placement a First Class Citizen in the Grid 31 Conclusions Regard data placement as real jobs. Treat computational and data placement jobs differently. Introduce a specialized scheduler for data placement. End-to-end automation, fault tolerance, run-time adaptation, multilevel policy support.

Stork: Making Data Placement a First Class Citizen in the Grid 32 Future work Enhanced interaction between Stork and higher level planners  co-scheduling of CPU and I/O Enhanced authentication mechanisms More run-time adaptation

Stork: Making Data Placement a First Class Citizen in the Grid 33 You don’t have to FedEx your data anymore.. Stork delivers it for you! For more information  URL:  to: