Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison Managing and Scheduling Data.

Slides:



Advertisements
Similar presentations
A Proposal of Capacity and Performance Assured Storage in The PRAGMA Grid Testbed Yusuke Tanimura 1) Hidetaka Koie 1,2) Tomohiro Kudoh 1) Isao Kojima 1)
Advertisements

Jens G Jensen Atlas Petabyte store Supporting Multiple Interfaces to Mass Storage Providing Tape and Mass Storage to Diverse Scientific Communities.
THINC: A Virtual Display Architecture for Thin-Client Computing Ricardo A. Baratto, Leonard N. Kim, Jason Nieh Network Computing Laboratory Columbia University.
1 Using Stork Barcelona, 2006 Condor Project Computer Sciences Department University of Wisconsin-Madison
Condor Project Computer Sciences Department University of Wisconsin-Madison Stork An Introduction Condor Week 2006 Milan.
A Computation Management Agent for Multi-Institutional Grids
GRID DATA MANAGEMENT PILOT (GDMP) Asad Samar (Caltech) ACAT 2000, Fermilab October , 2000.
A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter : S.Y.Chen.
The Difficulties of Distributed Data Douglas Thain Condor Project University of Wisconsin
Jim Basney Computer Sciences Department University of Wisconsin-Madison Managing Network Resources in.
Zach Miller Condor Project Computer Sciences Department University of Wisconsin-Madison Flexible Data Placement Mechanisms in Condor.
GridFTP Guy Warner, NeSC Training.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Harnessing the Capacity of Computational.
Workflow Management in Condor Gökay Gökçay. DAGMan Meta-Scheduler The Directed Acyclic Graph Manager (DAGMan) is a meta-scheduler for Condor jobs. DAGMan.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.
Networked Storage Technologies Douglas Thain University of Wisconsin GriPhyN NSF Project Review January 2003 Chicago.
DISTRIBUTED COMPUTING
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
File and Object Replication in Data Grids Chin-Yi Tsai.
1 The Roadmap to New Releases Todd Tannenbaum Department of Computer Sciences University of Wisconsin-Madison
Miron Livny Computer Sciences Department University of Wisconsin-Madison Welcome and Condor Project Overview.
INFSO-RI Enabling Grids for E-sciencE DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Nick LeRoy & Jeff Weber Computer Sciences Department University of Wisconsin-Madison Managing.
Peter F. Couvares (based on material from Tevfik Kosar, Nick LeRoy, and Jeff Weber) Associate Researcher, Condor Team Computer Sciences Department University.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
STORK: Making Data Placement a First Class Citizen in the Grid Tevfik Kosar and Miron Livny University of Wisconsin-Madison March 25 th, 2004 Tokyo, Japan.
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
Alain Roy Computer Sciences Department University of Wisconsin-Madison I/O Access in Condor and Grid.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor RoadMap.
The Roadmap to New Releases Derek Wright Computer Sciences Department University of Wisconsin-Madison
Data Replication and Power Consumption in Data Grids Susan V. Vrbsky, Ming Lei, Karl Smith and Jeff Byrd Department of Computer Science The University.
George Kola Computer Sciences Department University of Wisconsin-Madison DiskRouter: A Mechanism for High.
Cluster 2004 San Diego, CA A Client-centric Grid Knowledgebase George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison September 23 rd,
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Flexibility, Manageability and Performance in a Grid Storage Appliance John Bent, Venkateshwaran Venkataramani, Nick Leroy, Alain Roy, Joseph Stanley,
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
STORK: Making Data Placement a First Class Citizen in the Grid Tevfik Kosar University of Wisconsin-Madison May 25 th, 2004 CERN.
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.
1 Stork: State of the Art Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison
Peter F. Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Managing Job.
A Fully Automated Fault- tolerant System for Distributed Video Processing and Off­site Replication George Kola, Tevfik Kosar and Miron Livny University.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
TM 8-1 Copyright © 1999 Addison Wesley Longman, Inc. Client/Server and Middleware.
Peter Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Introduction &
Bulk Data Transfer Activities We regard data transfers as “first class citizens,” just like computational jobs. We have transferred ~3 TB of DPOSS data.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G.
Managing Network Resources in Condor Jim Basney Computer Sciences Department University of Wisconsin-Madison
Matthew Farrellee Computer Sciences Department University of Wisconsin-Madison Condor and Web Services.
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
1.3 ON ENHANCING GridFTP AND GPFS PERFORMANCES A. Cavalli, C. Ciocca, L. dell’Agnello, T. Ferrari, D. Gregori, B. Martelli, A. Prosperini, P. Ricci, E.
Globus Data Storage Interface (DSI) - Enabling Easy Access to Grid Datasets Raj Kettimuthu, ANL and U. Chicago DIALOGUE Workshop August 2, 2005.
NeST: Network Storage John Bent, Venkateshwaran V Miron Livny, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau.
1 GridFTP and SRB Guy Warner Training, Outreach and Education Team, Edinburgh e-Science.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor NT Condor ported.
George Kola Computer Sciences Department University of Wisconsin-Madison Data Pipelines: Real Life Fully.
Reliable and Efficient Grid Data Placement using Stork and DiskRouter Tevfik Kosar University of Wisconsin-Madison April 15 th, 2004.
New Development Efforts in GridFTP Raj Kettimuthu Math & Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, U.S.A.
Run-time Adaptation of Grid Data Placement Jobs George Kola, Tevfik Kosar and Miron Livny Condor Project, University of Wisconsin.
Condor on Dedicated Clusters Peter Couvares and Derek Wright Computer Sciences Department University of Wisconsin-Madison
Condor DAGMan: Managing Job Dependencies with Condor
NeST: Network Storage Flexible Commodity Storage Appliances
Data Bridge Solving diverse data access in scientific applications
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
Grid Computing.
Outline Problem DiskRouter Overview Details Real life DiskRouters
Outline What users want ? Data pipeline overview
STORK: A Scheduler for Data Placement Activities in Grid
Condor-G Making Condor Grid Enabled
Condor-G: An Update.
Presentation transcript:

Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison Managing and Scheduling Data Placement (DaP) Requests

Outline › Motivation › DaP Scheduler › Case Study: DAGMan › Conclusions

Demand for Storage › Applications require access to larger and larger amounts of data  Database systems  Multimedia applications  Scientific applications Eg. High Energy Physics & Computational Genomics Currently terabytes soon petabytes of data

Is Remote access good enough? › Huge amounts of data (mostly in tapes) › Large number of users › Distance / Low Bandwidth › Different platforms › Scalability and efficiency concerns => A middleware is required

Two approaches › Move job/application to the data  Less common  Insufficient computational power on storage site  Not efficient  Does not scale › Move data to the job/application

Move data to the Job Huge tape library (terabytes) Compute cluster LAN Local Storage Area (eg. Local Disk, NeST Server..) WAN Remote Staging Area

Main Issues › 1. Insufficient local storage area › 2. CPU should not wait much for I/O › 3. Crash Recovery › 4. Different Platforms & Protocols › 5. Make it simple

Data Placement Scheduler (DaPS) › Intelligently Manages and Schedules Data Placement (DaP) activities/jobs › What Condor is for computational jobs, DaPS means the same for DaP jobs › Just submit a bunch of DaP jobs and then relax..

DaPS Architecture DAPS Server AcceptExec.Sched. DaPS Client Req. GridFTP ServerNeST ServerSRB Server Local Disk GridFTP ServerSRM Server Req. Buffer Req. LocalRemote Queue Thirdparty transfer Get Put

DaPS Client Interface › Command line:  dap_submit › API:  dapclient_lib.a  dapclient_interface.h

DaP jobs › Defined as ClassAds › Currently four types:  Reserve  Release  Transfer  Stage

DaP Job ClassAds [ Type = Reserve; Server = nest://turkey.cs.wisc.edu; Size = 100MB; reservation_no = 1; …… ] [ Type = Transfer; Src_url = srb://ghidorac.sdsc.edu/kosart.condor/x.dat; Dst_url = nest://turkey.cs.wisc.edu/kosart/x.dat; reservation_no = 1; ]

Supported Protocols › Currently supported:  FTP  GridFTP  NeST (chirp)  SRB (Storage Resource Broker) › Very soon:  SRM (Storage Resource Manager)  GDMP (Grid Data Management Pilot)

Case Study: DAGMan.dag File Condor Job Queue A DAGMan C D A B

Current DAG structure › All jobs are assumed to be computational jobs Job A Job B Job C Job D

Current DAG structure › If data transfer to/from remote sites is required, this is performed via pre- and post-scripts attached to each job. Job A PRE Job B POST Job C Job D

New DAG structure Add DaP jobs to the DAG structure PRE Job B POST Transfer in Reserve In & out Job B Transfer out Release in Release out

New DAGMan Architecture.dag File Condor Job Queue A DAGMan B D A C DaPS Job Queue X Y X

Conclusion › More intelligent management of remote data transfer & staging  increase local storage utilization  maximize CPU throughput

Future Work › Enhanced interaction with DAGMan › Data Level Management instead of File Level Management › Possible integration with Kangaroo to keep the network pipeline full

Thank You for Listening & Questions › For more information  Drop by my office anytime Room: 3361, Computer Science & Stats. Bldg.  to: