Bulk Data Copy Generalization Some DMI/JSDL overlap (this indeed might be out of scope of JSDL) Extensibility options / possibly some new requirements.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

GridSAM Overview Grid Job S ubmission A nd M onitoring Service What is GridSAM? Funded by the OMII Managed Programme (Started in Sept, 04) Client Perspective.
© 2006 Open Grid Forum Joint Session on Information Modeling for Computing Resources OGF 20 - Manchester, 7 May 2007.
OGF 19, Raleigh NC HPC Profile WG Marty Humphrey, co-chair Department of Computer Science University of Virginia Charlottesville, VA.
IBM Software Group ® Design Thoughts for JDSL 2.0 Version 0.2.
Fujitsu Laboratories of Europe © 2004 What is a (Grid) Resource? Dr. David Snelling Fujitsu Laboratories of Europe W3C TAG - Edinburgh September 20, 2005.
WS-JDML: A Web Service Interface for Job Submission and Monitoring Stephen M C Gough William Lee London e-Science Centre Department of Computing, Imperial.
The National Grid Service and OGSA-DAI Mike Mineter
17 March 2008Standards for Interoperable Grids 1 Job Execution Standards for Interoperable Grids: Experience from NextGRID and OMII-Europe Clive Davenhall.
Enabling Secure Internet Access with ISA Server
Certification Test Tool Jon Wheeler Test Lead Microsoft Corporation.
An Open Standards-based Scalable Heavy Lifting Data Transfer Service for e-Research David Meredith, Peter Turner, Alex Arana, Gerson Galang, David Wallom,
Service Description: WSDL COMP6017 Topics on Web Services Dr Nicholas Gibbins –
1 Using Stork Barcelona, 2006 Condor Project Computer Sciences Department University of Wisconsin-Madison
Condor Project Computer Sciences Department University of Wisconsin-Madison Stork An Introduction Condor Week 2006 Milan.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A JSDL Applications Repository and Data Staging Portal: Some New Parameter Sweep Developments and Data transfer Requirements David Meredith STFC e-Science.
Chapter 9 Chapter 9: Managing Groups, Folders, Files, and Object Security.
Web Ontology Language for Service (OWL-S). Introduction OWL-S –OWL-based Web service ontology –a core set of markup language constructs for describing.
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
Report Distribution Report Distribution in PeopleTools 8.4 Doug Ostler & Eric Knapp 7264.
Check Disk. Disk Defragmenter Using Disk Defragmenter Effectively Run Disk Defragmenter when the computer will receive the least usage. Educate users.
1 Securing Network Resources Understanding NTFS Permissions Assigning NTFS Permissions Assigning Special Permissions Copying and Moving Files and Folders.
Data Grid Web Services Chip Watson Jie Chen, Ying Chen, Bryan Hess, Walt Akers.
TIBCO Designer TIBCO BusinessWorks is a scalable, extensible, and easy to use integration platform that allows you to develop, deploy, and run integration.
David Meredith 1, Stephen Crouch 2, Peter Turner 3, Gerson Galang 4, Ming Jiang 5, Hung Nguyen 6 1 NGS, Science and Technology Facilities Council, Daresbury.
Bulk Data Copy Description Generalizations (some DMI/JSDL overlap) Bulk Copying: Recursive file/dir copying between multiple sources and sinks (potentially.
© 2007 Open Grid Forum OGF Modeling Activities DMTF Alliance Partner Symposium Portland, 2007 July 18 Ellen Stokes
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
12 Systems Analysis and Design in a Changing World, Fifth Edition.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
FTP Server and FTP Commands By Nanda Ganesan, Ph.D. © Nanda Ganesan, All Rights Reserved.
COMP3019 Coursework: Introduction to GridSAM Steve Crouch School of Electronics and Computer Science.
Leading the pervasive adoption of grid computing for research and industry © 2005 Global Grid Forum The information contained herein is subject to change.
1 Overview of the Application Hosting Environment Stefan Zasada University College London.
© 2008 Open Grid Forum Independent Software Vendor (ISV) Remote Computing Primer Steven Newhouse.
A Novel Approach to Workflow Management in Grid Environments Frank Berretz*, Sascha Skorupa*, Volker Sander*, Adam Belloum** 15/04/2010 * FH Aachen - University.
(Business) Process Centric Exchanges
The PROGRESS Grid Service Provider Maciej Bogdański Portals & Portlets 2003 Edinburgh, July 14th-17th.
Updates made to latest draft since Herndon Sony Corporation Toshiaki Kojima.
Enabling Grids for E-sciencE CREAM-BES Luigi Zangrando INFN Sezione di Padova, Supercomputing'07.
Bulk Data Movement: Components and Architectural Diagram Alex Sim Arie Shoshani LBNL April 2009.
GSFL: A Workflow Framework for Grid Services Sriram Krishnan Patrick Wagstrom Gregor von Laszewski.
17 March 2008Standards for Interoperable Grids 1 Data Management Standards for Interoperable Grids: Experience from NextGRID and OMII-Europe Clive Davenhall.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
Grid Services I - Concepts
The HPC Basic Profile Steven Newhouse Microsoft Corporation.
The NGS Grid Portal David Meredith NGS + Grid Technology Group, e-Science Centre, Daresbury Laboratory, UK
JDF – An Overview.
Copy to Tape TOI. 2 Copy to Tape TOI Agenda Overview1 Technical Feature Implementation2 Q&A3.
© 2006 Open Grid Forum BES 1.1 Andrew Grimshaw. © 2006 Open Grid Forum 2 OGF IPR Policies Apply “ I acknowledge that participation in this meeting is.
1. 2 Purpose of This Presentation ◆ To explain how spacecraft can be virtualized by using a standard modeling method; ◆ To introduce the basic concept.
Hyperion Artifact Life Cycle Management Agenda  Overview  Demo  Tips & Tricks  Takeaways  Queries.
The NGS Grid Portal David Meredith NGS + Grid Technology Group, e-Science Centre, Daresbury Laboratory, UK
1 Comments on OGSA platform document draft-2 03/06/2003 Andreas Savva, Ph.D. Hiro Kishimoto, Ph.D. Fujitsu GGF7 OGSA-WG.
© 2006 Open Grid Forum BES, HPC, JSDL and GLUE Profiling OGF 23, Barcelona, Tuesday 16 October 2007.
Sharing Resources Lesson 6. Objectives Manage NTFS and share permissions Determine effective permissions Configure Windows printing.
Remote Api Tutorial How to call WS-PGRADE workflows from remote clients through the http protocol?
OGSA-DAI.
Enabling Grids for E-sciencE Agreement-based Workload and Resource Management Tiziana Ferrari, Elisabetta Ronchieri Mar 30-31, 2006.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
ITMT Windows 7 Configuration Chapter 6 – Sharing Resource ITMT 1371 – Windows 7 Configuration 1.
OGF PGI – EDGI Security Use Case and Requirements
Design Thoughts for JDSL 2.0
Grid Resource Allocation Agreement Protocol Working Group
Web Ontology Language for Service (OWL-S)
Status and Future Steps
CSSSPEC6 SOFTWARE DEVELOPMENT WITH QUALITY ASSURANCE
Semantic Markup for Semantic Web Tools:
Presentation transcript:

Bulk Data Copy Generalization Some DMI/JSDL overlap (this indeed might be out of scope of JSDL) Extensibility options / possibly some new requirements for recursive file/dir copying between multiple sources and sinks ?

In-Scope 1.Job Submission Description Language (JSDL) An activity description language for generic compute applications. 2.OGSA Data Movement Interface (DMI) Low level schema for defining the transfer of bytes between and single source and sink. 3.JSDL HPC File Staging Profile (HPCFS) Designed to address file staging not bulk copying. 4.OGSA Basic Execution Service (BES) Defines a basic framework for defining and interacting with generic compute activities: JSDL + extensible state and information models. 5.Others that I am sure that I have missed ! (…ByteIO) Neither fully captures our requirements (not a criticism, they are designed to address their use- cases which only partially overlap with the requirements for our bulk data copy activity). Other Condor Stork - based on Condor Class-Ads Not sure if Globus has/intends a similar definition in its new developments (e.g. SaaS) anyone ? – I believe Ravi was originally supportive of a DMI for data transfers between multiple sources/sinks

Stork – Condor Class Ads Example of a Stork job request: [ dest_url= "gsiftp://eric1.loni.org/scratch/user/"; arguments = p 4 dbg vb"; src_url = "file:///home/user/test/"; dap_type = "transfer"; verify_checksum = true; verify_filesize = true; set_permission = "755" ; recursive_copy = true; network_check = true; checkpoint_transfer = true; output = "user.out"; err = "user.err"; log = "userjob.log"; ] Purportedly the first batch scheduler for data placement and data movement in a heterogeneous environment. Developed with respect to Condor Uses Condors ClassAd job description language and is designed to understand the semantics and characteristics of data placement tasks Recent NSF funding to develop as a production service

JSDL Data Staging 1 and the HPC File Staging Profile fileA overwrite true gsiftp://griddata1.dl.ac.uk:2811/myhome/fileA ftp://ngs.oerc.ox.ac.uk:2811/myhome/fileA … Define both the source and target within the same element which is permitted in JSDL. The HPC File Staging Profile (Wasson et al. 2008), limits the use of credentials to a single credential definition within a data staging element. Different credentials will be required for the source and the target. Maybe profile use of credentials within JSDL Source and Target ?

fileA MY_SCRATCH_DIR overwrite true gsiftp://griddata1.dl.ac.uk:2811/myhome/fileA e.g. MyProxyToken fileA MY_SCRATCH_DIR overwrite ftp://ngs.oerc.ox.ac.uk:2811/myhome/fileA e.g. wsa:Username/password token Coupled staging elements; A source data staging element for fileA and a corresponding target element for staging out of the same file. By specifying that the input file is deleted after the job has executed, this example simulates the effect of a data copy from one location to another through the staging host. No multiple data locations (alternative sources and sinks – we think this is kinda useful). Some more (proprietary?) elements required (e.g. DMI transfer requirements, file selectors, URI connection properties). Staging 2

OGSA DMI The OGSA Data Movement Interface (DMI) (Antonioletti et al. 2008) defines a number of elements for describing and interacting with a data transfer activity. The data source and destination are each described separately with a Data End Point Reference (DEPRs), which is a specialized form of WS-Address element (Box et al. 2004). In contrast to the JSDL data staging model, a DEPR facilitates the definition of one or more elements within a element. This is used to define alternative locations for the data source and/or sink. An implementation can select between its supported protocols and retry different source/sink combinations from the available list (improves resilience and the likelihood of performing a successful copy). There are some limitations: DMI is intended to describe only a single data copy operation between one source and one sink. To do several transfers, multiple invocations of a DMI service factory would be required to create multiple DMI service instances. We require a single (atomic) message packet that wraps multiple transfers (e.g. for routing through a message broker). Some of the existing constructs require extension / slight modification. Therefore: DMI/JSDL discussion at OGF to canvass some new possible? Extensions. Maybe build on DMI, and/or closer integration with JSDL data staging to describe a bulk copy activity.

+ <dmi:Data ProtocolUri= " " DataUrl="gsiftp://example.org/name/of/the/dir/"> <dmi:Data ProtocolUri=" urn:my-project:srm " DataUrl="srm://example.org/name/of/the/dir/">... Sink Details... ? ? ? ? DEPR defines alternative locations for the data source and/or sink and each nests its own credentials. Source (wsa:EndpointR eference type) Sink (wsa:EndpointR eference type) Transfer Requirements (needs extending) Bulk DMI Draft A pseudo-example Some overlap with jsdl data staging

Bulk Data Copy and JSDL Integration ? /usr/bin/datacopyagent.sh myBulkDataCopyDoc.xml... myBulkDataCopyDoc.xm... Possible? options for integrating the proposed document within JSDL; a) nesting within the element or b) staging-in of a document as input for the named executable? (ideas, advice…). Some (sketchy) integration options?

Or New staging requirements ? JSDL intended to be a generic compute activity description language. Rather than use a separate document to describe a bulk data copy activity, is it better to suggest some JSDL extensions to cater for bulk copying ? (ideas, advice…) Potentially a better route for more widespread adoption (e.g. existing BES implementations). Other thoughts: Orchestration of copy activities / DAG ?

Profile the OGSA BES state model for DMI sub-state specializations. Adds optional DMI sub-state specializations. Client/service may only recognize the main BES states if necessary. Suspend, resume, cancel. Add DMI fault types? Resume () Request Suspend () Request PendingFinished Cancelled Failed: Clean Unclean Unknown Running: Transferring Running: Suspended BES and DMI sub-state specialisations ? BES states DMI based sub-states Cancel ()

Message Model Requirements Document Message Bulk Data Copy Activity description Capture all information required to connect to each source URI and sink URI and subsequently enact the data copy activity. Transfer requirements, e.g. additional URI Properties, file selectors (reg-expression), scheduling parameters to define a batch-window, retry count, source/sink alternatives, checksums?, sequential ordering? DAG? Serialized user credential definitions for each source and sink. Control Messages Interact with a state/lifecycle model (e.g. stop, resume, cancel) Event Messages Standard fault types and status updates Information Model To advertise the service capabilities / properties / supported protocols