Data Bridge Solving diverse data access in scientific applications Zoltán Farkas, Péter Kacsuk, Mark Santcroos, Silvia Olabarriaga, Ákos Balaskó, Krisztián Karóczkai zoltan.farkas@sztaki.mta.hu
Outline Problem statement Data Bridge as independent DCI service: Data Bridge concept Use-cases Data Bridge architecture WS-PGRADE integration Data browsing portlet gUSE integration
Problem statement Scientific applications: Data sources: Individual jobs or workflows Access data from diverse sources Science Gateways can hide the details, but… Data sources: Diverse types: HTTP, FTP, GridFTP, SRM, iRODS, … Thus, different APIs are needed to access these One possible solution is to use a service that can be used to access the sources through a unified interface
Existing solutions Name Supported storages Access possibilities OGSA-DAI Web services, XML databases, file services Web service Storage Resource Broker File systems, Relational Databases Web, APIs, Command line iRODS Disk, Tape, Database, Filesystem with Metadata catalog Web, WebDAV, Java API, Command line jSAGA FTP, GridFTP, SRM, LFC Java API Globus Online FTP, GridFTP Web interface
Data Bridge Offers a simple service that provides a generic interface above different DCI's storage services to handle the data stored The service in different use cases offers a way to browse, upload and download data, and with the help of multiple server instances it enables inter-DCI data transfer as well
Use cases Use case 1: Browse a single DCI data storage from WS-PGRADE, upload data Use case 2: Transfer data files between different DCIs Use case 3: Fetch input data on a DCI worker node from an other DCI Use case 4: Cloud storage usage
Use case 1: Storage browsing and data upload WS-PGRADE Browse and upload Storage Browsing Portlet Data Bridge Adaptor Interface Storage Adaptor Storage
Use case 2: Data Transfer – Using multi-level Data Bridge Client: Storage Browsing Portlet Custom application … Data Bridge Adaptor Interface Storage Adaptor1 Data Bridge Adaptor Data Bridge Adaptor Interface Storage Adaptor2 Storage1 Storage2
Data bridge usage guidelines: Use case 3: Fetch data on a DCI’s worker node from a „foreign” DCI’s storage Data bridge usage guidelines: First try to fetch the data using native tools Only if this fails, use the Data Bridge DCI Worker node Data Bridge Wrapper Pre-process Adaptor Interface Executable Storage Adaptor Storage Post-process
Use case 3: Get FTP data from PBS Could be other protocols (e.g. SRM) as well PBS Worker node Data Bridge Wrapper Pre-process Adaptor Interface Executable FTP Adaptor FTP Server Post-process
Use case 4: Cloud Storage access from WS-PGRADE/gUSE Currently, no S3 support in WS-PGRADE An S3 Data Bridge adaptor would fix this WS-PGRADE/gUSE DCI Worker node Job Amazon S3 Data Bridge
Data Bridge Architecture Public Interface HTTP servlet Adaptor Manager Temporary URL queue Worker Pool URI URI URI Thread1 Thread2 Threadn Adaptor Interface DCI Adaptor1 DCI Adaptor2 DCI Adaptor3 DCI Adaptorm jSAGA
Data Bridge components Interfaces: Public Interface Adaptor Interface Adaptor Manager Worker Threads DCI Adaptors
Data Bridge components- Interfaces Public Interface: Provides the public interface for external components (Portlets, gUSE, …) Web Service interface Adaptor Interface: A Java interface that hides the details of the different adaptors
Data Bridge Public Interface Operations: List Mkdir Delete Get Put Copy Move Entities: URI (either a path, an URL or some specific class) Error reports: Common exceptions
Data Bridge Public Interface - URI Represents an element with a given URI (a directory, a file, metadata attributes, …) Also needs to carry security credentials (if needed) Attributes: Nothing special in the base class For gLite, e.g: Path: the full path Type: directory or file Size: length of the entity (0 for directories) Attributes: optional, contains information as returned by the Adaptor Interface's Stat function
Data Bridge Public Interface – Get and Put Two-phase up- and download with the temporary URL queue: First, the web service interface is invoked to register the transfer request Next, a simple HTTP client may use HTTP GET or POST/PUT to down- or upload the data This way, web service invocation („heavyweight” SOAP) is separated from data transfer („lightweight” HTTP) Public Interface HTTP servlet Adaptor Manager Temporary URL queue Worker Pool URI URI URI Thread1 Thread2 Threadn Adaptor Interface DCI Adaptor1 DCI Adaptor2 DCI Adaptor3 DCI Adaptorm
Adaptor Manager and Worker threads Provided by JAX-WS web service API Tasks: Manage incoming requests Initialize worker threads to perform the requested operation With the help of different adaptors
DCI Adaptors Implement: Adaptor Interface Tasks: Types: Perform operations requested by the Worker Threads, that is operations invoked through the web service Types: gLite (using jSAGA) GridFTP (using jSAGA) FTP (using jSAGA) … Data Bridge: special adaptor to forward requests to other Data Bridges
Data Bridge clients Web Service clients: Java API: Create your own based on the WSDL (or REST) Java API: Provides a convenient tool to use Data Bridge Public Interface functions Data transfer functions should accept InputStream and OutputStream objects as their arguments
WS-PGRADE integration A Data Browsing portlet that eases storage management
WS-PGRADE Workflow I/O configuration During a workflow node's IO configuration the user should be able to select files from storages The provided interface should be the same as the selected storage's Storage Browsing portlet (only with one panel)
Current status, future work Core Data Bridge (available as a web service) ready, working with most major protocols (FTP, GridFTP, SRM) User Interface development has been started, first version will be available as part of WS-PGRADE/gUSE shortly
Thank you for your attention! Questions Thank you for your attention! ?