A Web-Based Data Grid Chip Watson, Ian Bird, Jie Chen, Ying Chen, Bryan Hess, Andy Kowalski Thomas Jefferson National Accelerator Facility November 22, 2018
Outline Overview of a prototype JLAB data grid architecture Status of the development Expected future milestones Lessons learned so far November 22, 2018
JLAB Prototype Architecture Summary The prototype data grid consists of Web services for information management and control File daemons (like ftpd) for bulk data transfer Back-end services used by the web services Communication w/ web services is via HTTP and XML (HTTPS w/ X.509 certificate for privileged operations) Communication w/ file daemons is via a daemon specific protocol Communication w/ back-end services is site specific November 22, 2018
In picture form… ClientProgram ReplicaCatalog DataGridServer Agent ReplicaCatalog DataGridServer FileServer R C Host File Host November 22, 2018
Web Services Replica Catalog Data Grid Server (aka Replica Host) Holds global file namespace May itself be replicated for redundancy or performance References (for given file) data grid nodes (but not physical path) Data Grid Server (aka Replica Host) Holds and serves files May be a disk cache; may include tertiary storage Translates global name to URL for retrieval (if cache resident) (pull by client) Accepts new files (push by client) Supports queuing of file transfer requests between nodes (3rd party) Supports policy based file movement November 22, 2018
Replica Catalog Components Relational database Global directory name, file name, owner, size, etc Set of Data Grid Nodes holding copies of the file, and last reported state of that replica copy (online, offline) XML servlet Directory level services per invocation, returning rich info from the database as an XML document Catalog updates HTTP servlet Applies style sheet(s) to the XML document, allows easy browsing and simple interactions with just a simple web browser November 22, 2018
Current Status of Replica Catalog A prototype exists with following functionality Database populated with ALL files from the Jefferson Lab silo (no owner, group, file size info loaded for now) XML servlet for browsing HTTP servlet for browsing http://129.57.41.138/servlet/dg.HttpReplicaCatalog?dname=/ Missing functionality in this prototype Authentication Easy, already done for another (batch system) prototype Edit catalog In principle easy, just need to finalize scenarios Extensible file properties Moderately easy, just need to add a name-value table to db and expand the XML document for a single file to include this info November 22, 2018
Status (cont.) Observations Web browsing into directories w/ thousands of files is slow (produces an ENORMOUS web page), but works Plan to segment, with “Next Page” link Probably need to allow client to specify number of files to retrieve, and offset for next retrieval November 22, 2018
Data Grid Node Components XML (and HTTP) servlets File Catalog Servlet (Replica Host) Translates file I/O requests to specific URL (including protocol negotiation or selection) Provides offline / online status of file Transfer Request Servlet Queues file transfer requests, reports status Edits transfer policy for specified directory Disk Cache Manager Servlet Edits policy of disk cache manager File Server(s) ftp, bbftp, gridftp, … November 22, 2018
Data Grid Server Components (Implementation) Disk Cache Manager (back end service) Java application Manages disk pool -- NFS mounted read-only to local users SQL database to track cached files, pending transfers Migrates files to / from tape (if requested and if has a reference to a Tape Manager) Interacts with a Disk Policy Agent (planned) Tape Manager (back end service) Separate Java application & db (running on different host) Stages files to or from silo (has own small disk cache) NFS exports stub file system November 22, 2018
Data Grid Node Components (Implementation) Disk Policy Agent (back end service) Runs in Disk Cache Manager’s VM Keeps replica catalog up to date Advises cache manager as to which files to delete (deleting last globally disk resident copy is expensive) Propagates transfer policy from Replica Catalog Grid Transfer Agent (back end service) Operates on queued transfer requests Uses remote File Servers (e.g. is or spawns an xxftp client) Runs (probably) in disk cache manager’s VM November 22, 2018
Current Status of Data Grid Node Data Grid Servlets Translation from global name to URL is hard coded Supports browsing of disk cache Newest prototype allows browsing of unmanaged node-local file system, including /home, /data, …, and the copying of files within a single data node (adding authentication soon) File Servers bbftp in production use at Jlab; waiting for gridFTP November 22, 2018
Back End Status Disk Cache Manager Tape Manager File Transfer Agent Simple LRU policy (pluggable), no user quotas No use of policy agent yet (to sync with replica catalog) Automatic migration of specified files to tape guaranteed before deletion Only 1 node operating in this mode (variant of other disk cache managers at Jlab) Tape Manager Fully operational, in production use at Jlab File Transfer Agent Just starting development November 22, 2018
Status Summary Missing Functionality A lot! Transfer queuing Advanced reservation & quotas Policy based operations Automatic updates of replica catalog All of these are planned or in progress… November 22, 2018
Data Grid Applications: File Manager File Manager Design Uses Replica Catalog (XML) Uses Data Grid Node (XML) GUI to browse files GUI to copy files (and view queues) Status XML communications and file GUI done 3rd party transfer operations awaiting additional functionality in the data grid node Currently application, but plan to make into an applet November 22, 2018
Deployment / Development 2Q 01 2 data grid servers running at Jlab & MIT for LQCD grid browsing (replica catalog and data grid server) retrieve file: http, bbftp & gridftp Command line utility and web interface to “publish” a file (insert into grid node from co-located machine / local file system) 3Q 01 2nd grid running between Jlab & FSU for CLAS (Hall D prototype) “push” file into a data grid server from offsite 3rd party file transfers on demand (queued) 1Q 02 Policy based file migration Asynchronous event notification (HTTP based) November 22, 2018