A Web-Based Data Grid Chip Watson, Ian Bird, Jie Chen,

Slides:



Advertisements
Similar presentations
17 Copyright © 2005, Oracle. All rights reserved. Deploying Applications by Using Java Web Start.
Advertisements

WP2: Data Management Gavin McCance University of Glasgow November 5, 2001.
JLab Lattice Portal – Data Grid Web Service Ying Chen, Chip Watson Thomas Jefferson National Accelerator Facility.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
Web Servers How do our requests for resources on the Internet get handled? Can they be located anywhere? Global?
Week 2 IBS 685. Static Page Architecture The user requests the page by typing a URL in a browser The Browser requests the page from the Web Server The.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.
Computer Science 101 Web Access to Databases Overview of Web Access to Databases.
Data Grid Web Services Chip Watson Jie Chen, Ying Chen, Bryan Hess, Walt Akers.
JavaScript, Fourth Edition Chapter 12 Updating Web Pages with AJAX.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
08/30/05GDM Project Presentation Lower Storage Summary of activity on 8/30/2005.
Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy Thomas Jefferson National Accelerator Facility Andy Kowalski.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
Copyright © cs-tutorial.com. Overview Introduction Architecture Implementation Evaluation.
09/02 ID099-1 September 9, 2002Grid Technology Panel Patrick Dreher Technical Panel Discussion: Progress in Developing a Web Services Data Analysis Grid.
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
Chapter 29 World Wide Web & Browsing World Wide Web (WWW) is a distributed hypermedia (hypertext & graphics) on-line repository of information that users.
2007cs Servers on the Web. The World-Wide Web 2007 cs CSS JS HTML Server Browser JS CSS HTML Transfer of resources using HTTP.
Disk Farms at Jefferson Lab Bryan Hess
CS 346 – Chapter 11 File system –Files –Access –Directories –Mounting –Sharing –Protection.
Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.
Free Powerpoint Templates Page 1 Free Powerpoint Templates Users and Documents.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
Oct HPS Collaboration Meeting Jeremy McCormick (SLAC) HPS Web 2.0 OR Web Apps and Databases (Oh My!) Jeremy McCormick (SLAC)
DGC Paris Spitfire A Relational DB Service for the Grid Leanne Guy Peter Z. Kunszt Gavin McCance William Bell European DataGrid Data Management.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN.
Interstage BPM v11.2 1Copyright © 2010 FUJITSU LIMITED INTERSTAGE BPM ARCHITECTURE BPMS.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Service-Oriented Architecture for Mobile Applications.
9/20/04Storage Resource Manager, Timur Perelmutov, Jon Bakken, Don Petravick, Fermilab 1 Storage Resource Manager Timur Perelmutov Jon Bakken Don Petravick.
AIRS Meeting GSFC, February 1, 2002 ECS Data Pool Gregory Leptoukh.
ATLAS DDM Developing a Data Management System for the ATLAS Experiment September 20, 2005 Miguel Branco
Compute and Storage For the Farm at Jlab
Progress Apama Fundamentals
Running a Forms Developer Application
Simulation Production System
The Client-Server Model
Classic Storage Element
File System Implementation
Data Bridge Solving diverse data access in scientific applications
gLite Data management system overview
Service Challenge 3 CERN
Evolution of Internet.
StoRM Architecture and Daemons
Chapter 2: System Structures
Artem Trunov and EKP team EPK – Uni Karlsruhe
LQCD Computing Operations
OGSA Data Architecture Scenarios
CSI 400/500 Operating Systems Spring 2009
Printer Admin Print Job Manager
WEB API.
Patrick Dreher Research Scientist & Associate Director
Distributed P2P File System
Outline Midterm results summary Distributed file systems – continued
Initial job submission and monitoring efforts with JClarens
XML for Data Grid Applications
The EU DataGrid Fabric Management Services
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
SDMX IT Tools SDMX Registry
Presentation transcript:

A Web-Based Data Grid Chip Watson, Ian Bird, Jie Chen, Ying Chen, Bryan Hess, Andy Kowalski Thomas Jefferson National Accelerator Facility November 22, 2018

Outline Overview of a prototype JLAB data grid architecture Status of the development Expected future milestones Lessons learned so far November 22, 2018

JLAB Prototype Architecture Summary The prototype data grid consists of Web services for information management and control File daemons (like ftpd) for bulk data transfer Back-end services used by the web services Communication w/ web services is via HTTP and XML (HTTPS w/ X.509 certificate for privileged operations) Communication w/ file daemons is via a daemon specific protocol Communication w/ back-end services is site specific November 22, 2018

In picture form… ClientProgram ReplicaCatalog DataGridServer Agent ReplicaCatalog DataGridServer FileServer R C Host File Host November 22, 2018

Web Services Replica Catalog Data Grid Server (aka Replica Host) Holds global file namespace May itself be replicated for redundancy or performance References (for given file) data grid nodes (but not physical path) Data Grid Server (aka Replica Host) Holds and serves files May be a disk cache; may include tertiary storage Translates global name to URL for retrieval (if cache resident) (pull by client) Accepts new files (push by client) Supports queuing of file transfer requests between nodes (3rd party) Supports policy based file movement November 22, 2018

Replica Catalog Components Relational database Global directory name, file name, owner, size, etc Set of Data Grid Nodes holding copies of the file, and last reported state of that replica copy (online, offline) XML servlet Directory level services per invocation, returning rich info from the database as an XML document Catalog updates HTTP servlet Applies style sheet(s) to the XML document, allows easy browsing and simple interactions with just a simple web browser November 22, 2018

Current Status of Replica Catalog A prototype exists with following functionality Database populated with ALL files from the Jefferson Lab silo (no owner, group, file size info loaded for now) XML servlet for browsing HTTP servlet for browsing http://129.57.41.138/servlet/dg.HttpReplicaCatalog?dname=/ Missing functionality in this prototype Authentication Easy, already done for another (batch system) prototype Edit catalog In principle easy, just need to finalize scenarios Extensible file properties Moderately easy, just need to add a name-value table to db and expand the XML document for a single file to include this info November 22, 2018

Status (cont.) Observations Web browsing into directories w/ thousands of files is slow (produces an ENORMOUS web page), but works Plan to segment, with “Next Page” link Probably need to allow client to specify number of files to retrieve, and offset for next retrieval November 22, 2018

Data Grid Node Components XML (and HTTP) servlets File Catalog Servlet (Replica Host) Translates file I/O requests to specific URL (including protocol negotiation or selection) Provides offline / online status of file Transfer Request Servlet Queues file transfer requests, reports status Edits transfer policy for specified directory Disk Cache Manager Servlet Edits policy of disk cache manager File Server(s) ftp, bbftp, gridftp, … November 22, 2018

Data Grid Server Components (Implementation) Disk Cache Manager (back end service) Java application Manages disk pool -- NFS mounted read-only to local users SQL database to track cached files, pending transfers Migrates files to / from tape (if requested and if has a reference to a Tape Manager) Interacts with a Disk Policy Agent (planned) Tape Manager (back end service) Separate Java application & db (running on different host) Stages files to or from silo (has own small disk cache) NFS exports stub file system November 22, 2018

Data Grid Node Components (Implementation) Disk Policy Agent (back end service) Runs in Disk Cache Manager’s VM Keeps replica catalog up to date Advises cache manager as to which files to delete (deleting last globally disk resident copy is expensive) Propagates transfer policy from Replica Catalog Grid Transfer Agent (back end service) Operates on queued transfer requests Uses remote File Servers (e.g. is or spawns an xxftp client) Runs (probably) in disk cache manager’s VM November 22, 2018

Current Status of Data Grid Node Data Grid Servlets Translation from global name to URL is hard coded Supports browsing of disk cache Newest prototype allows browsing of unmanaged node-local file system, including /home, /data, …, and the copying of files within a single data node (adding authentication soon) File Servers bbftp in production use at Jlab; waiting for gridFTP November 22, 2018

Back End Status Disk Cache Manager Tape Manager File Transfer Agent Simple LRU policy (pluggable), no user quotas No use of policy agent yet (to sync with replica catalog) Automatic migration of specified files to tape guaranteed before deletion Only 1 node operating in this mode (variant of other disk cache managers at Jlab) Tape Manager Fully operational, in production use at Jlab File Transfer Agent Just starting development November 22, 2018

Status Summary Missing Functionality A lot! Transfer queuing Advanced reservation & quotas Policy based operations Automatic updates of replica catalog All of these are planned or in progress… November 22, 2018

Data Grid Applications: File Manager File Manager Design Uses Replica Catalog (XML) Uses Data Grid Node (XML) GUI to browse files GUI to copy files (and view queues) Status XML communications and file GUI done 3rd party transfer operations awaiting additional functionality in the data grid node Currently application, but plan to make into an applet November 22, 2018

Deployment / Development 2Q 01 2 data grid servers running at Jlab & MIT for LQCD grid browsing (replica catalog and data grid server) retrieve file: http, bbftp & gridftp Command line utility and web interface to “publish” a file (insert into grid node from co-located machine / local file system) 3Q 01 2nd grid running between Jlab & FSU for CLAS (Hall D prototype) “push” file into a data grid server from offsite 3rd party file transfers on demand (queued) 1Q 02 Policy based file migration Asynchronous event notification (HTTP based) November 22, 2018