Presentation is loading. Please wait.

Presentation is loading. Please wait.

XML for Data Grid Applications

Similar presentations


Presentation on theme: "XML for Data Grid Applications"— Presentation transcript:

1 XML for Data Grid Applications
Chip Watson Thomas Jefferson National Accelerator Facility December 31, 2018December 31, 2018 PPDG Meeting

2 Why XML? -- Industry Trends
Strategy: Use web technologies, follow the success of the web... E-commerce companies (especially B2B) are currently investing heavily in XML technologies... Example news items: [December 11, 2000] "iPlanet Unveils Industry's First Full-Up B2B Commerce Platform…[based upon XML]” [December 08, 2000] "Schemantix (formerly Praxis) to Launch Schemantix Development Platform (SxDP) at XML 2000.’’ “Microsoft is augmenting its OLE DB for OLAP protocol with new interfaces based on XML…`The brass tacks on this is we're all going to run our analytical apps over the Internet, and the language these apps will use to communicate with their data sources will be XML,’ says Clay Young, VP of marketing at online analytical processing software vendor Knosys Inc.” -- InformationWeek, Dec 7, 2000 December 31, 2018December 31, 2018 PPDG Meeting

3 What is XML ? eXtensible Markup Language
Like HTML, but with user defined tags Tags refer to content, not presentation: <?xml version='1.0' encoding='ISO '?> <directory name="/clas" owner="root" group="other" modified="Aug 22 08:34"> <file name='97-12'/> <file name='98-02'/> <file name='98-03'/> <directory name='comm97'/> <directory name='e1'/> </directory> Properties of node Node contents XML has a tree data model December 31, 2018December 31, 2018 PPDG Meeting

4 XML vs CORBA XML is more verbose CORBA is harder to deploy
data transported as character strings (~2x for float) data is self describing, with string tags (~2x) (however, lists are separated by single whitespace, so string lists are carried with little overhead) CORBA is harder to deploy requires ORB, complex libraries, name server, etc. Both are language neutral XML supported in C/C++, Java, Perl, etc. December 31, 2018December 31, 2018 PPDG Meeting

5 What about SOAP ? Simple Object Access Protocol
SOAP is a protocol specification for invoking methods on servers, services, components and objects (RPC system). SOAP codifies the existing practice of using XML and HTTP as a method invocation mechanism. The SOAP specification mandates a small number of HTTP headers that facilitate firewall/proxy filtering. The SOAP specification also mandates an XML vocabulary that is used for representing method parameters, return values, and exceptions. December 31, 2018December 31, 2018 PPDG Meeting

6 Simple POST vs SOAP Simple POST SOAP
query contains tagged string values, like SOAP query contains structured arguments, even user defined types (example to follow) In either case, response is an http response of type xml, with arbitrary (tree-like) structure December 31, 2018December 31, 2018 PPDG Meeting

7 SOAP structure example
<SOAP-ENV:Envelope xmlns:SOAP-ENV=" SOAP-ENV:encodingStyle=" <SOAP-ENV:Body> <ppdg:AddFile xmlns:ppdg=” <directory>/clas/90-03/</directory> <file>test7.dat <owner name=“watson”/> <activity name=“calibration”/> </file> </ppdg:AddFile> </SOAP-ENV:Body> </SOAP-ENV:Envelope> December 31, 2018December 31, 2018 PPDG Meeting

8 Analysis: Simple vs SOAP
ReplicaCatalog & ReplicaHost (OO api) need to send method name & [0-2] string args Future catalog queries may need to send many selection criteria, but this could be done as a simple query string (hence 1 argument) question: may want to “batch” requests, sending, for example, an array of file names to resolve ? [could be done as many single calls, and let TCP buffer] Conclusion: Requirements do NOT dictate SOAP May still choose SOAP for standardization reasons…although the proposer does not have a good track record here December 31, 2018December 31, 2018 PPDG Meeting

9 Prototyping XML at Jlab
Goals: Get experience w/ XML Get experience w/ using XML in servlets Demonstrate feasibility of using XML as web protocol for ReplicaCatalog and ReplicaHost Deploy prototype replica system for experimental physics data stored in Jlab silo currently OSM + custom java infrastructure plan to replace OSM, resulting in pure java infrastructure December 31, 2018December 31, 2018 PPDG Meeting

10 XML & HTML sql db ldap db xml client corba obj html client style sheet
XML servlet xml client corba obj HTML servlet html client style sheet Two types of servlets used, one generating xml, another which calls the first, and uses a library (few calls) to apply a style sheet to the xml and generate html December 31, 2018December 31, 2018 PPDG Meeting

11 Prototype Components ReplicaCatalog ReplicaHost
java servlet producing XML xsl style sheet to translate this to html for browsers servlet to do formating (via style sheet) ReplicaHost Simple file transfer servers currently bbftpd, but soon httpd, gsiftpd December 31, 2018December 31, 2018 PPDG Meeting

12 Replica Catalog Implemented as Java servlet (Apache + Tomcat)
currently uses fork rsh ls /mss … to get listing of silo contents for demo purposes will use mysql via jdbc for persistent store (very soon) supports tree data model (maps existing silo system) Produces XML output for directory: listing of one directory, contents are files + subdirectories includes properties of this directory (owner, etc.) for file: properties of the file (owner, etc.) ReplicaHost(s) holding the file December 31, 2018December 31, 2018 PPDG Meeting

13 Replica Host Gives access information (disk-resident, offline, etc.)
If disk resident, locally translates file name (virtual path) to URL(s), indicating supported protocols, such as bbftp://bbftp.jlab.org/diskcache9/clas/file7.dat gsiftp://xxx.jlab.org/diskcache9/clas/file7.dat Future (within 1-2 months): support request to stage to disk support request to “pin” a file (advisory only) support request to store a file (push and/or pull?) manage update to catalog in response to local deletions of files web pages to fetch any file via browser December 31, 2018December 31, 2018 PPDG Meeting

14 Demo xml test of ReplicaCatalog viewed as xml
processed with style sheet & viewed as html December 31, 2018December 31, 2018 PPDG Meeting

15 Note: Directory Model Changed
Recommendation: Change the catalog data model to allow file system (tree) symantics in the logical name space. Hierarchical (apparently) containers Actual containers may still be flat: /a/b/c is one container /a/b/c/d/e is a separate container /a/b/c appears to contain “d” (even if not implemented that way in storage) This will probably be more attractive to physicists and other users. December 31, 2018December 31, 2018 PPDG Meeting

16 Future Activities 1. Finish SQL database for ReplicaCatalog
2. Finish integration of ReplicaHost and Jlab silo 3. Create exportable package for ReplicaHost Disk cache manager (java based) mountable by local clients ReplicaHost (java servlet based) File transfer daemons http bbftp gsiftp gridftp December 31, 2018December 31, 2018 PPDG Meeting

17 PPDG Sub-project (1) Protocol standardization choice of simple or SOAP
standardization of method names and / or arguments for requests XML tag name standardization response standardization (e.g. one directory listing) December 31, 2018December 31, 2018 PPDG Meeting

18 PPDG Sub-project (2) 1. Shared ReplicaCatalog servlet implementation
standardize java interface to local persistent store implement reference implementations: 1. above LDAP (compatible w/ or extending Globus solution) 2. above JDBC (Jlab design, open to revisions of schema) 2. Shared ReplicaHost servlet implementation standardize java interface to local silo, disk managers 1. CORBA calls to SRB 2. RMI calls to Jlab disk & silo managers 3. other? December 31, 2018December 31, 2018 PPDG Meeting

19 PPDG Sub-project (2) 3. C/C++ and Java client libraries
for Java & C++, implementing an OO api with local browsing of xml data 4. Extend ReplicaHost to support queueing of transfer requests... ...to/from other ReplicaHosts negotiate transfer protocol with other host negotiate push/pull with other host ...to/from remote transfer daemon protocol and direction fixed December 31, 2018December 31, 2018 PPDG Meeting


Download ppt "XML for Data Grid Applications"

Similar presentations


Ads by Google