Download presentation
Presentation is loading. Please wait.
Published byFranklin Nelson Modified over 9 years ago
1
Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu http://www.npaci.edu/DICE/SRB/ SDSC/UCSD/NPACI
2
A Quick Overview of SRB Data Grid Federated server system –Single client signOn –Access to all resources in the federation –Data grid owns all files Context management –MCAT server – Metadata catalog –Use traditional DBMS Four logical name spaces –Logical resource name (operations on sets of resources) –Distinguished user name space –Logical file name space –Metadata attribute name space (state information)
3
Federated Servers and Resources MCAT1 MCAT2 MCAT3 Server1.1 Server1.2 Server2.1 Server2.2 Server3.1 Federated Data Grids Data Grid 1 Data Grid 2 Data Grid 3
4
Types of Data Loss Risks Media corruption Vendor systemic failure Operational error Malicious user Natural disaster Solutions - replication, firewalls, federation
5
National Archives Persistent Archive NARAU MdSDSC MCAT Principle copy stored at NARA with complete metadata catalog Replicated copy at U Md for improved access, load balancing and disaster recovery Deep Archive at SDSC, no user access, but complete copy
6
BIRN Virtual Data Grid: Source Mark Ellisman Defines a Distributed Data Handling System Integrates Storage Resources in the BIRN network Integrates Access to Data, to Computational and Visualization Resources Acts as a Virtual Platform for Knowledge-based Data Integration Activities Provides a Uniform Interface to Users
7
Worldwide Universities Network David De Roure, University of Southampton dder@ecs.soton.ac.uk http://www.ecs.soton.ac.uk/~dder dder@ecs.soton.ac.uk http://www.ecs.soton.ac.uk/~dder Implement data grid linking academic universities Support collaborative research and education –HASTAC: Humanities, Arts, Science and Technology Advanced Collaboratory –Geo-referenced social science data collections –Earth Science data collections Provide data grid registry to promote federation of international data grids
8
Foundation of the WUN Grid SDSC Manchester Southampton White Rose NCSA A functioning, general purpose international Grid A hub for federating other data grids Manchester-SDSC mirror
9
Authentication User authenticates to a data grid server –GSI or challenge response –Access controls map constraints between user distinguished names and logical file names Data grid server authenticates to remote data grid server Remote data grid server authenticates to remote storage repository under data grid ID
10
Firewall Interactions Client behind a firewall Client initiated parallel I/O Client initiated bulk file load Server behind a firewall Paired servers inside and outside the firewall Server inside the firewall only responds to messages from outside server Server initiated parallel I/O Federated data grids Need to add metadata to forward messages from a paired front-end server to the back-end server
11
SRB server1 SRB agent SRB server2 Client behind firewall MCAT Sput SRB agent 1 2 3 4 5 6 srbObjCreate srbObjWrite 1.Logical-to-Physical mapping 2. Identification of Replicas 3.Access & Audit Control Peer-to-peer Request Server(s) Spawning Data Transfer R
12
SRB server1 SRB agent SRB server2 Client Initiated Parallel I/O MCAT Sput -M SRB agent 1 2 3 4 7 8 srbObjPut 1.Logical-to-Physical mapping 2. Identification of Replicas 3.Access & Audit Control Return socket addr., port and cookie Connect to server Data transfer R 5 6
13
SRB server SRB agent SRB server2 Client Initiated -Third Party Data Transfer MCAT Scp SRB agent 1 2 3 4 5 srbObjCopy dataPut- socket addr., port and cookie Connect to server2 Data transfer R 6 SRB server1 SRB server SRB agent R
14
SRB server1 SRB agent SRB server2 Client Initiated - Bulk Load Operation MCAT Sput -b SRB agent 1 2 3 4 6 Return Resource Location 1.Logical-to-Physical mapping 2. Identification of Replicas 3.Access & Audit Control Query Resource Bulk Register Bulk Data transfer thread R 8 Mb buffer Bulk Registration threads 5 Store Data in a temp file Unfold temp file
15
SRB server1 SRB agent SRB server2 Server behind firewall MCAT Sput SRB agent 1 2 3 4 5 6 srbObjCreate srbObjWrite 1.Logical-to-Physical mapping 2. Identification of Replicas 3.Access & Audit Control Peer-to-peer Request Server(s) Spawning Data Transfer R
16
SRB server1 SRB agent SRB server2 Server Initiated Parallel I/O MCAT Sput -m SRB agent 1 2 3 4 5 6 srbObjPut + socket addr, port and cookie 1.Logical-to-Physical mapping 2. Identification of Replicas 3.Access & Audit Control Peer-to-peer Request Connect to client Data transfer R
17
Federated Data Grids MCAT1 MCAT2 MCAT3 Server1.1 Server1.2 Server2.1 Server2.2 Server3.1 Automating redirection to a server in front of a firewall Data Grid 1 Data Grid 2 Data Grid 3 Client
18
Container - Archival of Small files Performance issues with storing/retrieving large number of small files to/from tape Container design –physical grouping of small files –Implemented with a Logical Resource A pool of Cache Resource for the frontend resource An Archival Resource for the backend resource –Read/Write I/O always done on Cache Resource and sync to the Archival Resource Stage to cache if a cache copy does not exist The entire container is moved between cache and archival and written to tape Bulk operation with container - faster
19
Examples of using container Make a container with name “myCont” –Smkcont -S cont-sdsc myCont Put a file into “myCont” –Sput -c myCont myLocalSrcFile mySRBTargFile Bulk Load a local directory into “myCont” –Sbload -c myCont myLocalSrcDir mySRBTargColl Sync “myCont” to archival and purge the cache copy –Ssyncont -d myCont Download a file store in “myCont” –Sget mySRBsrcFile myLocalTargFile Slscont - list existing containers and contents
20
Summary of Data Transfer modes Serial - default mode Parallel - for large files Bulk - for large number of small files Container - Archiving small files (to tapes). Container + bulk - faster archival of small files
21
Types of Data Transfer Local to SRB - Sput, Srsync SRB to Local - Sget, Srsync SRB to SRB - Scp, Sreplicate, Sbkupsrb, Srsync –Third party transfer Server to Server data transfer, client not involved Parallel I/O
22
Other useful Data Management Scommands Srsync, Schksum - –Data synchronization using checksum values –similar to UNIX’s rsync Sreplicate, Sbkupsrb –generate multiple copies of data using replica –Replica - multiple copies of the same file same Logical Path Name - e.g., /home/srb.sdsc/foo replica on different resources Each replica has different replNum Most recently modified flag
23
Commands Using Checksum Registering checksum values into MCAT –at the time of upload Sput -k - compute checksum of local source file and register with MCAT Sput -K –checkum verification mode –After upload, compute checksum by reading back uploaded file –Compare with the checksum generated with locally –Existing SRB files Schksum –compute and register checksum if not already exist Srsync - if the checksum does not exist
24
Srsync command Synchronize the data –from a local copy to SRB Srsync myLocalFile s:mySrbFile –from a SRB copy to a local file system Srsync s:mySrbFile myLocalFile –between two SRB paths. Srsync s:mySrbFile1 s:mySrbFile2 Similar to rsync –compare the checksum values of source and target –upload/download source to target if target does not exist or checksum differ –Save checksum values to MCAT
25
Srsync command (cont) Some Srsync options –-r --- recursively Synchronizing a directory/collection –-s --- use size instead of checksum value for determining synchronization Faster - no checksum computation Less accurate –-m, -M --- parallel I/O
26
Sreplicate, Sbkupsrb commands Generate multiple copies of data using replica Sreplicate - Generate a new replica each time Sbkupsrb –Backups the srb data/collection to the specified backupResource with a replica –If an up-to-date replica already exists in the backupResource, nothing will be done
27
Data and Resource Virtualisation Data and Collections Organisation –File Logical Name space - UNIX like directories (collections) and files (data) Mapping of logical name to physical attributes - host address, physical path. UNIX like API and utilities for making collections (mkdir) and data creation (creat) Virtualisation of Resources –Mapping of a logical resource name to physical attributes: Resource Location, Type –Client use a single logical name to reference a resource
28
Listing Resources SgetR – List Configured Resources –SgetR –--------------------------- RESULTS ------------------------------ –rsrc_name: unix-sdsc –netprefix: srb.sdsc.edu:NULL:NULL –rsrc_typ_name: unix file system –default_path: /misc/srb/srb/SRBVault/?USER.?DOMAIN/?SPLITPATH/TEST.?PATH?DA TANAME.?RANDOM.?TIMESEC –phy_default_path: /misc/srb/srb/SRBVault/?USER.?DOMAIN/?SPLITPATH/TEST.?PATH?DA TANAME.?RANDOM.?TIMESEC –phy_rsrc_name: unix-sdsc –rsrc_typ_name: unix file system –rsrc_class_name: permanent –user_name: srb –domain_desc: sdsc –zone_id: sdscdemo –-----------------------------------------------------------------
29
Serial Mode Data Transfer Simple to Implement and Use –Unix-like API – srbObjCreate, srbObjWrite Performance Issue –2 hops data transfer –Single data stream –One file at a time – overhead relatively high for small files MCAT interaction – query and registration Small buffer transfer Large files – Single Hop, multiple data streams Small files – Single Hop, multiple files at a time
30
Upload a File to a SRB Resource Sput –S unix-sdsc localFile srbFile –Default data transfer mode – serial Sls -l srbFile – srb 0 unix-sdsc 2764364 2004-08-21-18.19 % srbFile
31
Small files Data Transfer (Bulk operation) Upload/download large number of small files –One file at a time – relative high overhead MCAT interaction, Small buffer transfer 1 sec/files for WAN Bulk Operation –Bulk data transfer transfer multiple files in a single large buffer (8 Mb) –Bulk Registration Register large number of files (1,000) in a single call –Multiple threads for transfer and registration –Single Hop –3-10 times speedup –All or nothing type operation –Specify -b in Sput/Sget
32
Parallel Mode Data Transfer For large file transfer –multiple data streams –Single hop data transfer Two sub-modes –Server initiated –Client initiated (for clients behind firewall) Up to 5 times speed up for WAN Two simple API – srbObjPut and srbObjGet Use –m (Server initiated), -M (Client initiated) options Available to all Scommands involving data transfer –As an option – Sput, Sget, Srsync –Automatic – Sreplicate, Scp, Sbkupsrb, SsyncD, Ssyncont
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.