Data Grid Interactions with Firewalls Michael Wan Reagan Moore SDSC/UCSD/NPACI.

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Grid Based Solutions for Distributed Data Management Reagan.
Security Requirements for Shared Collections Storage Resource Broker Reagan W. Moore
Toolbox Mirror -Overview Effective Distributed Learning.
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
Other File Systems: AFS, Napster. 2 Recap NFS: –Server exposes one or more directories Client accesses them by mounting the directories –Stateless server.
Hands-On Microsoft Windows Server 2003 Networking Chapter 6 Domain Name System.
Hands-On Microsoft Windows Server 2003 Networking Chapter 7 Windows Internet Naming Service.
Chapter 12 File Management Systems
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
70-270, MCSE/MCSA Guide to Installing and Managing Microsoft Windows XP Professional and Windows Server 2003 Chapter Nine Managing File System Access.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 7: Planning a DNS Strategy.
An Introduction to DuraCloud Carissa Smith, Partner Specialist Michele Kimpton, Project Director Bill Branan, Lead Software Developer Andrew Woods, Lead.
Data Grid Web Services Chip Watson Jie Chen, Ying Chen, Bryan Hess, Walt Akers.
File Systems (2). Readings r Silbershatz et al: 11.8.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 14: Problem Recovery.
1 The Google File System Reporter: You-Wei Zhang.
10 May 2007 HTTP - - User data via HTTP(S) Andrew McNab University of Manchester.
1 Chapter 12 File Management Systems. 2 Systems Architecture Chapter 12.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI.
Distributed Systems. Interprocess Communication (IPC) Processes are either independent or cooperating – Threads provide a gray area – Cooperating processes.
5 Chapter Five Web Servers. 5 Chapter Objectives Learn about the Microsoft Personal Web Server Software Learn how to improve Web site performance Learn.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Data Grid Services/SRB/SRM & Practical Hai-Ning Wu Academia Sinica Grid Computing.
File and Object Replication in Data Grids Chin-Yi Tsai.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
Module 11: Implementing ISA Server 2004 Enterprise Edition.
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Archive for the NSDL Reagan W. Moore Charlie Cowart.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Data Grids, Digital Libraries, and Persistent Archives Reagan.
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
Introduction to The Storage Resource.
Cloud Computing Computer Science Innovations, LLC.
A Technical Overview Bill Branan DuraCloud Technical Lead.
Presenter: Seikwon KAIST The Google File System 【 Ghemawat, Gobioff, Leung 】
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
IRODS Advanced Features Michael Wan
Building Preservation Environments from Federated Data Grids Reagan W. Moore San Diego Supercomputer Center Storage.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Performance measurement of transferring files on the federated SRB
Integrating Disk into Backup for Faster Restores
Oxana Smirnova, Jakob Nielsen (Lund University/CERN)
An Overview of iRODS Integrated Rule-Oriented Data System
Policy-Based Data Management integrated Rule Oriented Data System
Arcot Rajasekar Michael Wan Reagan Moore (sekar, mwan,
Chapter 2: Operating-System Structures
Introduction to Operating Systems
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
INFNGRID Workshop – Bari, Italy, October 2004
Chapter 2: Operating-System Structures
STATEL an easy way to transfer data
Presentation transcript:

Data Grid Interactions with Firewalls Michael Wan Reagan Moore SDSC/UCSD/NPACI

A Quick Overview of SRB Data Grid Federated server system –Single client signOn –Access to all resources in the federation –Data grid owns all files Context management –MCAT server – Metadata catalog –Use traditional DBMS Four logical name spaces –Logical resource name (operations on sets of resources) –Distinguished user name space –Logical file name space –Metadata attribute name space (state information)

Federated Servers and Resources MCAT1 MCAT2 MCAT3 Server1.1 Server1.2 Server2.1 Server2.2 Server3.1 Federated Data Grids Data Grid 1 Data Grid 2 Data Grid 3

Types of Data Loss Risks Media corruption Vendor systemic failure Operational error Malicious user Natural disaster Solutions - replication, firewalls, federation

National Archives Persistent Archive NARAU MdSDSC MCAT Principle copy stored at NARA with complete metadata catalog Replicated copy at U Md for improved access, load balancing and disaster recovery Deep Archive at SDSC, no user access, but complete copy

BIRN Virtual Data Grid: Source Mark Ellisman Defines a Distributed Data Handling System Integrates Storage Resources in the BIRN network Integrates Access to Data, to Computational and Visualization Resources Acts as a Virtual Platform for Knowledge-based Data Integration Activities Provides a Uniform Interface to Users

Worldwide Universities Network David De Roure, University of Southampton Implement data grid linking academic universities Support collaborative research and education –HASTAC: Humanities, Arts, Science and Technology Advanced Collaboratory –Geo-referenced social science data collections –Earth Science data collections Provide data grid registry to promote federation of international data grids

Foundation of the WUN Grid SDSC Manchester Southampton White Rose NCSA A functioning, general purpose international Grid A hub for federating other data grids Manchester-SDSC mirror

Authentication User authenticates to a data grid server –GSI or challenge response –Access controls map constraints between user distinguished names and logical file names Data grid server authenticates to remote data grid server Remote data grid server authenticates to remote storage repository under data grid ID

Firewall Interactions Client behind a firewall Client initiated parallel I/O Client initiated bulk file load Server behind a firewall Paired servers inside and outside the firewall Server inside the firewall only responds to messages from outside server Server initiated parallel I/O Federated data grids Need to add metadata to forward messages from a paired front-end server to the back-end server

SRB server1 SRB agent SRB server2 Client behind firewall MCAT Sput SRB agent srbObjCreate srbObjWrite 1.Logical-to-Physical mapping 2. Identification of Replicas 3.Access & Audit Control Peer-to-peer Request Server(s) Spawning Data Transfer R

SRB server1 SRB agent SRB server2 Client Initiated Parallel I/O MCAT Sput -M SRB agent srbObjPut 1.Logical-to-Physical mapping 2. Identification of Replicas 3.Access & Audit Control Return socket addr., port and cookie Connect to server Data transfer R 5 6

SRB server SRB agent SRB server2 Client Initiated -Third Party Data Transfer MCAT Scp SRB agent srbObjCopy dataPut- socket addr., port and cookie Connect to server2 Data transfer R 6 SRB server1 SRB server SRB agent R

SRB server1 SRB agent SRB server2 Client Initiated - Bulk Load Operation MCAT Sput -b SRB agent Return Resource Location 1.Logical-to-Physical mapping 2. Identification of Replicas 3.Access & Audit Control Query Resource Bulk Register Bulk Data transfer thread R 8 Mb buffer Bulk Registration threads 5 Store Data in a temp file Unfold temp file

SRB server1 SRB agent SRB server2 Server behind firewall MCAT Sput SRB agent srbObjCreate srbObjWrite 1.Logical-to-Physical mapping 2. Identification of Replicas 3.Access & Audit Control Peer-to-peer Request Server(s) Spawning Data Transfer R

SRB server1 SRB agent SRB server2 Server Initiated Parallel I/O MCAT Sput -m SRB agent srbObjPut + socket addr, port and cookie 1.Logical-to-Physical mapping 2. Identification of Replicas 3.Access & Audit Control Peer-to-peer Request Connect to client Data transfer R

Federated Data Grids MCAT1 MCAT2 MCAT3 Server1.1 Server1.2 Server2.1 Server2.2 Server3.1 Automating redirection to a server in front of a firewall Data Grid 1 Data Grid 2 Data Grid 3 Client

Container - Archival of Small files Performance issues with storing/retrieving large number of small files to/from tape Container design –physical grouping of small files –Implemented with a Logical Resource A pool of Cache Resource for the frontend resource An Archival Resource for the backend resource –Read/Write I/O always done on Cache Resource and sync to the Archival Resource Stage to cache if a cache copy does not exist The entire container is moved between cache and archival and written to tape Bulk operation with container - faster

Examples of using container Make a container with name “myCont” –Smkcont -S cont-sdsc myCont Put a file into “myCont” –Sput -c myCont myLocalSrcFile mySRBTargFile Bulk Load a local directory into “myCont” –Sbload -c myCont myLocalSrcDir mySRBTargColl Sync “myCont” to archival and purge the cache copy –Ssyncont -d myCont Download a file store in “myCont” –Sget mySRBsrcFile myLocalTargFile Slscont - list existing containers and contents

Summary of Data Transfer modes Serial - default mode Parallel - for large files Bulk - for large number of small files Container - Archiving small files (to tapes). Container + bulk - faster archival of small files

Types of Data Transfer Local to SRB - Sput, Srsync SRB to Local - Sget, Srsync SRB to SRB - Scp, Sreplicate, Sbkupsrb, Srsync –Third party transfer Server to Server data transfer, client not involved Parallel I/O

Other useful Data Management Scommands Srsync, Schksum - –Data synchronization using checksum values –similar to UNIX’s rsync Sreplicate, Sbkupsrb –generate multiple copies of data using replica –Replica - multiple copies of the same file same Logical Path Name - e.g., /home/srb.sdsc/foo replica on different resources Each replica has different replNum Most recently modified flag

Commands Using Checksum Registering checksum values into MCAT –at the time of upload Sput -k - compute checksum of local source file and register with MCAT Sput -K –checkum verification mode –After upload, compute checksum by reading back uploaded file –Compare with the checksum generated with locally –Existing SRB files Schksum –compute and register checksum if not already exist Srsync - if the checksum does not exist

Srsync command Synchronize the data –from a local copy to SRB Srsync myLocalFile s:mySrbFile –from a SRB copy to a local file system Srsync s:mySrbFile myLocalFile –between two SRB paths. Srsync s:mySrbFile1 s:mySrbFile2 Similar to rsync –compare the checksum values of source and target –upload/download source to target if target does not exist or checksum differ –Save checksum values to MCAT

Srsync command (cont) Some Srsync options –-r --- recursively Synchronizing a directory/collection –-s --- use size instead of checksum value for determining synchronization Faster - no checksum computation Less accurate –-m, -M --- parallel I/O

Sreplicate, Sbkupsrb commands Generate multiple copies of data using replica Sreplicate - Generate a new replica each time Sbkupsrb –Backups the srb data/collection to the specified backupResource with a replica –If an up-to-date replica already exists in the backupResource, nothing will be done

Data and Resource Virtualisation Data and Collections Organisation –File Logical Name space - UNIX like directories (collections) and files (data) Mapping of logical name to physical attributes - host address, physical path. UNIX like API and utilities for making collections (mkdir) and data creation (creat) Virtualisation of Resources –Mapping of a logical resource name to physical attributes: Resource Location, Type –Client use a single logical name to reference a resource

Listing Resources SgetR – List Configured Resources –SgetR – RESULTS –rsrc_name: unix-sdsc –netprefix: srb.sdsc.edu:NULL:NULL –rsrc_typ_name: unix file system –default_path: /misc/srb/srb/SRBVault/?USER.?DOMAIN/?SPLITPATH/TEST.?PATH?DA TANAME.?RANDOM.?TIMESEC –phy_default_path: /misc/srb/srb/SRBVault/?USER.?DOMAIN/?SPLITPATH/TEST.?PATH?DA TANAME.?RANDOM.?TIMESEC –phy_rsrc_name: unix-sdsc –rsrc_typ_name: unix file system –rsrc_class_name: permanent –user_name: srb –domain_desc: sdsc –zone_id: sdscdemo –

Serial Mode Data Transfer Simple to Implement and Use –Unix-like API – srbObjCreate, srbObjWrite Performance Issue –2 hops data transfer –Single data stream –One file at a time – overhead relatively high for small files MCAT interaction – query and registration Small buffer transfer Large files – Single Hop, multiple data streams Small files – Single Hop, multiple files at a time

Upload a File to a SRB Resource Sput –S unix-sdsc localFile srbFile –Default data transfer mode – serial Sls -l srbFile – srb 0 unix-sdsc % srbFile

Small files Data Transfer (Bulk operation) Upload/download large number of small files –One file at a time – relative high overhead MCAT interaction, Small buffer transfer 1 sec/files for WAN Bulk Operation –Bulk data transfer transfer multiple files in a single large buffer (8 Mb) –Bulk Registration Register large number of files (1,000) in a single call –Multiple threads for transfer and registration –Single Hop –3-10 times speedup –All or nothing type operation –Specify -b in Sput/Sget

Parallel Mode Data Transfer For large file transfer –multiple data streams –Single hop data transfer Two sub-modes –Server initiated –Client initiated (for clients behind firewall) Up to 5 times speed up for WAN Two simple API – srbObjPut and srbObjGet Use –m (Server initiated), -M (Client initiated) options Available to all Scommands involving data transfer –As an option – Sput, Sget, Srsync –Automatic – Sreplicate, Scp, Sbkupsrb, SsyncD, Ssyncont