Download presentation
Presentation is loading. Please wait.
Published byMalcolm Hood Modified over 9 years ago
1
Security Requirements for Shared Collections Storage Resource Broker Reagan W. Moore moore@sdsc.edu http://www.sdsc.edu/srb
2
Abstract National and international collaborations create shared collections that span multiple administrative domains. Shared collection are built using: Data virtualization, the management of collection properties independently of the storage system. Trust virtualization, the ability to assert authenticity and authorization independently of the underlying storage resources. What security requirements are implied in trust virtualization? How are data integrity challenges met? How do you implement a “Deep Archive”?
3
State of the Art Technology Grid - workflow virtualization Manage execution of jobs (processes) independently of compute servers Data grid - data virtualization Manage properties of a shared collection independently of storage systems Semantic grid - information virtualization Reason across inferred attributes derived from multiple collections.
4
Shared Collections Purpose of SRB data grid is to enable the creation of a shared collection between two or more institutions Register digital entity into the shared collection Assign owner, access controls Assign descriptive, provenance metadata Manage audit trails, versions, replicas, backups Manage location mapping, containers, checksums Manage verification, synchronization Manage federation with other shared collections Manage interactions with storage systems Manage interactions with APIs
5
Trust Virtualization Collection owns the data that is registered into the data grid Data grid is a set of servers, installed at each storage repository Servers are installed under a SRB ID created for the shared collection All accesses to the data stored under the SRB ID are through the data grid Authentication and authorization are controlled by the data grid, independently of the remote storage system
6
SRB server SRB agent SRB server Federated Server Architecture MCAT Sget SRB agent 1 2 3 4 6 5 Logical Name Or Attribute Condition 1.Logical-to-Physical mapping 2.Identification of Replicas 3.Access & Audit Control Peer-to-peer Brokering Server(s) Spawning Data Access Parallel Data Access R1 R2 5/6
7
Authentication / Authorization Collection owned data At each remote storage system, an account ID is created under which the data grid stores files User authenticates to the data grid Data grid checks access controls Data grid server authenticates to a remote data grid server Remote data grid server authenticates to the remote storage repository SRB servers return the data to the client
8
Authentication Mechanisms Grid Security Infrastructure PKI certificates Challenge-response mechanism No passwords sent over network Ticket Valid for specified time period or number of accesses Generic Security Service API Authentication of server to remote storage
9
Trust Implementation For authentication to work across multiple administrative domains Require collection-managed names for users For authorization to work across multiple administrative domains Require collection-managed names for files Result is that access controls remain invariant. They do not change as data is moved to different storage systems under shared collection control
10
Communication Single port number for control messages Specify desired control port Parallel I/O streams are received using multiple port numbers Specify range of port numbers To deal with firewalls Parallel I/O streams initiated from within firewall
11
SRB server1 SRB agent SRB server2 Parallel mode Data Transfer – Server Initiated MCAT Sput -m SRB agent 1 2 3 4 5 6 srbObjPut + socket addr, port and cookie 1.Logical-to-Physical mapping 2. Identification of Replicas 3.Access & Audit Control Peer-to-peer Request Connect to client Data transfer R Server behind firewall
12
SRB server1 SRB agent SRB server2 Parallel mode Data Transfer – Client Initiated MCAT Sput -M SRB agent 1 4 2 3 6 7 srbObjPut 1.Logical-to-Physical mapping 2. Identification of Replicas 3.Access & Audit Control Return socket addr., port and cookie Connect to server Data transfer R 5 5 Client behind firewall
13
SRB server1 SRB agent SRB server2 Bulk Load Operation MCAT Sput -b SRB agent 1 2 3 4 6 Return Resource Location 1.Logical-to-Physical mapping 2. Identification of Replicas 3.Access & Audit Control Query Resource Bulk Register Bulk Data transfer thread R 8 Mb buffer Bulk Registration threads 5 Store Data in a temp file Unfold temp file Client behind firewall
14
SRB server SRB agent SRB server2 Third Party Data Transfer MCAT Scp SRB agent 1 2 3 4 5 srbObjCopy dataPut- socket addr., port and cookie Connect to server2 Data transfer R 6 SRB server1 SRB server SRB agent R Server1 behind firewall or Server1 and Server2 behind same firewall
15
Example International Institutions (2005)
16
KEK Data Grid Japan Taiwan South Korea Australia Poland US A functioning, general purpose international Data Grid for high-energy physics Manchester-SDSC mirror
17
BaBar High-energy Physics Stanford Linear Accelerator Lyon, France Rome, Italy San Diego RAL, UK A functioning international Data Grid for high-energy physics Manchester-SDSC mirror Moved over 100 TBs of data
18
BIRN - Biomedical Informatics Research Network Link 17 research hospitals and universities to promote sharing of data Install 1-2 Terabyte disk caches at each site Register all material into a central catalog Manage access controls on data, metadata, and resources Manage information about each image Audit sharing of data by institution Extend system as needed Add disk space dynamically Add new metadata attributes Add new sites
19
NIH BIRN SRB Data Grid Biomedical Informatics Research Network Access and analyze biomedical image data Data resources distributed throughout the country Medical schools and research centers across the US Stable high performance grid based environment Coordinate data sharing Federate collections Support data mining and analysis
20
HIPAA Patient Confidentiality Access controls on data Access controls on metadata Audit trails End-to-end encryption (manage keys) Localization of data to specific storage Access controls do not change when data is moved to another storage system under data grid control
22
Infrastructure Independence Data virtualization Management of name spaces independently of the storage repositories Global name spaces Persistent identifiers Collection-based ownership of files Support for access operations independently of the storage repositories Separation of access methods from storage protocols Support for integrity and authenticity operations
23
Separation of Access Method from Storage Protocols Storage System Storage Protocol Access Method Access Operations Data Grid Map from the operations used by the access method to a standard set of operations used to interact with the storage system Storage Operations
24
Data Grid Operations File access Open, close, read, write, seek, stat, synch, … Audit, versions, pinning, checksums, synchronize, … Parallel I/O and firewall interactions Versions, backups, replicas Latency management Bulk operations Register, load, unload, delete, … Remote procedures HDFv5, data filtering, file parsing, replicate, aggregate Metadata management SQL generation, schema extension, XML import and export, browsing, queries, GGF, “Operations for Access, Management, and Transport at Remote Sites”
25
Examples of Extensibility The 3 fundamental APIs are C library, shell commands, Java Other access mechanisms are ported on top of these interfaces API evolution Initial access through C library, Unix shell command Added iNQ Windows browser (C++ library) Added mySRB Web browser (C library and shell commands) Added Java (Jargon) Added Perl/Python load libraries (shell command) Added WSDL (Java) Added OAI-PMH, OpenDAP, DSpace digital library (Java) Added Kepler actors for dataflow access (Java) Added GridFTP version 3.3 (C library)
26
Examples of Extensibility Storage Repository Driver evolution Initially supported Unix file system Added archival access - UniTree, HPSS Added FTP/HTTP Added database blob access Added database table interface Added Windows file system Added project archives - Dcache, Castor, ADS Added Object Ring Buffer, Datascope Added GridFTP version 3.3 Database management evolution Postgres DB2 Oracle Informix Sybase mySQL (most difficult port - no locks, no views, limited SQL)
27
Unix Shell NT Browser, Kepler Actors OAI, WSDL, (WSRF) HTTP, DSpace, OpenDAP, GridFTP Archives - Tape, Sam-QFS, DMF, HPSS, ADSM, UniTree, ADS Databases - DB2, Oracle, Sybase, Postgres, mySQL, Informix File Systems Unix, NT, Mac OSX Application ORB Storage Repository Abstraction Database Abstraction Databases - DB2, Oracle, Sybase, Postgres, mySQL, Informix C Library, Java Logical Name Space Latency Management Data Transport Metadata Transport Consistency & Metadata Management / Authorization, Authentication, Audit Linux I/O C++ DLL / Python, Perl, Windows Federation Management Storage Resource Broker 3.3.1
28
Logical Name Spaces Storage Repository Storage location User name File name File context (creation date,…) Access constraints Data Access Methods (C library, Unix, Web Browser) Data access directly between application and storage repository using names required by the local repository
29
Logical Name Spaces Storage Repository Storage location User name File name File context (creation date,…) Access constraints Data Grid Logical resource name space Logical user name space Logical file name space Logical context (metadata) Control/consistency constraints Data Collection Data Access Methods (C library, Unix, Web Browser) Data is organized as a shared collection
30
Logical Resource Names Logical resource name represents multiple physical resources Writing to a logical resource name can result in: Replication - write completes when each physical resource has a copy Load leveling - write completes when the next physical resource in the list has a copy Fault tolerance - write completes when “k” of “n” resources have a copy Single copy - write is done to first disk at same IP address, then disk anywhere, then tape
31
Federation Between Data Grids Data Grid Logical resource name space Logical user name space Logical file name space Logical context (metadata) Control/consistency constraints Data Collection B Data Access Methods (Web Browser, DSpace, OAI-PMH) Data Grid Logical resource name space Logical user name space Logical file name space Logical context (metadata) Control/consistency constraints Data Collection A Access controls and consistency constraints on cross registration of digital entities
32
Authentication across Zones Follow Shibboleth model A user is assigned a “home” zone User identity is Home-zone:Domain:User-ID All authentications are done by the “home” zone
33
User Authentication Z1:D1:U1 Z2 Z1 Connect Challenge/Response Validation of Challenge Response
34
User Authentication Z1:D1:U1 Z2 Z1 Connect Challenge/Response Validation of Challenge Response Z3 Forward
35
Deep Archive Z2Z1 Z3 Z2:D2:U2 Register Z3:D3:U3 Register Pull Pull Firewall Server initiated I/O DeepArchive StagingZone Remote Zone No access by Remote zones PVN
36
Types of Risk Media failure Replicate data onto multiple media Vendor specific systemic errors Replicate data onto multiple vendor products Operational error Replicate data onto a second administrative domain Natural disaster Replicate data to a geographically remote site Malicious user Replicate data to a deep archive Bit flips because of finite bit-error rate
37
Commands Using Checksum Registering checksum values into MCAT at the time of upload Sput -k - compute checksum of local source file and register with MCAT Sput -K –checkum verification mode –After upload, compute checksum by reading back uploaded file –Compare with the checksum generated with locally Existing SRB files Schksum –compute and register checksum if not already exist Srsync - if the checksum does not exist
38
How Many Replicas Three sites minimize risk Primary site Supports interactive user access to data Secondary site Supports interactive user access when first site is down Provides 2nd media copy, located at a remote site, uses different vendor product, independent administrative procedures Deep archive Provides 3rd media copy, staging environment for data ingestion, no user access
39
NARA Persistent Archive NARAU MdSDSC MCAT Original data at NARA, data replicated to U Md Replicated copy at U Md for improved access, load balancing and disaster recovery Deep archive at SDSC, no user access, data pulled from NARA Demonstrate preservation environment Authenticity Integrity Management of technology evolution Mitigation of risk of data loss Replication of data Federation of catalogs Management of preservation metadata Scalability Types of data collections Size of data collections Federation of Three Independent Data Grids
40
Free Floating Occasional Interchange Replicated Data User and Data ReplicaResource InteractionNomadic Replicated Catalog Snow Flake Master Slave Archival Partial User-ID Sharing Partial Resource Sharing No Metadata Synch Hierarchical Zone Organization One Shared User-ID System Managed Replication Connection From Any Zone Complete Resource Sharing System Set Access Controls System Controlled Complete Synch Complete User-ID Sharing System Managed Replication System Set Access Controls System Controlled Partial Metadata Synch No Resource Sharing Super Administrator Zone Control System Controlled Complete Metadata Synch Complete User-ID Sharing Replication Zones Peer-to-Peer Zones Hierarchical Zones
41
For more Information on Storage Resource Broker Data Grid Reagan W. Moore moore@sdsc.edu http://www.sdsc.edu/srb/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.