Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science.

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
Remote Visualisation System (RVS) By: Anil Chandra.
Jens G Jensen Atlas Petabyte store Supporting Multiple Interfaces to Mass Storage Providing Tape and Mass Storage to Diverse Scientific Communities.
The Storage Resource Broker and.
The Storage Resource Broker and.
Peter Berrisford RAL – Data Management Group SRB Services.
Distributed Data Processing
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Grid Based Solutions for Distributed Data Management Reagan.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
On Developing Data Grid Workflows using Storage Resource Broker (SRB) and Kepler Tim H. Wong - UC Davis Efrat Frank - SDSC Bertram Ludäscher - UC Davis.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
On Developing Data Grid Workflows using Storage Resource Broker (SRB) and Kepler Tim H. Wong - UC Davis Efrat Frank - SDSC Dr. Bertram Ludäscher - UC Davis.
Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Data Grid Interactions with Firewalls Michael Wan Reagan Moore SDSC/UCSD/NPACI.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
By: Roman Olschanowsky An Introduction to the.
IRODS performance test and SRB system at KEK Yoshimi KEK Building data grids with iRODS 27 May 2008.
National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram.
Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
EGEE-II INFSO-RI Enabling Grids for E-sciencE Data Grid Services/SRB/SRM & Practical Hai-Ning Wu Academia Sinica Grid Computing.
File and Object Replication in Data Grids Chin-Yi Tsai.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008 John Bresnahan Michael Link Raj Kettimuthu Argonne National Laboratory and.
GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh
10/23/2015ISYS366 - Installation1 ISYS366 Installation.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
MAGDA Roger Jones UCL 16 th December RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Archive for the NSDL Reagan W. Moore Charlie Cowart.
HPSS for Archival Storage Tom Sherwin Storage Group Leader, SDSC
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
Data Management and Transfer in High-Performance Computational Grid Environments B. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman,
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
SAN DIEGO SUPERCOMPUTER CENTER By: Roman Olschanowsky An Introduction to the.
January 26, 2003Eric Hjort HRMs in STAR Eric Hjort, LBNL (STAR/PPDG Collaborations)
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
Introduction to The Storage Resource.
Biomedical Informatics Research Network The Storage Resource Broker & Integration with NMI Middleware Arcot Rajasekar, BIRN-CC SDSC October 9th 2002 BIRN.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Magda Distributed Data Manager Prototype Torre Wenaus BNL September 2001.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
The Storage Resource Broker and.
The SMB Archive System: Data Backup Across the Web Kenneth R. Sharp Stanford Synchrotron Radiation Laboratory.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Active Directory Domain Services (AD DS). Identity and Access (IDA) – An IDA infrastructure should: Store information about users, groups, computers and.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
The Data Grid: Towards an architecture for Distributed Management
Collection Based Persistent Archives
LQCD Computing Operations
Ákos Frohner EGEE'08 September 2008
Research Data Archive - technology
VORB Virtual Object Ring Buffers
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets A.Chervenak, I.Foster, C.Kesselman, C.Salisbury,
Presentation transcript:

Wong11/29/ SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science Washington University in St. Louis

Wong11/29/ OUTLINE OF TALK l SRB and HPSS Overview l SRB Concepts and Examples l Alternatives to SRB l Other SRB Projects l Our Experience

Wong11/29/ WU DATA CACHE AND THE SRB

Wong11/29/ WU DATA CACHE l 1.4 TB DEC Storage Works RAID (Level 5) –2-processor Sun Enterprise 450, 1 GB main memory –622 Mbps ATM interface, 10/100 Mbps Ethernet interface –1.7 TB (raw) = 48 x x x 36 GB l Backups –Incremental: Tue, Wed, Thu –Full: Mon, Fri, Sat l Data Volume –Used: 560 GB –Burn Rate: 7.0 GB/week (This Year); 5.5 GB/week (Lifetime)

Wong11/29/ INSTALLATION HISTORY l Jun/Jul 98 : Sun host and then 432 GB RAID –3 year extended warranty and 3 year maintenance on controllers l Sep 98 : SRB l Aug 99 : 24 x18.2 GB disks –3 year maintenance upgrade on controllers l Dec 99 : 24 x 36.4 GB disks

Wong11/29/ BRAINMAP DATA GROWTH

Wong11/29/ BRAINMAP DISK USAGE

Wong11/29/ STORAGE RESOURCE BROKER (SRB)

Wong11/29/ HIGH-PERFORMANCE STORAGE SYSTEM

Wong11/29/ HIGH-PERFORMANCE STORAGE SYSTEM l Current Usage –150 TB (terabytes; trillion) –15 million files l Current Capacity : 500 TBs of data (assuming a compression ratio of 1.5) l Projected Capacity : 1 PB (10^15) within a year

Wong11/29/ SRB CONCEPTS l SRB Server : Responds to SRB requests from clients l MCAT (Metadata Catalogue) –Information about data sets and collections (Oracle DB) l SRB Client l SRB Resource : A logical storage resource –Example: HPSS storage and container cache l Data Set : A file registered with the SRB l Collection : Group of registered data sets/collections l Container: Data sets stored as one physical unit –Container cache can be remote from HPSS

Wong11/29/ SRB SYSTEM CAPABILITIES l Collection-based management of data sets l Persistent identifiers for data sets l Management of data sets (copies or replicas) l Containers for aggregating data sets before archiving l Support for grid security infrastructure authentication –Uses public key certificates l Support for integrating data set collections across file systems, archives, and databases

Wong11/29/ SRB INTERFACES l Scommands (Unix commands) –Sinit/Sexit, Sput/Sget, Smkdir/Srmdir, Sls/Srm –Smkcont/Ssyncont, Slscont/Srmcont –SgetR/SgetU/SgetD l C-Programming API l Browser

Wong11/29/ PUBLISHING A DATA SET l Define the SRB environment (.srb/.MdasEnv file) mdasCollectionHome ‘/home/kenw.neurodb’ mdasDomainHome ‘neurodb’ srbUser ‘kenw’ srbHost ‘ghidorah.sdsc.edu’ defaultResource ‘cont-sdsc’ l Interact with SRB server %Sinit# Connect to SRB server %sls# See what is in my collection %Sput./mydata brain043# Copy file to SRB space %Schmod r public npaci brain043# Give read access %SgetD -a brain043# Check access permissions %Sexit# disconnect from SRB server

Wong11/29/ GETTING A DATA SET (SCOMMANDS) % Sinit % Scd /home/colin.neurodb# go to Colin's collection % Sls -l# see what is there % Sget colin_avg20_1.0mm_at0.5mm.mnc. # copy to this directory % Sexit

Wong11/29/ JINGHUA ZHOU'S WORK l Experiments –Test SRB functionality –Measures performance of basic SRB functions l Archiving (Perl Scripts) –Archive an arbitrary Unix directory to HPSS –Verify files were archived –Recover files from archival storage

Wong11/29/ RETRIEVAL EXPERIMENTS l Load 100 MB container with 1 MB files l Measure time required to retrieve N files l Divide time by N to get average time for each file l Repeat after container has been moved to tape l Repeat above steps for 10 MB container (instead of 100 MB)

Wong11/29/ AVERAGE RETRIEVAL TIME (OLD FILES)

Wong11/29/ AVERAGE RETRIEVAL TIME (FRESH FILES)

Wong11/29/ COMMENTS l SRB Overhead Per Object (File) –5-7 seconds (Early Measurements) –2-4 seconds (Recent Measurements) l Tape Overhead Per Object (File): 100 seconds l TCP Connection Needs Tuning –Assymetric routing, bottleneck,... –snoop and tcptrace analysis –Max Sget effective bandwidth is 8 Mbps –Max Sput effective bandwidth is 4 Mbps –Goal is 32 Mbps

Wong11/29/ ARCHIVING l Reflect Unix directory structure in SRB collection structure archiver NPACI/Unix account l Look for inactive files within a directory l Multiple versions handled by appending modification date to file name l Log all archival requests

Wong11/29/ CURRENT WORK l TCP Tuning and SRB Performance l Enhance Archival Scripts –Improve usability –Resilience to HPSS Blackouts –Parallel Archiving

Wong11/29/ RECENT SRB DEVELOPMENTS l Data Cutter l GSI authentication –UsesX.509 certificates l Container redesign – To handle multiple archival and cache resources l Remote proxy (Spcommand) l Textual annotation stored in MCAT

Wong11/29/ ALTERNATIVES TO SRB l Distributed Database –Do not deal with file data  Requires other means of accessing files –A heavyweight solution; i.e., expense (money, expertise) –Need instances running wherever you want to have storage –If it is only meta-data, then a case can be made but... Tied to a particular vendor at all sites Have to cross link all the databases l AFS (Andrew File System) –Doesn't have concept of application metadata SRB has some metadata facilities now and more to come Comments, annotations, user-controlled metadata –SRB provides a uniform authentication and authorization system

Wong11/29/ TOP SRB PROJECTS (SUMMARY) l 2-Micron All Sky Survey –10 TB of data from Caltech –5 million images sorted into 130,000 containers l Digital Embryo Project (NLM funded) –Digitizing existing slides for storage in HPSS l Particle Physics Data Grid (DOE funded) –Data mining l Information Power Grid (NASA funded) l Data Visualization Corridor (DOE funded) –Handles terabyte sized data sets for interactive viewing l Neuroscience Data Set Federation

Wong11/29/ TOP SRB PROJECTS l 2-Micron All Sky Survey (2MASS) –10 TB of data from Caltech (3 TB done) –5 million images sorted into 130,000 containers –SRB container technology used to manage the aggregation process on a disk cache –Replicate Caltech data l Digital Embryo Project (NLM funded) –Digitizing existing slides for storage in HPSS –SRB used to manage data movement, aggregation into containers, and metadata catalog –Queries against the collection l Particle Physics Data Grid (DOE funded) –Replicate data sets that are pulled into local disk caches

Wong11/29/ TOP SRB PROJECTS l Information Power Grid (NASA funded) –SRB used to support data mining against a distributed data set collection –Data transmission rate: 58 Mbps from SDSC to NASA Ames –Put collection management in front of storage archives through use of the MCAT l Data Visualization Corridor (DOE funded) –SRB has been integrated with the Data Cutter system For remote manipulation of data sets –Handles terabyte sized data sets for interactive viewing l Neuroscience Data Set Federation

Wong11/29/ CONCLUDING REMARKS l Documentation – – l Software –Follow SRB link –Get PGP key from SDSC –Can install subset (e.g., client only) l Applications?

Wong11/29/

Wong11/29/ WU DATA CACHE vBNS ghidorah(MCAT)hpssbrainmap (1.3 TB) stp, v1 (SUMS)petsun-23(Scanners) 622 Mbps 45 Mbps 155 Mbps ATM sdsc.eduwustl.edu UCSD, UCLA, John Hopkins, U. Montana, Caltech (12 Major Users)