Download presentation
Presentation is loading. Please wait.
Published byEvelyn Adams Modified over 9 years ago
1
Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science Washington University in St. Louis kenw@arl.wustl.edu, http://www.arl.wustl.edu/~kenw
2
Wong11/29/2015 2 OUTLINE OF TALK l SRB and HPSS Overview l SRB Concepts and Examples l Alternatives to SRB l Other SRB Projects l Our Experience
3
Wong11/29/2015 3 WU DATA CACHE AND THE SRB
4
Wong11/29/2015 4 WU DATA CACHE l 1.4 TB DEC Storage Works RAID (Level 5) –2-processor Sun Enterprise 450, 1 GB main memory –622 Mbps ATM interface, 10/100 Mbps Ethernet interface –1.7 TB (raw) = 48 x 9 + 24 x 18 + 24 x 36 GB l Backups –Incremental: Tue, Wed, Thu –Full: Mon, Fri, Sat l Data Volume –Used: 560 GB –Burn Rate: 7.0 GB/week (This Year); 5.5 GB/week (Lifetime)
5
Wong11/29/2015 5 INSTALLATION HISTORY l Jun/Jul 98 : Sun host and then 432 GB RAID –3 year extended warranty and 3 year maintenance on controllers l Sep 98 : SRB l Aug 99 : 24 x18.2 GB disks –3 year maintenance upgrade on controllers l Dec 99 : 24 x 36.4 GB disks
6
Wong11/29/2015 6 BRAINMAP DATA GROWTH
7
Wong11/29/2015 7 BRAINMAP DISK USAGE
8
Wong11/29/2015 8 STORAGE RESOURCE BROKER (SRB)
9
Wong11/29/2015 9 HIGH-PERFORMANCE STORAGE SYSTEM
10
Wong11/29/2015 10 HIGH-PERFORMANCE STORAGE SYSTEM l Current Usage –150 TB (terabytes; trillion) –15 million files l Current Capacity : 500 TBs of data (assuming a compression ratio of 1.5) l Projected Capacity : 1 PB (10^15) within a year
11
Wong11/29/2015 11 SRB CONCEPTS l SRB Server : Responds to SRB requests from clients l MCAT (Metadata Catalogue) –Information about data sets and collections (Oracle DB) l SRB Client l SRB Resource : A logical storage resource –Example: HPSS storage and container cache l Data Set : A file registered with the SRB l Collection : Group of registered data sets/collections l Container: Data sets stored as one physical unit –Container cache can be remote from HPSS
12
Wong11/29/2015 12 SRB SYSTEM CAPABILITIES l Collection-based management of data sets l Persistent identifiers for data sets l Management of data sets (copies or replicas) l Containers for aggregating data sets before archiving l Support for grid security infrastructure authentication –Uses public key certificates l Support for integrating data set collections across file systems, archives, and databases
13
Wong11/29/2015 13 SRB INTERFACES l Scommands (Unix commands) –Sinit/Sexit, Sput/Sget, Smkdir/Srmdir, Sls/Srm –Smkcont/Ssyncont, Slscont/Srmcont –SgetR/SgetU/SgetD l C-Programming API l Browser
14
Wong11/29/2015 14 PUBLISHING A DATA SET l Define the SRB environment (.srb/.MdasEnv file) mdasCollectionHome ‘/home/kenw.neurodb’ mdasDomainHome ‘neurodb’ srbUser ‘kenw’ srbHost ‘ghidorah.sdsc.edu’ defaultResource ‘cont-sdsc’ l Interact with SRB server %Sinit# Connect to SRB server %sls# See what is in my collection %Sput./mydata brain043# Copy file to SRB space %Schmod r public npaci brain043# Give read access %SgetD -a brain043# Check access permissions %Sexit# disconnect from SRB server
15
Wong11/29/2015 15 GETTING A DATA SET (SCOMMANDS) % Sinit % Scd /home/colin.neurodb# go to Colin's collection % Sls -l# see what is there % Sget colin_avg20_1.0mm_at0.5mm.mnc. # copy to this directory % Sexit
16
Wong11/29/2015 16 JINGHUA ZHOU'S WORK l Experiments –Test SRB functionality –Measures performance of basic SRB functions l Archiving (Perl Scripts) –Archive an arbitrary Unix directory to HPSS –Verify files were archived –Recover files from archival storage
17
Wong11/29/2015 17 RETRIEVAL EXPERIMENTS l Load 100 MB container with 1 MB files l Measure time required to retrieve N files l Divide time by N to get average time for each file l Repeat after container has been moved to tape l Repeat above steps for 10 MB container (instead of 100 MB)
18
Wong11/29/2015 18 AVERAGE RETRIEVAL TIME (OLD FILES)
19
Wong11/29/2015 19 AVERAGE RETRIEVAL TIME (FRESH FILES)
20
Wong11/29/2015 20 COMMENTS l SRB Overhead Per Object (File) –5-7 seconds (Early Measurements) –2-4 seconds (Recent Measurements) l Tape Overhead Per Object (File): 100 seconds l TCP Connection Needs Tuning –Assymetric routing, bottleneck,... –snoop and tcptrace analysis –Max Sget effective bandwidth is 8 Mbps –Max Sput effective bandwidth is 4 Mbps –Goal is 32 Mbps
21
Wong11/29/2015 21 ARCHIVING l Reflect Unix directory structure in SRB collection structure archiver NPACI/Unix account l Look for inactive files within a directory l Multiple versions handled by appending modification date to file name l Log all archival requests
22
Wong11/29/2015 22 CURRENT WORK l TCP Tuning and SRB 1.1.7 Performance l Enhance Archival Scripts –Improve usability –Resilience to HPSS Blackouts –Parallel Archiving
23
Wong11/29/2015 23 RECENT SRB DEVELOPMENTS l Data Cutter l GSI authentication –UsesX.509 certificates l Container redesign – To handle multiple archival and cache resources l Remote proxy (Spcommand) l Textual annotation stored in MCAT
24
Wong11/29/2015 24 ALTERNATIVES TO SRB l Distributed Database –Do not deal with file data Requires other means of accessing files –A heavyweight solution; i.e., expense (money, expertise) –Need instances running wherever you want to have storage –If it is only meta-data, then a case can be made but... Tied to a particular vendor at all sites Have to cross link all the databases l AFS (Andrew File System) –Doesn't have concept of application metadata SRB has some metadata facilities now and more to come Comments, annotations, user-controlled metadata –SRB provides a uniform authentication and authorization system
25
Wong11/29/2015 25 TOP SRB PROJECTS (SUMMARY) l 2-Micron All Sky Survey –10 TB of data from Caltech –5 million images sorted into 130,000 containers l Digital Embryo Project (NLM funded) –Digitizing existing slides for storage in HPSS l Particle Physics Data Grid (DOE funded) –Data mining l Information Power Grid (NASA funded) l Data Visualization Corridor (DOE funded) –Handles terabyte sized data sets for interactive viewing l Neuroscience Data Set Federation
26
Wong11/29/2015 26 TOP SRB PROJECTS l 2-Micron All Sky Survey (2MASS) –10 TB of data from Caltech (3 TB done) –5 million images sorted into 130,000 containers –SRB container technology used to manage the aggregation process on a disk cache –Replicate Caltech data l Digital Embryo Project (NLM funded) –Digitizing existing slides for storage in HPSS –SRB used to manage data movement, aggregation into containers, and metadata catalog –Queries against the collection l Particle Physics Data Grid (DOE funded) –Replicate data sets that are pulled into local disk caches
27
Wong11/29/2015 27 TOP SRB PROJECTS l Information Power Grid (NASA funded) –SRB used to support data mining against a distributed data set collection –Data transmission rate: 58 Mbps from SDSC to NASA Ames –Put collection management in front of storage archives through use of the MCAT l Data Visualization Corridor (DOE funded) –SRB has been integrated with the Data Cutter system For remote manipulation of data sets –Handles terabyte sized data sets for interactive viewing l Neuroscience Data Set Federation
28
Wong11/29/2015 28 CONCLUDING REMARKS l Documentation –http://www.sdsc.edu/DICE/SRB/index.html –http://www.arl.wustl.edu/kenw/npaci/index.html l Software –Follow SRB link –Get PGP key from SDSC –Can install subset (e.g., client only) l Applications?
29
Wong11/29/2015 29
30
Wong11/29/2015 30 WU DATA CACHE vBNS ghidorah(MCAT)hpssbrainmap (1.3 TB) stp, v1 (SUMS)petsun-23(Scanners) 622 Mbps 45 Mbps 155 Mbps ATM sdsc.eduwustl.edu UCSD, UCLA, John Hopkins, U. Montana, Caltech (12 Major Users)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.