SAN DIEGO SUPERCOMPUTER CENTER By: Roman Olschanowsky An Introduction to the
SAN DIEGO SUPERCOMPUTER CENTER Outline SDSC and History of SRB Example Project Introduction to SRB Discussion on SRB basics SRB Clients Overview of a Data Grid Infrastructure Topology
SAN DIEGO SUPERCOMPUTER CENTER Archival Systems 6 PB 10.4 TF DataStar IBM Power4 4.4 TF TeraGrid Linux Cluster (IA64) 600 TB Storage Area Network Disk Sun F15K Disk Server Networking Visualization Storage and Compute Resources Human infrastructure: Experienced multi- disciplinary staff support a broad spectrum of national science, engineering and technology projects Blue Gene/L (Due 12/04) 2.8/5.7 TF
SAN DIEGO SUPERCOMPUTER CENTER Sites Using the SRB
SAN DIEGO SUPERCOMPUTER CENTER SDSC SRB Projects (60 million,.5 PB ) Digital Libraries UCB, Umich, UCSB, Stanford,CDL NSF NSDL - UCAR / DLESE NASA Information Power Grid Astronomy National Virtual Observatory 2MASS Project (2 Micron All Sky Survey) Particle Physics Particle Physics Data Grid (DOE) GriPhyN SLAC Synchrotron Data Repository Medicine Digital Embryo (NLM) Earth Systems Sciences ESIPS LTER Persistent Archives NARA LOC Neuro Science & Molecular Science TeleScience/NCMIR, BIRN SLAC, AfCS, …
SAN DIEGO SUPERCOMPUTER CENTER The SCEC Project Southern California Earthquake Center 400 people, the best earthquake seismologists in the country (33 states) and several from abroad (9 countries). (Sep SCEC AHM attendees) Simulating a 7.7 earthquake in the L.A. basin 10 year effort 100+ TB of input data ( soil conditions, topography, grid coordinates, etc… ) 240 procs on SDSC Datastar cluster, 5 days, 1 TB RAM, 2GB/sec IO Thanks! SDSC, scientific applications group, with porting the code; parallelizing the calculation and the IO; and generalizing the code for scaling up to a large run. Offered invaluable insights regarding IO management. SRB, took care of draining the GPFS cache regularly, moving 43 TB of data safely to archive storage. That task was completed a mere 36 hours after the end of the calculation. The SRB was critical in this achievement.
SAN DIEGO SUPERCOMPUTER CENTER SDSC & SRB Example
SAN DIEGO SUPERCOMPUTER CENTER Storage Resource Broker (SRB) A distributed file system (Data Grid) Client-Server, Server-Server architecture. Abstracts physical SRB provides the ability to transparently share data across remote sites. Heterogeneous Resources Single sign on Single logical file hierarchy
SAN DIEGO SUPERCOMPUTER CENTER What we are familiar with
SAN DIEGO SUPERCOMPUTER CENTER What we are not familiar with, yet
SAN DIEGO SUPERCOMPUTER CENTER How do the file systems differ? Logical Abstraction Folders are NOT physical Files do NOT inherit physical location Everything is potentially distributed Access Control Permissions are NOT rwxrwxrwx Permissions ARE on a object by object basis Groups and permissions ARE more similar to NTFS Domains Geographical / logical grouping of users Namespace scalability: Also doubles as groups
SAN DIEGO SUPERCOMPUTER CENTER Interfaces to the Storage Resource Broker inQ– Windows Client Scommands– UNIX, DOS Command line Client Jargon– Java API and GUI components mySRB– Web Client Matrix– WSDL, Data Grid Workflows C, C++– C and C++ API Python– Python API Perl– Perl API
SAN DIEGO SUPERCOMPUTER CENTER Common Scommands (69 total) Sinit Senv Spwd Sls Scd Sget Sput Ssh Scp Smv (logical) Sphymove (physical) Srm Smkdir Srmdir Serror Schmod Sexit
SAN DIEGO SUPERCOMPUTER CENTER mySRB
SAN DIEGO SUPERCOMPUTER CENTER BIRN Portal (perl based)
SAN DIEGO SUPERCOMPUTER CENTER NEEScentral Portal (php based)
SAN DIEGO SUPERCOMPUTER CENTER Biomedical Informatics Research Network (BIRN) Major collaboration with SDSC, several of the projects’ Co-Investigators and Co-PIs are at SDSC. BIRN’s purpose is to provide it’s consortium of neuroscience laboratories the ability to share, compute, and collaborate. The Storage Resource Broker provides the ability to transparently share data across remote sites.
SAN DIEGO SUPERCOMPUTER CENTER The BIRN SRB Data Grid
SAN DIEGO SUPERCOMPUTER CENTER Doing this “Manually”
SAN DIEGO SUPERCOMPUTER CENTER The BIRN Data Grid
SAN DIEGO SUPERCOMPUTER CENTER The grid is in the details
SAN DIEGO SUPERCOMPUTER CENTER File Replication Sls /home/Demo/SRB-Tutorial/files-2: Doc.txt Sls -l /home/Demo/SRB-Tutorial/files-2: romanoly 0 z-ucsd-ncmir-nas Doc.txt romanoly 1 z-jhu-cis-nas Doc.txt romanoly 2 z-stanford-lucas-nas Doc.txt romanoly 3 z-umn-cmrr-nas Doc.txt romanoly 4 z-uci-bic-nas Doc.txt
SAN DIEGO SUPERCOMPUTER CENTER SRB “Location” or “Slave Server” SRB “Location” “Physical Resources” z-jhu-cis-nas0 “jhu-cis-nas” DRDR z-jhu-cis-nas1 z-jhu-cis-nas2 “Logical Resource”
SAN DIEGO SUPERCOMPUTER CENTER Pooling physical resources 0.7 TB 5.2 TB 0 TB 1.6 TB 0.8 TB 3.2 TB 0.8 TB 2.4 TB 0.8 TB 2.4 TB 1.6 TB 0.8 TB 5.0 TB 0.78 TB 0.08 TB
SAN DIEGO SUPERCOMPUTER CENTER Logical / Compound Resources SRB “My-Resource” “instant replication” “fast archival” “resource pooling”
SAN DIEGO SUPERCOMPUTER CENTER Logical Resources
SAN DIEGO SUPERCOMPUTER CENTER Thanks! SRB handles large data and provides the ability to share and collaborate on distributed heterogeneous resources. Questions?