San Diego Supercomputer Center Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu http://www.npaci.edu/DICE
Information Based Computing Data Mining Distributed Archives Application Collection Building Information Discovery Digital Library
Characterizing Supercomputers Generators of data - numerically intensive computing Usage models for the rate at which supercomputers move data between memory, disk, and archives Usage models for capacity of the data caches (memory size, local disk, and archival storage) Analyzers of data - data intensive computing Performance models for combining data analysis with data movement (between caches, disks, archives)
Supercomputer Data Flow Model CPU Memory Local Disk Archive Disk Archive tape
HPSS Archival Storage System 108 GB SSA RAID High Performance Gateway Node High Node Disk Mover HiPPI driver Wide Node 54 GB Silver Node Storage / Purge Bitfile / Migration Nameservice/PVL Log Daemon Tape / disk mover DCE / FTP /HIS Log Client 160 GB 830 GB MaxStrat RAID 9490 Robot Four Drives 3490 Tape RS6000 Tape Mover PVR (9490) HiPPI Switch Trail- Blazer3 Switch Magstar 3590 Tape 3494 Robot Eight Tape Drives Seven
Archive Data Flow Model TeraFlops System 5-20 TB Compute Engine Local Disk 3-10 GB/sec 1 day cache 0.5-1 TB memory 30-100 MB/sec Archive Tape Archive Disk 1 week cache 20-60 MB/sec 0.5-1 PB 5-20 TB
Data Generation Metrics 7 Bytes/Flop CPU Memory 1 Byte of storage per Flop 1 Byte/60 Flops 1/7 of data persists for a day Local Disk Hold data for 1 day 1/7 of data sent to archive Archive Disk Hold data for 1 week All data sent to archive Archive tape Hold data forever