© Copyright 2004 Instrumental, Inc I/O Types and Usage in DoD Henry Newman Instrumental, Inc/DOD HPCMP/DARPA HPCS May 24, 2004
© Copyright 2004 Instrumental, Inc Evaluation Issues What are the scaling problems?
© Copyright 2004 Instrumental, Inc Facts About Performance(1) System Feature System CPU PerformanceCDC MFLOPSEarth Simulator 40 TFLOPS Disk TechnologyCDC Cyber RPMs Seagate Cheetah 15K RPMs Disk Density80 MB146 GB Disk Transfer Rate3 MB/sec Half Duplex71.5 MB/sec Avg. per disk 200 MB/sec full duplex RAID Disk Seek+Latency24 ms6.0 ms write 5.6 ms read
© Copyright 2004 Instrumental, Inc Facts About Performance(2) ItemTimes Increase CPU1.6M RPMS4.1 Density1814 Transfer Rate disk23.8 Transfer Rate RAID133 Seek+Latency Read4.3
© Copyright 2004 Instrumental, Inc Device Utilization
© Copyright 2004 Instrumental, Inc Tape Facts Vendor DriveMediaYear Introduced Capacity MBPeak Transfer Rate MB/sec uncompressed Performance Increase IBM 3420Reel-to-Reel IBM IBM IBM 3490E IBM 3490E IBM E IBM StorageTek SD StorageTek T9840A IBM 3590E IBM 3950E3590E StorageTek T9940A LTO Sony GY-8240FC DTF GB/60GB *** StorageTek T9840B StorageTek T9940B IBM 3590H LTO LTO-IILTO
© Copyright 2004 Instrumental, Inc File System Concerns Data fragmentation and allocation Metadata fragmentation and allocation Recovery from crash or metadata loss Performance that scales Support for >2TB LUNs Failover
© Copyright 2004 Instrumental, Inc Fragmentation Fragmentation is becoming a performance problem as file systems grow No major technology enhancements have been seen in decades 4 Object Storage Device (OSD new T10 spec) will change this Fragmentation of metadata can have dramatic impact on performance 4 Recently observed 600x slowdown in access at a site
© Copyright 2004 Instrumental, Inc USG Types Requirements What is DoD currently using?
© Copyright 2004 Instrumental, Inc Current Types of Requirements Database 4 Used by most sites, big and small, for data reference especially in the intelligence community 4 Not used by MSRCs much Real-time data capture 4 Requirement in intelligence community Application 4 Homogeneous shared file system access
© Copyright 2004 Instrumental, Inc Current Types of Requirements Archival 4 Used by MSRCs 4 Intelligence community Process Flow 4 Used after real-time capture 4 Could be used by MSRCs if shared file system between HPC and HSM systems were implemented
© Copyright 2004 Instrumental, Inc Database 4, 8 or 16 KB I/Os for indexes 4 Random 64 KB I/Os for log updates 4 Sequential Read and write Up to 256 KB Just about everyone uses a database somewhere in their HPC systems 4 Although some don’t have performance requirements
© Copyright 2004 Instrumental, Inc Real-Time Data Capture Large Block Requirement 4 4 MB-128 MB I/O requests Small Block Requirement 4 1 KB-8 KB files with millions of files per day Multiple Threads threads to keep the devices busy with either type of I/O
© Copyright 2004 Instrumental, Inc Real-Time Data Capture Generally requires an HSM 4 Usually needs 100’s of MB/sec 7x24 to meet the requirements for capture Everything must run at rate 4 I/O Bus 4 RAID devices 4 Switches 4 Limitations of tape bandwidth are pushed 4 HBAs
© Copyright 2004 Instrumental, Inc Application Homogeneous Shared File System access 4 Must be able to get the data from the nodes to a single file over fibre channel High performance I/O from those nodes 4 Depends on the application but given that GPFS peek is about 400 MB/sec that seems to be the current requirement Support for a few 100,000 files 4 No where near the HSM requirements
© Copyright 2004 Instrumental, Inc Archival Large HSM Systems 4 MSRCs are a good example 4 High speed networks 4 TCP/IP (ftp) data movement Future movement to shared file systems which will make these look more like real-time capture requirements
© Copyright 2004 Instrumental, Inc Process Flow These are applications and processes that are done via an assembly line like concept 4 Each step uses a machine or machines, sometimes specialized, to move the task along 4 Data communication via a shared file system with multi- threaded large block I/O requests from each of the hosts to various data sets
© Copyright 2004 Instrumental, Inc Current MSRC Requirements Homogeneous shared file system for applications running on the HPC system HSM support and access via TCP/IP Process Flow should be supported for visualization Support for database but no performance requirement
© Copyright 2004 Instrumental, Inc Conclusion The future for HPCS machines and most application environments will be shared file systems Shared file systems were pioneered for real-time capture world Large file systems are seeing problems with fragmentation and scaling