SGI Contributions to Supercomputing by 2010 Steve Reinhardt Director of Engineering
Data Access Visualization HPC Scalable servers and superclusters SGI ® Origin ® family SGI ® Altix™ 3000 family SGI ® NUMAflex ™ Supercomputing Aspects of SGI Deliver data wherever the users are CXFS/WAN demo at SC’02 Each server reads directly, at channel speeds Biggest installed configuration.5PB “VAN” Deliver images wherever the users are Enable collaboration NOTE: No “enterprise” references
Memory is unifying theme globally addressable up to O(PB) incorporating varied processing types latency (-> 500ns for 10KP) bandwidth (local stride-1 B:F -> 2.0+ local gather/scatter B:F remote bisection BW B:F ->.3) Sustained performance differentiated scaling (latency & bandwidth) better memory interface new synchronization substrate Raise the level of programming abstraction UPC/CAF (near-term) parallel Matlab (radical) SGI in HPC
SGI Origin® family MIPS processors, Irix OS exploit low power consumption, ISA control SGI Altix™ family IPF processors, Linux OS exploit SGI interconnect, with industry-standard ISA and rapid OS maturation
Balancing High Innovation and Profitability low Differentiation high low Profitability high “Death Valley”:enough differentiation to have higher cost but not enough to have high value
System / Component Differentiation System Cost System Value OS Interconnect Memory Processor
Ideal Differentiation System Cost System Value OS Interconnect Memory Processor
SGI Origin series System Cost System Value OS Interconnect Memory Processor
Quadrics cluster System Cost System Value OS Interconnect Memory Processor
IBM SP3 system System Cost System Value OS Interconnect Memory Processor
SGI Altix system System Cost System Value OS Interconnect Memory Processor
World-record result for a µP-based system; fourth overall.8 B:F (6.4GB/s shared by 2x4GF processors) Single kernel; NUMA placement support in Linux STREAM Triad Results
Interconnect Scaling MPI bandwidth versus distance (MB/s) Coming soon
Altix 3000 Throughput Performance Throughput of 4 jobs, each 8P, crash application System: Altix 3000, 32P, 64GB, XVM, TP900 Individual jobs in the throughput mix are between 0.4% and 1.8 % slower than the standalone case
Summary: SGI for HPC Long-term directions –Memory: globally addressable, high BW, low latency –Strong delivered performance differentiated scaling (latency & bandwidth) better memory interface new synchronization substrate –Raise the level of programming abstraction UPC/CAF (near-term); parallel Matlab (radical) Near-term deliverables –Altix 3000 system distinguished performance rapidly maturing Open Source software base