Download presentation
Presentation is loading. Please wait.
1
Some Thoughts on Technology and Strategies for Petaflops
2
Rick Stevens Argonne Chicago Possible paths to Petaflops Traditional Commodity Clusters Leverage Moore’s law on GP microprocessors Interconnect and memory bandwidth problems Type C machines DARPA HPCS paths (e.g. Cascade etc.) Embedded systems based Clusters QCDOC one example BG/L another example
3
Rick Stevens Argonne Chicago Beyond Commodity Clusters Improved design capability Small groups can design SoCs Small groups can gain access to state of the art fabrication capabilities Design cycles are getting shorter thanks to increasing availability of off-the-shelf IP Blue Logic, MIPS, etc. QCDOC example
4
Rick Stevens Argonne Chicago
5
Rick Stevens Argonne Chicago Hardware/Software Co-design Application kernels Simple “FORTRAN” like C code - well behaved basic blocks with performance requirement annotations Compiler builds performance model for each basic block Decision point based on performance estimate Compile for GPU or synthesize logic/FGPA code Generate glue code/runtime
6
Rick Stevens Argonne Chicago Special purpose SoCs Networking Processing Units Core of fast IP switches and routers Many companies producing 10Gbps components and moving towards 40 Gbps parts DSPs Cell phone base stations.. Signal processing and array on a chip processors Example is 2 GHz, 175 Million transistors 64 processor DSP array, several hundred dollars a chip in quantities of 1,000.
7
Rick Stevens Argonne Chicago Graphics Accelerators NVIDIA Geforce4 example > 100 M transistors High-speed (QDR) RAM interface > 10 GBps Moving towards General purpose processors Cg programming language (programmable shaders) Evolving to become faster than the main CPU on a commodity based node Pentium or Itanium2 process becomes a service processor?
8
Rick Stevens Argonne Chicago Extendable Cores Possible target for HPC Hardware/Software Co- design Provides a reconfigurable node platform Xilinx virtex-pro Multiple PowerPC cores (1-4) Millions of gates of FPGA Clock rates lag high-performance chips Other vendors producing similar things MIPS cores, SPARClite cores, etc.
9
Rick Stevens Argonne Chicago Billion Transistor Dies by 2005/6 Design challenges and opportunities Many 32 bit cores available < 500,000 transistors Several 64 bit cores available < 2,000,000 transistors Complete SoC libraries becoming available (e.g. Blue Logic, etc.) Unprecedented opportunity for semi-custom node architectures based on SoC technologies
10
Rick Stevens Argonne Chicago Design Tools are Improving We can start to think in terms similar to desktop publishing from 20 years ago Mass customization will become possible but: What design Macros are needed ? How to involve algorithms and applications developers in the design process ? How to connect with systems software (OS, runtime, libraries)?
11
Rick Stevens Argonne Chicago Evolution of Commodity Clusters GPU/Node ….. Commodity Network High-Performance Interconnect ….. SoCs I/O O(1000) nodes GP services O(100K) nodes Semi-custom or Reconfigurable
12
Rick Stevens Argonne Chicago Systems Software for SoCs Embedded Processor Systems Software DSP: real-time OS/Runtime ~40K on chip FLASH ROM (shadow RAM), off chip extensions for future NPUs: real-time runtime support < 100K typically, some general purpose co-processors (Linux typically used in Juniper) Graphics processors on chip runtime support upgradeable via device drivers
13
Rick Stevens Argonne Chicago A Few Recommendations Comprehensive applications studies To determine feasibility of acceleration via semi-custom SoC/CLoCs To understand what OS functions are actually required for full HPC applications Establish some design challenges Pick several core algorithms (besides lattice gauge) and do some paper designs to validate the possible advantages of SoC based approaches An augmented cluster testbed GP Linux cluster with SoC/CLoC based compute backends
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.