Steve Pawlowski Intel Senior Fellow GM, Architecture and Planning CTO, Digital Enterprise Group Intel Corporation HPC: Energy Efficient Computing April 20, 2009
2 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS. Intel may make changes to specifications and product descriptions at any time, without notice. All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. This document may contain information on products in the design phase of development. The information here is subject to change without notice. Do not finalize a design with this information. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. Intel Corporation may have patents or pending patent applications, trademarks, copyrights, or other intellectual property rights that relate to the presented subject matter. The furnishing of documents and other materials and information does not provide any license, express or implied, by estoppel or otherwise, to any such patents, trademarks, copyrights, or other intellectual property rights. Wireless connectivity and some features may require you to purchase additional software, services or external hardware. Nehalem, Penryn, Westmere, Sandy Bridge and other code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for release. Customers, licensees and other third parties are not authorized by Intel to use code names in advertising, promotion or marketing of any product or services and any such use of Intel's internal code names is at the sole risk of the user Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Intel, Intel Inside, Pentium, Xeon, Core and the Intel logo are trademarks of Intel Corporation in the United States and other countries. *Other names and brands may be claimed as the property of others. Copyright © 2009 Intel Corporation.
3 Reach Exascale by 2018 From GigFlops to ExaFlops Sustained TeraFlop Sustained PetaFlop Sustained GigaFlop Sustained ExaFlop “The pursuit of each milestone has led to important breakthroughs in science and engineering.” Source: IDC “In Pursuit of Petascale Computing: Initiatives Around the World,” 2007 ~1987 ~ ~2018 Note: Numbers are based on Linpack Benchmark. Dates are approximate.
4 What are the Challenges? Power is Gating Every Part of Computing The Challenge of Exascale Source: Intel, for illustration and assumptions, not product representativeEFLOP Power? Power Consumption Power (KW) 1000,000 Voltage is not scaling as in the past MFLOP GFLOP TFLOP PFLOP ,000 ? 200MW 150MW 100MW 10MW MW? Compute Memory Comm Disk An ExaFLOPS Machine without Power Management Other misc. power consumptions: Power supply losses Cooling … etc 10EB 100pJ per FLOP 1.5nJ per Byte ~400W / Socket
5 HPC Platform Power Data from P3 Jet Power Calculator, V2.0 DP 80W Nehalem Memory – 48GB (12 x 4GB DIMMs) Single Power Supply 230Vac Need a platform view of power consumption: CPU, Memory and VR, etc. CPU Planar & VR’s Memory
6 Device Efficiency is Slowing Unmanaged growth in power will reach Giga Watt level at Exascale Relative Performance and Power (GFlops as the base) Power at a glance: (assume 31% CPU Power in a system) Today’s Peta: nj/op Today’s COTS: 2nj/op (assume 100W/50GFlops) Unmanaged Exa: if 1GW, 0.31nj/op; Exa Source: Intel Labs
7 To Reach ExaFlops Flops Pentium® II Architecture Pentium® 4 Architecture Pentium® Architecture Intel® Core™ uArch 1.E+06 1.E+07 1.E+08 1.E+09 1.E+10 1.E+11 1.E+12 1.E+13 1.E+14 1.E Pentium® III Architecture Tera Peta Giga Source: Intel Future Projection What it takes to get to Exa… 40 + TFlops per socket Power goal = 200W / Socket, to reach Linpack ExaFlops: 5 pJ / op / socket * 40 TFlops - 25K sockets peak or 33K sustained, or 10 pJ / op / socket * 20 TFlops - 50K sockets peak (conservative) Intel estimates of future trends. Intel estimates are based in part on historical capability of Intel products and projections for capability improvement. Actual capability of Intel products will vary based on actual product configurations.
8 Parallelism for Energy Efficient Performance Relative Performance Super Scalar Era of Pipelined Architecture Multi Threaded Multi-Core Era of Thread & Processor Level Parallelism Speculative, OOO Era of Instruction LevelParallelism Many Core Future Projection Intel estimates of future trends. Intel estimates are based in part on historical capability of Intel products and projections for capability improvement. Actual capability of Intel products will vary based on actual product configurations. Source: Intel Labs
9 Parallelism’s Challenges Current models based on communication between sequential processes (e.g. MPI, SHMEM, etc.). Depend on check-pointing for resilience. TerascalePetascaleExascale Time Mean Time between Component Failure Mean Time For a Global Checkpoint Comm-based systems break-down beyond the crossover point We need new, fault resilient programming models, so computations make progress even as components fail. Source: Intel
10 Software Scaling Performance Forward Existing Software Message Passing Programming Model Resiliency Issues Exa Concurrency Require A New Hierarchical Structure A concurrency primitives framework Specifying, assigning, executing, migrating, and debugging a hierarchy of units of computation Providing a unified foundation A high-level declarative coordination language Orchestrate billions of tasks written in existing serial languages Manage Resiliency: fully utilize hardware capabilities Today’s parallel framework Concurrency Primitives Framework High-level Declarative Coordination Language Virtual, abstract machine Application
11 Reduce Memory and Communication Power Core-to-core ~10pJ per Byte Chip to memory ~150pj per Byte Chip to chip ~16pJ per Byte Data movement is expensive
12 Technologies to Increase Bandwidth HE-WS/HPC Traditional CPU BW demand BW Trend Source: Intel Forecast DDR3 Assuming Assuming DDR4 increasing channels eDRAM: replace on-pkg mem controller with very fast flex links to an on-board mem controller Memory Package CPU Memory Controller + Buffer Assuming eDRAM at 2X/3 yrs CAGR Intel estimates of future trends in bandwidth capability. Intel estimates are based in part on historical bandwidth capability of Intel products and projections for bandwidth capability improvement. Actual bandwidth capability of Intel products will vary based on actual product configurations. BW Projections GB/S (Per Skt)
13 Power Efficient High I/O Interconnect Bandwidth X 8X 40X 75X (Exa) HPC Interconnect requirement progressionCOTS interconnect 50 GB/s MPI: 30Mmsgs/s, SHMEM: 300Mmsgs/s 200 GB/s MPI: 75Mmsgs/s, SHMEM: 1Gmsgs/s 1TB/s MPI: 325Mmsgs/s, SHMEM: 5Gmsgs/s 4TB/s MPI: 1.25Gmsgs/s, SHMEM: 20Gmsgs/s Source: Intel MPI: Message Passing Interconnect; SHMEM: Shared Memory <20 mW/Gb/s10 mW/Gb/s3 mW/Gb/s1 mW/Gb/s Power Target Copper and/or Silicon Photonics Intel estimates of future trends in bandwidth capability. Intel estimates are based in part on historical bandwidth capability of Intel products and projections for bandwidth capability improvement. Actual bandwidth capability of Intel products will vary based on actual product configurations.
14 Signaling Data Rate and Energy Efficiency Data Rate (Gb/s) Signaling Energy Efficiency (pj/bit) Proposed Copper (Near term target) 1.0 GDDR5 ~25 DDR3 Intel ISSCC Intel VLSI ~15 Silicon Photonics (Longer term) Source: Intel Labs
15 Solid State Drive Future Performance and Energy Efficiency Assume: Capacity of the SSD grows at a CAGR of about 1.5; historical HDD at SSD GigaBytes Future projection Vision 10 ExaBytes at 2018: 2 Million SSD’s vs. ½ Million HDD each, total 2MW If HDD (300 IOPS) and SSD (10k IOPS) constant: SSD has 140X IOPS Innovations to improve IO: 2X less power with 140x performance gain Source: Intel, calculations based on today’s vision
16 Increase Data Center Compute Density Silicon Process Target 50% yearly improvements in performance/watt Year Compute Density Data Center Innovation Power Management Small Form Factor New Technology ++++ Source: Intel, based on Intel YoY improvement with SpecPower Benchmark
17 Revised Exascale System Power ExaFLOPS Machine without Power Mgmt 10EB 100pJ com per FLOP 1.5nJ per Byte ~400W / Socket 200MW 150MW 100MW 10MW Other misc. power consumption: … Power supply losses Cooling … etc MW? Memory Comm Disk Compute Source: Intel, for illustration and assumptions, not product representative 10EB 5TB/SSD 9pJ per FLOP 150pJ per Byte 50K each 10MW 9MW ~9MW ~2MW ~40MW Compute Memory Comm SSD ExaFLOPS Machine Future Vision Other misc. power consumption: … Power supply losses Cooling … etc 10MW
18