The End of CMOS Scaling will be Good for Space Computing Fault Tolerant Spaceborne Computing Employing New Technologies May 29, 2008 Sandia National Laboratories.

Slides:



Advertisements
Similar presentations
Lecture 6: Multicore Systems
Advertisements

Erik P. DeBenedictis, Organizer Sandia National Laboratories Los Alamos Computer Science Institute Symposium 2004 The Path To Extreme Computing Sandia.
How to Reach Zettaflops Erik P. DeBenedictis Avogadro-Scale Computing April 17, 2008 The Bartos Theater Building E15 MIT SAND C Sandia is a multiprogram.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.
March 18, 2008SSE Meeting 1 Mary Hall Dept. of Computer Science and Information Sciences Institute Multicore Chips and Parallel Programming.
Some Thoughts on Technology and Strategies for Petaflops.
Software Group © 2006 IBM Corporation Compiler Technology Task, thread and processor — OpenMP 3.0 and beyond Guansong Zhang, IBM Toronto Lab.
OS Fall ’ 02 Introduction Operating Systems Fall 2002.
The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory.
Embedded Computing From Theory to Practice November 2008 USTC Suzhou.
Device Driver for Generic ASC Module - Project Presentation - By: Yigal Korman Erez Fuchs Instructor: Evgeny Fiksman Sponsored by: High Speed Digital Systems.
Operating Systems Should Manage Accelerators Sankaralingam Panneerselvam Michael M. Swift Computer Sciences Department University of Wisconsin, Madison,
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
GPU Programming with CUDA – Accelerated Architectures Mike Griffiths
Computer System Architectures Computer System Software
Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by.
1 VLSI and Computer Architecture Trends ECE 25 Fall 2012.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
Dax: Rethinking Visualization Frameworks for Extreme-Scale Computing DOECGF 2011 April 28, 2011 Kenneth Moreland Sandia National Laboratories SAND P.
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
Erik P. DeBenedictis Sandia National Laboratories October 24-27, 2005 Workshop on the Frontiers of Extreme Computing Sandia is a multiprogram laboratory.
Compiler BE Panel IDC HPC User Forum April 2009 Don Kretsch Director, Sun Developer Tools Sun Microsystems.
CS 395 Last Lecture Summary, Anti-summary, and Final Thoughts.
© David Kirk/NVIDIA and Wen-mei W. Hwu, 1 Programming Massively Parallel Processors Lecture Slides for Chapter 1: Introduction.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
CS/ECE 3330 Computer Architecture Kim Hazelwood Fall 2009.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
Firmware Storage : Technical Overview Copyright © Intel Corporation Intel Corporation Software and Services Group.
VTU – IISc Workshop Compiler, Architecture and HPC Research in Heterogeneous Multi-Core Era R. Govindarajan CSA & SERC, IISc
System Architecture: Near, Medium, and Long-term Scalable Architectures Panel Discussion Presentation Sandia CSRI Workshop on Next-generation Scalable.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.
A few issues on the design of future multicores André Seznec IRISA/INRIA.
Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.
Sandia is a multi-program laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Erik P. DeBenedictis Sandia National Laboratories October 27, 2005 Workshop on the Frontiers of Extreme Computing Overall Outbrief Sandia is a multiprogram.
Revision - 01 Intel Confidential Page 1 Intel HPC Update Norfolk, VA April 2008.
Erik P. DeBenedictis Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of.
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO Session 2 Computer Organization.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Design of A Custom Vector Operation API Exploiting SIMD Intrinsics within Java Presented by John-Marc Desmarais Authors: Jonathan Parri, John-Marc Desmarais,
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
Other Tools HPC Code Development Tools July 29, 2010 Sue Kelly Sandia is a multiprogram laboratory operated by Sandia Corporation, a.
CS4315A. Berrached:CMS:UHD1 Introduction to Operating Systems Chapter 1.
Lab Activities 1, 2. Some of the Lab Server Specifications CPU: 2 Quad(4) Core Intel Xeon 5400 processors CPU Speed: 2.5 GHz Cache : Each 2 cores share.
Advanced Science and Technology Letters Vol.43 (Multimedia 2013), pp Superscalar GP-GPU design of SIMT.
GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
Tuning Threaded Code with Intel® Parallel Amplifier.
Computer Organization IS F242. Course Objective It aims at understanding and appreciating the computing system’s functional components, their characteristics,
Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
Conclusions on CS3014 David Gregg Department of Computer Science
CS203 – Advanced Computer Architecture
CS427 Multicore Architecture and Parallel Computing
Enabling Effective Utilization of GPUs for Data Management Systems
Pangaea: A Tightly-Coupled Heterogeneous IA32 Chip Multiprocessor
Welcome: Intel Multicore Research Conference
Parallel Processing - introduction
Ray-Cast Rendering in VTK-m
חוברת שקפים להרצאות של ד"ר יאיר ויסמן מבוססת על אתר האינטרנט:
HPC User Forum 2012 Panel on Potential Disruptive Technologies Emerging Parallel Programming Approaches Guang R. Gao Founder ET International.
Chapter 1 Introduction.
Kenneth Moreland Edward Angel Sandia National Labs U. of New Mexico
Presentation transcript:

The End of CMOS Scaling will be Good for Space Computing Fault Tolerant Spaceborne Computing Employing New Technologies May 29, 2008 Sandia National Laboratories Erik DeBenedictis (Sandia) Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.

Overview HPC Parallel Embedded Low Power COTS/Desktop Development $ Productivity Tools Future Low Power Parallel Heterogeneous Productivity Tools Space Computing Rad-Hard ?

Clock Rate Flat Lined Clock rate flat lined a couple years ago, as vendors put excess resources into multiple cores This is a historical fact and evident to everybody, so there is little reason to comment on the cause However, it has profound architectural consequences (later slide) Year  100 MHz 1 GHz 10 GHz 2 GHz 4 GHz

ITRS Process Integration Spreadsheet Big Spreadsheet –Columns are years –Rows are 100+ transistor parameters –Manual entry of process parameters by year –Excel computes operating parameters –Extra degrees of freedom go to making Moore’s Law smooth – not the best computers

Energy (log scale) for Technology created in Government Fab Year Moore’s Law kT 100kT kT Limit Moderates Optimism for Perpetual Exponential Growth 2008  

ITRS 2008 Update – April, Konigswinter, Germany International Technology Roadmap for Semiconductors 2008 ITRS Update ORTC [ Konigswinter Germany ITRS ITWG Plenary] A.Allan, Rev 2, [notes on IRC/CTSG More Moore, More than Moore, Beyond CMOS 04/04/08] Industry’s Plans

The Architecture Game This is my diagram from a paper to illustrate CMOS architecture in light of CMOS scaling limits [Discuss] 100% CPU Efficiency (can’t do better) Comm ercial Speed Target 100% 50% 25% 12% 6% 3% Year  log(throughput) Power effici- ency Next Moves:  Switch to Vector Arch.  Switch to SIMD Arch.  Add Coprocessor  Scale Linewidth  Increase Parallelism  Increase Cache  More Superscalar  Raise Vdd and Clk Next Moves Finish

Year  Performance  B Traditional  P with big budget  P with big budget but clock rate and power handicap A Better Idea but with a small budget Special Architectures Go Mainstream Conclusions –Mainstream and embedded technology will become more similar Power Parallelism –Architectures will become more special purpose General systems may be comprised of multiple special purpose sections

EXOCHI: Architecture and Programming Environment for A Heterogeneous Multi-core Multithreaded System Perry H. Wang 1, Jamison D. Collins 1, Gautham N. Chinya 1, Hong Jiang 2, Xinmin Tian 3, Milind Girkar 3, Nick Y. Yang 2, Guei-Yuan Lueh 2, and Hong Wang 1 Microarchitecture Research Lab, Microprocessor Technology Labs, Intel Corporation 1 Graphics Architecture, Chipset Group, Intel Corporation 2 Intel Compiler Lab, Software Solutions Group, Intel Corporation 3

11 Future mainstream microprocessors will likely integrate heterogeneous cores How will we program them? Motivation OS My IA CPU My Accelerator Scheduler Process Thread My App Driver Stub Driver API Dispatch My Device Driver ia cpu ia cpu ia cpu ia cpu Map computation to driver / abstraction API Unfamiliar development / debugging flow OS / driver overheads Accelerator in distinct memory space The following 5 Viewgraphs sent by Jamison Collins with permission to post

12 CHI Programming Environment Compiler Modified front-end and OpenMP pragmas –Fork/join –Producer/consumer parallelism Generates fat binary CHI runtime Multi-shredding: User-level threading Extensible to multiple types of heterogeneous cores –E.g. Intel GMA X3000 –E.g. A data streaming systolic array accelerator for communication #pragma omp _asm { …… } Intel C++ Compiler Accelerator-specific assembler and domain-specific plug-ins.code.data.special_section Linker CHI runtime library #pragma omp parallel target(targetISA) [clause[[,]clause]…] structured-block Where clause can be any of the following: firstprivate(variable-list) private(variable-list) shared(variable-ptr-list) descriptor(descriptor-ptr-list) num_threads(integer-expression) master_nowait

13 IA Look-n-Feel: Development and Debugging

14 IA Look-n-Feel: Compilation and Execution

Interconnect options CPU Part GPU Part Verilog/ VHDL CPU: 1-core, multi-core FPGAAccelerator, GPU, SIMD, or ASIC Bus/Stream/Message Standards I/O Memory: DRAM, Nano Mass Storage Inter-subsystem gateway RAD- 750, etc. Fault-Tolerant High-Capability Computational Subsystem Spacecraft Control Subsystem Spaceborne Computing with Emerging Technologies Motivation –Greater quantities of data: perform more onboard computing, reduce communications requirements Vision –Multiple computing technologies each used to best advantage Harness advances in semiconductors and nanotech –Need hardware interoperability –Need software tools to support heterogeneous hardware Workshop –Target date May 28-30, 2008 –At Sandia, in and out –Immediate target: Inventory resources and set plans for coordination and standards –Rad hard processing Archival, Maintainable, Source Code