VTU – IISc Workshop Compiler, Architecture and HPC Research in Heterogeneous Multi-Core Era R. Govindarajan CSA & SERC, IISc

Slides:



Advertisements
Similar presentations
Garuda-PM July 2011 Challenges and Opportunities in Heterogeneous Multi-core Era R. Govindarajan HPC Lab,SERC, IISc
Advertisements

Larrabee Eric Jogerst Cortlandt Schoonover Francis Tan.
Workshop on HPC in India Programming Models, Languages, and Compilation for Accelerator-Based Architectures R. Govindarajan SERC, IISc
Accelerators for HPC: Programming Models Accelerators for HPC: StreamIt on GPU High Performance Applications on Heterogeneous Windows Clusters
SE-292 High Performance Computing
Ahmad Lashgar, Amirali Baniasadi, Ahmad Khonsari ECE, University of Tehran, ECE, University of Victoria.
Vectors, SIMD Extensions and GPUs COMP 4611 Tutorial 11 Nov. 26,
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Lecture 6: Multicore Systems
Instructor Notes We describe motivation for talking about underlying device architecture because device architecture is often avoided in conventional.
Department of Computer Science iGPU: Exception Support and Speculative Execution on GPUs Jaikrishnan Menon, Marc de Kruijf Karthikeyan Sankaralingam Vertical.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
Single-Chip Multiprocessor Nirmal Andrews. Case for single chip multiprocessors Advances in the field of integrated chip processing. - Gate density (More.
A many-core GPU architecture.. Price, performance, and evolution.
Introduction to CUDA and GPUGPU Computing
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
Instruction Level Parallelism (ILP) Colin Stevens.
CSCE101 – 4.2, 4.3 October 17, Power Supply Surge Protector –protects from power spikes which ruin hardware. Voltage Regulator – protects from insufficient.
2015/6/21\course\cpeg F\Topic-1.ppt1 CPEG 421/621 - Fall 2010 Topics I Fundamentals.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
Synergistic Processing In Cell’s Multicore Architecture Michael Gschwind, et al. Presented by: Jia Zou CS258 3/5/08.
Heterogeneous Computing Dr. Jason D. Bakos. Heterogeneous Computing 2 “Traditional” Parallel/Multi-Processing Large-scale parallel platforms: –Individual.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
Jan Programming Models for Accelerator-Based Architectures R. Govindarajan HPC Lab,SERC, IISc
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
GPU Programming with CUDA – Accelerated Architectures Mike Griffiths
GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.
Parallel and Distributed Systems Instructor: Xin Yuan Department of Computer Science Florida State University.
1 Multi-core processors 12/1/09. 2 Multiprocessors inside a single chip It is now possible to implement multiple processors (cores) inside a single chip.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY Ocelot and the SST-MacSim Simulator Genie.
GPU Architecture Overview
Carlo del Mundo Department of Electrical and Computer Engineering Ubiquitous Parallelism Are You Equipped To Code For Multi- and Many- Core Platforms?
A few issues on the design of future multicores André Seznec IRISA/INRIA.
Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.
Thread Level Parallelism Since ILP has inherent limitations, can we exploit multithreading? –a thread is defined as a separate process with its own instructions.
Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
EKT303/4 Superscalar vs Super-pipelined.
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 1: Overview of High Performance Processors * Jeremy R. Johnson Wed. Sept. 27,
CPS 258 Announcements –Lecture calendar with slides –Pointers to related material.
Advanced Computer Architecture 5MD00 / 5Z033 SMT Simultaneously Multi-Threading Henk Corporaal TUEindhoven.
Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.
Kevin Skadron University of Virginia Dept. of Computer Science LAVA Lab Trends in Multicore Architecture.
A Survey of the Current State of the Art in SIMD: Or, How much wood could a woodchuck chuck if a woodchuck could chuck n pieces of wood in parallel? Wojtek.
CS203 – Advanced Computer Architecture Performance Evaluation.
“Processors” issues for LQCD January 2009 André Seznec IRISA/INRIA.
Compiler Research How I spent my last 22 summer vacations Philip Sweany.
Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Lynn Choi School of Electrical Engineering
Extreme Big Data Examples
Advanced Computer Architecture 5MD00 / 5Z033 SMT Simultaneously Multi-Threading Henk Corporaal TUEindhoven.
Advanced Computer Architecture 5MD00 / 5Z033 SMT Simultaneously Multi-Threading Henk Corporaal TUEindhoven.
Coe818 Advanced Computer Architecture
Mattan Erez The University of Texas at Austin
Chapter 1 Introduction.
EE 4xx: Computer Architecture and Performance Programming
Mattan Erez The University of Texas at Austin
Multicore and GPU Programming
CSE 502: Computer Architecture
Multicore and GPU Programming
Presentation transcript:

VTU – IISc Workshop Compiler, Architecture and HPC Research in Heterogeneous Multi-Core Era R. Govindarajan CSA & SERC, IISc

VTU-IISc Workshop © 2 Moore’s Law : Transistors

VTU-IISc Workshop © 3 Moore’s Law : Performance Processor performance doubles every 1.5 years Processor performance doubles every 1.5 years

VTU-IISc Workshop © 4 Moore’s Law: Processor Architecture Roadmap (Pre-2000) First  P Super- scalar EPIC RISC VLIW

VTU-IISc Workshop © 5 Progress in Processor Architecture More transistors  New architecture innovations –Pipelined Architecture –Multiple Instruction Issue processors VLIW Superscalar EPIC –More on-chip caches, multiple levels of cache hierarchy, speculative execution, … Era of Instruction Level Parallelism

VTU-IISc Workshop © 6 Influence on Compiler Optimization Pipelined Architecture VLIW Architecture Superscalar Processor EPIC ILP Compilation Techniques (Instrn. Scheduling, Register Allocation, Software Pipelining, …)

VTU-IISc Workshop © 7 IFID Issue Reg. Read Superscalar Architecture IFID Issue Reg. Read Write Back Ld/Store Unit Write Back Int. ALU AlignAdd Align Write Back Multiple instructions are fetched, decoded, issued and executed in each cycle. Speculation, Cache/Memory hierarchy, Prefetching, Performance, Power Efficiency, …

VTU-IISc Workshop © 8 Progress in Processor Architecture (Post-2000) More transistors  New architecture innovations –Multiple Instruction Issue processors –More on-chip caches –Multi cores Era of Multi-Cores

VTU-IISc Workshop © 9 Multicores : The Right Turn 6 GHz 1 Core 3 GHz 1 Core 1 GHz 1 Core Performance 3 GHz 16 Core 3 GHz 4 Core 3 GHz 2 Core

VTU-IISc Workshop © 10 Moore’s Law: Processor Architecture Roadmap (Post-2000) First  P RISC VLIW Super- scalar EPIC Multi- cores

VTU-IISc Workshop © 11 Era of Multicores (Post 2000) Multiple cores in a single die Early efforts utilized multiple cores for multiple programs Throughput oriented rather than speedup- oriented!

VTU-IISc Workshop © 12 Influence on Compilation Techniques Multi-Core Processors Extracting Parallelism Thread-Level Parallelism Speculative Multithreading

VTU-IISc Workshop © 13 MultiCore-Based Node L2-Cache C0C2 L1$ L2-Cache C4C6 L1$ L2-Cache C1C3 L1$ L2-Cache C5C7 L1$ Memory

VTU-IISc Workshop © 14 HPC Cluster using Multi-Core Nodes Memory NIC Memory NIC N/W Switch Node 0Node 1 Node 3 Node 2

VTU-IISc Workshop © 15 Progress in Processor Architecture More transistors  New architecture innovations –Multiple Instruction Issue processors –More on-chip caches –Multi cores –Heterogeneous cores and accelerators Graphics Processing Units (GPUs) Cell BE, Clearspeed Larrabee Reconfigurable accelerators … Era of Heterogeneous Accelerators

VTU-IISc Workshop © 16 Moore’s Law: Processor Architecture Roadmap (Post-2000) First  P RISC VLIW Super- scalar EPIC Multi- cores Accele- rators

VTU-IISc Workshop © 17 Accelerators

VTU-IISc Workshop © 18 Why Bother about Accelerators? Some Top500 Systems (Nov List) RankSystemDescription# Procs.R_max (TFLOPS) 2RoadrunnerOpteron + CellBE ,105 29LANLOpteron + CellBE TSUBAME GridOpteron +Xeon + Clearspeed + GPU IBM Poughkeepsie Opteron + CellBE

VTU-IISc Workshop © 19 HPC Design Using Accelerators High level of performance from Accelerators Variety of general-purpose hardware accelerators –GPUs : nVidia, ATI, –Accelerators: Clearspeed, Cell BE, … –Plethora of Instruction Sets even for SIMD Programmable accelerators, e.g., FPGA-based HPC Design using Accelerators –Exploit instruction-level parallelism –Exploit data-level parallelism on SIMD units –Exploit thread-level parallelism on multiple units/multi-cores Challenges –Portability across different generation and platforms –Ability to exploit different types of parallelism

VTU-IISc Workshop © 20 Summary Multi-cores and Heterogeneous accelerators present tremendous research opportunity in –Architecture –High Performance Computing –Programming Languages & Models –Compilers Proebsting’s Law Compiler Technology Doubles CPU Power Every 18 YEARS!! Time to Rewrite Probesting’s Law?

VTU – IISc Workshop Thank You !!