VTU – IISc Workshop Compiler, Architecture and HPC Research in Heterogeneous Multi-Core Era R. Govindarajan CSA & SERC, IISc
VTU-IISc Workshop © 2 Moore’s Law : Transistors
VTU-IISc Workshop © 3 Moore’s Law : Performance Processor performance doubles every 1.5 years Processor performance doubles every 1.5 years
VTU-IISc Workshop © 4 Moore’s Law: Processor Architecture Roadmap (Pre-2000) First P Super- scalar EPIC RISC VLIW
VTU-IISc Workshop © 5 Progress in Processor Architecture More transistors New architecture innovations –Pipelined Architecture –Multiple Instruction Issue processors VLIW Superscalar EPIC –More on-chip caches, multiple levels of cache hierarchy, speculative execution, … Era of Instruction Level Parallelism
VTU-IISc Workshop © 6 Influence on Compiler Optimization Pipelined Architecture VLIW Architecture Superscalar Processor EPIC ILP Compilation Techniques (Instrn. Scheduling, Register Allocation, Software Pipelining, …)
VTU-IISc Workshop © 7 IFID Issue Reg. Read Superscalar Architecture IFID Issue Reg. Read Write Back Ld/Store Unit Write Back Int. ALU AlignAdd Align Write Back Multiple instructions are fetched, decoded, issued and executed in each cycle. Speculation, Cache/Memory hierarchy, Prefetching, Performance, Power Efficiency, …
VTU-IISc Workshop © 8 Progress in Processor Architecture (Post-2000) More transistors New architecture innovations –Multiple Instruction Issue processors –More on-chip caches –Multi cores Era of Multi-Cores
VTU-IISc Workshop © 9 Multicores : The Right Turn 6 GHz 1 Core 3 GHz 1 Core 1 GHz 1 Core Performance 3 GHz 16 Core 3 GHz 4 Core 3 GHz 2 Core
VTU-IISc Workshop © 10 Moore’s Law: Processor Architecture Roadmap (Post-2000) First P RISC VLIW Super- scalar EPIC Multi- cores
VTU-IISc Workshop © 11 Era of Multicores (Post 2000) Multiple cores in a single die Early efforts utilized multiple cores for multiple programs Throughput oriented rather than speedup- oriented!
VTU-IISc Workshop © 12 Influence on Compilation Techniques Multi-Core Processors Extracting Parallelism Thread-Level Parallelism Speculative Multithreading
VTU-IISc Workshop © 13 MultiCore-Based Node L2-Cache C0C2 L1$ L2-Cache C4C6 L1$ L2-Cache C1C3 L1$ L2-Cache C5C7 L1$ Memory
VTU-IISc Workshop © 14 HPC Cluster using Multi-Core Nodes Memory NIC Memory NIC N/W Switch Node 0Node 1 Node 3 Node 2
VTU-IISc Workshop © 15 Progress in Processor Architecture More transistors New architecture innovations –Multiple Instruction Issue processors –More on-chip caches –Multi cores –Heterogeneous cores and accelerators Graphics Processing Units (GPUs) Cell BE, Clearspeed Larrabee Reconfigurable accelerators … Era of Heterogeneous Accelerators
VTU-IISc Workshop © 16 Moore’s Law: Processor Architecture Roadmap (Post-2000) First P RISC VLIW Super- scalar EPIC Multi- cores Accele- rators
VTU-IISc Workshop © 17 Accelerators
VTU-IISc Workshop © 18 Why Bother about Accelerators? Some Top500 Systems (Nov List) RankSystemDescription# Procs.R_max (TFLOPS) 2RoadrunnerOpteron + CellBE ,105 29LANLOpteron + CellBE TSUBAME GridOpteron +Xeon + Clearspeed + GPU IBM Poughkeepsie Opteron + CellBE
VTU-IISc Workshop © 19 HPC Design Using Accelerators High level of performance from Accelerators Variety of general-purpose hardware accelerators –GPUs : nVidia, ATI, –Accelerators: Clearspeed, Cell BE, … –Plethora of Instruction Sets even for SIMD Programmable accelerators, e.g., FPGA-based HPC Design using Accelerators –Exploit instruction-level parallelism –Exploit data-level parallelism on SIMD units –Exploit thread-level parallelism on multiple units/multi-cores Challenges –Portability across different generation and platforms –Ability to exploit different types of parallelism
VTU-IISc Workshop © 20 Summary Multi-cores and Heterogeneous accelerators present tremendous research opportunity in –Architecture –High Performance Computing –Programming Languages & Models –Compilers Proebsting’s Law Compiler Technology Doubles CPU Power Every 18 YEARS!! Time to Rewrite Probesting’s Law?
VTU – IISc Workshop Thank You !!