Download presentation
Presentation is loading. Please wait.
Published byKathryn Kennedy Modified over 8 years ago
1
Computation II pg 1 Parallelization, Compilation and Platforms or PCP 5LIM0 Quartile 3, year 2015-2016 Introduction - Overview Henk Corporaal February 2016
2
Computation II pg 2 Parallelization, Compilers and Platforms New course 5LIM0, try out Lecturers –Henk Corporaal –Sander Stuijk –Roel Jordans –Martijn Koedam
3
Computation II pg 3 Why this course? Is compilers not out? –Tools are free –The dragon book “compiler bible”is from 1986 (so 30 years old !!) –TU/e skipped the course many years ago We tried auto-parallelization for over 40 years, and the conclusion: do it yourself However: –New developments in platforms –We are getting better in program analyzis –Demand from industry –Time for a rebirth?
4
Computation II pg 4
5
Computation II pg 5 What can we conclude? Power / Energy wall drives computing Single cores hardly improve Need Multi-Core => Many-Core –E.g. GPGPUs may contain thousands of Processing Elements Need Heterogeneous systems: –Scalable Vdd – Performance Near Vth designs –Big – Little configurations –Include DSPs / VLIWs –Include Accelerators
6
Computation II pg 6 General Goals In-depth knowledge about Compilers Compiler design LLVM (Low Level Virtual Machine) Intermediate Formats Code generation: scheduling, allocation, etc. Program analysis, Polyhedral model and tools Loop transformations Optimizing data accesses and data reuse
7
Computation II pg 7 General goals Getting familiar with a few embedded platforms We take 2 embedded extremes The smallest –Arduino board –8-bit AVR RISC (Atmel / ATmega328?), –with hardly any memory (2KB SRAM) The biggest –Jetson TK1 –4 + 1 ARM A15 cores + –192 Nvidia cores
8
Computation II pg 8 General Goals In-depth knowledge about Parallelization Vectorization –Use of SIMD instruction sets Parallel programming techniques –OpenCL, –OpenMP, OpenMP4 –MPI Parallelizing code for the Jetson board Guest lectures: –Halide –Compiler correctness –Compiler business
9
Computation II pg 9 PCP Material Books (background material): –Alfred Aho, Monica Lam, Ravi Sethi, Jeffrey Ullman: Compilers: Principles, Techniques, and Tools. Second edition, Addison-Wesley, 2006. –Y.N. Srikant, P. Shankar (ed.): The compiler design handbook: optimizations and machine code generation, CRC Press, collection of independent chapters –Fisher, Faraboschi, Young: Embedded Computing - A VLIW Approach to Architecture, Compilers, and Tools. Morgan Kaufmann, 2005. Check regularly our website (course 5LIM0) –for slides –announcements –labs, tools, etc. –material will be regularly uploaded
10
Computation II pg 10 PCP Structure Lectures + Lab contact hours –Mondays 3,4 in L10 (Paviljoen) –Thursdays 7,8 in Aud 15 –Typically second hour for labs Exam / Credits –4 points (1 per assignment) –Written (online) exam Compiler / Code generation : 3 points Parallelization : 3 points –2 bonus points Final online exam in week 14/15
11
Computation II pg 11 PCP Schedule, preliminary WeekDayTheory topicsLab 5Feb 1Course introduction1a: AVR *topic ** ? Compiler overview, passes, linking, AVR architecture Installation, assembly code Feb 4LLVM tutorial, part 11b: AVR Optimizing delay function Overview LLVM, ELF format 7Feb 15LLVM tutorial, part 21c: AVR adding an instruction: Control Flow analysis, Data Dep Analysis built-in delay Feb 18IR, Single Assignment2a: make an LLVM IR pass e.g. list BBs 8Feb 22List scheduling, Modulo scheduling2b1: List scheduler, single issue + multi-issue Heuristics, ILP example, if-conversion homogeneous Feb 25Register allocation2b2: List scheduler, multi issue heterogeneous coloring, heuristics, spilling Scheduling scopes: from trace to region 9Feb 29Loop transformations, part 12c: Bonus: Register allocation DMM: Data Memory Management Extended basis block scheduling Mar 3Multi-Proc platforms, Jetson K1, X13a1: Loop transformations for access and architecture, coding, profiling, debugging, etc. locality improvement
12
Computation II pg 12 PCP Schedule, preliminary 10Mar 7SIMD model, vectorization3a2: continue on loop trafos Neon, SSX ISA sets Mar 10Loop transformations, part 23b: Use of SIMD instruction-set including loop analyzis afine, scop, etc. 11Mar 14Polyhedral model, Polly, Autovectorization3c: Bonus: auto vectorization exercise Mar 17Testing Compilers: Guest Marcel Beemster, SolidSands cont'd auto vectorization and perhaps ACE compiler insights 12Mar 21Threads, SMT, OpenMP44a: Task parallelization using OpenMP barriers, synchronization primitives Mar 24OpenMP4 offloading4b: Using GPU cores OpenCL, CUDA Bonus: CUDA or OpenCL compared to OpenMP4 offloading 13Mar 31Future: Compiler business and Parallelizationreserved for finising labs 1. Halide: Guest speaker Sander Vocke 2. Compiler business: Guest speaker Marco Roodzant, ACE
13
Computation II pg 13 PCP Wish you a very nice course !! Questions? Jetson TX1 development board Quad ARM A57 Maxwell GPU 1TFLOP/s (for 16-bit FloatingPoint 16 GB SDK supporting Deep Learning
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.