Computation II pg 1 Parallelization, Compilation and Platforms or PCP 5LIM0 Quartile 3, year 2015-2016 Introduction - Overview Henk Corporaal February.

Slides:



Advertisements
Similar presentations
SE 292 (3:0) High Performance Computing Aug R. Govindarajan Sathish S. Vadhiyar
Advertisements

Accelerators for HPC: Programming Models Accelerators for HPC: StreamIt on GPU High Performance Applications on Heterogeneous Windows Clusters
Computer Architecture Lecture 7 Compiler Considerations and Optimizations.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 531 Compiler Construction Final Lecture of the Semester Spring 2010 Marco.
Compiler Construction by Muhammad Bilal Zafar (AP)
Goran Šuković, University of Montenegro 1/21 Compiler Construction Course at University of Montenegro 7 th Workshop on “Software Engineering Education.
Programming with CUDA WS 08/09 Lecture 12 Tue, 02 Dec, 2008.
TDDD55 Compilers & Interpreters TDDB44 Compiler Construction 2011 Organizational Issues Peter Fritzson, IDA.
Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.
Instruction Level Parallelism (ILP) Colin Stevens.
Cpeg421-08S/final-review1 Course Review Tom St. John.
2015/6/21\course\cpeg F\Topic-1.ppt1 CPEG 421/621 - Fall 2010 Topics I Fundamentals.
Processor Architectures and Program Mapping 5kk10 TU/e 2006 Henk Corporaal Jef van Meerbergen Bart Mesman.
Processor Design 5Z032 Henk Corporaal Eindhoven University of Technology 2011.
2015/6/25\course\cpeg421-08s\Topic-1.ppt1 CPEG 421/621 - Spring 2008 Compiler Design: The Software and Hardware Tradeoffs.
Embedded Systems in Silicon TD5102 Henk Corporaal Technical University Eindhoven DTI / NUS Singapore.
1 Computer Engineering Department Islamic University of Gaza ECOM 6301: Selected Topics in Computer Architectures (Graduate Course) Fall Prof.
CS 415: Programming Languages Course Introduction Aaron Bloomfield Fall 2005.
DOP - A CPU CORE FOR TEACHING BASICS OF COMPUTER ARCHITECTURE Miloš Bečvář, Alois Pluháček and Jiří Daněček Department of Computer Science and Engineering.
(1) ECE 8823: GPU Architectures Sudhakar Yalamanchili School of Electrical and Computer Engineering Georgia Institute of Technology NVIDIA Keplar.
Integrating Parallel and Distributed Computing Topics into an Undergraduate CS Curriculum Andrew Danner & Tia Newhall Swarthmore College Third NSF/TCPP.
CIS4930/CDA5125 Parallel and Distributed Systems Florida State University CIS4930/CDA5125: Parallel and Distributed Systems Instructor: Xin Yuan, 168 Love,
COMP 3438 System Programming
Array Dependence Analysis COMP 621 Special Topics By Nurudeen Lameed
ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 3, 2011outline.1 ITCS 6010/8010 Topics in Computer Science: GPU Programming for High Performance.
Compiler course 1. Introduction. Outline Scope of the course Disciplines involved in it Abstract view for a compiler Front-end and back-end tasks Modules.
GPU Programming and Architecture: Course Overview Patrick Cozzi University of Pennsylvania CIS Spring 2012.
1 Optimizing compiler tools and building blocks project Alexander Drozdov, PhD Sergey Novikov, PhD.
Multicore Computing Lecture 1 : Course Overview Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang University.
JPEG-GPU: A GPGPU IMPLEMENTATION OF JPEG CORE CODING SYSTEMS Ang Li University of Wisconsin-Madison.
Compilers for Embedded Systems Ram, Vasanth, and VJ Instructor : Dr. Edwin Sha Synthesis and Optimization of High-Performance Systems.
CS 460/660 Compiler Construction. Class 01 2 Why Study Compilers? Compilers are important – –Responsible for many aspects of system performance Compilers.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
Advanced Computer Architecture 5MD00 Overview Henk Corporaal TUEindhoven 2014.
Course Overview for Compilers J. H. Wang Sep. 14, 2015.
Course Overview Mark Stanovich COP 5641 / CIS 4930.
Embedded Computer Architecture 5SIA0 Overview Henk Corporaal TUEindhoven
1 CS308 Compiler Theory. 2 Course Information Instructor : –Prof. Minyi Guo –Yao Shen Course.
Compilers: Prelim/0 1 Compiler Structures Objective – –to give some background on the course , Semester 1, Who I am: Andrew Davison.
Course Overview for Compilers J. H. Wang Sep. 20, 2011.
AUTO-GC: Automatic Translation of Data Mining Applications to GPU Clusters Wenjing Ma Gagan Agrawal The Ohio State University.
My Coordinates Office EM G.27 contact time:
CS203 – Advanced Computer Architecture Introduction Daniel Wong, Assistant Professor Department of Electrical and Computer Engineering Cooperating Faculty,
Course Overview Stephen M. Thebaut, Ph.D. University of Florida Software Engineering.
Multicore Computing Lecture 1 : Course Overview Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang University.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Martin Kruliš by Martin Kruliš (v1.1)1.
1 ”MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs” John A. Stratton, Sam S. Stone and Wen-mei W. Hwu Presentation for class TDT24,
COMP Compilers Lecture 1: Introduction
Compiler Design (40-414) Main Text Book:
Chapter 1 Introduction.
CS427 Multicore Architecture and Parallel Computing
CMPUT Compiler Design and Optimization
Chapter 1 Introduction.
课程名 编译原理 Compiling Techniques
CISC 7120X Programming Languages and Compilers
Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 17 NVIDIA GPU Computational Structures Prof. Zhang Gang
Advanced Computer Architecture 5MD00 / 5Z033 Overview
Course supervisor: Lubna Siddiqui
COMP Compilers Lecture 1: Introduction
Embedded Computer Architecture 5SIA0 Overview
CSC227: Operating Systems
Compiler Structures 0. Preliminaries
Embedded Computer Architecture 5SAI0 Wrap-Up, we are almost there...
ECE 8823: GPU Architectures
Human Media Multicore Computing Lecture 1 : Course Overview
Human Media Multicore Computing Lecture 1 : Course Overview
Human Media Multicore Computing Lecture 1 : Course Overview
CISC 7120X Programming Languages and Compilers
Presentation transcript:

Computation II pg 1 Parallelization, Compilation and Platforms or PCP 5LIM0 Quartile 3, year Introduction - Overview Henk Corporaal February 2016

Computation II pg 2 Parallelization, Compilers and Platforms New course 5LIM0, try out Lecturers –Henk Corporaal –Sander Stuijk –Roel Jordans –Martijn Koedam

Computation II pg 3 Why this course? Is compilers not out? –Tools are free –The dragon book “compiler bible”is from 1986 (so 30 years old !!) –TU/e skipped the course many years ago We tried auto-parallelization for over 40 years, and the conclusion: do it yourself However: –New developments in platforms –We are getting better in program analyzis –Demand from industry –Time for a rebirth?

Computation II pg 4

Computation II pg 5 What can we conclude? Power / Energy wall drives computing Single cores hardly improve Need Multi-Core => Many-Core –E.g. GPGPUs may contain thousands of Processing Elements Need Heterogeneous systems: –Scalable Vdd – Performance Near Vth designs –Big – Little configurations –Include DSPs / VLIWs –Include Accelerators

Computation II pg 6 General Goals In-depth knowledge about Compilers Compiler design LLVM (Low Level Virtual Machine) Intermediate Formats Code generation: scheduling, allocation, etc. Program analysis, Polyhedral model and tools Loop transformations Optimizing data accesses and data reuse

Computation II pg 7 General goals Getting familiar with a few embedded platforms We take 2 embedded extremes The smallest –Arduino board –8-bit AVR RISC (Atmel / ATmega328?), –with hardly any memory (2KB SRAM) The biggest –Jetson TK1 –4 + 1 ARM A15 cores + –192 Nvidia cores

Computation II pg 8 General Goals In-depth knowledge about Parallelization Vectorization –Use of SIMD instruction sets Parallel programming techniques –OpenCL, –OpenMP, OpenMP4 –MPI Parallelizing code for the Jetson board Guest lectures: –Halide –Compiler correctness –Compiler business

Computation II pg 9 PCP Material Books (background material): –Alfred Aho, Monica Lam, Ravi Sethi, Jeffrey Ullman: Compilers: Principles, Techniques, and Tools. Second edition, Addison-Wesley, –Y.N. Srikant, P. Shankar (ed.): The compiler design handbook: optimizations and machine code generation, CRC Press, collection of independent chapters –Fisher, Faraboschi, Young: Embedded Computing - A VLIW Approach to Architecture, Compilers, and Tools. Morgan Kaufmann, Check regularly our website (course 5LIM0) –for slides –announcements –labs, tools, etc. –material will be regularly uploaded

Computation II pg 10 PCP Structure Lectures + Lab contact hours –Mondays 3,4 in L10 (Paviljoen) –Thursdays 7,8 in Aud 15 –Typically second hour for labs Exam / Credits –4 points (1 per assignment) –Written (online) exam Compiler / Code generation : 3 points Parallelization : 3 points –2 bonus points Final online exam in week 14/15

Computation II pg 11 PCP Schedule, preliminary WeekDayTheory topicsLab 5Feb 1Course introduction1a: AVR *topic ** ? Compiler overview, passes, linking, AVR architecture Installation, assembly code Feb 4LLVM tutorial, part 11b: AVR Optimizing delay function Overview LLVM, ELF format 7Feb 15LLVM tutorial, part 21c: AVR adding an instruction: Control Flow analysis, Data Dep Analysis built-in delay Feb 18IR, Single Assignment2a: make an LLVM IR pass e.g. list BBs 8Feb 22List scheduling, Modulo scheduling2b1: List scheduler, single issue + multi-issue Heuristics, ILP example, if-conversion homogeneous Feb 25Register allocation2b2: List scheduler, multi issue heterogeneous coloring, heuristics, spilling Scheduling scopes: from trace to region 9Feb 29Loop transformations, part 12c: Bonus: Register allocation DMM: Data Memory Management Extended basis block scheduling Mar 3Multi-Proc platforms, Jetson K1, X13a1: Loop transformations for access and architecture, coding, profiling, debugging, etc. locality improvement

Computation II pg 12 PCP Schedule, preliminary 10Mar 7SIMD model, vectorization3a2: continue on loop trafos Neon, SSX ISA sets Mar 10Loop transformations, part 23b: Use of SIMD instruction-set including loop analyzis afine, scop, etc. 11Mar 14Polyhedral model, Polly, Autovectorization3c: Bonus: auto vectorization exercise Mar 17Testing Compilers: Guest Marcel Beemster, SolidSands cont'd auto vectorization and perhaps ACE compiler insights 12Mar 21Threads, SMT, OpenMP44a: Task parallelization using OpenMP barriers, synchronization primitives Mar 24OpenMP4 offloading4b: Using GPU cores OpenCL, CUDA Bonus: CUDA or OpenCL compared to OpenMP4 offloading 13Mar 31Future: Compiler business and Parallelizationreserved for finising labs 1. Halide: Guest speaker Sander Vocke 2. Compiler business: Guest speaker Marco Roodzant, ACE

Computation II pg 13 PCP Wish you a very nice course !! Questions? Jetson TX1 development board Quad ARM A57 Maxwell GPU 1TFLOP/s (for 16-bit FloatingPoint 16 GB SDK supporting Deep Learning