Research in Compilers and Introduction to Loop Transformations Part I: Compiler Research Tomofumi Yuki EJCP 2017 June 29, Toulouse.

Slides:

Advertisements

Similar presentations

Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories Muthu Baskaran 1 Uday Bondhugula.

Advertisements

Chapt.2 Machine Architecture Impact of languages –Support – faster, more secure Primitive Operations –e.g. nested subroutine calls »Subroutines implemented.

Compiler Support for Superscalar Processors. Loop Unrolling Assumption: Standard five stage pipeline Empty cycles between instructions before the result.

Instruction Set Design

1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.

1 Lecture 5: Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2)

CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.

Computer Abstractions and Technology

1 Lecture 10: Static ILP Basics Topics: loop unrolling, static branch prediction, VLIW (Sections 4.1 – 4.4)

Optimal Instruction Scheduling for Multi-Issue Processors using Constraint Programming Abid M. Malik and Peter van Beek David R. Cheriton School of Computer.

Prof. Bodik CS 164 Lecture 171 Register Allocation Lecture 19.

Register Allocation (via graph coloring)

Chapter 13 Reduced Instruction Set Computers (RISC) Pipelining.

Register Allocation (via graph coloring). Lecture Outline Memory Hierarchy Management Register Allocation –Register interference graph –Graph coloring.

RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.

PhD/Master course, Uppsala  Understanding the interaction between your program and computer  Structuring the code  Optimizing the code  Debugging.

Compiler Optimization Overview

4/29/09Prof. Hilfinger CS164 Lecture 381 Register Allocation Lecture 28 (from notes by G. Necula and R. Bodik)

Reduced Instruction Set Computers (RISC) Computer Organization and Architecture.

CH13 Reduced Instruction Set Computers {Make hardware Simpler, but quicker} Key features  Large number of general purpose registers  Use of compiler.

High level & Low level language High level programming languages are more structured, are closer to spoken language and are more intuitive than low level.

IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.

Research in Compilers and How it Relates to Software Engineering Part I: Compiler Research Tomofumi Yuki EJCP 2015 June 22, Nancy.

Introduction and Overview Summer 2014 COMP 2130 Introduction to Computer Systems Computing Science Thompson Rivers University.

CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.

VTU – IISc Workshop Compiler, Architecture and HPC Research in Heterogeneous Multi-Core Era R. Govindarajan CSA & SERC, IISc

RISC Architecture RISC vs CISC Sherwin Chan.

Chapter 2 Instructions: Language of the Computer Part I.

Compilers for Embedded Systems Ram, Vasanth, and VJ Instructor : Dr. Edwin Sha Synthesis and Optimization of High-Performance Systems.

ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.

Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

CR18: Advanced Compilers L01 Introduction Tomofumi Yuki.

Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.

Reduced Instruction Set Computing Ammi Blankrot April 26, 2011 (RISC)

Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 1: Overview of High Performance Processors * Jeremy R. Johnson Wed. Sept. 27,

Memory-Aware Compilation Philip Sweany 10/20/2011.

High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.

Computer Organization Exam Review CS345 David Monismith.

COMP Compilers Lecture 1: Introduction

Advanced Computer Systems

Research in Compilers and Introduction to Loop Transformations Part I: Compiler Research Tomofumi Yuki EJCP 2016 June 29, Lille.

Advanced Architectures

Chapter 1 Introduction.

A Closer Look at Instruction Set Architectures

CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue

The compilation process

Chapter 1 Introduction.

课程名编译原理 Compiling Techniques

Optimization Code Optimization ©SoftMoore Consulting.

Decoupled Access-Execute Pioneering Compilation for Energy Efficiency

A Closer Look at Instruction Set Architectures

/ Computer Architecture and Design

Compiler Construction

Instruction Scheduling for Instruction-Level Parallelism

CS170 Computer Organization and Architecture I

COMP Compilers Lecture 1: Introduction

Compiler Back End Panel

Compiler Back End Panel

1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.

CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue

Static Code Scheduling

Lecture 3: MIPS Instruction Set

Chapter 12 Pipelining and RISC

Lecture 4: Instruction Set Design/Pipelining

Instruction Level Parallelism

CSc 453 Final Code Generation

6- General Purpose GPU Programming

Optimizing Compilers CISC 673 Spring 2009 Course Overview

(via graph coloring and spilling)

Presentation transcript:

Research in Compilers and Introduction to Loop Transformations Part I: Compiler Research Tomofumi Yuki EJCP 2017 June 29, Toulouse

Background Defended Ph.D. in C.S. on October 2012 Colorado State University Advisor: Dr. Sanjay Rajopadhye Currently Inria Chargé de Recherche Bretagne-Atlantique, Rennes, CAIRN team Optimizing compiler + programming language static analysis (polyhedral model) parallel programming models High-Level Synthesis I will start with a short reminder of my background. I did my PhD work at CSU under the supervision of Dr.Sanjay Rajopadhye. Currently, I am a post-doc at Inria Rennes, within the CAIRN team. My past work is in optimizing compilers and programming languages. In particular, I have worked on static analyses, parallel programming models, and high-level synthesis. EJCP 2017, June 29, Toulouse

What is this Course About? Research in compilers a bit about compiler itself Understand compiler research what are the problems? what are the techniques? what are the applications? may be do research in compilers later on! Be able to (partially) understand work by “compiler people” at conferences. Many domain is related to compilation, much more than a typical starting student may think of, and when you meet ppl at conferences, basic understanding of the context. Not going into parsing/lexing blah blah EJCP 2017, June 29, Toulouse

Compiler Advances Old compiler vs recent compiler modern architecture gcc -O3 vs gcc -O0 How much speedup by compiler alone after 45 years of research? EJCP 2017, June 29, Toulouse

Proebsting’s Law HW gives 60%/year Compiler Advances Double Computing Power Every 18 Years http://proebsting.cs.arizona.edu/law.html Someone actually tried it: On Proebsting’s Law, Kevin Scott, 2001 SPEC95, compared against –O0 3.3x for int 8.1x for float HW gives 60%/year EJCP 2017, June 29, Toulouse

Compiler Advances Old compiler vs recent compiler Not so much? modern architecture gcc -O3 vs gcc -O0 3~8x difference after 45 years Not so much? EJCP 2017, June 29, Toulouse

Compiler Advances Old compiler vs recent compiler Not so much? modern architecture gcc -O3 vs gcc -O0 3~8x difference after 45 years Not so much? “The most remarkable accomplishment by far of the compiler field is the widespread use of high-level languages.” by Mary Hall, David Padua, and Keshav Pingali [Compiler Research: The Next 50 Years, 2009] EJCP 2017, June 29, Toulouse

Earlier Accomplishments Getting efficient assembly register allocation instruction scheduling ... High-level language features object-orientation dynamic types automated memory management EJCP 2017, June 29, Toulouse

What is Left? Parallelism Security/Reliability Power/Energy multi-cores, GPUs, ... language features for parallelism Security/Reliability verification certified compilers Power/Energy data movement voltage scaling EJCP 2017, June 29, Toulouse

Agenda for today Part I: What is Compiler Research? Part II: Compiler Optimizations Lab: Introduction to Loop Transformations EJCP 2017, June 29, Toulouse

What is a Compiler? Bridge between “source” and “target” source target EJCP 2017, June 29, Toulouse

Compiler vs Assembler What are the differences? source target compile object/machine code assembly object assemble EJCP 2017, June 29, Toulouse

Compiler vs Assembler Compiler Assembler Many possible targets (semi-portable) Many decisions are taken Assembler Specialized output (non-portable) Usually a “translation” EJCP 2017, June 29, Toulouse

Goals of the Compiler Higher abstraction Performance No more writing assemblies! enables language features loops, functions, classes, aspects, ... Performance while increasing productivity speed, space, energy, ... compiler optimizations EJCP 2017, June 29, Toulouse

Productivity vs Performance Higher Abstraction ≈ Less Performance Python Java Abstraction C Fortran Assembly Performance EJCP 2017, June 29, Toulouse

Productivity vs Performance How much can you regain? Python Python Java C Fortran Java Abstraction C Fortran Assembly Performance EJCP 2017, June 29, Toulouse

Productivity vs Performance How sloppy can you write code? Python Java C Fortran Python Java Abstraction C Fortran Assembly Performance EJCP 2017, June 29, Toulouse

Compiler Research Branch of Programming Languages Program Analysis, Transformations Formal Semantics Type Theory Runtime Systems Compilers ... EJCP 2017, June 29, Toulouse

New HW Needs New Compiler New Architecture IBM Cell, GPU, Xeon-Phi, Kalray MPPA, ... which ones succeeded? Good prog. model and compiler easier to fully utilize new HW crucial for success HW vendors invest a lot into compilers EJCP 2017, June 29, Toulouse

Examples Two classical compiler optimizations register allocation instruction scheduling EJCP 2017, June 29, Toulouse

Case 1: Register Allocation Classical optimization problem 3 registers 8 instructions 2 registers 6 instructions C = A + B; D = B + C; load %r1, A load %r2, B add %r3, %r1, %r2 store %r3, C load %r1, B load %r2, C store %r3, D naïve translation load %r1, A load %r2, B add %r1, %r1, %r2 store %r1, C add %r1, %r2, %r1 store %r1, D smart compilation EJCP 2017, June 29, Toulouse

Register Allocation in 5min. Often viewed as graph coloring Live Range: when a value is “in use” Interference: both values are “in use” e.g., two operands of an instruction Coloring: conflicting nodes to different reg. a b c d Live Range Analysis a b d c Interference Graph b c = a + b; d = b + c; add %r1, %r1, %r2 add %r1, %r2, %r1 Assume unbounded number of registers and load memory into “virtual” registers. EJCP 2017, June 29, Toulouse

Register Allocation in 5min. Registers are limited a b c d x y a b d c y x a b d c y x c = a + b; d = b + c; x = c + d; y = a + x; Live Range Splitting a b c d x y z Assume unbounded number of registers and load memory into “virtual” registers. a b d c z x a b d c z x a = load A; c = a + b; d = b + c; x = c + d; z = load A; y = z + x; EJCP 2017, June 29, Toulouse 23

Research in Register Allocation How to do a good allocation which variables to split which values to spill How to do it fast? Graph-coloring is expensive Just-in-Time compilation “Solved” EJCP 2017, June 29, Toulouse

Case 2: Instruction Scheduling Another classical problem X = A * B * C; Y = D * E * F; R = A * B; X = R * C; S = D * E; Y = S * F; naïve translation R = A * B; S = D * E; X = R * C; Y = S * F; smart compilation Pipeline Stall (if mult. takes 2 cycles) Also done in hardware (out-of-order) EJCP 2017, June 29, Toulouse

Research in Instruction Scheduling Not much anymore for speed/parallelism beaten to death hardware does it for you Remains interesting in specific contexts faster methods for JIT energy optimization “predictable” execution in-order cores, VLIW, etc. EJCP 2017, June 29, Toulouse

Case 1+2: Phase Ordering Yet another classical problem practically no solution Given optimization A and B A after B vs A before B which order is better? can you solve the problem globally? Parallelism requires more memory trade-off: register pressure vs parallelism EJCP 2017, June 29, Toulouse

Job Market Where do they work at? Many opportunities in France Intel / IBM Research Apple Mathworks amazon Xilinx Many opportunities in France Mathworks @ Grenoble Many start-ups EJCP 2017, June 29, Toulouse