EECS 583 – Advanced Compilers Course Overview, Introduction to Control Flow Analysis Fall 2011, University of Michigan September 7, 2011.

Slides:



Advertisements
Similar presentations
8. Static Single Assignment Form Marcus Denker. © Marcus Denker SSA Roadmap  Static Single Assignment Form (SSA)  Converting to SSA Form  Examples.
Advertisements

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
Syllabus Instructor: Dr. Wesam Ashour
8. Code Generation. Generate executable code for a target machine that is a faithful representation of the semantics of the source code Depends not only.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
Microprocessors VLIW Very Long Instruction Word Computing April 18th, 2002.
Program Representations. Representing programs Goals.
EECS 483 – Compiler Construction Fall 2006, University of Michigan Scott Mahlke (mall-key) September 6, 2006.
CMPT 300: Operating Systems I Dr. Mohamed Hefeeda
Common Sub-expression Elim Want to compute when an expression is available in a var Domain:
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
Cpeg421-08S/final-review1 Course Review Tom St. John.
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
2015/6/21\course\cpeg F\Topic-1.ppt1 CPEG 421/621 - Fall 2010 Topics I Fundamentals.
CS 300 – Lecture 23 Intro to Computer Architecture / Assembly Language Virtual Memory Pipelining.
2015/6/25\course\cpeg421-08s\Topic-1.ppt1 CPEG 421/621 - Spring 2008 Compiler Design: The Software and Hardware Tradeoffs.
Multiscalar processors
Intermediate Code. Local Optimizations
Topic 6 -Code Generation Dr. William A. Maniatty Assistant Prof. Dept. of Computer Science University At Albany CSI 511 Programming Languages and Systems.
Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Computer Science 102 Data Structures and Algorithms V Fall 2009 Lecture 1: administrative details Professor: Evan Korth New York University 1.
Precision Going back to constant prop, in what cases would we lose precision?
Overview of the Course. Critical Facts Welcome to CISC 672 — Advanced Compiler Construction Instructor: Dr. John Cavazos Office.
1 Computer Engineering Department Islamic University of Gaza ECOM 6301: Advanced Computer Architectures (Graduate Course) Fall 2013 Prof. Mohammad A. Mikki.
Spring 2008 Mark Fontenot CSE Honors Principles of Computer Science I Note Set 1 1.
Optimization software for apeNEXT Max Lukyanov,  apeNEXT : a VLIW architecture  Optimization basics  Software optimizer for apeNEXT  Current.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
1 A Simple but Realistic Assembly Language for a Course in Computer Organization Eric Larson Moon Ok Kim Seattle University October 25, 2008.
Fall 2015, Aug 17 ELEC / Lecture 1 1 ELEC / Computer Architecture and Design Fall 2015 Introduction Vishwani D. Agrawal.
Introduction to CSCI 3294 Program Analysis Dr. Mark Lewis
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
Introduction and Overview Summer 2014 COMP 2130 Introduction to Computer Systems Computing Science Thompson Rivers University.
Predicated Static Single Assignment (PSSA) Presented by AbdulAziz Al-Shammari
Principles of Computer Science I Honors Section Note Set 1 CSE 1341 – H 1.
CS 211: Computer Architecture Lecture 6 Module 2 Exploiting Instruction Level Parallelism with Software Approaches Instructor: Morris Lancaster.
1 CS 201 Compiler Construction Introduction. 2 Instructor Information Rajiv Gupta Office: WCH Room Tel: (951) Office.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Limits of Instruction-Level Parallelism Presentation by: Robert Duckles CSE 520 Paper being presented: Limits of Instruction-Level Parallelism David W.
CSE 351 GDB Introduction. Lab 1 Status? How is Lab 1 going? I’ll be available at the end of class to answer questions There are office hours later today.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
Announcements & Reading Material
Hy-C A Compiler Retargetable for Single-Chip Heterogeneous Multiprocessors Philip Sweany 8/27/2010.
CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.
1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.
C Programming Lecture 1 : Introduction Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang University.
Data Structures and Algorithms in Java AlaaEddin 2012.
High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.
Introduction Computer Organization Spring 1436/37H (2015/16G) Dr. Mohammed Sinky Computer Architecture
Welcome! Simone Campanoni
Introduction and Overview Winter 2013 COMP 2130 Introduction to Computer Systems Computing Science Thompson Rivers University.
Welcome to CSE 502 Introduction.
Code Optimization.
Compiler Construction (CS-636)
EECS 583 – Class 2 Control Flow Analysis
课程名 编译原理 Compiling Techniques
Optimization Code Optimization ©SoftMoore Consulting.
September 27 – Course introductions; Adts; Stacks and Queues
CSCI1600: Embedded and Real Time Software
Exam Topics Hal Perkins Autumn 2009
Fall 2018, University of Michigan September 5, 2018
Welcome to CSE 502 Introduction.
8 Code Generation Topics A simple code generator algorithm
CSCI1600: Embedded and Real Time Software
ELEC / Computer Architecture and Design Fall 2014 Introduction
Presentation transcript:

EECS 583 – Advanced Compilers Course Overview, Introduction to Control Flow Analysis Fall 2011, University of Michigan September 7, 2011

- 1 - About Me v Mahlke = mall key »But just call me Scott v 10 years here at Michigan »Compiler guy who likes hardware »Program optimization and building custom hardware for high performance/low power v Before this – HP Labs »Compiler research for Itanium-like processors »PICO – automatic design of NPAs v Before before – Grad student at UIUC v Before ^ 3 – Undergrad at UIUC

- 2 - Class Overview v This class is NOT about: »Programming languages »Parsing, syntax checking, semantic analysis »Handling advanced language features – virtual functions, … »Frontend transformations »Debugging »Simulation v Compiler backend »Mapping applications to processor hardware »Retargetability – work for multiple platforms (not hard coded) »Work at the assembly-code level (but processor independent) »Speed/Efficiency  How to make the application run fast  Use less memory (text, data), efficiently execute  Parallelize

- 3 - Background You Should Have v 1. Programming »Good C++ programmer (essential) »Linux, gcc, emacs »Debugging experience – hard to debug with printf’s alone »Compiler system not ported to Windows v 2. Computer architecture »EECS 370 is good, 470 is better but not essential »Basics – caches, pipelining, function units, registers, virtual memory, branches, multiple cores, assembly code v 3. Compilers »Frontend stuff is not very relevant for this class »Basic backend stuff we will go over fast  Non-EECS 483 people will have to do some supplemental reading

- 4 - Textbook and Other Classroom Material v No required text – Lecture notes, papers v LLVM compiler system »LLVM webpage: »Read the documentation! »LLVM users group v Course webpage + course newsgroup » »Lecture notes – available the night before class »Newsgroup – ask/answer questions, GSI and I will try to check regularly but may not be able to do so always 

- 5 - What the Class Will be Like v Class meeting time – 10:30 – 12:30, MW »2 hrs is hard to handle »We’ll stop at 12:00, most of the time v Core backend stuff »Text book material – some overlap with 483 »2 homeworks to apply classroom material v Research papers »I’ll present research material along the way »However, its not a monologue, you are expected to participate in the discussion »Students will be asked to submit summaries/opinions about papers

- 6 - What the Class Will be Like (2) v Learning compilers »No memorizing definitions, terms, formulas, algorithms, etc »Learn by doing – Writing code »Substantial amount of programming  Fair learning curve for LLVM compiler »Reasonable amount of reading v Classroom »Attendance – You should be here »Discussion important  Work out examples, discuss papers, etc »Essential to stay caught up »Extra meetings outside of class to discuss projects

- 7 - Course Grading v Yes, everyone will get a grade »Distribution of grades, scale, etc - ??? »Most (hopefully all) will get A’s and B’s »Slackers will be obvious and will suffer v Components »Midterm exam – 25% »Project – 45% »Homeworks – 10% »Paper summaries – 10% »Class participation – 10%

- 8 - Homeworks v 2 of these »Small/modest programming assignments »Design and implement something we discussed in class v Goals »Learn the important concepts »Learn the compiler infrastructure so you can do the project v Grading »Good, weak effort but did something, did nothing (2/1/0) v Working together on the concepts is fine »Make sure you understand things or it will come back to bite you »Everyone must do and turn in their own assignment

- 9 - Projects – Most Important Part of the Class v Design and implement an “interesting” compiler technique and demonstrate its usefulness using LLVM v Topic/scope/work »2-3 people per project (1 person, 4 persons allowed in some cases) »You will pick the topics (I have to agree) »You will have to  Read background material  Plan and design  Implement and debug v Deliverables »Working implementation »Project report: ~5 page paper describing what you did/results »15-20 min presentation at end (demo if you want) »Project proposal (late Oct) and status report (late Nov) scheduled with each group during semester

Types of Projects v New idea »Small research idea »Design and implement it, see how it works v Extend existing idea (most popular) »Take an existing paper, implement their technique »Then, extend it to do something interesting  Generalize strategy, make more efficient/effective v Implementation »Take existing idea, create quality implementation in LLVM »Try to get your code released into main LLVM system v Using other compilers is possible but need a good reason

Topic Areas (You are Welcome to Propose Others) v Automatic parallelization »Loop parallelization »Vectorization/SIMDization »Transactional memories/speculation »Breaking dependences v Memory system performance »Instruction prefetching »Data prefetching »Use of scratchpad memories »Data layout v Reliability »Catching transient faults »Reducing AVF »Application-specific techniques v Power »Instruction scheduling techniques to reduce power »Identification of narrow computations v Streaming/GPUs – StreamIt compiler »Stream scheduling »Memory management »Optimizing CUDA programs v For the adventurous - Dynamic optimization »However, LLVM dynamic system is not stable »Run-time parallelization or other optimizations are interesting

Class Participation v Interaction and discussion is essential in a graduate class »Be here »Don’t just stare at the wall »Be prepared to discuss the material »Have something useful to contribute v Opportunities for participation »Research paper discussions – thoughts, comments, etc »Saying what you think in project discussions outside of class »Solving class problems »Asking intelligent questions

GSI v Daya Khudia v Office hours »Tuesday, Thursday, Friday: 3-5pm »Location: 1620 CSE (CAEN Lab) v LLVM help/questions v But, you will have to be independent in this class »Read the documentation and look at the code »Come to him when you are really stuck or confused »He cannot and will not debug everyone’s code »Helping each other is encouraged »Use the phorum (Daya and I will monitor this)

Contact Information v Office: 4633 CSE v v Office hours »Mon/Wed right after class in 3150 Dow »Wed: 4:30-5:30 »Or send me an for an appointment v Visiting office hrs »Mainly help on classroom material, concepts, etc. »I am just learning LLVM myself, so likely I cannot answer any non-trivial question »See Daya for LLVM details

Tentative Class Schedule WeekDateTopic 1Sept 7Course intro, Control flow analysis 2Sept 12Control flow analysis, HW #1 out Sept 14Control flow – region formation 3Sept 19Control flow – if-conversion Sept 21Control flow – hyperblocks, HW #1 due (fri) 4Sept 26Dataflow analysis Sept 28Dataflow analysis, HW #2 out 5Oct 3SSA form Oct 5Classical optimization 6Oct 10ILP optimization Oct 12Code generation – Acyclic scheduling, HW #2 due (fri) 7Oct 17No class – Fall Break Oct 19Code generation – Superblock scheduling 8Oct 24Project proposals Oct 26Project proposals 9Oct 31Code generation – Software pipelining Nov 2Code generation – Software pipelining II 10Nov 7Code generation – Register allocation Nov 9Research – Automatic parallelization 11Nov 14Midterm Exam – in class Nov 16Research – Automatic parallelization 12Nov 21Research – Automatic parallelization Nov 23No class 13Nov 28Research – Compiling streaming applications Nov 30Research – Compiling streaming applications 14Dec 5Research – Topic TBA Dec 7Research – Topic TBA 15Dec 12Research – Topic TBA Dec 13-16Project demos

Target Processors: 1) VLIW/EPIC Architectures v VLIW = Very Long Instruction Word »Aka EPIC = Explicitly Parallel Instruction Computing »Compiler managed multi-issue processor v Desktop »IA-64: aka Itanium I and II, Merced, McKinley v Embedded processors »All high-performance DSPs are VLIW  Why? Cost/power of superscalar, more scalability »TI-C6x, Philips Trimedia, Starcore, ST-200

Target Processors: 2) Multicore v Sequential programs – 1 core busy, 3 sit idle v How do we speed up sequential applications? »Switch from ILP to TLP as major source of performance »Memory dependence analysis becomes critical

Target Processors: 3) SIMD v Do the same work on different data: GPU, SSE, etc. v Energy-efficient way to scale performance v Must find “vector parallelism”

So, lets get started… Compiler Backend IR – Our Input v Variable home location »Frontend – every variable in memory »Backend – maximal but safe register promotion  All temporaries put into registers  All local scalars put into registers, except those accessed via &  All globals, local arrays/structs, unpromotable local scalars put in memory. Accessed via load/store. v Backend IR (intermediate representation) »machine independent assembly code – really resource indep! »aka RTL (register transfer language), 3-address code »r1 = r2 + r3 or equivalently add r1, r2, r3  Opcode (add, sub, load, …)  Operands u Virtual registers – infinite number of these u Literals – compile-time constants

Control Flow v Control transfer = branch (taken or fall-through) v Control flow »Branching behavior of an application »What sequences of instructions can be executed v Execution  Dynamic control flow »Direction of a particular instance of a branch »Predict, speculate, squash, etc. v Compiler  Static control flow »Not executing the program »Input not known, so what could happen v Control flow analysis »Determining properties of the program branch structure »Determining instruction execution properties

Basic Block (BB) v Group operations into units with equivalent execution conditions v Defn: Basic block – a sequence of consecutive operations in which flow of control enters at the beginning and leaves at the end without halt or possibility of branching except at the end »Straight-line sequence of instructions »If one operation is executed in a BB, they all are v Finding BB’s »The first operation starts a BB »Any operation that is the target of a branch starts a BB »Any operation that immediately follows a branch starts a BB

Identifying BBs - Example L1: r7 = load(r8) L2: r1 = r2 + r3 L3: beq r1, 0, L10 L4: r4 = r5 * r6 L5: r1 = r1 + 1 L6: beq r1 100 L3 L7: beq r2 100 L10 L8: r5 = r9 + 1 L9: jump L2 L10: r9 = load (r3) L11: store(r9, r1) ??

Control Flow Graph (CFG) v Defn Control Flow Graph – Directed graph, G = (V,E) where each vertex V is a basic block and there is an edge E, v1 (BB1)  v2 (BB2) if BB2 can immediately follow BB1 in some execution sequence »A BB has an edge to all blocks it can branch to »Standard representation used by many compilers »Often have 2 pseudo vertices  entry node  exit node BB1 BB2 BB4 BB3 BB5BB6 BB7 Entry Exit

CFG Example x = z – 2; y = 2 * z; if (c) { x = x + 1; y = y + 1; } else { x = x – 1; y = y – 1; } z = x + y x = z – 2; y = 2 * z; if (c) B2 else B3 x = x + 1; y = y + 1; goto B4 z = x + y x = x – 1; y = y – 1; then (taken) else (fallthrough) B1 B2B3 B4

Weighted CFG v Profiling – Run the application on 1 or more sample inputs, record some behavior »Control flow profiling  edge profile  block profile »Path profiling »Cache profiling »Memory dependence profiling v Annotate control flow profile onto a CFG  weighted CFG v Optimize more effectively with profile info!! »Optimize for the common case »Make educated guess BB1 BB2 BB4 BB3 BB5BB6 BB7 Entry Exit

Dominator (DOM) v Defn: Dominator – Given a CFG(V, E, Entry, Exit), a node x dominates a node y, if every path from the Entry block to y contains x v 3 properties of dominators »Each BB dominates itself »If x dominates y, and y dominates z, then x dominates z »If x dominates z and y dominates z, then either x dominates y or y dominates x v Intuition »Given some BB, which blocks are guaranteed to have executed prior to executing the BB

Dominator Examples BB1 BB2 BB4 BB3 Entry Exit BB2 BB3 BB5BB4 Entry Exit BB6 BB1 BB7

If You Want to Get Started … v Go to v Download and install LLVM on your favorite Linux box »Read the installation instructions to help you »Will need gcc 4.x v Try to run it on a simple C program v Will be the first part of HW 1 that goes out next week. v We will have 3 machines for the class to use »Andrew, hugo, wilma – will be available next week