CMPUT 229 - Computer Organization and Architecture I1 CMPUT229 - Fall 2003 TopicA: Flow Analysis José Nelson Amaral.

Slides:



Advertisements
Similar presentations
In-Order Execution In-order execution does not always give the best performance on superscalar machines. The following example uses in-order execution.
Advertisements

Optimizing Compilers for Modern Architectures Syllabus Allen and Kennedy, Preface Optimizing Compilers for Modern Architectures.
Course Outline Traditional Static Program Analysis Software Testing
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
CMPUT Compiler Design and Optimization1 CMPUT680 - Winter 2006 Topic 5: Peep Hole Optimization José Nelson Amaral
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
Data Dependencies Describes the normal situation that the data that instructions use depend upon the data created by other instructions, or data is stored.
1 CS 201 Compiler Construction Machine Code Generation.
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /15/2013 Lecture 11: MIPS-Conditional Instructions Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER.
Chapter 10 Code Optimization. A main goal is to achieve a better performance Front End Code Gen Intermediate Code source Code target Code user Machine-
Dominators and CFGs Taken largely from University of Delaware Compiler Notes \course\cpeg421-05s\Topic2.ppt.
Topic 3: Flow Analysis José Nelson Amaral
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 531 Compiler Construction Final Lecture of the Semester Spring 2010 Marco.
Pipelining II (1) Fall 2005 Lecture 19: Pipelining II.
VLIW Compilation Techniques in a Superscalar Environment Kemal Ebcioglu, Randy D. Groves, Ki- Chang Kim, Gabriel M. Silberman and Isaac Ziv PLDI 1994.
CMPUT Compiler Design and Optimization
Compiler Challenges, Introduction to Data Dependences Allen and Kennedy, Chapter 1, 2.
CMPUT680 - Fall 2006 Topic A: Data Dependence in Loops José Nelson Amaral
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
PSUCS322 HM 1 Languages and Compiler Design II Basic Blocks Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU Spring.
Lecture 6 Program Flow Analysis Forrest Brewer Ryan Kastner Jose Amaral.
2015/6/24\course\cpeg421-10F\Topic1-b.ppt1 Topic 1b: Flow Analysis Some slides come from Prof. J. N. Amaral
Intermediate Code. Local Optimizations
Topic 6 -Code Generation Dr. William A. Maniatty Assistant Prof. Dept. of Computer Science University At Albany CSI 511 Programming Languages and Systems.
CMPUT Computer Organization and Architecture I1 CMPUT229 - Fall 2003 Topic8: Input/Output Programming José Nelson Amaral.
CHAPTER 9: Input / Output
CMPUT Compiler Design and Optimization1 CMPUT680 - Winter 2006 Topic0: Introduction José Nelson Amaral
CS 61C L38 Pipelined Execution, part II (1) Garcia, Spring 2004 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.
DATA LOCALITY & ITS OPTIMIZATION TECHNIQUES Presented by Preethi Rajaram CSS 548 Introduction to Compilers Professor Carol Zander Fall 2012.
ECE355 Fall 2004Software Reliability1 ECE-355 Tutorial Jie Lian.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to.
CHAPTER 9: Input / Output
CMPUT Computer Organization and Architecture I1 CMPUT229 - Fall 2003 Topic7: Floating Point José Nelson Amaral.
Compiler course 1. Introduction. Outline Scope of the course Disciplines involved in it Abstract view for a compiler Front-end and back-end tasks Modules.
CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.
RISC architecture and instruction Level Parallelism (ILP) based on “Computer Architecture: a Quantitative Approach” by Hennessy and Patterson, Morgan Kaufmann.
1 CS 201 Compiler Construction Introduction. 2 Instructor Information Rajiv Gupta Office: WCH Room Tel: (951) Office.
SOFTWARE DESIGN. INTRODUCTION There are 3 distinct types of activities in design 1.External design 2.Architectural design 3.Detailed design Architectural.
Compilers Modern Compiler Design
1 Control Flow Analysis Topic today Representation and Analysis Paper (Sections 1, 2) For next class: Read Representation and Analysis Paper (Section 3)
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.
CS 614: Theory and Construction of Compilers Lecture 15 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
Interrupt driven I/O Computer Organization and Assembly Language: Module 12.
CMPUT Computer Organization and Architecture I1 CMPUT229 - Fall 2003 Topic6: Logic, Multiply and Divide Operations José Nelson Amaral.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
Memory-Aware Compilation Philip Sweany 10/20/2011.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.
Instruction Scheduling Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
CMPUT Computer Organization and Architecture I1 CMPUT229 - Fall 2003 Topic3: Instructions, The Language of the Machine José Nelson Amaral.
Exceptions and Interrupts ◆ Breakpoints, arithmetic overflow, traps, and interrupts are all classified as exceptions. ◆ An exception is an event that requires.
Computer Organization CS224
CMPUT Compiler Design and Optimization
CS170 Computer Organization and Architecture I
The University of Adelaide, School of Computer Science
Pick up the handout on your way in!!
Taken largely from University of Delaware Compiler Notes
Control Flow Analysis CS 4501 Baishakhi Ray.
Topic 4: Flow Analysis Some slides come from Prof. J. N. Amaral
Code Optimization Overview and Examples Control Flow Graph
Control Flow Analysis (Chapter 7)
Computer Architecture
Debuggers and Debugging
8 Code Generation Topics A simple code generator algorithm
Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.
Taken largely from University of Delaware Compiler Notes
Some Assembly (Part 2) set.html.
Week 5 Computers are like Old Testament gods; lots of rules and no mercy. Joseph Campbell.
Presentation transcript:

CMPUT Computer Organization and Architecture I1 CMPUT229 - Fall 2003 TopicA: Flow Analysis José Nelson Amaral

CMPUT Computer Organization and Architecture I2 Reading Material The concepts necessary for flow analysis, such as basic blocks, control flow graphs, and data dependence graphs are presented in several compiler textbooks (most of these books are available in the UofA library). For instance: Randy Allen, Ken Kennedy, Optimizing Compilers for Modern Architectures: A Dependence-based Approach, Morgan Kauffman, Andrew W. Appel : Modern Compiler Implementation in C A. Aho, R. Sethi and J. Ullman, Compilers: Principles, Techniques and Tools (The Dragon Book), Addison Wesley, 1988 M. Wolfe, High Performance Compilers of Parallel Computing, Addison Wesley, 1995 S. Muchnick, Advanced Compiler Design and Implementation, Morgan Kaufman, 1997 Section 6.4 (pp. 476) of Patterson-Hennessy has a brief discussion of data dependences.

CMPUT Computer Organization and Architecture I3 Analysing Code Given the code for Panic in the previous slide. What is the best way to analyse it? Compilers use the notion of a basic block. A basic block is a sequence of instructions with the following property: Whenever one instruction of the basic block is executed, all the instructions in the basic block must be executed. I.e., only the first instruction of a basic block can be the target of a jump or branch, and only the last instruction in a basic block can be a jump or a branch.

CMPUT Computer Organization and Architecture I4 Finding Basic Blocks Given a sequence of assembly code, compilers can find the leaders of basic blocks using a very simple set of rules: The first instruction of a basic block is the leader of the basic block. (i) The first instruction in the program is a leader. (ii) Any statement that is the target of a branch statement is a leader (in general these instructions have an associated label). (iii) Any instruction that immediately follows a branch or return instruction is a leader.

CMPUT Computer Organization and Architecture I5 Identify the leaders Exception Handler: DisplayData= 0xbfff0008 DisplayStatus = 0xbfff000c.kdata Pmess:.asciiz“Panic:“.ktext# Panic prints a message and quits Panic:la$a1, Pmess PRead1:lb$a2, ($a1)# read letter to print beq$a2, $zero, PRead2# done when we find a null PWait1:lw$a3, DisplayStatus# Read the status of the display bge$a3, $zero, PWait1# keep reading until it is ready sw$a2, DisplayData# output character addi$a1, $a1, 1# advance character jPRead1 PRead2:lb$a2, ($a0)# Print message pointed by $a0 beq$a2, $zero, Pcontinue# done when we find a null PWait2:lw$a3, DisplayStatus# Read the status of the display bge$a3, $zero, PWait2# keep reading until it is ready sw$a2, DisplayData# output character addi$a0, $a0, 1# advance character jPRead2 Pcontinue:li$v0, 0# clear re-entrance flag sw$v0, flag li$v0, 13# the quit_now syscall syscall

CMPUT Computer Organization and Architecture I6 Identify the leaders Exception Handler: DisplayData= 0xbfff0008 DisplayStatus = 0xbfff000c.kdata Pmess:.asciiz“Panic:“.ktext# Panic prints a message and quits Panic:la$a1, Pmess PRead1:lb$a2, ($a1)# read letter to print beq$a2, $zero, PRead2# done when we find a null PWait1:lw$a3, DisplayStatus# Read the status of the display bge$a3, $zero, PWait1# keep reading until it is ready sw$a2, DisplayData# output character addi$a1, $a1, 1# advance character jPRead1 PRead2:lb$a2, ($a0)# Print message pointed by $a0 beq$a2, $zero, Pcontinue# done when we find a null PWait2:lw$a3, DisplayStatus# Read the status of the display bge$a3, $zero, PWait2# keep reading until it is ready sw$a2, DisplayData# output character addi$a0, $a0, 1# advance character jPRead2 Pcontinue:li$v0, 0# clear re-entrance flag sw$v0, flag li$v0, 13# the quit_now syscall syscall

CMPUT Computer Organization and Architecture I7 Identify the leaders Exception Handler: DisplayData= 0xbfff0008 DisplayStatus = 0xbfff000c.kdata Pmess:.asciiz“Panic:“.ktext# Panic prints a message and quits Panic:la$a1, Pmess PRead1:lb$a2, ($a1)# read letter to print beq$a2, $zero, PRead2# done when we find a null PWait1:lw$a3, DisplayStatus# Read the status of the display bge$a3, $zero, PWait1# keep reading until it is ready sw$a2, DisplayData# output character addi$a1, $a1, 1# advance character jPRead1 PRead2:lb$a2, ($a0)# Print message pointed by $a0 beq$a2, $zero, Pcontinue# done when we find a null PWait2:lw$a3, DisplayStatus# Read the status of the display bge$a3, $zero, PWait2# keep reading until it is ready sw$a2, DisplayData# output character addi$a0, $a0, 1# advance character jPRead2 Pcontinue:li$v0, 0# clear re-entrance flag sw$v0, flag li$v0, 13# the quit_now syscall syscall

CMPUT Computer Organization and Architecture I8 Identify the leaders Exception Handler: DisplayData= 0xbfff0008 DisplayStatus = 0xbfff000c.kdata Pmess:.asciiz“Panic:“.ktext# Panic prints a message and quits Panic:la$a1, Pmess PRead1:lb$a2, ($a1)# read letter to print beq$a2, $zero, PRead2# done when we find a null PWait1:lw$a3, DisplayStatus# Read the status of the display bge$a3, $zero, PWait1# keep reading until it is ready sw$a2, DisplayData# output character addi$a1, $a1, 1# advance character jPRead1 PRead2:lb$a2, ($a0)# Print message pointed by $a0 beq$a2, $zero, Pcontinue# done when we find a null PWait2:lw$a3, DisplayStatus# Read the status of the display bge$a3, $zero, PWait2# keep reading until it is ready sw$a2, DisplayData# output character addi$a0, $a0, 1# advance character jPRead2 Pcontinue:li$v0, 0# clear re-entrance flag sw$v0, flag li$v0, 13# the quit_now syscall syscall

CMPUT Computer Organization and Architecture I9 Identify the leaders Exception Handler: DisplayData= 0xbfff0008 DisplayStatus = 0xbfff000c.kdata Pmess:.asciiz“Panic:“.ktext# Panic prints a message and quits Panic:la$a1, Pmess PRead1:lb$a2, ($a1)# read letter to print beq$a2, $zero, PRead2# done when we find a null PWait1:lw$a3, DisplayStatus# Read the status of the display bge$a3, $zero, PWait1# keep reading until it is ready sw$a2, DisplayData# output character addi$a1, $a1, 1# advance character jPRead1 PRead2:lb$a2, ($a0)# Print message pointed by $a0 beq$a2, $zero, Pcontinue# done when we find a null PWait2:lw$a3, DisplayStatus# Read the status of the display bge$a3, $zero, PWait2# keep reading until it is ready sw$a2, DisplayData# output character addi$a0, $a0, 1# advance character jPRead2 Pcontinue:li$v0, 0# clear re-entrance flag sw$v0, flag li$v0, 13# the quit_now syscall syscall

CMPUT Computer Organization and Architecture I10 Identify the leaders Exception Handler: DisplayData= 0xbfff0008 DisplayStatus = 0xbfff000c.kdata Pmess:.asciiz“Panic:“.ktext# Panic prints a message and quits Panic:la$a1, Pmess PRead1:lb$a2, ($a1)# read letter to print beq$a2, $zero, PRead2# done when we find a null PWait1:lw$a3, DisplayStatus# Read the status of the display bge$a3, $zero, PWait1# keep reading until it is ready sw$a2, DisplayData# output character addi$a1, $a1, 1# advance character jPRead1 PRead2:lb$a2, ($a0)# Print message pointed by $a0 beq$a2, $zero, Pcontinue# done when we find a null PWait2:lw$a3, DisplayStatus# Read the status of the display bge$a3, $zero, PWait2# keep reading until it is ready sw$a2, DisplayData# output character addi$a0, $a0, 1# advance character jPRead2 Pcontinue:li$v0, 0# clear re-entrance flag sw$v0, flag li$v0, 13# the quit_now syscall syscall

CMPUT Computer Organization and Architecture I11 Identify the leaders Exception Handler: DisplayData= 0xbfff0008 DisplayStatus = 0xbfff000c.kdata Pmess:.asciiz“Panic:“.ktext# Panic prints a message and quits Panic:la$a1, Pmess PRead1:lb$a2, ($a1)# read letter to print beq$a2, $zero, PRead2# done when we find a null PWait1:lw$a3, DisplayStatus# Read the status of the display bge$a3, $zero, PWait1# keep reading until it is ready sw$a2, DisplayData# output character addi$a1, $a1, 1# advance character jPRead1 PRead2:lb$a2, ($a0)# Print message pointed by $a0 beq$a2, $zero, Pcontinue# done when we find a null PWait2:lw$a3, DisplayStatus# Read the status of the display bge$a3, $zero, PWait2# keep reading until it is ready sw$a2, DisplayData# output character addi$a0, $a0, 1# advance character jPRead2 Pcontinue:li$v0, 0# clear re-entrance flag sw$v0, flag li$v0, 13# the quit_now syscall syscall

CMPUT Computer Organization and Architecture I12 Identify the leaders Exception Handler: DisplayData= 0xbfff0008 DisplayStatus = 0xbfff000c.kdata Pmess:.asciiz“Panic:“.ktext# Panic prints a message and quits Panic:la$a1, Pmess PRead1:lb$a2, ($a1)# read letter to print beq$a2, $zero, PRead2# done when we find a null PWait1:lw$a3, DisplayStatus# Read the status of the display bge$a3, $zero, PWait1# keep reading until it is ready sw$a2, DisplayData# output character addi$a1, $a1, 1# advance character jPRead1 PRead2:lb$a2, ($a0)# Print message pointed by $a0 beq$a2, $zero, Pcontinue# done when we find a null PWait2:lw$a3, DisplayStatus# Read the status of the display bge$a3, $zero, PWait2# keep reading until it is ready sw$a2, DisplayData# output character addi$a0, $a0, 1# advance character jPRead2 Pcontinue:li$v0, 0# clear re-entrance flag sw$v0, flag li$v0, 13# the quit_now syscall syscall

CMPUT Computer Organization and Architecture I13 Identify the leaders Exception Handler: DisplayData= 0xbfff0008 DisplayStatus = 0xbfff000c.kdata Pmess:.asciiz“Panic:“.ktext# Panic prints a message and quits Panic:la$a1, Pmess PRead1:lb$a2, ($a1)# read letter to print beq$a2, $zero, PRead2# done when we find a null PWait1:lw$a3, DisplayStatus# Read the status of the display bge$a3, $zero, PWait1# keep reading until it is ready sw$a2, DisplayData# output character addi$a1, $a1, 1# advance character jPRead1 PRead2:lb$a2, ($a0)# Print message pointed by $a0 beq$a2, $zero, Pcontinue# done when we find a null PWait2:lw$a3, DisplayStatus# Read the status of the display bge$a3, $zero, PWait2# keep reading until it is ready sw$a2, DisplayData# output character addi$a0, $a0, 1# advance character jPRead2 Pcontinue:li$v0, 0# clear re-entrance flag sw$v0, flag li$v0, 13# the quit_now syscall syscall

CMPUT Computer Organization and Architecture I14 Basic Block Formation Rule Once we know the leaders, the basic block formation follows a simple rule: A basic block is formed by a leader and all the instructions that come after the leader up to but not including the next leader.

Panic:la$a1, Pmess B1 PRead1:lb$a2, ($a1) beq$a2, $zero, PRead2 B2 PWait1:lw$a3, DisplayStatus bge$a3, $zero, PWait1 B3 sw$a2, DisplayData addi$a1, $a1, 1 jPRead1 B4 PRead2:lb$a2, ($a0) beq$a2, $zero, Pcontinue B5 PWait2:lw$a3, DisplayStatus bge$a3, $zero, PWait2 B6 sw$a2, DisplayData addi$a0, $a0, 1 jPRead2 B7 Pcontinue:li$v0, 0 sw$v0, flag li$v0, 13 syscall B8 Now that we have the basic blocks, we can connect them using two simple rules: (1) connect B i to B j if there is a branch or jump from the last instruction of B i to the first instruction of B j. (2) connect B i to B j if both: (i) B j immediately follows B i, and (ii) B i does not end with an unconditional jump

Panic:la$a1, Pmess B1 PRead1:lb$a2, ($a1) beq$a2, $zero, PRead2 B2 PWait1:lw$a3, DisplayStatus bge$a3, $zero, PWait1 B3 sw$a2, DisplayData addi$a1, $a1, 1 jPRead1 B4 PRead2:lb$a2, ($a0) beq$a2, $zero, Pcontinue B5 PWait2:lw$a3, DisplayStatus bge$a3, $zero, Wait2 B6 sw$a2, DisplayData addi$a0, $a0, 1 jPRead2 B7 Pcontinue:li$v0, 0 sw$v0, flag li$v0, 13 syscall B8 B1 B2 B3 B5 B4 B6 B7 B8 The graph that connects the basic blocks in this way is called the Control Flow Graph for the program.

CMPUT Computer Organization and Architecture I17 Apply the basic block formation algorithm that you just learned to the matrix multiplication code from Topic 7. MIPS assembly: li$t1, 32 # t1  32 li$s0, 0 # i  0 L1:li$s1, 0 # j  0 L2:mtc1$zero, $f4 mtc1$zero, $f5 li$s2, 0 # k  0 L3:sll$t2, $s0, 5 # $t2  32  i addu$t2, $t2, $s2# $t2  32  i + k sll$t2, $t2, 3# $t2  (32  i + k)  8 addu$t2, $a1, $t2# $t2  Addr(y[i][k]) l.d$f16, 0($t2)# $f16  y[i][k] sll$t2, $s2, 5# $t2  32  k addu$t2, $t2, $s1# $t2  32  i + j sll$t2, $t2, 3# $t2  (32  k + j)  8 addu$t2, $a2, $t2# $t2  Addr(z[k][j]) l.d$f18, 0($t2)# $f16  z[k][j] mul.d$f16, $f18, $f16# $f16  y[i][k]  z[k][j] add.d$f4, $f4, $f16 addiu$s2, $s2, 1 # k  k+1 bne$s2, $t1, L3 sll$t2, $s0, 5# $t2  32  i addu$t2, $t2, $s1# $t2  32  i + j sll$t2, $t2, 3# $t2  (32  i + j)  8 addu$t2, $a0, $t2# $t2  Addr(x[i][j]) swc1$f4, 0($t2)# x[i][j]  $f4 swc1$f5, 4($t2) addiu$s1, $s1, 1 # j  j+1 bne$s1, $t1, L2 addiu$s0, $s0, 1 # i  i+1 bne$s0, $t1, L1

CMPUT Computer Organization and Architecture I18 MIPS assembly: li$t1, 32 # t1  32 li$s0, 0 # i  0 L1:li$s1, 0 # j  0 L2:mtc1$zero, $f4 mtc1$zero, $f5 li$s2, 0 # k  0 L3:sll$t2, $s0, 5 # $t2  32  i addu$t2, $t2, $s2# $t2  32  i + k sll$t2, $t2, 3# $t2  (32  i + k)  8 addu$t2, $a1, $t2# $t2  Addr(y[i][k]) l.d$f16, 0($t2)# $f16  y[i][k] sll$t2, $s2, 5# $t2  32  k addu$t2, $t2, $s1# $t2  32  i + j sll$t2, $t2, 3# $t2  (32  k + j)  8 addu$t2, $a2, $t2# $t2  Addr(z[k][j]) l.d$f18, 0($t2)# $f16  z[k][j] mul.d$f16, $f18, $f16# $f16  y[i][k]  z[k][j] add.d$f4, $f4, $f16 addiu$s2, $s2, 1 # k  k+1 bne$s2, $t1, L3 sll$t2, $s0, 5# $t2  32  i addu$t2, $t2, $s1# $t2  32  i + j sll$t2, $t2, 3# $t2  (32  i + j)  8 addu$t2, $a0, $t2# $t2  Addr(x[i][j]) swc1$f4, 0($t2)# x[i][j]  $f4 swc1$f5, 4($t2) addiu$s1, $s1, 1 # j  j+1 bne$s1, $t1, L2 addiu$s0, $s0, 1 # i  i+1 bne$s0, $t1, L1 Apply the basic block formation algorithm that you just learned to the matrix multiplication code from Topic 7. B0 B2 B3 B4 B5 B1

CMPUT Computer Organization and Architecture I19 MIPS assembly: li$t1, 32 # t1  32 li$s0, 0 # i  0 L1:li$s1, 0 # j  0 L2:mtc1$zero, $f4 mtc1$zero, $f5 li$s2, 0 # k  0 L3:sll$t2, $s0, 5 # $t2  32  i addu$t2, $t2, $s2# $t2  32  i + k sll$t2, $t2, 3# $t2  (32  i + k)  8 addu$t2, $a1, $t2# $t2  Addr(y[i][k]) l.d$f16, 0($t2)# $f16  y[i][k] sll$t2, $s2, 5# $t2  32  k addu$t2, $t2, $s1# $t2  32  i + j sll$t2, $t2, 3# $t2  (32  k + j)  8 addu$t2, $a2, $t2# $t2  Addr(z[k][j]) l.d$f18, 0($t2)# $f16  z[k][j] mul.d$f16, $f18, $f16# $f16  y[i][k]  z[k][j] add.d$f4, $f4, $f16 addiu$s2, $s2, 1 # k  k+1 bne$s2, $t1, L3 sll$t2, $s0, 5# $t2  32  i addu$t2, $t2, $s1# $t2  32  i + j sll$t2, $t2, 3# $t2  (32  i + j)  8 addu$t2, $a0, $t2# $t2  Addr(x[i][j]) swc1$f4, 0($t2)# x[i][j]  $f4 swc1$f5, 4($t2) addiu$s1, $s1, 1 # j  j+1 bne$s1, $t1, L2 addiu$s0, $s0, 1 # i  i+1 bne$s0, $t1, L1 Build a Control Flow Graph for this code. B0 B1 B2B3B4 B0 B2 B3 B4 B5 B1 B5

CMPUT Computer Organization and Architecture I20 Superscalar Pipelined Machines COPYRIGHT 1998 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED

CMPUT Computer Organization and Architecture I21 Data Dependencies We say that there is a data dependence between two instructions if is is not possible to invert the order of execution of the two instructions without producing wrong results. l.d$f18, 0($t2)# $f16  z[k][j] mul.d$f16, $f18, $f16# $f16  y[i][k]  z[k][j] When the first instruction computes a value that the second instruction uses, we say that there is a flow dependence from the first to the second instruction. For instance, in the sequence of instructions above, the value of $f18 is computed by the load and used by the multiply instruction. Therefore there is a flow dependence from the load to the store.

CMPUT Computer Organization and Architecture I22 Data Flow Graphs Compilers often have to take into consideration the dependences between instructions. In its data flow analysis, compilers typically build a data flow graph also called a data dependence graph for each basic block of the program. A data flow graph is a directed graph with one node for each instruction in the basic block. An edge (v i, v j ) in the data flow graph indicates that there is a flow dependence from v i to v j.

CMPUT Computer Organization and Architecture I23 Build a Data Dependence Graph for Basic Block B3 MIPS assembly: ali$t1, 32 # t1  32 bli$s0, 0 # i  0 L1: cli$s1, 0 # j  0 L2: dmtc1$zero, $f4 emtc1$zero, $f5 fli$s2, 0 # k  0 L3: gsll$t2, $s0, 5 # $t2  32  i haddu$t2, $t2, $s2# $t2  32  i + k isll$t2, $t2, 3# $t2  (32  i + k)  8 jaddu$t2, $a1, $t2# $t2  Addr(y[i][k]) k l.d$f16, 0($t2)# $f16  y[i][k] lsll$t2, $s2, 5# $t2  32  k m addu$t2, $t2, $s1# $t2  32  i + j nsll$t2, $t2, 3# $t2  (32  k + j)  8 oaddu$t2, $a2, $t2# $t2  Addr(z[k][j]) pl.d$f18, 0($t2)# $f16  z[k][j] qmul.d$f16, $f18, $f16# $f16  y[i][k]  z[k][j] radd.d$f4, $f4, $f16 saddiu$s2, $s2, 1 # k  k+1 tbne$s2, $t1, L3 usll$t2, $s0, 5# $t2  32  i vaddu$t2, $t2, $s1# $t2  32  i + j wsll$t2, $t2, 3# $t2  (32  i + j)  8 xaddu$t2, $a0, $t2# $t2  Addr(x[i][j]) yswc1$f4, 0($t2)# x[i][j]  $f4 zswc1$f5, 4($t2) a1addiu$s1, $s1, 1 # j  j+1 b1bne$s1, $t1, L2 c1addiu$s0, $s0, 1 # i  i+1 c2bne$s0, $t1, L1 B0 B2 B3 B4 B5 B1

Build a Data Dependence Graph for Basic Block B3 s0 g h s2 i j a1 k l m s1 n o a2 MIPS assembly: ali$t1, 32 # t1  32 bli$s0, 0 # i  0 L1: cli$s1, 0 # j  0 L2: dmtc1$zero, $f4 emtc1$zero, $f5 fli$s2, 0 # k  0 L3: gsll$t2, $s0, 5 # $t2  32  i haddu$t2, $t2, $s2# $t2  32  i + k isll$t2, $t2, 3# $t2  (32  i + k)  8 jaddu$t2, $a1, $t2# $t2  Addr(y[i][k]) k l.d$f16, 0($t2)# $f16  y[i][k] lsll$t2, $s2, 5# $t2  32  k m addu$t2, $t2, $s1# $t2  32  i + j nsll$t2, $t2, 3# $t2  (32  k + j)  8 oaddu$t2, $a2, $t2# $t2  Addr(z[k][j]) pl.d$f18, 0($t2)# $f16  z[k][j] qmul.d$f16, $f18, $f16# $f16  y[i][k]  z[k][j] radd.d$f4, $f4, $f16 saddiu$s2, $s2, 1 # k  k+1 tbne$s2, $t1, L3 usll$t2, $s0, 5# $t2  32  i vaddu$t2, $t2, $s1# $t2  32  i + j wsll$t2, $t2, 3# $t2  (32  i + j)  8 xaddu$t2, $a0, $t2# $t2  Addr(x[i][j]) yswc1$f4, 0($t2)# x[i][j]  $f4 zswc1$f5, 4($t2) a1addiu$s1, $s1, 1 # j  j+1 b1bne$s1, $t1, L2 c1addiu$s0, $s0, 1 # i  i+1 c2bne$s0, $t1, L1 B0 B2 B3 B4 B5 B1 p q r f4 s t1 t s2 f4 Analysing the DDG on the left, it seems that instructions g-h-i-j-k could be executed at the same time as l-m-n-o-p. But in the assembly code above, it seems that there is a conflict with the use of register $t2. How compilers deal with a situation like this?

Using pseudo-registers. MIPS assembly: ali$t1, 32 # t1  32 bli$s0, 0 # i  0 L1: cli$s1, 0 # j  0 L2: dmtc1$zero, $f4 emtc1$zero, $f5 fli$s2, 0 # k  0 L3: gsll$p2, $s0, 5 haddu$p3, $p2, $s2 isll$p4, $p3, 3 jaddu$p5, $a1, $p4 k l.d$f16, 0($p5) lsll$p6, $s2, 5 m addu$p7, $p6, $s1 nsll$p8, $p7, 3 oaddu$p9, $a2, $p8 pl.d$f18, 0($p9) qmul.d$pf17, $f18, $f16 radd.d$f4, $f4, $pf17 saddiu$p10, $s2, 1 tbne$p10, $t1, L3 usll$t2, $s0, 5# $t2  32  i vaddu$t2, $t2, $s1# $t2  32  i + j wsll$t2, $t2, 3# $t2  (32  i + j)  8 xaddu$t2, $a0, $t2# $t2  Addr(x[i][j]) yswc1$f4, 0($t2)# x[i][j]  $f4 zswc1$f5, 4($t2) a1addiu$s1, $s1, 1 # j  j+1 b1bne$s1, $t1, L2 c1addiu$s0, $s0, 1 # i  i+1 c2bne$s0, $t1, L1 B0 B2 B3 B4 B5 B1 s0 g h s2 i j a1 k l m s1 n o a2 p q r f4 s t1 t s2f4 Compilers rename the registers, generating a code with pseudo-registers. For the first code generation, they assume that there is an ilimited number of pseudo-registers.

CMPUT Computer Organization and Architecture I26 Value of Flow Analysis Although flow analysis was developed for code analysis during compilation, it is of great value while coding and debugging programs. Often, the analysis of the control flow and data flow in a program will elicit subtle bugs that might be otherwise difficult to uncover. Control flow analysis is specially helpful for the analysis of assembly code in which the control structure of the code is not as evident as in higher level languages.

CMPUT Computer Organization and Architecture I27 Definition: Let G = (N, E, s, f) denote a flowgraph, where: N: set of vertices E: set of edges s: starting node. f: sink node and let a  N, b  N. Domination Relation 1. a dominates b, if every path from s to b contains a. 2. b post-dominates a, if every path from a to f contains b.

CMPUT Computer Organization and Architecture I S Domination relation: { (1, 1), (1, 2), (1, 3), (1,4) … (2, 3), (2, 4), … (2, 10) } Dominator Sets: DOM(1) = {1} DOM(2) = {1, 2} DOM(3) = {1, 2, 3} DOM(10) = {1, 2, 10) An Example

CMPUT Computer Organization and Architecture I29 Dominance Intuition S Imagine a source of light at the start node, and that the edges are optical fibers To find which nodes are dominated by a given node, place an opaque barrier at that node and observe which nodes became dark.

CMPUT Computer Organization and Architecture I30 Dominance Intuition S The start node dominates all nodes in the flowgraph.

CMPUT Computer Organization and Architecture I31 Dominance Intuition S Which nodes are dominated by node 3?

CMPUT Computer Organization and Architecture I32 Dominance Intuition S Node 3 dominates nodes 3, 4, 5, 6, 7, 8, and 9. Which nodes are dominated by node 3?

CMPUT Computer Organization and Architecture I33 Dominance Intuition S Which nodes are dominated by node 7? Node 7 only dominates itself.

CMPUT Computer Organization and Architecture I34 Live Values When allocating registers for a basic block, a compiler needs to compute which values are live at the entrance of a basic block. We say that a value is live at any point of a program if there is a possibility that the value will be used in the program. lw$t8, 0($t6) addu$t6, $t6, 12 sw$t8, 0($t9) lw$t7, -8($t6) addu$t9, $t9, 12 sw$t7, -8($t9) lw$t8, -4($t6) sw$t8, -4($t9) Which registers contain a live value in the following basic block? Registers $t6 and $t9 contain live values.