Announcements MP 3 CS296 (Chase Geigle geigle1@illinois.edu)

Slides:

Advertisements

Similar presentations

P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.

Advertisements

Review of the MIPS Instruction Set Architecture. RISC Instruction Set Basics All operations on data apply to data in registers and typically change the.

Chapter 3 Instruction Set Architecture Advanced Computer Architecture COE 501.

1/1/ / faculty of Electrical Engineering eindhoven university of technology Introduction Part 2: Data types and addressing modes dr.ir. A.C. Verschueren.

1 Lecture 3: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation.

10/9: Lecture Topics Starting a Program Exercise 3.2 from H+P Review of Assembly Language RISC vs. CISC.

Microprocessors General Features To be Examined For Each Chip Jan 24 th, 2002.

1 Lecture 9: Floating Point Today’s topics:  Division  IEEE 754 representations  FP arithmetic Reminder: assignment 4 will be posted later today.

COMP3221: Microprocessors and Embedded Systems Lecture 2: Instruction Set Architecture (ISA) Lecturer: Hui Wu Session.

1 Lecture 5: MIPS Examples Today’s topics:  the compilation process  full example – sort in C Reminder: 2 nd assignment will be posted later today.

Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.

Linked Lists in MIPS Let’s see how singly linked lists are implemented in MIPS on MP2, we have a special type of doubly linked list Each node consists.

IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.

Data Representation By- Mr. S. S. Hire. Data Representation.

Dr Mohamed Menacer College of Computer Science and Engineering Taibah University CS-334: Computer.

Computer architecture Lecture 11: Reduced Instruction Set Computers Piotr Bilski.

Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.

Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.

RISC and CISC. What is CISC? CISC is an acronym for Complex Instruction Set Computer and are chips that are easy to program and which make efficient use.

ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.

Csci 136 Computer Architecture II – Summary of MIPS ISA Xiuzhen Cheng

Chapter 10 Instruction Sets: Characteristics and Functions Felipe Navarro Luis Gomez Collin Brown.

ISA's, Compilers, and Assembly

1 Lecture 6: Assembly Programs Today’s topics:  Large constants  The compilation process  A full example  Intro to the MARS simulator.

Addressing modes, memory architecture, interrupt and exception handling, and external I/O. An ISA includes a specification of the set of opcodes (machine.

Answer CHAPTER FOUR.

William Stallings Computer Organization and Architecture 8th Edition

Lecture 3 Translation.

Computer Architecture & Operations I

Computer Architecture & Operations I

Topics to be covered Instruction Execution Characteristics

Instruction Set Architectures

How objects are located in memory

Lecture 6: Assembly Programs

Visit for more Learning Resources

A Closer Look at Instruction Set Architectures

Microprocessor Systems Design I

The compilation process

ISA's, Compilers, and Assembly

CS 232: Computer Architecture II

Microprocessor Systems Design I

Computer Organization & Assembly Language Chapter 3

A Closer Look at Instruction Set Architectures

CISC (Complex Instruction Set Computer)

Chapter 6 Floating Point

Computer Architecture

Lecture 4: MIPS Instruction Set

Central Processing Unit

EE 445S Real-Time Digital Signal Processing Lab Spring 2014

(Part 3-Floating Point Arithmetic)

Arithmetic Logical Unit

CS 105 “Tour of the Black Holes of Computing!”

The University of Adelaide, School of Computer Science

CS 105 “Tour of the Black Holes of Computing!”

Chapter 9 Instruction Sets: Characteristics and Functions

A Closer Look at Instruction Set Architectures Chapter 5

Computer Instructions

Introduction to Microprocessor Programming

Instruction Set Principles

Instructions in Machine Language

Chapter 12 Pipelining and RISC

Review In last lecture, done with unsigned and signed number representation. Introduced how to represent real numbers in float format.

CPU Structure CPU must:

CS 105 “Tour of the Black Holes of Computing!”

Lecture 4: Instruction Set Design/Pipelining

Computer Organization and Assembly Language

Chapter 10 Instruction Sets: Characteristics and Functions

Presentation transcript:

Announcements MP 3 CS296 (Chase Geigle geigle1@illinois.edu)

Floating Point Numbers How can we represent 3.14 ? What’s wrong with: (int_part, frac_part) 3.14 and 3.014 have the same representation! The leading-zeroes problem can be solved if numbers are normalized write the number in the form d.f  10e , d is a single non-zero digit normalized(3.14) = 3.14  100, normalized(0.314) = 3.14  101 In binary, the “d” part will always be 1 (zero is a special case) this implicit 1 can be ignored Ideal representation scheme has these features: can represent positive and negative, low and high magnitude it is easy to compare two numbers it is easy to do basic math 2

IEEE 754 standard Format for single-precision (32-bit) and double-precision (64-bit) reals The normalized (non-zero) binary number  1.f  2e is stored as Comparison of floats almost identical to comparison of ints! MIPS has separate floating point registers and instructions 23-bit fraction f 8-bit exponent e excess-127 notation 1 sign bit 1 = negative 0 = positive single precision float 52-bit fraction f 11-bit exponent e excess-1023 notation 1 sign bit 1 = negative 0 = positive double precision double 3

Instruction Set Architecture (ISA) The ISA is an abstraction layer between hardware and software Software doesn’t need to know how the processor is implemented Processors that implement the same ISA appears equivalent An ISA enables processor innovation without changing software This is how Intel has made billions of dollars Before ISAs, software was re-written/re-compiled for each new machine Software Proc #1 ISA Proc #2 4

ISA history: RISC vs. CISC 1964: IBM System/360, the first computer family IBM wanted to sell a range of machines that ran the same software 1960’s, 1970’s: Complex Instruction Set Computer (CISC) era Much assembly programming, compiler technology immature Hard to optimize, guarantee correctness, teach 1980’s: Reduced Instruction Set Computer (RISC) era Most programming in high-level languages, mature compilers Simpler, cleaner ISA’s facilitated pipelining, high clock frequencies 1990’s: Post-RISC era ISA compatibility outweighs any RISC advantage in general purpose CISC and RISC chips use same techniques (pipelining, superscalar, ..) Embedded processors prefer RISC for lower power, cost 2000’s: Multi-core era 5

Comparing x86 and MIPS x86 is a typical CISC ISA, MIPS is a typical RISC ISA Much more is similar than different: Both use registers and have byte-addressable memories Same basic types of instructions (arithmetic, branches, memory) A few of the differences: Fewer registers: 8 (vs. 32 for MIPS) 2-register instruction formats (vs. 3-register format for MIPS) Additional, complex addressing modes Variable-length instruction encoding (vs. fixed 32-bit length for MIPS) 6

Why did Intel win? x86 won because it was the first 16-bit chip by two years IBM put it in PCs because there was no competing choice Rest is inertia and “financial feedback” x86 is most difficult ISA to implement for high performance, but Because Intel sells the most processors ... It has the most money ... Which it uses to hire more and better engineers ... Which is uses to maintain competitive performance ... And given equal performance, compatibility wins ... So Intel sells the most processors! 7

The compilation process To produce assembly code: gcc –S test.c produces test.s To produce object code: gcc –c test.c produces test.o To produce executable code: gcc test.c produces a.out

The purpose of a linker The linker is a program that takes one or more object files and assembles them into a single executable program. The linker resolves references to undefined symbols by finding out which other object defines the symbol in question, and replaces placeholders with the symbol's address.

Loader Before we can start executing a program, the O/S must load it: Loading involves 5 steps: Allocates memory for the program's execution Copies the text and data segments from the executable into memory Copies program arguments (command line arguments) onto the stack Initializes registers: sets $sp to point to top of stack, clears the rest Jumps to start routine, which: 1) copies main's arguments off of the stack, and 2) jumps to main.

Compiler Purpose: convert high-level code into low-level assembly Four key steps: lexing  parsing  code optimizations  code generation Code generation: instruction selection (depends on ISA; easier for RISC or CISC?) instruction scheduling (later in the course, CS 433) register allocation (today’s topic)  CS 421, 426 

Register allocation The compiler initially produces “intermediate” code that assumes an infinite number of registers $t0, $t1, … and maps each variable to a unique register To get actual code, variables must share registers Suppose there are only 3 real registers $t1, $t2, $t3 An easy case: every scope defines at most three variables But scope is an over-estimate, as in this example: // a, b, c defined in this scope for(int i = 0; i < a; i += b) // stuff c = 0; c is not live here live  in scope, converse not true

Live variable analysis A variable x is live at a point p in the code if: x is defined at point d “” p x is read at a point r “” p x is not redefined between p and r Intuitively, x holds a value that may be needed in the future Liveness computed at compile time may have to over-estimate liveness (for correctness) If at some point p, number_of_live_variables(p)  number_of_registers, we obviously have to spill some variables to memory Is the converse true?

Example Consider the following code snippet: a = 0; b = a + 1; c = b + 1; a = c + 1; We define a graph G where vertices are variables edge between two variables if their live regions overlap We want to assign variables to registers so that two variables that share an edge are not assigned the same register a b c a  $t1 j loop loop: b  $t2 c  $t1 Graph coloring At most two live variables Suppose we have two registers $t1, $t2

Graph Coloring Color the vertices of a graph with k colors so that no two neighboring vertices get the same color A tree is always 2-colorable A map is always 4-colorable There isn’t an efficient way to decide if a graph is 3-colorable unless “P = NP” (the biggest open problem in CS!) Fortunately, there are some efficient heuristics that can produce a near-optimal coloring

Which variables should be spilled? Register Allocation Problem asks two key questions: Are we forced to spill registers? Yes iff min_colors(graph)  number_of_registers If so, which registers should we spill? Also hard to compute optimum Before good heuristics for graph coloring, register allocation was hard for RISC architectures with many registers Now we have good graph coloring heuristics, so the focus shifts to the second problem much more critical with CISC architectures with few registers