Chapter 2 SPARC Architecture Chinua Umoja 1.  SPARC is a load/store architecture  Registers used for all arithmetic and logical operations  32 registers.

Slides:



Advertisements
Similar presentations
1 Lecture 3: MIPS Instruction Set Today’s topic:  More MIPS instructions  Procedure call/return Reminder: Assignment 1 is on the class web-page (due.
Advertisements

Lecture 6 Programming the TMS320C6x Family of DSPs.
Data Dependencies Describes the normal situation that the data that instructions use depend upon the data created by other instructions, or data is stored.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
Machine Instructions Operations 1 ITCS 3181 Logic and Computer Systems 2015 B. Wilkinson Slides4-1.ppt Modification date: March 18, 2015.
SPARC Architecture & Assembly Language
Computer Organization and Architecture
CSC 3210 Computer Organization and Programming Introduction and Overview Dr. Anu Bourgeois.
The Little man computer
The CPU Revision Typical machine code instructions Using op-codes and operands Symbolic addressing. Conditional and unconditional branches.
CSCE 121, Sec 200, 507, 508 Fall 2010 Prof. Jennifer L. Welch.
Lecture 5 Sept 14 Goals: Chapter 2 continued MIPS assembly language instruction formats translating c into MIPS - examples.
CPSC355 Week 3 Hoang Dang. Week 3 Assignment 1 Do/while loop Assignment 2 Binary Bitwise operation.
Sparc Architecture Overview
8051 ASSEMBLY LANGUAGE PROGRAMMING
Data Transfer & Decisions I (1) Fall 2005 Lecture 3: MIPS Assembly language Decisions I.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
Computer Science 210 Computer Organization The Instruction Execution Cycle.
Chapter 5 The LC-3. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 5-2 Instruction Set Architecture ISA.
CSC 3210 Computer Organization and Programming Chapter 2 SPARC Architecture Dr. Anu Bourgeois 1.
Presented by: Sergio Ospina Qing Gao. Contents ♦ 12.1 Processor Organization ♦ 12.2 Register Organization ♦ 12.3 Instruction Cycle ♦ 12.4 Instruction.
CSC 3210 Computer Organization and Programming Chapter 1 THE COMPUTER D.M. Rasanjalee Himali.
CSC 3210 Computer Organization and Programming Chapter 8 MACHINE INSTRUCTIONS D.M. Rasanjalee Himali.
CSC 3210 Computer Organization and Programming Chapter 2 SPARC Architecture Dr. Anu Bourgeois 1.
Lecture 8. MIPS Instructions #3 – Branch Instructions #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System Education.
Richard P. Paul, SPARC Architecture, Assembly Language Programming, and C Chapter 7 – Subroutines These are lecture notes to accompany the book SPARC Architecture,
Natawut NupairojAssembly Language1 Arithmetic Operations.
Chapter 10 The Assembly Process. What Assemblers Do Translates assembly language into machine code. Assigns addresses to all symbolic labels (variables.
Chapter 5 The LC Instruction Set Architecture ISA = All of the programmer-visible components and operations of the computer memory organization.
Dr. José M. Reyes Álamo 1.  Review: ◦ of Comparisons ◦ of Set on Condition  Statement Labels  Unconditional Jumps  Conditional Jumps.
The LC-3. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 5-2 Instruction Set Architecture ISA = All of the.
Execution of an instruction
Natawut NupairojAssembly Language1 Control Structure.
Richard P. Paul, SPARC Architecture, Assembly Language Programming, and C Chapter 4 –Binary Arithmetic These are lecture notes to accompany the book SPARC.
by Richard P. Paul, 2nd edition, 2000.
CSC 3210 Computer Organization and Programming Chapter 2 SPARC ARCHITECTURE D.M. Rasanjalee Himali.
Computer Organization CS224 Fall 2012 Lessons 7 and 8.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.
1 The Instruction Set Architecture September 27 th, 2007 By: Corbin Johnson CS 146.
Natawut NupairojAssembly Language1 Pipelining Processor.
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO Session 11 Conditional Operations.
Computer Organization Instructions Language of The Computer (MIPS) 2.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.
SPARC Programming Model 24 “window” registers 8 global registers Control registers –Multiply step –PSR (status flags, etc.) –Trap Base register –Window.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
The Little man computer
Assembly language.
Control Unit Lecture 6.
William Stallings Computer Organization and Architecture 8th Edition
The University of Adelaide, School of Computer Science
3.Instruction Set of 8085 Consists of 74 operation codes, e.g. MOV
RISC Concepts, MIPS ISA Logic Design Tutorial 8.
Chapter 7 Subroutines Dr. A.P. Preethy
Computer Science 210 Computer Organization
Chapter 5 The LC-3.
Computer Science 210 Computer Organization
CSCE Fall 2013 Prof. Jennifer L. Welch.
CSC 3210 Computer Organization and Programming
ECE232: Hardware Organization and Design
CSCE 121: Simple Computer Model Spring 2015
Chapter 8 Central Processing Unit
by Richard P. Paul, 2nd edition, 2000.
Computer Architecture
CSCE Fall 2012 Prof. Jennifer L. Welch.
COMS 361 Computer Organization
Chapter 6 Programming the basic computer
MIPS assembly.
Computer Architecture
Presentation transcript:

Chapter 2 SPARC Architecture Chinua Umoja 1

 SPARC is a load/store architecture  Registers used for all arithmetic and logical operations  32 registers available at a time  Uses only load and store instructions to access memory 2

 Registers are accessed directly for rapid computation  32 registers – divided into 4 sets -- Global: %g0-%g7-- Out: %o0 - %o7 -- In: %i0 - %i7-- Local: %l0 - %l7  %g0 – always returns 0  %o6, %o7, %i6, %i7 – do not use  Register size = 32 bits each 3

4

 SPARC assembler as : 2-pass assembler  First pass: ◦ Updates location counter without paying attention to undefined labels for operands ◦ Defines label symbol to location counter  Second pass: ◦ Values substituted in for labels ◦ Ignores labels followed by colons 5

 Programs are line based  Use mnemonics which generate machine code upon assembling  Statements may be labeled  Comments: ! or /* … */ /* instructions to add and to subtract the contents of %o0 and %o1 */ start: add %o0, %o1, %l0 !l0=o0+o1 sub %o0, %o1, %l1 !l1=o0-o1 6

 Statements that do not generate machine code ◦ e.g. Data defininitions, statements to provide the assembler information  Generally start with a period a:.word3  Can be labeled.global main main: 7

 C compiler will call as and produce the object files  Object files are the machine code  Next calls the linker to combine.o files with library routines to produce the executable program – a.out 8

%gcc -S program.c : produces the.s assembly language file %gcc expr.s –o expr : assembles the program and produces the executable file NOTE: You will only do this for the 1 st assignment 9

 C compiler expects to start execution at an address main  The label must be at the first statement to execute and declared to be global.global main main: save %sp, -96, %sp  save instruction provides space to save registers for the debugger 10

 If we have macros defined, then the program should be a.m file  We can expand the macros to produce a.s file by running m4 first % m4 expr.m > expr.s % gcc expr.s –o expr 11

 3 operands: 2 source operands and 1 destination operand  Source registers are unchanged  Result stored in destination register  Constants : ≤ c < 4096 opreg rs1, reg rs2, reg rd opreg rs1, imm, reg rd 12

clrreg rd  Clears a register to zero movreg_or_imm, reg rd  Copies content of source to destination addreg rs1, reg_or_imm, reg rd  Adds oper1 + oper2  destination subreg rs1, reg_or_imm, reg rd  Subtracts oper1 - oper2  destination 13

 No instruction available in SPARC  Use function call instead  Must use %o0 and %o1 for sources and %o0 holds result mov b, %o0 mov b, %o0 mov c, %o1 mov c, %o1 call.mul call.div a = b * c a = b ÷ c 14

 Instruction cycle broken into 4 stages: Instruction fetch Fetch & decode instruction, obtain any operands, update PC Execute Execute arithmetic instruction, compute branch target address, compute memory address Memory access Access memory for load or store instruction; fetch instruction at target of branch instruction Store results Write instruction results back to register file 15

 SPARC is a RISC machine – want to complete one instruction per cycle  Overlap stages of different instructions to achieve parallel execution  Can obtain a speedup by a factor of 4  Hardware does not have to run 4 times faster – break h/w into 4 parts to run concurrently 16

 Wash 3 loads sequentially  Wash 3 loads in pipeline fashion 17 WDI W DIWDI WDI W DI WDI Use only one m/c at a time Use multiple m/c at a time

 Sequential: each h/w stage idle 75% of the time time ex = 4 * i  Parallel: each h/w stage working after filling the pipeline time ex = 3 + i 18 Example: for 100 instr: 400 vs 103 cycles

load [%o0], %o1 add %o1, %o2, %o2 19

 Branch target address not available until after execution of branch instruction  Insert branch delay slot instruction 20

 Try to place an instruction after the branch that is useful – can also use nop  The instruction following a branch instruction will always be fetched  Updating the PC determines which instruction to fetch next 21

cmp%l0, %l1 bgnext mov%l2, %l3 sub%l3, 20, %l4 Condition true: branch to next Condition false: continue to sub cmp bg mov ??? bg execute mov fetch 22 FE MW FE MW FE MW FE MW Determine if branch taken Update if true Target  PC Fetch instruction from memory[PC] Update PC PC++ Obtain operands

23

 After running through m4: %m4 expr.m > expr.s  Produce executable: %gcc expr.s – expr  Execute file: %./expr 24

 The call instruction is called a delayed control transfer instruction : changes address from where future instructions will be fetched  The following instruction is called a delayed instruction, and is located in the delay slot  The delayed instruction is executed before the branch/call happens  By using a nop for the delay slot – still wasting a cycle  Instead, we may be able to move the instruction prior to the branch instruction into the delay slot. 25

 Move sub instructions to the delay slots to eliminate nop instructions.global main main: save %sp, -96, %sp mov 9, %l0!initialize x sub %l0, 1, %o0!(x - 1) into %o0 call.mul sub %l0, 7, %o1!(x - 7) into %o1 call.div sub %l0, 11, %o1 !(x - 11) into %o1, the divisor mov %o0, %l1 !store it in y mov 1, %g1 !trap dispatch ta 0 !trap to system 26

 Executing the mov instruction, while fetching the sub instruction.global main main: save %sp, -96, %sp mov 9, %l0!initialize x sub %l0, 1, %o0!(x - 1) into %o0 call.mul sub %l0, 7, %o1!(x - 7) into %o1 call.div sub %l0, 11, %o1 !(x - 11) into %o1, the divisor mov %o0, %l1 !store it in y mov 1, %g1 !trap dispatch ta 0 !trap to system 27 EXECUTE  FETCH 

 Now executing the sub instruction, while fetching the call instruction.global main main: save %sp, -96, %sp mov 9, %l0!initialize x sub %l0, 1, %o0!(x - 1) into %o0 call.mul sub %l0, 7, %o1!(x - 7) into %o1 call.div sub %l0, 11, %o1 !(x - 11) into %o1, the divisor mov %o0, %l1 !store it in y mov 1, %g1 !trap dispatch ta 0 !trap to system 28 EXECUTE  FETCH 

 Now executing the call instruction, while fetching the sub instruction.global main main: save %sp, -96, %sp mov 9, %l0!initialize x sub %l0, 1, %o0!(x - 1) into %o0 call.mul sub %l0, 7, %o1!(x - 7) into %o1 call.div sub %l0, 11, %o1 !(x - 11) into %o1, the divisor mov %o0, %l1 !store it in y mov 1, %g1 !trap dispatch ta 0 !trap to system  Execution of call will update the PC to fetch from mul routine, but since sub was already fetched, it will be executed before any instruction from the mul routine 29 EXECUTE  FETCH 

 Now executing the sub instruction, while fetching from the mul routine.global main main: save %sp, -96, %sp mov 9, %l0!initialize x sub %l0, 1, %o0!(x - 1) into %o0 call.mul sub %l0, 7, %o1!(x - 7) into %o1 call.div sub %l0, 11, %o1 !(x - 11) into %o1, the divisor mov %o0, %l1 !store it in y mov 1, %g1 !trap dispatch ta 0 !trap to system …….mul: save ….. …… 30 EXECUTE  FETCH 

 Now executing the save instruction, while fetching the next instruction from the mul routine.global main main: save %sp, -96, %sp mov 9, %l0!initialize x sub %l0, 1, %o0!(x - 1) into %o0 call.mul sub %l0, 7, %o1!(x - 7) into %o1 call.div sub %l0, 11, %o1 !(x - 11) into %o1, the divisor mov %o0, %l1 !store it in y mov 1, %g1 !trap dispatch ta 0 !trap to system …….mul: save ….. …… 31 EXECUTE  FETCH 

 While executing the last instruction of the mul routine, will come back to main and fetch the call.div instruction.global main main: save %sp, -96, %sp mov 9, %l0!initialize x sub %l0, 1, %o0!(x - 1) into %o0 call.mul sub %l0, 7, %o1!(x - 7) into %o1 call.div sub %l0, 11, %o1 !(x - 11) into %o1, the divisor mov %o0, %l1 !store it in y mov 1, %g1 !trap dispatch ta 0 !trap to system …….mul: save ….. …… 32 EXECUTE  FETCH  At this point %o0 has the result from the multiply routine – this is the first operand for the divide routine The subtract instruction will compute the 2 nd operand before starting execution of the divide routine

 Delayed control transfer instructions  Branch target address not available until after execution of branch instruction  Can be conditional or unconditional  If conditional – test the condition codes (flags)  Operand of instruction is the target label b icc label 33

 Able to test information about a previous result  State of execution saved in 4 integer codes or flags: ◦ Z: whether the result was zero ◦ N: whether the result was negative ◦ V: whether execution of the result caused an overflow ◦ C: whether execution of the result generated a carry out 34

 Certain instructions can be modified to update the condition codes  Append “cc” to the instruction to save state of execution addccreg rs1, reg_or_imm, reg rd subccreg rs1, reg_or_imm, reg rd 35

36 Assembler Mnemonic Unconditional Branches ba Branch always, goto bn Branch never Assembler Mnemonic Conditional Branches (signed arithmetic) bl Branch on less than zero ble Branch on less than or equal to zero be Branch on equal to zero bne Branch on not equal to zero bge Branch on greater than or equal to zero bg Branch on greater than zero

 We can extend our program to compute y for an input range of 0 ≤ x ≤ 10  First consider a C program and then we will translate to assembly  Do-while loop used main () { int x, y; x = 0; do { y = ((x-1)*(x-7))/(x-11); x++; } while (x < 11); } 37

define (x_r, l0) define (y_r, l1).global main main: save%sp, -96, %sp clr%x_r!initialize x loop: sub%x_r, 1, %o0!(x-1) call.mul sub%x_r, 7, %o1!(x-7) call.div sub%x_r, 11, %o1!(x-11) mov%o0, %y_r!store result add%x_r, 1, %x_r!x++ subcc%x_r, 11, %g0!set condition codes blloop nop ret!ends program restore 38 no nop after the call – sub fills delay slot nop fills delay slot after the branch

define (x_r, l0) define (y_r, l1).global main main: save%sp, -96, %sp clr%x_r!initialize x loop: sub%x_r, 1, %o0!(x-1) call.mul sub%x_r, 7, %o1!(x-7) call.div sub%x_r, 11, %o1!(x-11) mov%o0, %y_r!store result add%x_r, 1, %x_r!x++ subcc%x_r, 11, %g0!set condition codes blloop nop ret!ends program restore 39 Can we use subbcc to fill the delay slot?

define (x_r, l0) define (y_r, l1).global main main: save%sp, -96, %sp clr%x_r!initialize x loop: sub%x_r, 1, %o0!(x-1) call.mul sub%x_r, 7, %o1!(x-7) call.div sub%x_r, 11, %o1!(x-11) mov%o0, %y_r!store result add%x_r, 1, %x_r!x++ subcc%x_r, 11, %g0!set condition codes blloop nop ret!ends program restore 40 Can we use add to fill the delay slot?

define (x_r, l0) define (y_r, l1).global main main: save%sp, -96, %sp clr%x_r!initialize x loop: sub%x_r, 1, %o0!(x-1) call.mul sub%x_r, 7, %o1!(x-7) call.div sub%x_r, 11, %o1!(x-11) mov%o0, %y_r!store result add%x_r, 1, %x_r!x++ subcc%x_r, 11, %g0!set condition codes blloop nop ret!ends program restore 41 Can we use mov to fill the delay slot?

define (x_r, l0) define (y_r, l1).global main main: save%sp, -96, %sp clr%x_r!initialize x loop: sub%x_r, 1, %o0!(x-1) call.mul sub%x_r, 7, %o1!(x-7) call.div sub%x_r, 11, %o1!(x-11) add%x_r, 1, %x_r!x++ subcc%x_r, 11, %g0!set condition codes blloop mov%o0, %y_r!store result ret!ends program restore 42 Revised code with nops removed

 Easier for us to read and understand logically  Assembler automatically substitutes synthetic instruction  Similar to the way macros work and are updated 43

1) cmpreg rs1, reg_or_imm same as subccreg rs1, reg_or_imm, %g0 2)increg same as addreg, 1, reg 44

Original code without synthetic instructions Code with synthetic instructions add %x_r, 1, %x_r subcc %x_r, 11, %g0 bl loop mov %o0, %y_r inc %x_r cmp %x_r, 11 bl loop mov %o0, %y_r 45 Note:.m program will include the synthetic instructions, but machine code will correspond to the substitution instructions on left hand side

 When writing assembly programs, always first write pseudo-code, then translate  Consider the following control structures first in C and then map to assembly: -- do while loop-- while loop -- for loop-- if-then / if- then-else 46

test:! target/label to branch back cmp%a_r, 17! check a relative to 17, ! and set condition codes bgdone! reverse logic of test nop!delay slot add%a_r, %b_r, %a_r inc%c_r batest! branch back to test nop! delay slot done:! whatever follows while loop 47 while ( a <= 17) { a = a + b; c++; } while ( a <= 17) { a = a + b; c++; } Now see if we can optimize the code…

 We test at start of loop to see if we should enter the body of the loop  Then we have a ba with nop at end of the loop  This brings up back to start to test again  We do have to test before entering the first time, but then we can just test at end to see if we should get back into the loop  This will take away one branch/nop in body of the loop 48 currently revision

test:!initial test cmp%a_r, 17!check a relative to 17, !and set condition codes bgdone!never enters loop if true nop!delay slot loop: add%a_r, %b_r, %a_r inc%c_r cmp%a_r, 17!check condition again bletest!branch back into loop nop!delay slot done: 49 notice it is ble not bg now

 Rather than using a nop after a conditional branch at the end of a loop, we can fill it with the first instruction of the loop  If annulled, then the delay instruction is only executed if the condition is true – i.e. if going back into the loop  If condition is false, the delay instruction is still fetched, but annulled instead of executed 50

batest !check if we should enter nop !loop the first time loop: inc%c_r test: cmp%a_r, 17 !testing condition ble,aloop !if true go back to loop & pick !up add instr on the way add%a_r, %b_r, %a_r 51

Previous codeUsing annulled branch loop: sub %x_r, 1, %o0 call.mul sub %x_r, 7, %o1 call.div sub %x_r, 11, %o1 add %x_r, 1, %x_r cmp %x_r, 11 bl loop mov %o0, %y_r sub %x_r, 1, %o0 loop: call.mul sub %x_r, 7, %o1 call.div sub %x_r, 11, %o1 mov %o0, %y_r add %x_r, 1, %x_r cmp %x_r, 11 bl,a loop sub %x_r, 1, %o0 52 Steps to modify loop: -Pull 1st instr out of loop -Repeat 1st instr in delay slot -Annul the branch -Change target to 2 nd instr Steps to modify loop: -Pull 1st instr out of loop -Repeat 1st instr in delay slot -Annul the branch -Change target to 2 nd instr

Previous codeUsing annulled branch loop: sub %x_r, 1, %o0 call.mul sub %x_r, 7, %o1 call.div sub %x_r, 11, %o1 add %x_r, 1, %x_r cmp %x_r, 11 bl loop mov %o0, %y_r sub %x_r, 1, %o0 loop: call.mul sub %x_r, 7, %o1 call.div sub %x_r, 11, %o1 mov %o0, %y_r add %x_r, 1, %x_r cmp %x_r, 11 bl,a loop sub %x_r, 1, %o0 53

 Syntax in C: for ( ex1; ex2; ex3 ) st  Define it as follows: ex1; while ( ex2 ) { st ex3; } 54 A for loop is nothing more than a while loop

ba test mov 1, %a_r !a = 1; loop: call.mul mov %c_r, %o1 mov %o0, %c_r add %a_r, 1, %a_r test: cmp %a_r, %b_r !test condition ble,a loop mov %a_r, %o0 !1 st instr of loop 55 for (a=1; a <= b; a++) c *= a; for (a=1; a <= b; a++) c *= a;

ConditionComplement blbge blebg bebne be bgebl bgble  The statement following the test should be branched over if the condition is not true  To do this, complement the branch test 56

C code SPARC code d = a; if ((a + b) > c) { a += b; c++; } a = c + d; mov %a_r, %d_r add %a_r, %b_r, %o0 cmp %o0, %c_r ble next nop add %a_r, %b_r, %a_r inc %c_r next: add %c_r, %d_r, %a_r 57 Only executed if ble is false – which is the “true” condition for the C code

Move instr from before the if If no instr to move, then use annulled branch & change target add %a_r, %b_r, %o0 cmp %o0, %c_r ble next mov %a_r, %d_r add %a_r, %b_r, %a_r inc %c_r next: add %c_r, %d_r, %a_r mov %a_r, %d_r add %a_r, %b_r, %o0 cmp %o0, %c_r ble,a next add %c_r, %d_r, %a_r add %a_r, %b_r, %a_r inc %c_r add %c_r, %d_r, %a_r next: 58

C code SPARC code if ((a + b) >= c) { a += b; c++; } else { a -= b; c--; } c += 10; add %a_r, %b_r, %o0 cmp %o0, %c_r bl else nop add %a_r, %b_r, %a_r inc %c_r ba next nop else: sub %a_r, %b_r, %a_r dec %c_r next: add %c_r, 10, %c_r 59 Now consider how to remove the 1 st nop

 Eliminate 1 st nop by using bl,a and placing 1 st statement of else in delay slot  This way, if we take the branch, we will pick up 1 st instr for the else along the way  If branch is not taken, then we kill the instr in the delay slot add %a_r, %b_r, %o0 cmp %o0, %c_r bl,a else sub %a_r, %b_r, %a_r add %a_r, %b_r, %a_r inc %c_r ba next nop else: dec %c_r next: add %c_r, 10, %c_r 60 Now consider how to remove the next nop

 Eliminate next nop by moving an instr from end of the then section into the delay slot  Doesn’t matter if we execute the increment before we branch or “on the way” to target next add %a_r, %b_r, %o0 cmp %o0, %c_r bl,a else sub %a_r, %b_r, %a_r add %a_r, %b_r, %a_r ba next inc %c_r else: dec %c_r next: add %c_r, 10, %c_r 61

 To eliminate the 2 nd nop we could copy the instr following the if else into the delay slot  We will need to copy it not only into the then section, but also in the else section add %a_r, %b_r, %o0 cmp %o0, %c_r bl,a else sub %a_r, %b_r, %a_r add %a_r, %b_r, %a_r inc %c_r ba next add %c_r, 10, %c_r else: dec %c_r add %c_r, 10, %c_r next: 62

 Use of registers – with add & sub  Call instruction – for mul & div ◦ Have to use registers %o0 & %o1  Pipelining – program flow & delay slots  Gdb – learn from practice  Branching & testing ◦ Control structures-- Annulled branches ◦ Optimizing code 63