Download presentation
Presentation is loading. Please wait.
Published byMerilyn Boyd Modified over 9 years ago
1
CSC 3210 Computer Organization and Programming Chapter 2 SPARC ARCHITECTURE D.M. Rasanjalee Himali
2
Outline Introduction Registers SPARC Assembly Language Programming Pipelining The Debugger gdb Filling Delay Slots Branching Control Statements
3
Introduction The SPARC architecture is a load/store architecture. All arithmetic and logical operations are carried out between operands located in registers. Load and store instructions are provided to load and store register contents from memory. The machine has 32 registers available to the programmer at any one time.
4
Registers Registers provide for rapid, direct access in computation These registers are logically divided into four sets: Global (%g0 - %g7): for global register data, data that have meaning to an entire program and are accessible from any function. In (%i0 - %i7): contain calling function arguments Local (%l0 - %l7): for local function variables and we will store our program variables in these registers. Out(%o0 - %o7): for use as temporaries, passing arguments to functions, and, obtaining returned values from functions. Special Registers: %o6,%o7, %g0 All registers will store a signed integer n, -2 31 <n < 2 31 or approximately |ni| < 10 9
5
Registers
6
SPARC Assembly Language Programming The SPARC assembler, as is in effect a two-pass assembler. First Pass: Assembler updates the location counter as it processes machine statements, without paying attention to undefined labels that might be used as operands. Whenever it sees a label followed by a colon (:) it defines the label symbol to have the value of the location counter. Second Pass: The program is then read a second time; All the symbols and labels have been defined, and whenever a label is encountered its value is substituted for the symbol. Labels followed by a colon are ignored.
7
SPARC Assembly Language Programming Assembly language programs are line based: Each statement typically specify a single instruction or data element. Statements may be labeled: Label: An identifier followed by a colon Labels start at the beginning of a line and the instruction or data specification one tab stop in.
8
SPARC Assembly Language Programming Comments: Start at about the center of the line. Commence with an exclamation point (!). C-style comments may also be used, opening with a /* and closing with a */. These comments may extend over many lines. Example :
9
SPARC Assembly Language Programming Pseudo-Ops: All machine instructions have mnemonics such as add and sub. There are other statements that do not generate machine instructions called pseudo-ops: Ex: data definitions, statements that provide the assembler information. Pseudo-ops generally start with a period. Pseudo-op may be labeled. .global pseudo-op define a label to be accessible outside the program in which it is defined. Ex: Define the label main to be global:
10
SPARC Assembly Language Programming We use the C compiler to call the assembler as and to load our program. All C programs have a “.c” file name extension. The C compiler produce the object files (with a “.o” extension). Object files are the machine code corresponding to the C code for files. Then C compiler calls the linker to combine all the object files with library routines to make an executable program. This executable program is by default stored in a file called “a. out.”
11
SPARC Assembly Language Programming Compiling a C program: two-step process: First, the compiler translates the C program into assembly language, placing the code in a file with a “.s” extension to indicate that it is assembly language. The compiler then calls as to assemble this file to produce the “.o” file.
12
SPARC Assembly Language Programming To see the assembly language for one of your C programs, call the compiler with the “-S” switch and it will produce only the “.s” assembly language file. To have this assembled and made ready for execution, we would type: This will assemble our program and place it in a file called expr ready for execution
13
SPARC Assembly Language Programming The C compiler expects to start execution at an address main. Thus, the label ‘main’ : must appear in our program at the first statement we want executed, and must be declared global by using the.global pseudo-op. The first instruction to be executed should be: The save instruction provides space to save our registers when the debugger is running.
14
SPARC Assembly Language Programming Macros need to be expanded before we assemble our program We write our program in a file with a.m extension, indicating that m4 must first be run to produce the.s file:
15
SPARC Assembly Language Programming Ex: Write a program to evaluate equation in chapter1
16
SPARC Assembly Language Programming Most SPARC instructions take three operands: two registers and a literal constant, or three registers: The contents of the first source register reg rs1 is combined with the literal or the contents of the second source register reg rs2 to produce a result Result is stored in the destination register reg rd. The contents of the source registers are unchanged. A literal constant, c, must have the range -4096 <c < 4096.
17
SPARC Assembly Language Programming Some other instructions: clear a register to zero copy contents of one register to another combine the contents of the two source registers, or source register and literal, with the sum or difference going into the destination register second operand is subtracted from the first and placed in destination register
18
SPARC Assembly Language Programming Multiplication and Division: SPARC architecture does not have a multiply or divide instruction. These operations are done by call instruction To multiply : To divide: Result is placed in %o0 Called function may use any of the first six out registers %o0-%o5, possibly changing their contents These registers are for temporary results, and their contents are not preserved over function calls
19
Pipelining To achieve very fast execution, computers are pipelined. Von Neumann cycle is broken up into its components parts. For a RISC architecture the components are:
20
Pipelining Sequential/ Non-Pipelined Execution: If each component of the cycle takes one machine cycle, it will take four cycles to execute each instruction Each component remains idle 75% of the time. Pipelined Execution: Each component is executed independently and concurrently Ex: the instruction fetch component fetch the next instruction immediately after it has finished fetching the current instruction The pipelined machine can execute one instruction every machine cycle, four times the rate of the non-pipelined machine
21
Pipelining Problem 1: Load Delay Ex: When a load instruction is executed (load [ %o1)the data is not obtained until the end of the M cycle. If the instruction (add %ol, %o2, %o2) attempts to use this data, it will obtain the prior contents of the register! Machine detects this and waits a cycle to allow the data to be obtained If you can insert an instruction between the load and the next instruction which uses the result of the load, no cycles are wasted.
22
Pipelining Problem 2: Branch Delay Occurs when a branch instruction is encountered, as a branch instruction changes the pc. Unfortunately, the branch target address is not available until after the execution of the branch instruction, and this is not until after the following instruction has been fetched Once again a cycle must be wasted. In this case, however, the machine does not insert a wait cycle but expects the programmer to insert some instruction that may be executed after the branch instruction. This is called a branch delay slot instruction.
23
Pipelining It is frequently possible to place an instruction after the branch that can be usefully executed. Programmer can use instructions following a branch by maintaining two program counters, %pc and %npc, the program counter and the next program counter. The machine executes the instruction to which the %pc is pointing while at the same time fetching the instruction to which the %npc is pointing. The instruction fetched is generally the one following the instruction being executed. When a branch occurs, the instruction following the branch has already been fetched and will be executed.
24
Pipelining The left half and right half of the diagram execute simultaneously, with time running down the page.
25
Pipelining Independent of what happens to the %npc, the instruction that was fetched before the branch is always executed. When we call a function we are branching to another address in memory, and the instruction following the call instruction will be executed before the first instruction of the called function is executed. The simplest thing to do following any branch instruction is to insert a nop. This is a mnemonic for “no operation” and is an instruction that does nothing to change the state of the machine:
26
Pipelining We can now write our program to compute the expression for x = 9 given in Eq.1:
27
Pipelining Trap Instruction: The last two instructions in the program return us to the operating system. The trap instruction ta calls the operating system with the service request encoded into register %gl. A few of the traps are as follows:
28
Pipelining Save the program expr.m and run through m4: with the output redirected into expr.s, the following assembly code would be produced: This could then be assembled and the executable output put into a file expr by : If the program is then executed:
29
The Debugger gdb A debugger is used to verify correctness, and to find bugs The debugger gdb may also be used to execute a program, to stop execution at any point and to single-step execution Having assembled the program, placing the output into expr as we did in the example above, gdb may be entered by typing: To run the program in gdb, type “r”:
30
The Debugger gdb A breakpoint may be set at any address When computer is about to execute the instruction at which the breakpoint was set, it stops and returns to gdb, whereupon the program and its state of execution may be examined. Typing “c” will tell gdb to continue execution from the breakpoint. To set a breakpoint at a memory address, we need to type: Ex: The command “b” followed by a label sets a breakpoint at the instruction following the labeled instruction; gdb assumes the labeled instruction to be a save instruction :
31
The Debugger gdb If we then run the program: gdb informs us that we are at breakpoint 1, which should be the first instruction in our program. The pc, will have the address of the instruction 0x106a8. We can examine memory by typing “x” followed by an address: The “i” format specifier states that the contents of the memory location should be interpreted as a machine instruction.
32
The Debugger gdb In gdb all machine registers are referred to by a $ in place of the % used in as. By typing a return we repeat the last command but with the address incremented by the size of the last data element typed out: We may print the entire program by typing x/12i main, which will repeat the examine command 12 times:
33
The Debugger gdb If we want to see whether the program ran correctly, we can set another break point at the trap instruction located at main+44: We would then command gdb to continue execution by typing “c” (remember we are currently stopped at the first location in our program): The program executes and stops at the last breakpoint we set. At this point the value should be stored in register %l1. To print the contents of a register, we use the print command “p”: This tells us that the contents of register %l1 is -8, the correct value.
34
The Debugger gdb What would happen if our program were incorrect and did not compute the correct value? We could single-step the program starting at the beginning by typing “ni” for next machine instruction. To do this at this point we would need to run the program again: To know what instructions were being executed examine the memory location the %pc is pointing to:
35
The Debugger gdb We have just executed the first instruction. If we execute the second instruction, by typing ni, %l0 should contain the value 9: “display” command, prints pc value every time a command is executed:
36
The Debugger gdb We are now about to execute the call to.mul: Note that the delay slot instruction is executed before the call to. Mul To quit gdb and to return to the operating system:
37
Filling Delay Slots The call instruction is called a delayed control transfer instruction. A delayed transfer instruction changes the address from which future instructions will be fetched after the instruction following the delayed transfer instruction has been executed. The instruction following the delayed control transfer instruction is called the delayed instruction and it is located in the delay slot. Whenever a branch or call instruction is executed: it changes the contents of %npc, not the %pc. The instruction that follows the branching instruction will be executed before the branch or call happens. By filling the delay slot with a nop instruction we have not accomplished very much; the pipeline machine wastes an instruction execution every time it branches. However, we may move the instruction prior to the branch instruction into the delay slot.
38
Filling Delay Slots In the following version of the program we have moved the sub instructions, which compute the final argument to.mul and.div into the delay slots, thereby eliminating the nop instructions. The resulting code does not lose any cycles at all.
39
Filling Delay Slots Assume that we are executing the mov 9, %lo instruction while at the same time fetching the sub %l0,1,%o0 instruction Having fetched an instruction, we will execute it in the next cycle. As the instruction executed was not a branch instruction, the next instruction following will be fetched.
40
Filling Delay Slots Having fetched the call instruction, it will be executed in the next cycle. As the instruction executed sub %l0, 1, %o0, was not a branch instruction, the next instruction following will be fetched. The execution of the call instruction will cause the next instruction to be fetched from the first location labelled by.mul. Having fetched the sub %l0, 7, %o1instruction, it will be executed. Its execution occurs before the first instruction from. mul has even been fetched.
41
Filling Delay Slots As the instruction executed, sub %l0, 7, %o1, was not a branching instruction, the next instruction following the instruction addressed by the %npc will be fetched while the instruction just fetched will be executed Filling the delay slots in this manner makes reading the program more difficult, but by filling the delay slots the resulting execution is faster and the size of the program smaller. Care must be taken in filling delay slots to ensure that the algorithm is not changed. In general, when we write assembly language programs we will be expected to fill all possible delay slots.
42
Branching Branching is used in conjunction with testing Testing: The state of execution is saved in terms of four variables: This information is kept in four variables, the integer condition codes: Z, N, V, and C.
43
Branching Moving instructions around could eliminate empty delay slots. This causes a problem when we wish to conditionally branch, based on the result of a prior instruction execution, if the instruction was not immediately executed before the branch instruction. This problem is solved in the SPARC architecture by having a duplicate set of computational instructions, such as add and sub, which in addition to performing the arithmetic operation, set the condition codes. These instructions have “cc” appended to the mnemonic, which indicates that the instruction is to set the condition codes Z, N, V, and to save the state of the instruction execution.
44
Branching Like the call instruction if the condition specified is met,branch instructions branch to the specified label. Branch instructions are delayed control transfer instructions such that the following instruction will be executed before the effect of the branch takes place. The delay slot of a conditional branch instruction may not be filled with another branching instruction Branch instructions test the condition codes in order to determine if the branching condition exists :
45
Branching Ex: evaluate the expression in Chapter 1 for integer values of x from 0 up to 10 C program: Translated Assembly language program : bl instruction is followed by a nop instruction in the delay slot. We cannot fill the delay slot as we did in the case of the call instruction simply by moving the instruction immediately before the branch into the slot, as this statement sets the condition codes to be evaluated by the bl instruction.
46
Branching If it is possible to rearrange the code before the conditional branch statement: We are now free to move the mov instruction into the delay slot. Modified.s version:
47
Branching: When we execute the program we will need to set a breakpoint at loop to print out the value of y: This works well but involves a lot of typing
48
Branching: We can program gdb to do this for us with the commands instruction This instruction specifies a number of commands to be executed when the breakpoint is reached its argument is the breakpoint at which the commands are to be executed. In our case it is breakpoint 2 (the first breakpoint is set at main).
49
Branching This informs gdb that when it reaches breakpoint 2, it is to print out the contents of register %l1 and then to continue:
50
Control Statements While: The while loop causes some problems in assembly language. Consider translating the following while statement into assembly language:
51
Control Statements The most obvious way to go is to : perform the test, which must be performed before the loop is executed, execute the loop, and then branch back to the test: The number of instructions to be executed initializing a loop is generally small compared to the number of instructions to be executed inside a loop, when multiplied by the number of times the loop will be executed.
52
Control Statements In the previous example, the cmp and the two add instructions must be there with the conditional branch. However, the unconditional branch ba test might be removed, as may the two nop instructions. By repeating the compare and test at the end of the loop we may eliminate the ba instruction: To minimize the number of instructions to be executed, we should concentrate on the instructions inside the loop. Note that the loop is now two add instructions, the cmp, the conditional branch (all of which must be there), and a nop instruction - an improvement of two instructions
53
Control Statements We might also eliminate the code for the initial test by branching unconditionally to the test at the end of the loop:
54
Control Statements The nop following the conditional branch lies in the loop and is executed every iteration of the loop. We might be tempted to move the first instruction from the loop into the delay slot: However, unlike the do loop, the condition of a while loop is to be evaluated before the loop is executed and, if the condition is not met, the loop, including the first instruction of the loop, is not to be executed. The first instruction of the loop, add, that we moved into the delay slot after the conditional branch instruction will be executed even if the first test failed and the loop were not to be executed at all.
55
Control Statements All conditional branches may be annulled. If a conditional branch is annulled, the delay instruction is : executed when the branch is taken (the usual case) but not executed if the branch is not taken The delay slot instruction is still fetched
56
Control Statements To specify to the assembler that we want an annulled branch, we follow the branch mnemonic with, a. The code segment above is now correct and the loop contains the minimum number of instructions: two adds, one compare, and a branch.
57
Control Statements Do: For previous example, making use of the annulled branch feature,: repeat the first instruction of the loop in the delay slot, annul the branch, and change the target of the branch to the second instruction in the loop: This approach, although simple to implement, results in a program that is one instruction longer (generally not important) and wastes one machine cycle when the execution of the delay slot instruction is annulled (which happens only when the loop is finally exited).
58
Control Statements For: The translation of the following segment of C code: would be:
59
Control Statements If Then: The statement following the relational expression is to be branched over if the condition is not true To accomplish this we need to logically complement the sense of the branch, following the relational expression evaluation, before the code for the statement. The complements of the branches are as follows:
60
Control Statements For example, to translate: we would write, complementing the test into a ble:
61
Control Statements To fill the delay slot here we could move an instruction from before the if into the delay slot if the instruction had no effect on the if condition evaluation: If there is no such instruction, we could copy the instruction following the if into the delay slot, annul the branch, and change the target of the branch to skip over the copied instruction: Once again, the latter method is simpler but wastes an instruction and a cycle of execution if the branch is untaken.
62
Control Statements If Else: Assembly program: C program:
63
Control Statements We may eliminate the first nop instruction by replacing the bl instruction with an annulled bl, a instruction and moving the first instruction of the else part into the delay slot. If the else part is to be executed, the branch “takes” and the first instruction of the else part is executed in the delay slot. If the then part is executed, the branch is not “taken” and the first instruction of the else part is annulled:
64
Control Statements We can then deal with the nop after the unconditional branch instruction by moving one of the instructions from the end of the then part into the delay slot: Or by copying the instruction following the if else into the delay slot following the ba instruction: Once again, we add an instruction to the length of the program, but in this case, do not add an additional cycle to the execution.
65
Control Statements
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.