Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Architecture I - Class 9

Similar presentations


Presentation on theme: "Computer Architecture I - Class 9"— Presentation transcript:

1 Computer Architecture I - Class 9
Today’s class Microarchitecture Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

2 Computer Architecture I - Class 9
Data Path The part of the CPU containing the ALU, its inputs, and its outputs This is an example data path for the IJVM microarchitecture developed in your text IJVM = Integer Java Virtual Machine (a subset of the Java Virtual Machine that only does integer operations) The registers identified here are only accessible at the microarchitecture level (by the microprogram). Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

3 ALU Signals and Functions
6 control lines F0 and F1 determine ALU operation ENA and ENB for enabling the inputs INVA for inverting the left input INC for forcing a carry into the low-order bit, effectively adding 1 to the result Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

4 Computer Architecture I - Class 9
Data Path Timing Subcycles of the data path cycle: The control signals are set up (Δw). The registers are loaded onto the B bus (Δx). The ALU and shifter operate (Δy). The results propagate along the C bus back to the registers (Δz). MPC = MicroProgram Counter MIR = MicroInstruction Register Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

5 Computer Architecture I - Class 9
Memory Example machine has two different ways to communicate with memory 32-bit word-addressable memory port Controlled by two registers MAR (Memory Address Register) MDR (Memory Data Register) 8-bit byte-addressable memory port Controlled by one register PC, which reads 1 byte into the low-order 8 bits of MBR Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

6 Computer Architecture I - Class 9
Memory Operation Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

7 Microinstruction Format
Addr – contains the address of a potential next microinstruction JAM – determines how the next microinstruction is selected ALU – ALU and shifter functions C – selects which registers are written from the C bus Mem – memory functions B – selects the B bus source, encoded as shown Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

8 Computer Architecture I - Class 9
Sequencer Which control signals should be enabled on each cycle? Determined by the sequencer Must produce two kinds of information each cycle: The state of every control signal in the system The address of the microinstruction that is to be executed next Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

9 Computer Architecture I - Class 9
Detailed block diagram of the complete microarchitecture of our example machine, which is being called the Mic-1. Two parts: data path on the left and the control section on the right. Control store – holds microprogram High Bit – computes (JAMZ AND Z) OR (JAMN AND N) MPC takes on either NEXT_ADDRESS (Addr field) or NEXT_ADDRESS with the high-order bit set to 1 Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

10 Stacks for Local Variable Storage
Procedures need somewhere to store local variables that won’t be interfered with by recursive calls Suppose procedure A has three local variables. As shown in (a), register LV points to the start of the local variable frame on the stack, and register SP points to the top of the stack. Suppose procedure B has four local variables, and A calls B. The result is as shown in (b). Suppose procedure C has two local variables, and B calls C. The result is shown in (c). Now suppose C, and then B, return, and then A calls procedure D which has five local variables. The situation is as shown in (d). Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

11 Stacks for Operand Storage
Need someplace to hold operands during an arithmetic operation Suppose that procedure A has to compute a1=a2+a3 before calling B. It can do this by pushing a2 onto the stack (a), pushing a3 onto the stack (b), doing the actual computation by popping the stack twice, forming the sum, and pushing the result back on the stack (c), and finally popping the stack and assigning the result to the variable (d). Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

12 Computer Architecture I - Class 9
The IJVM Memory Model Address space can be thought of as 4 GB or 1 GW (gigawords), each word 4 bytes. Constant pool – cannot be written by an IJVM program. Consists of constants, strings, and pointers to other areas of memory that can be referenced. Method area – contains the program. CPP, SP, and LV are all word registers, and their contents are word addresses. PC is a byte register, and its content is a byte address. Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

13 The IJVM Instruction Set
Operands byte, const, and varnum are 1 byte. Operands disp, index, and offset are 2 bytes. Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

14 Computer Architecture I - Class 9
Calling a Procedure Push onto the stack a reference (pointer) to the object to be called (since our architecture is not object-oriented, this will not be used but is retained for consistency) Push the procedure’s parameters onto the stack Execute INVOKEVIRTUAL instruction Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

15 The Stack and INVOKEVIRTUAL
The first four bytes of the called procedure contain special information: First two bytes are a 16-bit integer indicating number of parameters for the procedure Second two bytes are a 16-bit integer indicating the size of the local variable area for the procedure (need this to allocate space on the stack for them, as in the right hand picture here) Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

16 Computer Architecture I - Class 9
The Stack and IRETURN Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

17 Computer Architecture I - Class 9
Compiling Java to IJVM is a simple Java code fragment. is the code produced by compiling (a) into IJVM is the hex code produced by assembling (b) (we assume i is variable 1, j is variable 2, and k is variable 3) Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

18 Microprogram for the Mic-1
Consider how the IADD instruction will work. It will add the top two stack elements and put the result back on the stack. The TOS is already available. Need to fetch the next-to-top element of the stack. iadd1 will initiate the fetch of the next-to-top stack element. Note that the memory fetch will not occur in this clock cycle, so TOS remains unaffected. iadd2 will execute in the second clock cycle and will store the TOS in H, while the read operation from the first step works. iadd3 will actually do the addition. MDR contains the operand fetched from memory (originally the next-to-top item on the stack). Note that this is not the complete microprogram for the Mic-1. The rest can be found in your text. Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

19 Computer Architecture I - Class 9
In-Class Exercise Give two different IJVM translations for the Java statement i=k+n+5; (refer to the instruction set on page 250 of your text) How long does a 2.5 GHz Mic-1 take to execute the Java statement i=j+k;? Give your answer in nanoseconds. (Refer to the code on page 254 of your text, and to the microprogram code on page 262) Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

20 Increasing Speed of Execution
Reduce the number of clock cycles needed to execute an instruction Simplify the organization so that the clock cycle can be shorter Overlap the execution of instructions Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

21 Merging the Interpreter Loop with the Microcode
One way of reducing the number of clock cycles is to find some dead ALU time and take advantage of it. This can often be accomplished by merging the interpreter loop with the microcode. On top here is the original code for the POP instruction. Note the dead cycle while memory is being read. Use it to fetch the next instruction information, as shown in the bottom. Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

22 A Three Bus Architecture
A second approach to shorten execution path length is to have two full input buses to the ALU. Consider the ILOAD instruction. In the top box, our original design, we see that LV is copied into H in iload1; the only reason for doing so is so that it can be added to MBRU in iload2. There’s no way to add two arbitrary registers, which is what we need to do here. With a three-bus design, we can save this first step, and thus save a cycle (see the lower box). Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

23 An Instruction Fetch Unit
The ALU is used nearly every cycle for a variety of operations having to do with fetching the instruction and assembling the fields within the instruction, in addition to the real “work” of the instruction Need to free the ALU from some of these tasks Create an independent unit to fetch and process the instructions Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

24 Computer Architecture I - Class 9
The Mic-2 Datapath The Mic-2 incorporates the three previous design decisions to speed things up. It haas an instruction fetch unit, it has two full buses going into the ALU, and it merges the main loop into each instruction. Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

25 Microprogram for the Mic-2
Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

26 Computer Architecture I - Class 9
In-Class Exercise Reconsider the Java statement i=j+k; How long does it take to execute this statement on a 2.5 GHz Mic-2? (refer to the microprogram for the Mic-2 on page 282 of your text) Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

27 Computer Architecture I - Class 9
Pipelining – The Mic-3 3 cycles to operate: Load A and B Perform the operation and load C Write the results back to the registers Each of these pieces is called a microstep Three latches (registers) on the buses: A, B, and C. The latches are written on every cycle. We can speed up the clock because the maximum delay is now shorter. We can use all parts of the data path during every cycle. Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

28 Computer Architecture I - Class 9
The Mic-2 Code for SWAP We will now look at this code on the Mic-3. Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

29 The Implementation of SWAP on the Mic-3
Note that even though there are more cycles used in the Mic-3, each cycle on the Mic-3 is faster than the Mic-2, so we do get an overall improvement. (The cycle time on the Mic-3 is 1/3 that of the Mic-2.) Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

30 Computer Architecture I - Class 9
Pipeline Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

31 Cache Memory A system with three levels of cache.
Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

32 Computer Architecture I - Class 9
Cache Model Main memory is divided up into fixed-size blocks called cache lines A cache line consists of 4 to 64 consecutive bytes Lines are numbered consecutively starting at 0 Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

33 Computer Architecture I - Class 9
Direct-Mapped Caches This cache contains 2048 entries, each entry (row) holds one cache line from main memory. Let’s use a 32-byte cache line size, so this cache can hold 64 KB. The fields in the cache are: Valid = one bit to indicate whether there is any valid data in this entry or not Tag = a unique 16-bit value identifying the corresponding line of memory from which the data came Data = copy of the data in memory; this holds one cache line of 32 bytes. A 32-bit virtual address is broken up into four fields: TAG = corresponds to the Tag bits stored in the cache entry LINE = indicates which cache entry holds the corresponding data, if they are present WORD = which word within a line is referenced BYTE = only used if a single byte is requested and indicates which byte within the word is needed; for a cache supplying only 32-bit words this field will always be 0 Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

34 Set-Associative Caches
Allow multiple entries for each line number, to minimize collisions. This is a 4-way set associative cache. Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9


Download ppt "Computer Architecture I - Class 9"

Similar presentations


Ads by Google