Download presentation
Presentation is loading. Please wait.
Published byMaria Bruce Modified over 9 years ago
1
TDC 311 The Microarchitecture
2
Introduction As mentioned earlier in the class, one Java statement generates multiple machine code statements Then one machine code statement generates one or more micro-code statements 2
3
Introduction Continued For example, in Java: counter += 1; Might generate the following machine code: loadreg1,counter increg1 storereg1,counter 3
4
Reg BB 31 PC 1 MAR 2 MDR 3 Reg A 4 Reg B 5 Reg C 6 ALU Control Store MIR ALU control Add0 Multiply1 Inc A2 Inc B3 A Bus Decoder (assume 31 registers, 0 means no register) B Bus Decoder C Bus ( 32 individual signals) Addr A Bus B Bus C Bus Memory machine code instr Read, write signals Dec A4 Dec B5 AND6 OR7 Pass A8 TwosC A9 4
5
Clock Subcycles Subcycle 1 – set up signals to drive data path Subcycle 2 – drive A and B buses Subcycle 3 – ALU operation Subcycle 4 – drive C bus 1 2 3 4 Cycle starts here Registers loaded from C Bus Next microinstruction loaded from control store Requires 2 complete clock cycles to perform a microinstruction. 5
6
Simple Example Java statement: counter += 1; What might the microinstructions look like? loadreg1,counter (Assume the address of counter is currently in Register C) Rd=1; Wr=0; A=00110 (Reg C); B=00000; C=00010 (MAR); ALU=1000 (pass A thru) Rd=1; all else 0 (counter should now be sitting in MDR) Rd=0; Wr=0; A=00011 (MDR); B=00000; C=00100 (Reg A/1); ALU=1000 increg1 Rd=0; Wr=0; A=00100 (Reg A/1); B=00000; C=00100 (Reg A); ALU=0010 (Inc A) storereg1,counter Rd=0; Wr=1; A=00100 (Register A); B=00000; C=00011 (MDR); ALU=8 (assume address of counter is still in MAR) Rd=0; Wr=1; all else 0 6
7
Design Issues Speed vs. cost reduce the number of clock cycles needed to execute an instruction simplify the organization so that the clock cycle can be shorter overlap the execution of instructions Any way to improve upon the micro- architecture? 7
8
Design Issues Create independent units that fetch and process the instructions? (double-up on other things? Everything?) Pre-fetch one/two/three instructions? Perform pipelining? 8
9
Pipeline Example 9
10
Pipeline Problems Pipe stall – when a subsequent instruction must wait before it can proceed What causes stalls? waiting for memory waiting for subsequent instruction determining the next instruction What if you encounter a branch instruction? Also takes time to fill the pipeline 10
11
Design Issues Perform branch prediction? Perform out-of-order execution add two register contents and store in register increment counter by 1 start a write operation changed to: add two register contents and store in register start a write operation increment counter by 1 11
12
Design Issues Perform speculative execution? Re-use registers that are no longer used? Have a large register set and keep all current values in registers? Use cache memory? 12
13
Cache Memory Main memory is usually referenced near one location (locality principle) Program code should be in one location (if good programmer) and data often in another (but grouped together) Bring most recently referenced values into a high speed cache How does the CPU know something is in cache or not? 13
14
Direct-mapped Cache Most common form of cache memory Let’s consider a cache which has 2048 entries, each entry holding 32 bytes (not bits) of data 2048 entries times 32 bytes per entry equals 64 KB 14
15
V bitTag (16 bits)Data (32 bytes) 2047 2046 2045 : 2 1 0 Addresses that use this entry: 65504-65535, 131040- 131071,… 64-95, 65600-65631,… 32-63, 65568-65599,… 0-31, 65536-65567, 131072-131103,… 15
16
Cache Address When a program generates a 32-bit address, it has the following form: Tag – 16 bitsLine – 11 bitsWord – 3 bitsByte – 2 bits 16
17
Cache Hit To see if a data item is in the cache, use the 11-bit LINE portion (of the address) to point to one of the 2048 cache row entries Then the 16-bit TAG of the address is compared to the 16-bit TAG value in the cache entry If there is a match, the data is there 17
18
Cache Hit If the data is there, use the 3-bit WORD portion of the address to tell you which word from the 8 words (32 bytes) in the cache line should be fetched If necessary, the 2-bit BYTE address will tell you which one of the four bytes to fetch 18
19
Cache Memory Note that since this cache only holds 64KB, it holds data for addresses 0 – 65535. But it may also hold data for the addresses 65536 – 131071. That is why you must compare the TAG fields to see if there is a match 19
20
Cache Miss If no match (of TAG fields), then there is a cache miss The CPU goes to main memory and fetches the next block of data and stores it in the cache (thus wiping out the old block in the cache) 20
21
Cache Example Consider that the CPU wants to fetch data from location 36 10 (or 00000024 in hex) Tag = 0000 0000 0000 0000 Line = 0000 0000 001 Word = 001 Byte = 00 21
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.