Presentation is loading. Please wait.

Presentation is loading. Please wait.

COSC 3330/6308 Second Review Session Fall 2012. Instruction Timings For each of the following MIPS instructions, check the cycles that each instruction.

Similar presentations


Presentation on theme: "COSC 3330/6308 Second Review Session Fall 2012. Instruction Timings For each of the following MIPS instructions, check the cycles that each instruction."— Presentation transcript:

1 COSC 3330/6308 Second Review Session Fall 2012

2 Instruction Timings For each of the following MIPS instructions, check the cycles that each instruction does not skip. (4×5 points for each correct line) InstructionIFID/RRALUMEMWB add r1, r2, r3 slt r1, r2, r3 ld r1, d(r2) st r1, d(r2)

3 Instruction Timings For each of the following MIPS instructions, check the cycles that each instruction does not skip. (4×5 points for each correct line) InstructionIFID/RRALUMEMWB add r1, r2, r3XXXX slt r1, r2, r3 ld r1, d(r2) st r1, d(r2)

4 Instruction Timings For each of the following MIPS instructions, check the cycles that each instruction does not skip. (4×5 points for each correct line) InstructionIFID/RRALUMEMWB add r1, r2, r3XXXX slt r1, r2, r3XXXX ld r1, d(r2) st r1, d(r2)

5 Instruction Timings For each of the following MIPS instructions, check the cycles that each instruction does not skip. (4×5 points for each correct line) InstructionIFID/RRALUMEMWB add r1, r2, r3XXXX slt r1, r2, r3XXXX ld r1, d(r2)XXXXX st r1, d(r2)

6 Instruction Timings For each of the following MIPS instructions, check the cycles that each instruction does not skip. (4×5 points for each correct line) InstructionIFID/RRALUMEMWB add r1, r2, r3XXXX slt r1, r2, r3XXXX ld r1, d(r2)XXXXX st r1, d(r2)XXXX

7 Conditional branch What is missing in the following diagram sketching the datapaths of the non-pipelined version of the conditional branch instruction? (2×5 points)

8 Conditional Branch

9 Shift left 2

10 Conditional Branch Shift left 2 Add

11 Immediate instructions Remember that the MIPS instruction set has a variety of immediate instructions such as  addi r1, r2, im that stores into r1 the sum of the contents of register r2 and the immediate value im. Show on the following diagram what would be the datapaths for that instruction. (3×5 points)

12 addi r1, r2, im Register file Sign-extended immediate ALU

13 addi r1, r2, im Register file Sign-extended immediate ALU

14 addi r1, r2, im Register file Sign-extended immediate ALU

15 Pipelining Consider the following pair of MIPS instructions  sub r3, r1, r2 add r4, r3, r6 Show how the second instruction will proceed when bypassing is not implemented. (5 points)

16 Pipelining w/o bypassing Steps1234567 sub r3, r1, r2IFID/RRALUWB add r4, r3, r6IF Cannot read register operation before being able to read new value of register r3

17 Pipelining w/o bypassing Steps1234567 sub r3, r1, r2IFID/RRALUWB add r4, r3, r6IFID/RRALUWB Cannot read register operation before being able to read new value of register r3

18 Pipelining Show how the second instruction will proceed if bypassing is implemented.

19 Pipelining with bypassing Steps1234567 sub r3, r1, r2IFID/RRALUWB add r4, r3, r6IF

20 Pipelining with bypassing Steps1234567 sub r3, r1, r2IFID/RRALUWB add r4, r3, r6IFID/RRALUWB

21 More pipelining Consider the following pair of MIPS instructions  lw r3, d(r1) add r4, r3, r6  Show how the second instruction will proceed when bypassing is not implemented. (5 points)

22 Without bypassing Steps1234567 lw r3, d(r1)IFID/RRALUMEMWB add r4, r3, r6IF Cannot read register operation before being able to read new value of register r3

23 Without bypassing Steps1234567 lw r3, d(r1)IFID/RRALUMEMWB add r4, r3, r6IFID/RRALU Cannot read register operation before being able to read new value of register r3

24 More pipelining Show how the second instruction will proceed if bypassing is implemented.

25 With bypassing Steps1234567 lw r3, d(r1)IFID/RRALUMEMWB add r4, r3, r6IF Cannot read register operation before being able to read new value of register r3

26 With bypassing Steps1234567 lw r3, d(r1)IFID/RRALUMEMWB add r4, r3, r6IFID/RRALUWB Cannot read register operation before being able to read new value of register r3

27 A last word about data hazards Which single MIPS instruction can cause the worst data hazards? (5 points)

28 A last word about data hazards Which single MIPS instruction can cause the worst data hazards? (5 points) lw (load word into register)  It goes though all cycles before updating its register

29 The comparator The MIPS architecture we have discussed in class includes a small comparator that checks whether the two register read outputs are equal or not.  Which MIPS instructions use this comparator? (5 points)  Why do they use this comparator instead of the ALU? (5 points)  How is this comparator implemented? (5 points)

30 The comparator The comparator is used by the beq and bne instructions So that the branch decision can be made one step earlier It XORes the two 32 values then ORes bitwise the result

31 Without special unit beqIFID/RRALUMEMWB nextIFID/RR ABORT nextIF ABORT destIFID/RRALU Must wait until end of ALU step of beq to know whether we will branch or not

32 With special unit beqIFID/RRALUMEMWB nextIF ABORT destIFID/RRALU Since special unit is very fast, we know whether we will branch or not by the end of the ID/RR step

33 Disk reliability What do we mean when we say that disk failure rates follow a bathtub curve? (5 points)

34 Disk reliability What do we mean when we say that disk failure rates follow a bathtub curve? (5 points) Disk failure rates are higher  For new disks (infant mortality)  As disks wear down at the end of their useful lifetime

35 Caching A small direct-mapping cache has 2,048 entries with each entry containing four words. The computer memory is byte-addressable and all addresses are 32-bit addresses. (4×5 points)  What is the cache size (tags excluded) in bytes?

36 The cache Tag 4 words = 4  4 bytes Tag Bit 2,048 lines

37 Caching A small direct-mapping cache has 2,048 entries with each entry containing four words. The computer memory is byte-addressable and all addresses are 32-bit addresses. (4×5 points)  What is the cache size (tags excluded) in bytes? 2,048  4  4 = 32K bytes

38 Caching A small direct-mapping cache has 2,048 entries with each entry containing four words. The computer memory is byte-addressable and all addresses are 32-bit addresses. (4×5 points)  What is the tag size?

39 Caching A small direct-mapping cache has 2,048 entries with each entry containing four words. The computer memory is byte-addressable and all addresses are 32-bit addresses. (4×5 points)  What is the tag size? 32 – 4 – 11 =17 bits  Remove log2 (16) = 4 bits since each entry is 16-byte long  Remove log2(2,048) = 11 bits that are given by address in cache.

40 Caching A small direct-mapping cache has 2,048 entries with each entry containing four words. The computer memory is byte-addressable and all addresses are 32-bit addresses. (4×5 points)  How could we increase the hit ratio of the cache without increasing its size?

41 Caching A small direct-mapping cache has 2,048 entries with each entry containing four words. The computer memory is byte-addressable and all addresses are 32-bit addresses. (4×5 points)  How could we increase the hit ratio of the cache without increasing its size? Replacing it with a set-associative cache that could store 1,204 pairs of four-word entries.

42 Caching A small direct-mapping cache has 2,048 entries with each entry containing four words. The computer memory is byte-addressable and all addresses are 32-bit addresses. (4×5 points)  What would be the main disadvantage of your solution?

43 Caching A small direct-mapping cache has 2,048 entries with each entry containing four words. The computer memory is byte-addressable and all addresses are 32-bit addresses. (4×5 points)  What would be the main disadvantage of your solution? Set-associative caches are slower than direct mapping caches

44 Main memory organization Assuming that a main memory access takes  1 bus clock cycle to send the address,  16 bus clock cycle to initiate a read,  1 bus clock cycle to send a word of data, how many clock cycles would it take to transfer 16 bytes to the cache if  the data are stored in a single bank of memory? (5 points)  the data are stored in a four-way interleaved memory? (5 points)

45 Single bank memory Assuming that a main memory access takes  1 bus clock cycle to send the address,  16 bus clock cycle to initiate a read,  1 bus clock cycle to send a word of data, how many clock cycles would it take to transfer 16 bytes to the cache?  1 + 4  (16 + 1) = 69 cycles All operations are done sequentially

46 Four-way interleaved memory Assuming that a main memory access takes  1 bus clock cycle to send the address,  16 bus clock cycle to initiate a read,  1 bus clock cycle to send a word of data, how many clock cycles would it take to transfer 16 bytes to the cache?  1 + 16 + 4  1 = 21 cycles The reads, but not the data transfers, are now performed in parallel

47 Protecting page tables How can we prevent user programs from modifying their own page tables? (5 points)

48 Protecting page tables How can we prevent user programs from modifying their own page tables? (5 points)  We must store page tables in the protected area of the operating system.

49 Caches and virtual memory What would be a reasonable page size for a virtual memory system? Justify your answer in a few words. Would that be a reasonable block size for a cache? Justify your answer in a few words.

50 Caches and virtual memory What would be a reasonable page size for a virtual memory system?  4K bytes

51 Caches and virtual memory What would be a reasonable page size for a virtual memory system?  4K bytes Justify your answer in a few words.  Because page faults are very costly, the system should try to bring in as many useful data as possible.

52 Caches and virtual memory What would be a reasonable page size for a virtual memory system?  4K bytes Would that be a reasonable block size for a cache?  NO

53 Caches and virtual memory What would be a reasonable page size for a virtual memory system?  4K bytes Would that be a reasonable block size for a cache?  NO Justify your answer in a few words. Cache block sizes are much smaller: 64 bytes is a good choice because larger block sizes create too many collisions.

54 Page table size How can we limit the size of page tables to 512KB in a 32-bit virtual system?

55 Answer We do all the computations in reverse  Desired page table size 512 KB  Number of page table entries

56 Answer We do all the computations in reverse  Desired page table size 512 KB  Number of page table entries?

57 Answer We do all the computations in reverse  Desired page table size 512 KB  Number of page table entries: 512/4 =128 K Each page table entry occupies four bytes  Number of bits occupied by the page number?

58 Answer We do all the computations in reverse  Desired page table size 512 KB  Number of page table entries: 512/4 =128K Each page table entry occupies four bytes  Number of bits occupied by the page number: log2(128K) = log2(2 17 ) = 17 bits  Number of bits occupied by the byte offset?

59 Answer We do all the computations in reverse  Desired page table size 512 KB  Number of page table entries: 512/4 =128K Each page table entry occupies four bytes  Number of bits occupied by the page number: log2(128K) = log2(2 17 ) = 17 bits  Number of bits occupied by the byte offset: 32 - 17 = 15 bits  Page size?

60 Answer We do all the computations in reverse  Desired page table size 512 KB  Number of page table entries: 512/4 =128K Each page table entry occupies four bytes  Number of bits occupied by the page number: log2(128K) = log2(2 17 ) = 17 bits  Number of bits occupied by the byte offset: 32 - 17 = 15 bits  Page size: 2 15 bytes = 32 KB

61 TLB misses When comparing the hit ratios of two translation look-aside buffers, which question should we ask first?

62 Answer Are TLB misses handled by the firmware or by the OS?  If TLB misses are handled by the firmware, the cost of a TLB miss is one extra memory reference  If TLB misses are handled by the OS, the cost of a TLB miss is two context switches.

63 The dirty bit What is the purpose of the dirty bit?

64 Answer The dirty bit tells whether a page has been modified since the last time it was brought into main memory. It is used whenever a page must be expelled from main memory.  If its dirty bit is ON, the page must be saved to disk before being expelled  If its dirty bit is OFF, there already is an exact copy of the page on disk.

65 Page table organization What is the main advantage of hashed page tables?

66 Answer Hashed page tables only keep track of the pages that are actually in main memory Their size is proportional to the size of the physical memory  Instead of the size of the virtual address space

67 ALWAYS REMEMBER One KILOis2 10 One MEGAis2 20 One GIGAis2 30 In binary, 2 n is 1 followed by n zeroes


Download ppt "COSC 3330/6308 Second Review Session Fall 2012. Instruction Timings For each of the following MIPS instructions, check the cycles that each instruction."

Similar presentations


Ads by Google