CS 300 – Lecture 21 Intro to Computer Architecture / Assembly Language Virtual Memory
Next Homework Sorry – it's not ready yet. It will be in the wiki Friday. The first part will be due the Thursday after break. The second will be due a week later. There will be one more homework after that.
Test Recap Binary Numbers! Aaaaaargh! Binary numbers WILL BE BACK on the final! I pity the fool that can't think in binary!
Binary Convert the following decimal numbers to 8 bit signed binary numbers Add the following 8 bit signed binary numbers; indicate any overflows (but still give a result) Signed: / * / 4 IEEE Float: Convert the number -8.5 to IEEE floating point (32 bits)
Bit Fiddling Write a MIPS code sequence which takes an IEEE float in $a0 and places the exponent only, converted to an integer between -128 and 127, in register $v0.
More MIPS *x = *(x+1)+2
Short Answer If p points to a 64 bit floating point numbers, to increment p you add _______ (T / F) If you divide by 2 using a right shift, the result is rounded up if the number is odd. To divide a signed integer by 2 using a right shift, you shift in _______ bits. When you use a lw instruction, the memory address referenced must end in _________ (T / F) A “lw” instruction may access any word in the MIPS memory. (T / F) A “j” instruction can jump to any instruction in the MIPS memory
Short Answer If p points to an 8-bit character, a ______ instruction fetches the character from memory. (T / F) A function is free to change the value of $t1 without saving it on the stack (T / F) In C, the expression a[i] is the same as *(a+i) Fast arithmetic on large integers is important today since computers commonly run ___________________________ software. (T / F) Writing large assembly language programs is likely to cause brain damage.
Bugz f: add $a0, $a1, $a0 lw $a0, 4($a0) jal put_str jr $ra g: lw $t0, 0($s0) add $s1, $s1, $t0 addi $s0, $s0, 1 addi $s2, $s2, -1 bne $zero, $s2, g
Bugz h: addi $sp, $sp, -4 sw $ra, 0($s0) # Oops - $sp jal f1 addi $a0, $v1, 1 jal f2 la $ra, 0($sp) # Oops - lw jr $ra
Mipsorama int f(char **a, int *b, int c) { int sum = 0; int i; while (c != 0) { i = *b; while (i != 0) { put_str(*a); sum++; i--}; b++; a++; c--; return(sum);
Back to Caches … Things to know: * A cache is smaller but faster than the system being cached. * Shape of the cache determines whether addresses conflict - direct mapped, associative, set (partial) associative * Replacement policies (LRU) * Multi-level cache systems
Dual Caches One possible cache design is to separate instruction caching from data caching. There are major differences in the access patterns for instructions & data (I & D) * No writes to instructions (simplifies cache design) * Instructions are more sequential – pre-loading is a big issue. A less associative design is possible * A data cache has to worry about regular access patterns (much array code)
Cache Coherence This is a problem when more than one party is using a cache. If two processors use a common memory, their on- chip caches can lose coherence. How to deal with this? * Write-through (cache is never out of synch with memory) instead of write-back (avoid writing dirty cache words until replacement). * Invalidation signals: When processor A writes into memory, it must invalidate the corresponding word in processor B's cache (or update it)
Current Cache Design Stuff * Segregated (I/D) L1 cache – small and very fast * On-chip large L2 cache (in the mB range) * Off-chip L3 cache in high end systems * Set associative designs predominate – 8 way is common.
Overview of Pentium Caching * Pentium I: 8KB each L1 I and D cache * Pentium Pro: 256KB L2 cache added * Pentium II: 16KB L1 I/D cache, 512KB L2 cache * Pentium IV: up to 1MB L2 cache – cache access time to L1 is just 2 clocks but cache is smaller. L2 cache runs about 10 clocks. * Pentium D: up to 4MB L2 cache
The Three C's There are three reasons that cache misses occur: * Compulsory: data was not used previously so can't be in the cache * Capacity: the word could have been in the cache but it was too full * Conflict: the cache is big enough but the shape of the cache precludes keeping the data available
Cache and the Programmer Most code doesn't care about the cache… Some algorithms are "cache friendly" (quicksort) Numeric code is a serious problem! Array access patterns can lead to very poor cache behavior. Explicit prefetching of data can achieve significant speedup Compilers for RISC are explicit cache managers