CSE Advanced Computer Architecture Week-1 Week of Jan 12, 2004 engr.smu.edu/~rewini/8383
Contents Course Outline Review of Main Concepts in Computer Architecture Instruction Set Architecture Flynn’s Taxonomy Layers of Computer System Development Performance
Course Contents 1. Review of Main Concepts 2. Memory System Design 3. Pipeline Design Techniques 4. Multiprocessors 5. Shared Memory Systems 6. Message Passing Systems 7. Network Computing
Course Resources Lecture slides on the web Student presentations on the web Books Hwang, Advanced Computer Architecture-- Parallelism Scalability Programmability, McGraw- Hill. Abd-El-Barr and El-Rewini, Computer Design and Architecture, to be published by John Wiley and Sons in (Selected chapters will be made available on the web)
Student Work Class Participation Assignments Presentations Project Midterm Final
Review
Memory Locations and Operations Memory Addressing Memory Data Register (MDR) Memory Address Register (MAR) Three steps for read and write
Addressing Instruction Format Op-code Address fields Number of address fields Three (memory locations or registers) Two (memory locations or registers) One-and-half (memory location and register) One (accumulator) Zero (stack operations)
Addressing Modes Immediate (operands in instruction) Direct (address in instruction) Indirect (address of the address) Indexed (constant is added to index register) Other modes
Instruction Types Data Movement Arithmetic and Logical Sequencing Input/Output
Flynn’s Classification SISD (single instruction stream over a single data stream) SIMD (single instruction stream over multiple data stream) MIMD (multiple instruction streams over multiple data streams) MISD (multiple instruction streams and a single data streams)
SISD (single instruction stream over a single data stream) SISD uniprocessor architecture CU IS DSIS PUMU I/O Captions: CU = control unitPU = Processing unit MU = memory unitIS = instruction stream DS = data streamPE = processing element LM = Local Memory
SIMD (single instruction stream over multiple data stream) SIMD Architecture PE n PE 1 LMn CU IS DS IS Program loaded from host Data sets loaded from host LM 1
MIMD (multiple instruction streams over multiple data streams) CU 1 PUn ISDS ISDS MMD Architecture (with shared memory) PU 1 Shared Memory I/O IS
MISD (multiple instruction streams and a single data streams) Memory (Program and data) CU 1 CU 2 PU 2 CU n PU n PU 1 IS DS I/O DS MISD architecture (the systolic array)
Layers for computer system development Applications Programming Environment Languages Supported Communication Model Addressing Space Hardware Architecture Machine Independent Machine Dependent
System Attributes to Performances Clock Rate and CPI (clock cycles per instruction) Performance Factors: T = I c x CPI x System Attributes Instruction-set architecture Complier technology CPU implementation and control Cache and memory hierarchy
MIPS & Throughput f = 1/ (clock rate) C = total number of cycles MIPS Rate MIPS = I c /(T x 10 6 )= f/(CPI x10 6 ) = (f x I c )/(C x10 6 ) Throughput Rate: W p = f /(I c x CPI)
Memory System Design
Contents (Memory) Memory Hierarchy Cache Memory Placement Policies Direct Mapping Fully Associative Set Associative Replacement Policies
Memory Hierarchy CPU Registers Cache Main Memory Secondary Storage Latency Bandwidth Speed Cost per bit
Sequence of events Processor makes a request for X X is sought in the cache If it exists hit (hit ratio h) Otherwise miss (miss ratio m = 1-h) If miss X is sought in main memory It can be generalized to more levels
Cache Memory The idea is to keep the information expected to be used more frequently in the cache. Locality of Reference Temporal Locality Spatial Locality Placement Policies Replacement Policies
Placement Policies How to Map memory blocks (lines) to Cache block frames (line frames) Blocks (lines) Block Frames (Line Frames) Memory Cache
Placement Policies Direct Mapping Fully Associative Set Associative
Direct Mapping Simplest A memory block is mapped to a fixed cache block frame (many to one mapping) J = I mod N J Cache block frame number I Memory block number N number of cache block frames
Address Format Memory M blocks Block size B words Cache N blocks Address size log 2 (M * B) TagBlock frameWord log 2 Blog 2 NRemaining bits log 2 M/N
Example Memory 4K blocks Block size 16 words Address size log 2 (4K * 16) = 16 Cache 128 blocks TagBlock frameWord 475
Example (cont.) MemoryTagcache bits
Fully Associative Most flexible A memory block is mapped to any available cache block frame (many to many mapping) Associative Search
Address Format Memory M blocks Block size B words Cache N blocks Address size log 2 (M * B) TagWord log 2 BRemaining bits log 2 M
Example Memory 4K blocks Block size 16 words Address size log 2 (4K * 16) = 16 Cache 128 blocks TagWord 412
Example (cont.) Memory Tagcache 12 bits
Set Associative Compromise between the other two Cache number of sets Set number of blocks A memory block is mapped to any available cache block frame within a specific set Associative Search within a set
Address Format Memory M blocks Block size B words Cache N blocks Number of sets S N/num of blocks per set Address size log 2 (M * B) log 2 B TagSetWord log 2 S Remaining bits log 2 M/S
Example Memory 4K blocks Block size 16 words Address size log 2 (4K * 16) = 16 Cache 128 blocks Num of blocks per set = 4 Number of sets = 32 4 TagSetWord 57
Example (cont.) Set 0 Tag cache 7 bits Set Memory
Comparison Simplicity Associative Search Cache Utilization Replacement
Replacement Techniques FIFO LRU MRU Random Optimal