CSE 8383 - Advanced Computer Architecture Week-1 Week of Jan 12, 2004 engr.smu.edu/~rewini/8383.

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
1 Parallel Scientific Computing: Algorithms and Tools Lecture #2 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
The Memory Hierarchy (Lectures #24) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer Organization.
1 Lecture 2: Review of Computer Organization Operating System Spring 2007.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
Memory Hierarchy Design Chapter 5 Karin Strauss. Background 1980: no caches 1995: two levels of caches 2004: even three levels of caches Why? Processor-Memory.
Maninder Kaur CACHE MEMORY 24-Nov
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
In1210/01-PDS 1 TU-Delft The Memory System. in1210/01-PDS 2 TU-Delft Organization Word Address Byte Address
Fall 2012 Chapter 2: x86 Processor Architecture. Irvine, Kip R. Assembly Language for x86 Processors 6/e, Chapter Overview General Concepts IA-32.
Chapter Twelve Memory Organization
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 12 Overview and Concluding Remarks.
Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)
Computer Architecture Lecture 26 Fasih ur Rehman.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January Session 2.
Computer Architecture And Organization UNIT-II General System Architecture.
CSE 241 Computer Engineering (1) هندسة الحاسبات (1) Lecture #3 Ch. 6 Memory System Design Dr. Tamer Samy Gaafar Dept. of Computer & Systems Engineering.
Computer Organization & Programming
1 Basic Components of a Parallel (or Serial) Computer CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
11 Intro to cache memory Kosarev Nikolay MIPT Nov, 2009.
Computer Architecture And Organization UNIT-II Flynn’s Classification Of Computer Architectures.
Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2010
Lecture 17 Final Review Prof. Mike Schulte Computer Architecture ECE 201.
Chapter 5 Computer Systems Organization. Levels of Abstraction – Figure 5.1e The Concept of Abstraction.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January Session 2.
Lecture 3: Computer Architectures
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29.
Parallel Computing Presented by Justin Reschke
Classification of parallel computers Limitations of parallel processing.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Computer Orgnization Rabie A. Ramadan Lecture 9. Cache Mapping Schemes.
Chapter Overview General Concepts IA-32 Processor Architecture
CMSC 611: Advanced Computer Architecture
CS 704 Advanced Computer Architecture
Catalog of useful (structural) modules and architectures
COSC3330 Computer Architecture
The Memory System (Chapter 5)
Memory COMPUTER ARCHITECTURE
The Goal: illusion of large, fast, cheap memory
A Closer Look at Instruction Set Architectures
buses, crossing switch, multistage network.
Parallel Processing - introduction
CS 704 Advanced Computer Architecture
Flynn’s Classification Of Computer Architectures
Cache Memory Presentation I
William Stallings Computer Organization and Architecture 7th Edition
Shared Memory Multiprocessors
Chapter 5 Memory CSE 820.
Systems Architecture II
ECE 445 – Computer Organization
buses, crossing switch, multistage network.
CMSC 611: Advanced Computer Architecture
Overview Parallel Processing Pipelining
COMP541 Datapaths I Montek Singh Mar 18, 2010.
Chapter 4 Multiprocessors
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache - Optimization.
Introduction to Computer Systems Engineering
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

CSE Advanced Computer Architecture Week-1 Week of Jan 12, 2004 engr.smu.edu/~rewini/8383

Contents Course Outline Review of Main Concepts in Computer Architecture Instruction Set Architecture Flynn’s Taxonomy Layers of Computer System Development Performance

Course Contents 1. Review of Main Concepts 2. Memory System Design 3. Pipeline Design Techniques 4. Multiprocessors 5. Shared Memory Systems 6. Message Passing Systems 7. Network Computing

Course Resources Lecture slides on the web Student presentations on the web Books Hwang, Advanced Computer Architecture-- Parallelism Scalability Programmability, McGraw- Hill. Abd-El-Barr and El-Rewini, Computer Design and Architecture, to be published by John Wiley and Sons in (Selected chapters will be made available on the web)

Student Work Class Participation Assignments Presentations Project Midterm Final

Review

Memory Locations and Operations Memory Addressing Memory Data Register (MDR) Memory Address Register (MAR) Three steps for read and write

Addressing Instruction Format Op-code Address fields Number of address fields Three (memory locations or registers) Two (memory locations or registers) One-and-half (memory location and register) One (accumulator) Zero (stack operations)

Addressing Modes Immediate (operands in instruction) Direct (address in instruction) Indirect (address of the address) Indexed (constant is added to index register) Other modes

Instruction Types Data Movement Arithmetic and Logical Sequencing Input/Output

Flynn’s Classification SISD (single instruction stream over a single data stream) SIMD (single instruction stream over multiple data stream) MIMD (multiple instruction streams over multiple data streams) MISD (multiple instruction streams and a single data streams)

SISD (single instruction stream over a single data stream) SISD uniprocessor architecture CU IS DSIS PUMU I/O Captions: CU = control unitPU = Processing unit MU = memory unitIS = instruction stream DS = data streamPE = processing element LM = Local Memory

SIMD (single instruction stream over multiple data stream) SIMD Architecture PE n PE 1 LMn CU IS DS IS Program loaded from host Data sets loaded from host LM 1

MIMD (multiple instruction streams over multiple data streams) CU 1 PUn ISDS ISDS MMD Architecture (with shared memory) PU 1 Shared Memory I/O IS

MISD (multiple instruction streams and a single data streams) Memory (Program and data) CU 1 CU 2 PU 2 CU n PU n PU 1 IS DS I/O DS MISD architecture (the systolic array)

Layers for computer system development Applications Programming Environment Languages Supported Communication Model Addressing Space Hardware Architecture Machine Independent Machine Dependent

System Attributes to Performances Clock Rate and CPI (clock cycles per instruction) Performance Factors: T = I c x CPI x  System Attributes Instruction-set architecture Complier technology CPU implementation and control Cache and memory hierarchy

MIPS & Throughput f = 1/  (clock rate) C = total number of cycles MIPS Rate MIPS = I c /(T x 10 6 )= f/(CPI x10 6 ) = (f x I c )/(C x10 6 ) Throughput Rate: W p = f /(I c x CPI)

Memory System Design

Contents (Memory) Memory Hierarchy Cache Memory Placement Policies Direct Mapping Fully Associative Set Associative Replacement Policies

Memory Hierarchy CPU Registers Cache Main Memory Secondary Storage Latency Bandwidth Speed Cost per bit

Sequence of events Processor makes a request for X X is sought in the cache If it exists  hit (hit ratio h) Otherwise  miss (miss ratio m = 1-h) If miss  X is sought in main memory It can be generalized to more levels

Cache Memory The idea is to keep the information expected to be used more frequently in the cache. Locality of Reference Temporal Locality Spatial Locality Placement Policies Replacement Policies

Placement Policies How to Map memory blocks (lines) to Cache block frames (line frames) Blocks (lines) Block Frames (Line Frames) Memory Cache

Placement Policies Direct Mapping Fully Associative Set Associative

Direct Mapping Simplest A memory block is mapped to a fixed cache block frame (many to one mapping) J = I mod N J  Cache block frame number I  Memory block number N  number of cache block frames

Address Format Memory  M blocks Block size  B words Cache  N blocks Address size log 2 (M * B) TagBlock frameWord log 2 Blog 2 NRemaining bits log 2 M/N

Example Memory  4K blocks Block size  16 words Address size log 2 (4K * 16) = 16 Cache  128 blocks TagBlock frameWord 475

Example (cont.) MemoryTagcache bits

Fully Associative Most flexible A memory block is mapped to any available cache block frame (many to many mapping) Associative Search

Address Format Memory  M blocks Block size  B words Cache  N blocks Address size log 2 (M * B) TagWord log 2 BRemaining bits log 2 M

Example Memory  4K blocks Block size  16 words Address size log 2 (4K * 16) = 16 Cache  128 blocks TagWord 412

Example (cont.) Memory Tagcache 12 bits

Set Associative Compromise between the other two Cache  number of sets Set  number of blocks A memory block is mapped to any available cache block frame within a specific set Associative Search within a set

Address Format Memory  M blocks Block size  B words Cache  N blocks Number of sets S  N/num of blocks per set Address size log 2 (M * B) log 2 B TagSetWord log 2 S Remaining bits log 2 M/S

Example Memory  4K blocks Block size  16 words Address size log 2 (4K * 16) = 16 Cache  128 blocks Num of blocks per set = 4 Number of sets = 32 4 TagSetWord 57

Example (cont.) Set 0 Tag cache 7 bits Set Memory

Comparison Simplicity Associative Search Cache Utilization Replacement

Replacement Techniques FIFO LRU MRU Random Optimal