1 Basic Components of a Parallel (or Serial) Computer CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM.

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
1 Parallel Scientific Computing: Algorithms and Tools Lecture #3 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
CMPE 421 Parallel Computer Architecture MEMORY SYSTEM.
© Karen Miller, What do we want from our computers?  correct results we assume this feature, but consider... who defines what is correct?  fast.
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Chapter 7 Large and Fast: Exploiting Memory Hierarchy Bo Cheng.
Memory Hierarchy Design Chapter 5 Karin Strauss. Background 1980: no caches 1995: two levels of caches 2004: even three levels of caches Why? Processor-Memory.
The Memory Hierarchy II CPSC 321 Andreas Klappenecker.
ENGS 116 Lecture 121 Caches Vincent H. Berk Wednesday October 29 th, 2008 Reading for Friday: Sections C.1 – C.3 Article for Friday: Jouppi Reading for.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Parallel Processing Architectures Laxmi Narayan Bhuyan
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
CS 524 (Wi 2003/04) - Asim LUMS 1 Cache Basics Adapted from a presentation by Beth Richardson
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Flynn’s Taxonomy of Computer Architectures Source: Wikipedia Michael Flynn 1966 CMPS 5433 – Parallel Processing.
Systems I Locality and Caching
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.
CMPE 421 Parallel Computer Architecture
CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)
SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU.
CSE Advanced Computer Architecture Week-1 Week of Jan 12, 2004 engr.smu.edu/~rewini/8383.
Parallel Computing.
CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
Outline Why this subject? What is High Performance Computing?
EKT303/4 Superscalar vs Super-pipelined.
Lecture 3: Computer Architectures
Parallel Processing Presented by: Wanki Ho CS147, Section 1.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
1  2004 Morgan Kaufmann Publishers Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality:
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.
SOFTENG 363 Computer Architecture Cache John Morris ECE/CS, The University of Auckland Iolanthe I at 13 knots on Cockburn Sound, WA.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
CPS 258 Announcements –Lecture calendar with slides –Pointers to related material.
SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU.
COMPSYS 304 Computer Architecture Cache John Morris Electrical & Computer Enginering/ Computer Science, The University of Auckland Iolanthe at 13 knots.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
A PRESENTATION ON VIRTUAL MEMORY (PAGING) Submitted to Submitted by Prof. Dr. Ashwani kumar Ritesh verma Dept. Of Physics Mtech (Instrumentation) Roll.
The Goal: illusion of large, fast, cheap memory
buses, crossing switch, multistage network.
CS 147 – Parallel Processing
Cache Memory Presentation I
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Chapter 17 Parallel Processing
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Parallel Processing Architectures
Lecture 24: Memory, VM, Multiproc
buses, crossing switch, multistage network.
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Presentation transcript:

1 Basic Components of a Parallel (or Serial) Computer CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM

2 Processor Related Terms RISC: Reduced Instruction Set Computers PIPELINE : Technique where multiple instructions are overlapped in execution SUPERSCALAR: Multiple instructions per clock period

3 Network Interconnect Related Terms LATENCY : How long does it take to start sending a "message"? Units are generally microseconds or milliseconds. BANDWIDTH : What data rate can be sustained once the message is started? Units are bytes/sec, Mbytes/sec, Gbytes/sec etc. TOPLOGY: What is the actual ‘shape’ of the interconnect? Are the nodes connect by a 2D mesh? A ring? Something more elaborate?

4 Memory/Cache Related Terms CPU MAIN MEMORY Cache CACHE : Cache is the level of memory hierarchy between the CPU and main memory. Cache is much smaller than main memory and hence there is mapping of data from main memory to cache.

5 Memory/Cache Related Terms ICACHE : Instruction cache DCACHE (L1) : Data cache closest to registers SCACHE (L2) : Secondary data cache –Data from SCACHE has to go through DCACHE to registers –SCACHE is larger than DCACHE –All processors do not have SCACHE TLB : Translation-lookaside buffer keeps addresses of pages ( block of memory) in main memory that have recently been accessed

6 Memory/Cache Related Terms (cont.) CPU MEMORY (e.g., L1 cache) MEMORY (e.g., L2 cache) MEMORY (e.g., DRAM)

7 Memory/Cache Related Terms (cont.) The data cache was designed with two key concepts in mind –Spatial Locality When an element is referenced its neighbors will be referenced too Cache lines are fetched together Work on consecutive data elements in the same cache line –Temporal Locality When an element is referenced, it might be referenced again soon Arrange code so that date in cache is reused as often as possible

8 Memory/Cache Related Terms (cont.) Direct mapped cache: A block from main memory can go in exactly one place in the cache. This is called direct mapped because there is direct mapping from any block address in memory to a single location in the cache. cache Main memory

9 Memory/Cache Related Terms (cont.) Fully associative cache : A block from main memory can be placed in any location in the cache. This is called fully associative because a block in main memory may be associated with any entry in the cache. cache Main memory

10 Memory/Cache Related Terms (cont.) Set associative cache : The middle range of designs between direct mapped cache and fully associative cache is called set-associative cache. In a n-way set-associative cache a block from main memory can go into n (n at least 2) locations in the cache. 2-way set-associative cache Main memory

11 Memory/Cache Related Terms (cont.) Least Recently Used (LRU) : Cache replacement strategy for set associative caches. The cache block that is least recently used is replaced with a new block. Random Replace : Cache replacement strategy for set associative caches. A cache block is randomly replaced.

12 Types of Parallel Computers Until recently, Flynn's taxonomy was commonly use to classify parallel computers into one of four basic types: –Single instruction, single data (SISD): single scalar processor –Single instruction, multiple data (SIMD): Thinking machines CM-2 –Multiple instruction, single data (MISD): various special purpose machines –Multiple instruction, multiple data (MIMD): Nearly all parallel machines

13 However, since the MIMD model “won,” a much more useful way to classify modern parallel computers is by their memory model –shared memory –distributed memory –(more recently) hybrid of the above two (also called multi-tiered, CLUMPS )

14 PPPPPP BUS / Cross Bar Memory Shared and Distributed memory Shared memory: single address space. All processors have access to a pool of shared memory. (examples: Sun ES10000) Methods of memory access : - Bus - Crossbar Distributed memory: each processor has it’s own local memory. Must do message passing to exchange data between processors. (examples: IBM SP, CRAY T3E)