(Superficial!) Review of Uniprocessor Architecture Parallel Architectures and Related concepts CS 433 Laxmikant Kale University of Illinois at Urbana-Champaign.

Slides:

Advertisements

Similar presentations

COMP375 Computer Architecture and Organization Senior Review.

Advertisements

Instruction Set-Intro

1 (Review of Prerequisite Material). Processes are an abstraction of the operation of computers. So, to understand operating systems, one must have a.

Chapter 2 Data Manipulation Dr. Farzana Rahman Assistant Professor Department of Computer Science James Madison University 1 Some sldes are adapted from.

ISA Issues; Performance Considerations. Testing / System Verilog: ECE385.

Chapter 10- Instruction set architectures

1 Parallel Scientific Computing: Algorithms and Tools Lecture #2 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.

Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.

Technical University of Lodz Department of Microelectronics and Computer Science Elements of high performance microprocessor architecture Memory system.

CMPE 421 Parallel Computer Architecture MEMORY SYSTEM.

CS 258 Parallel Computer Architecture Lecture 15.1 DASH: Directory Architecture for Shared memory Implementation, cost, performance Daniel Lenoski, et.

Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.

PZ13A Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ13A - Processor design Programming Language Design.

Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.

1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.

1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.

1: Operating Systems Overview

RISC By Don Nichols. Contents Introduction History Problems with CISC RISC Philosophy Early RISC Modern RISC.

1  Caches load multiple bytes per block to take advantage of spatial locality  If cache block size = 2 n bytes, conceptually split memory into 2 n -byte.

Chapter 12 CPU Structure and Function. Example Register Organizations.

1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.

Multiprocessor Cache Coherency

Instruction Sets and Pipelining Cover basics of instruction set types and fundamental ideas of pipelining Later in the course we will go into more depth.

Basics and Architectures

LOGO OPERATING SYSTEM Dalia AL-Dabbagh

Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.

Invitation to Computer Science 5th Edition

CS 1308 Computer Literacy and the Internet Computer Systems Organization.

CMPE 421 Parallel Computer Architecture

IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.

Previously Fetch execute cycle Pipelining and others forms of parallelism Basic architecture This week we going to consider further some of the principles.

Chapter 5: Computer Systems Organization Invitation to Computer Science, Java Version, Third Edition.

Computer Systems Organization CS 1428 Foundations of Computer Science.

(Superficial!) Review of Uniprocessor Architecture Parallel Architectures and Related concepts CS 433 Laxmikant Kale University of Illinois at Urbana-Champaign.

Chapter 1 An Introduction to Processor Design 부산대학교 컴퓨터공학과.

Memory Management – Page 1 of 49CSCI 4717 – Computer Architecture Memory Management Uni-program – memory split into two parts –One for Operating System.

CS1104 – Computer Organization PART 2: Computer Architecture Lecture 12 Overview and Concluding Remarks.

ECE200 – Computer Organization Chapter 9 – Multiprocessors.

Future of parallel computing: issues and directions Laxmikant Kale CS433 Spring 2000.

1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=

M U N - February 17, Phil Bording1 Computer Engineering of Wave Machines for Seismic Modeling and Seismic Migration R. Phillip Bording February.

Pipelining and Parallelism Mark Staveley

Basic Memory Management 1. Readings r Silbershatz et al: chapters

CS 1308 Computer Literacy and the Internet. Objectives In this chapter, you will learn about:  The components of a computer system  Putting all the.

1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )

Distributed shared memory u motivation and the main idea u consistency models F strict and sequential F causal F PRAM and processor F weak and release.

Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 1: Overview of High Performance Processors * Jeremy R. Johnson Wed. Sept. 27,

Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=

High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.

Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.

1 Processor design Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section 11.3.

CMSC 611: Advanced Computer Architecture

Memory COMPUTER ARCHITECTURE

Chapter 8: Main Memory.

Instruction Set Architecture

Ramya Kandasamy CS 147 Section 3

Interconnection topologies

How will execution time grow with SIZE?

Overview Introduction General Register Organization Stack Organization

Cache Memory Presentation I

Computer Architecture

Introduction to Multiprocessors

Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.

Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics

CS 3410, Spring 2014 Computer Science Cornell University

Processor design Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section 11.3.

Processor design Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section 11.3.

Presentation transcript:

(Superficial!) Review of Uniprocessor Architecture Parallel Architectures and Related concepts CS 433 Laxmikant Kale University of Illinois at Urbana-Champaign Department of Computer Science

Early machines We will present a series of idealized and simplified models –Read more about the real models in architecture textbooks official prereq: cs232, cs333 –The idea here to review the concepts and define our vocabulary Processor Memory Location 0 Location 1 Location k

Early machines Early machines: Complex instruction sets, (lets say) no registers –Processor can access any memory location equally fast –Instructions: Operations: Add L1, L2, L3 (Add contents of Location L1 to that of Location L2, and store results in L3.) Branching: Branch to L4 (Note that some locations store program instructions), Coonditional Branching: If (L1>L2) goto L3 Processor Memory Location 0 Location 1 Location k

Registers Processors are faster than memory –they can deal with data within the processor much faster So, create some locations in processor for storing data –Called registers; Often with a special register called Accumulator Now we need new instructions for dealing with data in registers: –Data movement instructions Move from register to memory, memory to register, register to register, and memory to memory –Computation instructions: In addition to the previous ones, we now add instructions to allow one or more operands being a register CPU registers Memory Processor

Load-Store architectures (RISC) Do not allow memory locations to be operands –For computations as well as control instructions Only instructions to reference memory are: –Load R, L # move contents of L into register R –Store R, L # move contents of register R into memory location L Notice that the number of instructions is now dramatically reduced –Further, allow only relatively simple instructions to do register-to- register operations –More complex operations implemented in software –Compiler has a bigger responsibility now

Caches The processor still has to wait for data from memory –I.e. Load and Store instructions are slower –Although more often the CPU is executing register-only instructions –Load and store latency Dictionary meaning: latency is the delay between stimulus and response OR: delay between a data-transfer instruction and beginning of data transfer But, faster SRAM memory is available (although expensive) Idea: just like registers, put some more of data in faster memory –Which data?? –Principle of locality: (empirical observation) Data accessed correlates with past accesses, spatially and temporarily Without this, caches will be worthless (unless most data fits in cache)

Caches Processor Memory Cache Processor still issues load and store instructions as before, but the cache controller intercepts the requests, and if the location has been cached, deals with it using cache Data transfer between cache and memory is not seen by the processor Cache controller

Cache Issues Level 2 cache Cache lines –Bring a bunch of data “at once” : exploit spatial locality block transfers are faster – byte cache lines typical –Trade-off: or why larger and large cache lines aren’t good either

Cache blocks and Cache Lines Processor Memory Cache A cache block is a physical part of the cache. A cache line is a section of the address space. A line is brought into a cache block. Of course, line-size and block-size are the same. Cache controller L1 block

Cache Management How is cache managed? –Its job: given an address, find if it is cache, and return contents if so. Also, write data back to memory when needed and bring data from the memory when needed –Ideally, a fully associative cache will be good Keep cache lines anywhere in the physical cache But looking up is hard

Cache management Alternative scheme: –Each cache line (I.e. address) has exactly one place in the cache memory where it can be stored. –Of course, there are more than one cache lines that will have the same area of cache memory as their possible target Why? –Only one cache line can live inside a cache block at a time –If you want to bring in a new one, the old one must be “emptied” A tradeoff: set-associative caches –Have each line map to more than 1 (say 4) physical locations

Parallel Machines: an abstract introduction Our main focus will be on three kinds of machines –Bus-based shared memory machines –Scalable shared memory machines Cache coherent Hardware support for remote memory access –Distributed memory machines

Bus based machines PE0PE1 PE N-1 Mem0Mem1 Memk

Bus based machines Any processor can access any memory location –Read and write Bus bandwidth is a limiting factor Also, how do you deal with 2 processors changing the same data? –Locks (more on this later)

Scalable shared memory m/cs PE0 Interconnection Network with support for remote memory access Mem0 Not popular, as all data is slow to access

Distributed memory m/cs Interconnection Network PE0 Mem0 PEp Memp PE1 Mem1

Introducing caches into the picture! Now, we have more complex problems : –can’t be fixed by locks alone: –copy of the same variables in two different caches may contain different values. Cache controller must do more PE0PE1 PE p-1 Mem0Mem1 Mem p-1 cache

Distributed memory m/cs Interconnection Network PE0 Mem0 cache Pep-1 Memp-1 cache PE1 Mem1 cache

Writing parallel programs Programming model –How should a programmer view the parallel machine? –Sequential programming: von Neumann model Parallel programming models: –Shared memory (Shared address space) model –Message passing model –Shared Objects model