Parallel Shared Memory

Slides:

Advertisements

Similar presentations

Symmetric Multiprocessors: Synchronization and Sequential Consistency.

Advertisements

Central Processing Unit

CS364 CH16 Control Unit Operation

Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.

CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.

Chapter 2 — Instructions: Language of the Computer — 1 Branching Far Away If branch target is too far to encode with 16-bit offset, assembler rewrites.

Computer Organization CS224 Fall 2012 Lesson 12. Synchronization  Two processors or threads sharing an area of memory l P1 writes, then P2 reads l Data.

Computer Systems. Computer System Components Computer Networks.

A. Frank - P. Weisberg Operating Systems Introduction to Cooperating Processes.

The Central Processing Unit (CPU) and the Machine Cycle.

Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.

Fetch-execute cycle.

Computer Systems - Registers. Starter… Discuss in pairs the definition of the following Control Unit Arithmetic and Logic Unit Registers Internal clock.

Represents different voltage levels High: 5 Volts Low: 0 Volts At this raw level a digital computer is instructed to carry out instructions.

COMPILERS CLASS 22/7,23/7. Introduction Compiler: A Compiler is a program that can read a program in one language (Source) and translate it into an equivalent.

LECTURE 7 Pipelining. DATAPATH AND CONTROL We started with the single-cycle implementation, in which a single instruction is executed over a single cycle.

Control Unit Operations Chapter10:. What is Control Unit (CU)?(1)  Part of a CPU or other device that directs its operation.  Tells the rest of the.

CMSC 104, Lecture 061 Stored Programs A look at how programs are executed.

Computer Organization Instructions Language of The Computer (MIPS) 2.

Computer Orgnization Rabie A. Ramadan Lecture 9. Cache Mapping Schemes.

December 1, 2006©2006 Craig Zilles1 Threads & Atomic Operations in Hardware  Previously, we introduced multi-core parallelism & cache coherence —Today.

C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface Chapter 2 Instructions: Language of the Computer.

Real-World Pipelines Idea Divide process into independent stages

The Little man computer

Single Cycle CPU.

Instruction Level Parallelism

Background on the need for Synchronization

Lecture 13 - Introduction to the Central Processing Unit (CPU)

William Stallings Computer Organization and Architecture 8th Edition

Operating Systems (CS 340 D)

Atomic Operations in Hardware

The University of Adelaide, School of Computer Science

Atomic Operations in Hardware

Chapter 9 a Instruction Level Parallelism and Superscalar Processors

Parallel Shared Memory

Micro-Operations A computer executes a program Fetch/execute cycle

William Stallings Computer Organization and Architecture

ADVANCED PROCESSOR ARCHITECTURE

William Stallings Computer Organization and Architecture 7th Edition

Morgan Kaufmann Publishers The Processor

Chapter 15 Control Unit Operation

CDA 3101 Spring 2016 Introduction to Computer Organization

The Processor and Machine Language

Morgan Kaufmann Publishers The Processor

Basic Processing Unit Unit- 7 Engineered for Tomorrow CSE, MVJCE.

Chapter 26 Concurrency and Thread

MIPS Processor.

MIPS Instruction Encoding

MARIE: An Introduction to a Simple Computer

William Stallings Computer Organization and Architecture 7th Edition

MIPS Instruction Encoding

BIC 10503: COMPUTER ARCHITECTURE

Multicycle Approach We will be reusing functional units

Background and Motivation

Chapter 14 Control Unit Operation

William Stallings Computer Organization and Architecture 8th Edition

Concurrency: Mutual Exclusion and Process Synchronization

Multicycle Design.

Pipelining: Basic Concepts

Kernel Synchronization II

Basic Processing Unit UNIT-5.

The University of Adelaide, School of Computer Science

CS333 Intro to Operating Systems

October 29 Review for 2nd Exam Ask Questions! 4/26/2019

Basic components Instruction processing

The University of Adelaide, School of Computer Science

Computer Architecture

Presentation transcript:

Parallel Shared Memory Multiple processors work on a series of related jobs, numbered 0, 1, 2, ..., n How to coordinate assignment of work? How to balance load?

Static assignment Solution 1, divide jobs statically (k processors) proc 0 does jobs 0, k, 2k, 3k, etc proc 1 does jobs 1, k+1, 2k+1, 3k+1, etc proc 3 does jobs 2, k+2, 2k+2, 3k+2, etc .

Dynamic assignment Static assignment can be bad if the jobs have varied execution times – a few processors may carry most of the computational load Dynamic assignment each processor starts with a job, same num as proc job counter is a memory loc set to next job to be done when a proc finishes a job, it gets the job count, increments it, and saves it back to memory, then executes that job (unless the count has reached the end of work)

Race condition Same code on different processors Assume X20 holds address of job counter, X19 holds the job number of this processor 1a, 2a jobstart: ldur X19, [X20, #0] 1b, 2b add X9, X19, 1 1c, 2c stur X9, [X20, #0] 1d, 2d <start work> … 1z, 2z b jobstart --------------------------------------------------------------------------------- Consider the execution sequence: 1a, 1b, 2a, 1c, 1d, 2b, 2c, 2d, .... 1z, ... Both processors load the same value into X19 and therefore do the same job.

Mutual Exclusion Need a method to allow only one processor to execute a block of code (the load, increment, save sequence) Several primitive synchronization methods are used in different systems: test and set register exchange fetch and increment These must be atomic operations – no intervening instruction can occur

ARMv8 solution two coordinated instructions ldxr rd,[rn, #0] load exclusive register stxr rt, rm, [rn] store exclusive register Always paired, ldxr followed by stxr for the same memory location (same rn) a stxr fails if the memory location has been changed after the ldxr instruction and before the stxr instruction (by another processor or thread) stxr success, set rm to 0 stxr failure, set rm to 1

Race Solution Same code on different processors (assume $s1 holds address of job counter) 1a, 2a jobstart: ldxr X19, [x20, #0] 1b, 2b add X9, X19, 1 1c, 2c stxr X9, X10, [x20] 1cc, 2cc cbnz X10, jobstart 1d, 2d <start work> … 1z, 2z j jobstart -------------------------------------------------------------------------------- Consider the execution sequence: 1a, 1b, 2a, 1c, 1cc – (memory value has not been changed so no branch) – 1d, 2b, 2c, 1e, 2cc – (between 2a ldxr and 2c stxr, instruction 1c stxr has changed the memory value, so branch) – 2a, 1f, 1g, 2b, 2c, 2cc – (memory value has not been changed this time, no branch) – 2d, … 1z, etc. The two processors get different job values.

Fetch and Increment The code sequence used to avoid the previous race condition is an example of a fetch-and-increment instruction implemented with ldxr and stxr. It is not an atomic assembly instruction but it has the same effect – a processor cannot move past this sequence unless it has executed in a way logically equivalent to its being atomic. 1a, 2a jobstart: ldxr X19, [x20, #0] 1b, 2b add X9, X19, 1 1c, 2c stxr X9, X10, [x20] 1cc, 2cc cbnz X10, jobstart

Test-and-Set How would you use ARMv8 ldxr and stxr instructions to implement the equivalent of an atomic test-and-set? Test-and-set atomically checks whether a memory location is set to 1, rather than 0, and at the same time sets it to 1

Register-Exchange Another atomic control instruction is register- exchange. For this instruction the value in a given register is exchanged with the value at the designated memory location. How would you implement the logical equivalent of an atomic register exchange instruction using ARMv8 ldxr and stxr?