1 Computer Architecture & Assembly Language Spring 2001 Dr. Richard Spillman Lecture 26 – Alternative Architectures.

Slides:



Advertisements
Similar presentations
Chapter 5 Part I: Shared Memory Multiprocessors
Advertisements

The University of Adelaide, School of Computer Science
Computer Architecture 2011 – coherency & consistency (lec 7) 1 Computer Architecture Memory Coherency & Consistency By Dan Tsafrir, 11/4/2011 Presentation.
CS252/Patterson Lec /23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem.
1 Lecture 1: Parallel Architecture Intro Course organization:  ~5 lectures based on Culler-Singh textbook  ~5 lectures based on Larus-Rajwar textbook.
1 Lecture 1: Introduction Course organization:  4 lectures on cache coherence and consistency  2 lectures on transactional memory  2 lectures on interconnection.
1 Lecture 18: Coherence Protocols Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections )
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
1 Shared-memory Architectures Adapted from a lecture by Ian Watson, University of Machester.
Multiprocessor Cache Coherency
Computer Architecture 2015 – Cache Coherency & Consistency 1 Computer Architecture Memory Coherency & Consistency By Yoav Etsion and Dan Tsafrir Presentation.
Switches and indirect networks Computer Architecture AMANO, Hideharu Textbook pp. 92~13 0.
Lecture 13: Multiprocessors Kai Bu
Ch4. Multiprocessors & Thread-Level Parallelism 2. SMP (Symmetric shared-memory Multiprocessors) ECE468/562 Advanced Computer Architecture Prof. Honggang.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Super computers Parallel Processing
1 Lecture 3: Coherence Protocols Topics: consistency models, coherence protocol examples.
1 Lecture: Coherence Protocols Topics: snooping-based protocols.
1 Lecture 17: Multiprocessors Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )
1 Lecture: Coherence Topics: snooping-based coherence, directory-based coherence protocols (Sections )
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
The University of Adelaide, School of Computer Science
Multi Processing prepared and instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University June 2016Multi Processing1.
CS 704 Advanced Computer Architecture
COSC6385 Advanced Computer Architecture
COMP 740: Computer Architecture and Implementation
Parallel Architecture
תרגול מס' 5: MESI Protocol
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Interconnection Networks (Part 2) Dr.
Dynamic connection system
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
CS 147 – Parallel Processing
CS 704 Advanced Computer Architecture
Lecture 18: Coherence and Synchronization
Multiprocessor Cache Coherency
The University of Adelaide, School of Computer Science
CMSC 611: Advanced Computer Architecture
Krste Asanovic Electrical Engineering and Computer Sciences
Example Cache Coherence Problem
Flynn’s Taxonomy Flynn classified by data and control streams in 1966
The University of Adelaide, School of Computer Science
Lecture 2: Snooping-Based Coherence
CMSC 611: Advanced Computer Architecture
Outline Interconnection networks Processor arrays Multiprocessors
Multiprocessors - Flynn’s taxonomy (1966)
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
High Performance Computing & Bioinformatics Part 2 Dr. Imad Mahgoub
/ Computer Architecture and Design
Lecture 25: Multiprocessors
High Performance Computing
Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini
Lecture 25: Multiprocessors
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
Lecture 24: Multiprocessors
Lecture: Coherence Topics: wrap-up of snooping-based coherence,
Coherent caches Adapted from a lecture by Ian Watson, University of Machester.
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 18: Cache Coherence
Lecture 19: Coherence and Synchronization
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
Lecture 10: Directory-Based Examples II
Lecture 17 Multiprocessors and Thread-Level Parallelism
Multiprocessors and Multi-computers
Presentation transcript:

1 Computer Architecture & Assembly Language Spring 2001 Dr. Richard Spillman Lecture 26 – Alternative Architectures

2 Semester Topics PLU 1 Alternatives CPU Disk Memory I/O ALU Assembly Microprogramming Alternatives Cache Virtual Structure Operation Network

3 Review – Last Lecture Introduction to Parallel Computing MIMD Structures

4 Review – Cache Coherency Informally: Any read must return the most recent write Too strict and very difficult to implement Better: Any write must eventually be seen by a read All writes are seen in order (“serialization”) Two rules to ensure this: If P writes x and P1 reads it, P’s write will be seen if the read and write are sufficiently far apart Writes to a single location are serialized: seen in one order Latest write will be seen Otherwise could see writes in illogical order (could see older value after a newer value)

5 Outline Cache Coherency Introduction to Array Architectures It’s too much my circuits hurt

6 Cache Problem Consider a simple 3 processor system From its own cache

7 Consistency & Coherence Informal coherence model any read of a data item should return the value that was most recently written appealing but way too simple in the SMP world Simple model contains 2 behavioral aspects coherence defines what value is returned by a read consistency defines when a written value will be returned by a read this is a problem since a write at processor 1 may have happened but still not have left the processor by the time a read at processor 2 happens note that this when problem gets more difficult as the physical extent of the multiprocessor system increases both critical to writing correct shared memory programs

8 Coherence A memory system is coherent if: given no other writes, a processor P will get its last written value to X on a read of X simply preserves program order and is the normal expectation given no other writes, a read by P1 of X gets the value written by P2 if the read and write are sufficiently separated writes to the same location are serialized fundamental need to avoid the concurrent writer problem Implication reordering reads is OK similar to uniprocessor world writes must finish in program order (write serialization) definitely not the same as in the uniprocessor world this restriction will be relaxed later

9 Enforcing Coherence Directory Based keep sharing and block status in a directory directory may be centralized or distributed Snooping take advantage of the common connection to the bus caches monitor transactions on the bus see writes to shared data modify contents of their copy if they have one

10 Snooping Cache controllers monitor the bus for memory operations: Snoop tag keeps track of cache block status it can take on the values: invalid, valid

11 Snooping Protocols There are two ways to maintain coherence: One is to ensure that a processor has exclusive access to a data block before it writes to that block. This is called a write invalidate protocol because all other copies of the block are invalidated on a write. The other way is to update all of the cached copies of a data block whenever writes to that block occur. This is called a write update. To keep down the bandwidth requirements, it is necessary to keep track of which cache blocks are shared and then broadcast or update only on writes to shared blocks.

12 Write Invalidate

13 Example Write Update

14 Array Architecture Used for solving problems which involve vector operations air traffic control signal processing pattern recognition matrix operations Control Processor Control Memory PE 1 Memory 1 PE n Memory n Interconnection Network Data & I/O bus The issue is: How is this organized?

15 Permutations A complete interconnection network is the fastest but the most expensive, if we are willing to trade- off time, we can reduce cost GOAL GOAL: All that is necessary is to insure that data at processor I can get to processor J this can be represented by a permutation Now, there are N! permutations of N processors while there are N 2 switches in a complete network With N 2 switches there are 2 M states (M=2 N ) which is much greater than N! So, a complete interconnection network gives us more connections than we need For n=10 n! = 3,628,800 2 m = 1.27 x 10 30

16 Permutation Networks A permutation network for n processors requires: about 2ln(n - 1) of these 2x2 crossbar switch stages each stage has n/2 switches or Start with a basic 2x2 crossbar switch

17 Example For 6 processors:

18 Cyclic Network This is the simplest network connection pattern Each processor is directly connected to its neighbor It works well for algorithms which involve vector operations of the form: x(I) = k 1 x(I-1) + k 2 x(I) + k 3 x(I+1)

19 Cyclic Perfect Shuffle The cyclic perfect shuffle adds some additional direct communication links Effective for matrix operations, FFT, sorting,... Note: for 0 to communicate with 4 in a cyclic architecture requires the transfers: In this architecture in requires:

20 Barrel Shifter Architecture A PE in a barrel shifter is connected to other PE’s whose distance is 1, 2, 4, 8,..., 2 n-1 from the source PE (number of PE’s is 2 n ) EXAMPLE (n = 3)

21 Barrel Shifter Connections To transfer data between any two PE’s, the upper bound of the number of intermediate PE’s is B where: B <= (log 2 N)/2 So for N = 8 at most 1 extra PE will be required in a transmission

22 Ring View

23 Omega Network An Omega Network is sometimes called a shuffle-exchange network it is much like the perfect shuffle It can be implemented using switch boxes or

24 Implementation Setting the switches produces one of the Omega paths

25 16x16 Network How can you easily find a path in this network?

26 Omega Routing Algorithm Routing algorithm is simple based on destination address only e. g. going to 7 - then address is 0111 means top, bottom, bottom, bottom

27 Summary Cache Coherency Introduction to Array Architectures It wasn’t so bad after all