Download presentation
Presentation is loading. Please wait.
Published byWesley Simon Modified over 8 years ago
2
1 Computer Architecture & Assembly Language Spring 2001 Dr. Richard Spillman Lecture 26 – Alternative Architectures
3
2 Semester Topics PLU 1 Alternatives CPU Disk Memory I/O ALU Assembly Microprogramming Alternatives Cache Virtual Structure Operation Network
4
3 Review – Last Lecture Introduction to Parallel Computing MIMD Structures
5
4 Review – Cache Coherency Informally: Any read must return the most recent write Too strict and very difficult to implement Better: Any write must eventually be seen by a read All writes are seen in order (“serialization”) Two rules to ensure this: If P writes x and P1 reads it, P’s write will be seen if the read and write are sufficiently far apart Writes to a single location are serialized: seen in one order Latest write will be seen Otherwise could see writes in illogical order (could see older value after a newer value)
6
5 Outline Cache Coherency Introduction to Array Architectures It’s too much my circuits hurt
7
6 Cache Problem Consider a simple 3 processor system From its own cache
8
7 Consistency & Coherence Informal coherence model any read of a data item should return the value that was most recently written appealing but way too simple in the SMP world Simple model contains 2 behavioral aspects coherence defines what value is returned by a read consistency defines when a written value will be returned by a read this is a problem since a write at processor 1 may have happened but still not have left the processor by the time a read at processor 2 happens note that this when problem gets more difficult as the physical extent of the multiprocessor system increases both critical to writing correct shared memory programs
9
8 Coherence A memory system is coherent if: given no other writes, a processor P will get its last written value to X on a read of X simply preserves program order and is the normal expectation given no other writes, a read by P1 of X gets the value written by P2 if the read and write are sufficiently separated writes to the same location are serialized fundamental need to avoid the concurrent writer problem Implication reordering reads is OK similar to uniprocessor world writes must finish in program order (write serialization) definitely not the same as in the uniprocessor world this restriction will be relaxed later
10
9 Enforcing Coherence Directory Based keep sharing and block status in a directory directory may be centralized or distributed Snooping take advantage of the common connection to the bus caches monitor transactions on the bus see writes to shared data modify contents of their copy if they have one
11
10 Snooping Cache controllers monitor the bus for memory operations: Snoop tag keeps track of cache block status it can take on the values: invalid, valid
12
11 Snooping Protocols There are two ways to maintain coherence: One is to ensure that a processor has exclusive access to a data block before it writes to that block. This is called a write invalidate protocol because all other copies of the block are invalidated on a write. The other way is to update all of the cached copies of a data block whenever writes to that block occur. This is called a write update. To keep down the bandwidth requirements, it is necessary to keep track of which cache blocks are shared and then broadcast or update only on writes to shared blocks.
13
12 Write Invalidate
14
13 Example Write Update
15
14 Array Architecture Used for solving problems which involve vector operations air traffic control signal processing pattern recognition matrix operations Control Processor Control Memory PE 1 Memory 1 PE n Memory n Interconnection Network Data & I/O bus The issue is: How is this organized?
16
15 Permutations A complete interconnection network is the fastest but the most expensive, if we are willing to trade- off time, we can reduce cost GOAL GOAL: All that is necessary is to insure that data at processor I can get to processor J this can be represented by a permutation Now, there are N! permutations of N processors while there are N 2 switches in a complete network With N 2 switches there are 2 M states (M=2 N ) which is much greater than N! So, a complete interconnection network gives us more connections than we need For n=10 n! = 3,628,800 2 m = 1.27 x 10 30
17
16 Permutation Networks A permutation network for n processors requires: about 2ln(n - 1) of these 2x2 crossbar switch stages each stage has n/2 switches 1 2 1 2 2 1 or Start with a basic 2x2 crossbar switch
18
17 Example For 6 processors: 0 1 2 5 2 3 3 0 4 5 4 1
19
18 Cyclic Network This is the simplest network connection pattern Each processor is directly connected to its neighbor It works well for algorithms which involve vector operations of the form: x(I) = k 1 x(I-1) + k 2 x(I) + k 3 x(I+1) 0 1 2 3
20
19 Cyclic Perfect Shuffle The cyclic perfect shuffle adds some additional direct communication links Effective for matrix operations, FFT, sorting,... Note: for 0 to communicate with 4 in a cyclic architecture requires the transfers: 0 - 1 - 2 - 3 - 4 In this architecture in requires: 0 - 1 - 4 0123 45 67
21
20 Barrel Shifter Architecture A PE in a barrel shifter is connected to other PE’s whose distance is 1, 2, 4, 8,..., 2 n-1 from the source PE (number of PE’s is 2 n ) EXAMPLE (n = 3)
22
21 Barrel Shifter Connections To transfer data between any two PE’s, the upper bound of the number of intermediate PE’s is B where: B <= (log 2 N)/2 So for N = 8 at most 1 extra PE will be required in a transmission
23
22 Ring View
24
23 Omega Network An Omega Network is sometimes called a shuffle-exchange network it is much like the perfect shuffle It can be implemented using switch boxes 0123 45 67 1 2 1 2 2 1 or
25
24 Implementation Setting the switches produces one of the Omega paths
26
25 16x16 Network How can you easily find a path in this network?
27
26 Omega Routing Algorithm Routing algorithm is simple based on destination address only e. g. going to 7 - then address is 0111 means top, bottom, bottom, bottom
28
27 Summary Cache Coherency Introduction to Array Architectures It wasn’t so bad after all
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.