Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multithreaded ASC Kevin Schaffer and Robert A. Walker ASC Processor Group Computer Science Department Kent State University.

Similar presentations


Presentation on theme: "Multithreaded ASC Kevin Schaffer and Robert A. Walker ASC Processor Group Computer Science Department Kent State University."— Presentation transcript:

1 Multithreaded ASC Kevin Schaffer and Robert A. Walker ASC Processor Group Computer Science Department Kent State University

2 Organization of an ASC Computer

3 Broadcast/Reduction Bottleneck Time to perform a broadcast or reduction increases as the number of PEs increases Even for a moderate number of PEs, this time can dominate the machine cycle time Pipelining reduces the cycle time but increases the latency Additional latency causes pipeline hazards

4 Instruction Types Scalar instructions  Execute entirely within the control unit Broadcast/Parallel instructions  Execute within the PE array  Use the broadcast network to transfer instruction and data Reduction instructions  Execute within the PE array  Use the broadcast network to transfer instruction and data  Use the reduction network to combine data from PEs

5 Scalar Pipeline Instruction Fetch (IF) Instruction Decode (ID) Execute (EX) Memory Access (M) Write Back (W)

6 Hazards in a Scalar Pipeline

7 Unified SIMD Pipeline Broadcast (B1...Bn) Reduction (R1...Rn) Number of stages is variable All instructions go through every stage

8 Diversified SIMD Pipeline Separate paths for each instruction type so instructions only go through stages that they use Stalls less often than a unified pipeline organization

9 Hazards

10 Multithreading Pipelining alone cannot eliminate hazards caused by broadcast and reduction latencies Solution: use instructions from multiple threads to keep the pipeline full Instructions from different threads are independent so they cannot generate stalls due to data dependencies As long as there are a sufficient number of threads, it is possible to fill any number of stall cycles

11 Types of Multithreading Coarse-grain multithreading switches to a new thread when the current thread encounters a high latency operation Fine-grain multithreading switches to a new thread every clock cycle Simultaneous multithreading can issue instructions from multiple threads in the same clock cycle For a SIMD processor, fine-grain or simultaneous multithreading is necessary as pipeline stalls are relatively short and occur frequently

12 Multithreaded Control Unit

13 Reduction Hazard with a Single Thread

14 Reduction Hazard with Multiple Threads

15 Execution Time vs. Latency

16 Throughput vs. Latency

17 Multithreaded ASC vs. MASC A multithreaded ASC computer can execute at most one instruction in a cycle A MASC computer with j instruction streams can execute up to j instructions in a cycle In multithread ASC each thread can access every PE In MASC each instruction stream can only access its partition of PEs A multithreaded MASC computer could combine the advantages of both

18 ASC

19 Multithreaded ASC

20 MASC

21 Multithreaded MASC

22 Multithreaded ASC Processor In order to validate simulation results and estimate hardware costs, a prototype processor was developed Targeted for an Altera Cyclone II (EP2C35) FPGA Using an FPGA makes it possible to get detailed measurements of speed and hardware cost

23 Additional Enhancements Flags (logical values) are a first-class data types with their own set of registers and instructions Extra reduction operators  Count Responders  Sum Hardware semaphores for thread synchronization

24 Synthesis Results Targeted for an Altera Cyclone II FPGA (EP2C35) 16 x 16-bit PEs 16 hardware threads Clock speed: 75 MHz


Download ppt "Multithreaded ASC Kevin Schaffer and Robert A. Walker ASC Processor Group Computer Science Department Kent State University."

Similar presentations


Ads by Google