Download presentation
Presentation is loading. Please wait.
Published byBrice Chambers Modified over 9 years ago
2
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors
3
1997 Frank Casilio 2 Computer Engineering Problems with MultiProcessors Memory Latency Context Switching Time Communication/Synchronization Latency Cache Coherence Writes To Memory Poor Programming Model
4
1997 Frank Casilio 3 Computer Engineering Motivation Reduce/Tolerate Memory Latency General Purpose Machine Scalability Shared Memory Simpler Programming Model
5
1997 Frank Casilio 4 Computer Engineering Typical Ways To Reduce Latency On-Chip Cache Shortens Round Trip To Memory Fast Buses & Networks Hardware Synchronization Prefetching
6
1997 Frank Casilio 5 Computer Engineering Multi-Threading: The Concept Support For Multiple Concurrent Hardware ContextsSupport For Multiple Concurrent Hardware Contexts Tolerates Latency Instead of Reducing It Swap Contexts During Latencies Experimental Systems Have Existed Since The 50’s Only 2 Commercial Systems Ever Produced HEP Tera MTA
7
1997 Frank Casilio 6 Computer Engineering Parameters That Effect Efficiency Number Of Contexts Supported Switching Overhead Run Length (Granularity) Average Latency To Be Hidden
8
1997 Frank Casilio 7 Computer Engineering Switching Theory Determines How Often Contexts Switch Two Different Types Fine Grained Coarse Grained Directly Related to Cost
9
1997 Frank Casilio 8 Computer Engineering Fine Grained Switching Switches Contexts Every Cycle Many Long Latencies Operations Tolerated Requires More Contexts Workload Requirements Can Simplify Overall Processor Complexity
10
1997 Frank Casilio 9 Computer Engineering Coarse Grained Switching Switches Contexts After A Couple Of Cycles Has Problems With Sporadic Latencies Requires Less Contexts Requires More Complex Processors
11
1997 Frank Casilio 10 Computer Engineering The TERA MTA First Commercial Multithreaded Machine Since 1978 Uniform Shared Memory Scalable Direct Relationship b/w PE’s & Throughput Fine Grained Architecture
12
1997 Frank Casilio 11 Computer Engineering The Tera MTA Cont’d Torodial Interconnection 12 Million Dollar Base System 16-256 Processor Versions
13
1997 Frank Casilio 12 Computer Engineering Processor Characteristics Support For 128 Threads 16 Protection Domains 333 MHz Nominal Speed 0 Context Switching Overhead!!! 1 GFLOP Peak Performance
14
1997 Frank Casilio 13 Computer Engineering Processor Characteristics Cont’d Load-Store Architecture 3 Addressing Modes 31 64-bit GPR’s 3 Operations Per Instruction 3 Operations Per Instruction 1 Memory Reference 1 Arithmetic Operation 1 Control (i.e.. Branch) 6KW Of Power Dissipation Per Processor
15
1997 Frank Casilio 14 Computer Engineering Interconnection Network 3-D Torus Contains 3p/2 nodes Packet Switching 3 Cycles of Latency Per Node Messages Are Assigned Random Priorities 164 Bit Packets 64 Bits Are Data 2.67 GB/s Bandwidth In Each Direction 2 HIPPI Channels / Processor For Net Connection
16
1997 Frank Casilio 15 Computer Engineering Memory 8, 16, 32 and 64 Bit Addressable 4 Bits per Word Of Access State For Synchronization Memory Units Equipped With Error Correcting Code Memory Usage In Random To All Banks Either 2p or 4p Units, Interleaved 64 Ways 16 MB DRAM Chips
17
1997 Frank Casilio 16 Computer Engineering Input / Output Maximum Strategy Gen5 XL RAID Sustained Bandwidth of 130 MB/s At Least p/16 Disk Arrays Are Required System Capacity of 300p GB 20p MB/s In Each Direction
18
1997 Frank Casilio 17 Computer Engineering Operating System Distributed Parallel Version Of Unix Highly Concurrent Version Of Berkeley Allows Systems To Run p Tasks Truly Parallel Streams Are Dynamically Created w/o OS Intervention Processes Are Broken Up Into Tasks By OS Two Tier Scheduler Provides Better Resource Allocation PL Scheduler PB Scheduler
19
1997 Frank Casilio 18 Computer Engineering Software / Languages Implicit And Explicit Parallelism Is Allowed Automatic Parallelization Of: C, C++ & Fortran By The Compiler High Degree of Cray Compatibility Easy To Program b/c Of Architecture
20
1997 Frank Casilio 19 Computer Engineering System Performance 3.84-12.8 Times Performance Of Cray T90/32 1K x 1K Matrix Multiple in 50 ms Integer Sort of 100M Keys in 36 ms
21
1997 Frank Casilio 20 Computer Engineering Conclusion Proven Effectiveness Logical Step For Multiprocessor Computers Still Very Pricey Allow General Purpose Workload Scalable Shared Memory
22
1997 Frank Casilio 21 Computer Engineering Questions?
23
1997 Frank Casilio 22 Computer Engineering Instruction Pipeline
24
1997 Frank Casilio 23 Computer Engineering Breakdown Of A Task Task Team VP
25
1997 Frank Casilio 24 Computer Engineering
26
1997 Frank Casilio 25 Computer Engineering Deciding The Of Number Contexts
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.