Download presentation
Presentation is loading. Please wait.
1
CS 284a, 7 October 97Copyright (c) 1997-98, John Thornley1 CS 284a Lecture Tuesday, 7 October 1997
2
CS 284a, 7 October 97Copyright (c) 1997-98, John Thornley2 Multiprocessors and Multiprocessing Hardware: Multiprocessor computers have become commodity products, e.g., quad-processor Pentium Pros, SGI and Sun workstations. Programming: Multithreaded programming is supported by commodity operating systems, e.g., Windows NT, UNIX/Pthreads. Applications: Traditionally science and engineering. Now also business and home computing. Problem: Difficulty of multithreaded programming compared to sequential programming.
3
CS 284a, 7 October 97Copyright (c) 1997-98, John Thornley3 Why Buy a Multiprocessor? Multiple users. Multiple applications. Multitasking within an application. Responsiveness and/or throughput.
4
CS 284a, 7 October 97Copyright (c) 1997-98, John Thornley4 Multiprocessor Architectures Message-Passing Architectures –Separate address space for each processor. –Processors communicate via message passing. Shared-Memory Architectures –Single address space shared by all processors. –Processors communicate by memory read/write. –SMP or NUMA. –Cache coherence is important issue. Lots of middle ground and hybrids. No clear consensus on terminology.
5
CS 284a, 7 October 97Copyright (c) 1997-98, John Thornley5 Message-Passing Architecture... processor cache memory processor cache memory processor cache memory interconnection network...
6
CS 284a, 7 October 97Copyright (c) 1997-98, John Thornley6 Shared-Memory Architecture... interconnection network... processor 1 cache processor 2 cache processor N cache memory 1 memory M memory 2
7
CS 284a, 7 October 97Copyright (c) 1997-98, John Thornley7 Shared-Memory Architecture: SMP and NUMA SMP = Symmetric Multiprocessor –All memory is equally close to all processors. –Typical interconnection network is a shared bus. –Easier to program, but doesn’t scale to many processors. NUMA = Non-Uniform Memory Access –Each memory is closer to some processors than others. –a.k.a. “Distributed Shared Memory”. –Typically interconnection is grid or hypercube. –Harder to program, but scales to more processors.
8
CS 284a, 7 October 97Copyright (c) 1997-98, John Thornley8 Shared-Memory Architecture: Cache Coherence Effective caching reduces memory contention. Processors must see single consistent memory. Many different consistency models. Weak consistency is sufficient. Snoopy cache coherence for bus-based SMPs. Distributed directories for NUMA. Many implementation issues: multiple-levels, I-D separation, cache line size, update policy, etc. etc. Usually don’t need to know all the details.
9
CS 284a, 7 October 97Copyright (c) 1997-98, John Thornley9 Example: Quad-Processor Pentium Pro SMP, bus interconnection. 4 x 200 MHz Intel Pentium Pro processors. 8 + 8 Kb L1 cache per processor. 512 Kb L2 cache per processor. Snoopy cache coherence. Compaq, HP, IBM, NetPower. Windows NT, Solaris, Linux, etc.
10
CS 284a, 7 October 97Copyright (c) 1997-98, John Thornley10 Example: SGI Origin 2000 NUMA, hypercube interconnection. Up to 128 (64 x 2) MIPS R 10000 processors. 32 + 32 Kb L1 cache per processor. 4 Mb L2 cache per processor. Distributed directory-based cache coherence. Automatic page migration/replication. SGI IRIX with Pthreads.
11
CS 284a, 7 October 97Copyright (c) 1997-98, John Thornley11 Message-Passing versus Shared-Memory Architectures Shared-memory programming model is easier because data transfer is handled automatically. Proof: message passing can be efficiently implemented on shared memory, but not vice versa. How much of shared-memory programming model should be implemented in hardware? How efficient is shared-memory programming model? How well does shared-memory scale? Does scalablity really matter?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.