Download presentation
Presentation is loading. Please wait.
1
Trends in Cluster Architecture Steve Lumetta David Culler University of California at Berkeley Computer Science Division
2
Lessons from the NOW Project how to build a system uniprocessors and fast networks parallel and sequential jobs simultaneously no operating system changes questions for the future “killer” applications? requirements for hardware? the next step?
3
Infrequently Cited Quotations Bob Lucky said (to our graduating class), “Technology is running away from us…that’s Moore’s Law.” Steve Lumetta says (to his key application vendor), “If all you can give me is Moore’s Law, you’re history!”
4
Applications of Parallelism enterprise computing growing market optimized parallel versions important applications databases (DB2 on SP-2) internet services (Inktomi and TranSend on NOW) collaborative environments others? hardware requirements efficient inter-process communication reasonable per-processor I/O bandwidth
5
Outline motivation clusters of SMP’s communication abstraction model of shared resources conclusions
6
network cloud memory interconnect SMP memory network cards SMP memory SMP Hardware memory trends larger, slower memory affinity increasingly important SMP’s minimize penalties lower latency higher throughput
7
network cloud memory interconnect SMP memory network cards SMP memory Cluster Software explicit control of locality operating system compiler/runtime programmer high availability multiple peer operating systems dynamic resource partitions
8
An Important Component: Message-Passing within an address space synchronize data transfer ship control to hot cache serialize access to complex data structure optimize DSM protocols (SMP-Shasta) between address spaces support DSM (Cashmere-2L, Shasta) communicate between operating systems
9
send a message shared memory network communication layer poll for messages A Uniform Communication Interface hierarchical hardware single interface for message-passing hides multi-protocol complexity allows for optimization design issues shared data layout queue algorithm polling strategy
10
concurrent message queue sender receiver Shared Memory Protocol Design one queue per receiver less memory than 1-to-1 queues longer queues reduce impact of overflow reduce coherence traffic (50-80 cycles each) avoid false sharing use cache-aligned data require atomic queue operations
11
Lock-Free Queue Algorithm index Fetch&Increment (qˆ.tail) mod Q_LENGTH while TRUE if Compare&Swap (qˆ.packet[index].type, FREE, CLAIMED) return index; (back off exponentially and poll) head packets tail direction of advance
12
Advantages of the Lock-Free Algorithm very simple; tightly coupled to data structure versus simple spin lock: slightly higher overhead less vulnerable to contention effective for multiprogramming avoids mutual exclusion rarely blocks (except when queue is full)
13
send a message shared memory network communication layer poll for messages Polling Strategy poll costs differ by an order of magnitude simple polling adversely impacts fast protocol use adaptive polling strategy monitor incoming traffic recent history determines polling frequency
14
Send Overhead via Shared Memory Sun Enterprise 5000 server with 167MHz Ultrasparc processors bus transactions: 32% of total time more expensive on Enterprise 10000 increase in future need control over coherence policy
15
Shared Resource Model processors alternate between two queues private idle queue shared communication queue communication queue single server server-sharing discipline processor characterization utilization u (from 0 to 1) duty cycle when P=1 2P communication queue 1 idle queues...
16
Communication Queue Scaling many small resourcesone large resource 2P communication queue 1 idle queues... 21N 2P communication queue 1 idle queues... N
17
Application Slowdown Metric three regimes correlated: worst case independent: speedup at low utilization scheduled: maximum benefit correlated scheduled independent
18
The Effect of Resource Scaling
19
Conclusions: The Future of Clusters hardware clusters of SMP’s (Clumps) scalable I/O capability cache coherence control software dynamic resource partitions focus on data affinity efficient message-passing communication abstraction uniform interface lock-free algorithm adaptive polling strategy
20
Trend: research era, introduction to industry, use by industry SMP’s: early 80’s, etc. Clusters: last 5 years have been culmination of research era Viewed over time, approaches to system design usually divide into three eras. The first is an era of research and prototypes; a few machines are produced, and a few may be sold, but no real market is created. Why does parallelism matter?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.