Pete Keleher, Alan L. Cox, Sandhya Dwarkadas and Willy Zwaenepoel

Slides:

Advertisements

Similar presentations

The Effect of Network Total Order, Broadcast, and Remote-Write on Network- Based Shared Memory Computing Robert Stets, Sandhya Dwarkadas, Leonidas Kontothanassis,

Advertisements

The University of Adelaide, School of Computer Science

Relaxed Consistency Models. Outline Lazy Release Consistency TreadMarks DSM system.

Multiple-Writer Distributed Memory. The Sequential Consistency Memory Model P1P2 P3 switch randomly set after each memory op ensures some serial order.

Distributed Shared Memory

Using DSVM to Implement a Distributed File System Ramon Lawrence Dept. of Computer Science

1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.

November 1, 2005Sebastian Niezgoda TreadMarks Sebastian Niezgoda.

Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Lightweight Logging For Lazy Release Consistent DSM Costa, et. al. CS /01/01.

Memory consistency models Presented by: Gabriel Tanase.

CSCS: A Concise Implementation of User-Level Distributed Shared Memory Zhi Zhai Feng Shen Computer Science and Engineering University of Notre Dame Dec.

Distributed Resource Management: Distributed Shared Memory

Chapter 11 Operating Systems

Memory Consistency Models

PRASHANTHI NARAYAN NETTEM.

File Systems (2). Readings r Silbershatz et al: 11.8.

Distributed Shared Memory Systems and Programming

Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.

TreadMarks Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel.

Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon.

Distributed Shared Memory (DSM)

Fast Multi-Threading on Shared Memory Multi-Processors Joseph Cordina B.Sc. Computer Science and Physics Year IV.

Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z.

ECE200 – Computer Organization Chapter 9 – Multiprocessors.

TECHNIQUES FOR REDUCING CONSISTENCY- RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS J. B. Carter University of Utah J. K. Bennett and W. Zwaenepoel.

TreadMarks Presented By: Jason Robey. Cool pic from last semester.

An Efficient Lock Protocol for Home-based Lazy Release Consistency Electronics and Telecommunications Research Institute (ETRI) 2001/5/16 HeeChul Yun.

1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.

Ch 10 Shared memory via message passing Problems –Explicit user action needed –Address spaces are distinct –Small Granularity of Transfer Distributed Shared.

Distributed Shared Memory Based on Reference paper: Distributed Shared Memory, Concepts and Systems.

Distributed Shared Memory Presentation by Deepthi Reddy.

Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix.

Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

DISTRIBUTED COMPUTING

Page 1 Distributed Shared Memory Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation.

A Design of User-Level Distributed Shared Memory Zhi Zhai Feng Shen Computer Science and Engineering University of Notre Dame Oct. 27, 2009 Progress Report.

TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems Present By: Blair Fort Oct. 28, 2004.

1 Chapter 9 Distributed Shared Memory. 2 Making the main memory of a cluster of computers look as though it is a single memory with a single address space.

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 March 20, 2008 Session 9.

Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z. By Nooruddin Shaik.

Memory Coherence in Shared Virtual Memory System ACM Transactions on Computer Science(TOCS), 1989 KAI LI Princeton University PAUL HUDAK Yale University.

Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.

OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ECE1747 – Parallel Programming Vicky Tsang.

Distributed shared memory u motivation and the main idea u consistency models F strict and sequential F causal F PRAM and processor F weak and release.

Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

Distributed Memory and Cache Consistency (some slides courtesy of Alvin Lebeck)

Software Coherence Management on Non-Coherent-Cache Multicores

Distributed Shared Memory

The University of Adelaide, School of Computer Science

The University of Adelaide, School of Computer Science

Lecture 18: Coherence and Synchronization

The University of Adelaide, School of Computer Science

Distributed Shared Memory

The University of Adelaide, School of Computer Science

Outline Midterm results summary Distributed file systems – continued

Distributed Shared Memory

Multiple Processor Systems

Exercises for Chapter 16: Distributed Shared Memory

Lecture 25: Multiprocessors

The University of Adelaide, School of Computer Science

Lecture 17 Multiprocessors and Thread-Level Parallelism

Ch 17 - Binding Protocol Addresses

Distributed Resource Management: Distributed Shared Memory

Lecture 17 Multiprocessors and Thread-Level Parallelism

Lecture 18: Coherence and Synchronization

The University of Adelaide, School of Computer Science

Lecture 17 Multiprocessors and Thread-Level Parallelism

Presentation transcript:

TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan L. Cox, Sandhya Dwarkadas and Willy Zwaenepoel Department of Computer Science, Rice University Winter Usenix Conference, 1994 Taemin Lee, Eunhyeok Park

Distributed Shared Memory Big Picture Distributed Shared Memory ATM Ethernet Benchmark (speedup) - Jacobi (x7.4) - TSP (x7.2) - Quicksort (x6.3) - ILINK (x5.7) - Water (x4.0)

Contents Overview Preliminaries: why Implementations: how Experiments Clusters of Workstations vs. Supercomputers Distributed Shared Memory vs. Others Release Consistency vs. Others Portability & Networking Benchmark Implementations: how Experiments

Cluster of Workstations vs. Supercomputers low-cost entry into parallel computing arena No special HW is required [Kleinrock85] L. Kleinrock, "Distributed Systems," Communications of the ACM (CACM) Vol. 28, No.11, November 1985, pp. 1200-1213

Distributed Shared Memory vs. Others InterProcess Communication Message passing RPC, RMI, … Indirect Communication DSM, Tuple spaces, … <Java UDP example> aSocket = new DatagramSocket(); DatagramPacket request = new DatagramPacket(m, m.length(), aHost, serverPort); aSocket.send(request);

Distributed Shared Memory Implementation Issues Structure Synchronization model Consistency model Update options Granularity Thrashing

Release Consistency vs. Others Consistency model Atomic consistency (linearizability) Sequential consistency Coherence Release consistency (weak consistency) Lock, semaphore Update options Write-invalidate protocols Write-update protocols False sharing, Thrashing

Lazy vs. Eager Consistency Eager + Release Consistency + Write Update Broadcast diffs to all processors @ release Lazy + Release Consistency + Write Invalidate Move diffs @ acquire Vector timestamp Processor A Processor B t “Happened before”

Multiple Writer Protocols To mitigate false sharing problem Many processors simultaneously modify a page Local copy -> modification merge Lazy diff creation Two copy of the page (twin) Diffs are only created @ acquire (Lazy) Runlength encoding

Portability Commercial, off-the-shelf product DECstation-5000/240 x 8 Unix (SunOS/Ultrix) User-level implementation

Networking Ethernet vs. ATM LAN Addressing vs. virtual circuit Frame relay Reliability/Flow control 10-Mb/s vs. 100-Mb/s TCP/IP vs. AAL3/4

Benchmark Jacobi, TSP, Quicksort, ILINK, Water Genetic LINKAGE package [16] Splash <- PARSEC HPC, Scientific studies of parallel machines, Shared memory PARSEC vs. SPLASH2: A Quantitative Comparison of Two Multithreaded Benchmark Suites on ChipMultiprocessors Strategies for multilocus linkage analysis in humans

Quick Summary Clusters of workstations c.f) supercomputers Distributed Shared Memory c.f) tuple spaces, message passing Portability Standard Unix systems c.f) in-house research platform Eight DECstation-5000/240 running Unix (SunOS/Ultrix) Ethernet/ATM network up to 100-Mb/s User-level implementation c.f) kernel modifications Efficiency Lazy release consistency c.f) eager, sequentially consistency Write invalidation c.f) write update Multiple writer protocols c.f) in-place write Lazy diff creation for false sharing Benchmark Jacobi, TSP, Quicksort, ILINK, Water (from LINKAGE, SPLASH suite)

Contents Overview Preliminaries: why Implementations: how Experiments Data structure Interval and diff creation Lock and barriers Access Misses, garbage collection and Unix aspects Experiments

Data Structure Overview

Data Structure Overview Page Array One entry for each shared page Current status No access, read-only, read-write Approximate copyset Specifying the set of processors that are believed to currently cache this page

Data Structure Overview Page Array One entry for each shared page Current status No access, read-only, read-write Approximate copyset Specifying the set of processors that are believed to currently cache this page Array indexed by processor Head and tail pointers to linked list of write notice records

Data Structure Overview Proc Array One entry for each processor Pointer to the head and tail of a doubly linked list of interval records

Data Structure Overview Proc Array One entry for each processor Pointer to the head and tail of a doubly linked list of interval records Interval Records Pointer to a list of write notice records for that interval

Data Structure Overview Diff pool store diffs for corresponding write notice records Write notice records has a pointer for the diffs in diff pool

Example Page Data Page Array Interval Timestamp Proc Array P0 R P0 I 1 Process 0 Process 1 P0 R P0 I Example Page Data 1 1 2 2 Page Array 3 3 P1 R P1 I 1 1 2 2 3 Interval 3 <0,0,0,0> <0,0,0,0> <0,0,0,0> <0,0,0,0> Timestamp 1 <0,0,0,0> 1 <0,0,0,0> 2 <0,0,0,0> 2 <0,0,0,0> Proc Array 3 <0,0,0,0> 3 <0,0,0,0> P0 I Process 2 P0 I Process 3 1 1 2 2 3 3 P1 I P1 I 1 1 2 2 3 3 <0,0,0,0> <0,0,0,0> <0,0,0,0> <0,0,0,0> 1 <0,0,0,0> 1 <0,0,0,0> 2 <0,0,0,0> 2 <0,0,0,0> 3 <0,0,0,0> 3 <0,0,0,0>

Example Proc 1 reads Page 0 P0 R P0 R 1 1 2 2 3 3 P1 R P1 I 1 1 2 2 3 Process 0 Process 1 P0 R P0 R Example 1 1 2 2 3 3 P1 R P1 I 1 1 2 2 Proc 1 reads Page 0 3 3 Assumption: initially the pages are stored at process 0 Fetch the data from Process 0 The page status is modified to readable TreadMarks structure is not modified <0,0,0,0> <0,0,0,0> <0,0,0,0> <0,0,0,0> 1 <0,0,0,0> 1 <0,0,0,0> 2 <0,0,0,0> 2 <0,0,0,0> 3 <0,0,0,0> 3 <0,0,0,0> P0 I Process 2 P0 I Process 3 1 1 2 2 3 3 P1 I P1 I 1 1 2 2 3 3 <0,0,0,0> <0,0,0,0> <0,0,0,0> <0,0,0,0> 1 <0,0,0,0> 1 <0,0,0,0> 2 <0,0,0,0> 2 <0,0,0,0> 3 <0,0,0,0> 3 <0,0,0,0>

Example Proc 1 reads Page 0 Proc 0 writes Page 0 P0 RW w0 P0 R 1 1 2 2 Process 0 Process 1 P0 RW w0 P0 R Example 1 1 2 2 3 3 P1 R P1 I 1 1 2 2 Proc 1 reads Page 0 Proc 0 writes Page 0 3 3 <0,0,0,0> <0,0,0,0> <0,0,0,0> <0,0,0,0> 1 <0,0,0,0> 1 <0,0,0,0> The copy of page 0 is created The update is applied to the copied data Processors are not yet synchronized The update is not propagated to other processors 2 <0,0,0,0> 2 <0,0,0,0> 3 <0,0,0,0> 3 <0,0,0,0> P0 I Process 2 P0 I Process 3 1 1 2 2 3 3 P1 I P1 I 1 1 2 2 3 3 <0,0,0,0> <0,0,0,0> <0,0,0,0> <0,0,0,0> 1 <0,0,0,0> 1 <0,0,0,0> 2 <0,0,0,0> 2 <0,0,0,0> 3 <0,0,0,0> 3 <0,0,0,0>

Example Proc 1 reads Page 0 Proc 0 writes Page 0 Proc 1 writes Page 0 Process 0 Process 1 P0 RW w0 P0 RW Example 1 1 w0 2 2 3 3 P1 R P1 I 1 1 2 2 Proc 1 reads Page 0 Proc 0 writes Page 0 Proc 1 writes Page 0 3 3 <0,0,0,0> <0,0,0,0> <0,0,0,0> <0,0,0,0> 1 <0,0,0,0> 1 <0,0,0,0> 2 <0,0,0,0> 2 <0,0,0,0> Multiple-Writer enables concurrent update to same page Different variables in the same page: Concurrent Multiple-Writer update Same variable in the same page: Explicit barrier or lock 3 <0,0,0,0> 3 <0,0,0,0> P0 I Process 2 P0 I Process 3 1 1 2 2 3 3 P1 I P1 I 1 1 2 2 3 3 <0,0,0,0> <0,0,0,0> <0,0,0,0> <0,0,0,0> 1 <0,0,0,0> 1 <0,0,0,0> 2 <0,0,0,0> 2 <0,0,0,0> 3 <0,0,0,0> 3 <0,0,0,0>

Example Proc 1 reads Page 0 Proc 0 writes Page 0 Proc 1 writes Page 0 Process 0 Process 1 P0 RW w0 P0 RW Example 1 1 w0 2 2 3 3 P1 R P1 I 1 1 2 2 Proc 1 reads Page 0 Proc 0 writes Page 0 Proc 1 writes Page 0 Proc 2 reads Page 0 3 3 <0,0,0,0> <0,0,0,0> <0,0,0,0> <0,0,0,0> 1 <0,0,0,0> 1 <0,0,0,0> 2 <0,0,0,0> 2 <0,0,0,0> 3 <0,0,0,0> 3 <0,0,0,0> Fetch data from Process 0 The write notice record for page 0 is not propagated The updates are not visible P0 R Process 2 P0 I Process 3 1 1 2 2 3 3 P1 I P1 I 1 1 2 2 3 3 <0,0,0,0> <0,0,0,0> <0,0,0,0> <0,0,0,0> 1 <0,0,0,0> 1 <0,0,0,0> 2 <0,0,0,0> 2 <0,0,0,0> 3 <0,0,0,0> 3 <0,0,0,0>

Example Proc 1 reads Page 0 Proc 0 writes Page 0 Proc 1 writes Page 0 Process 0 Process 1 P0 RW w0 P0 RW Example 1 1 w0 2 2 3 3 P1 R P1 I 1 1 2 2 Proc 1 reads Page 0 Proc 0 writes Page 0 Proc 1 writes Page 0 Proc 2 reads Page 0 3 3 <0,0,0,0> <0,0,0,0> <0,0,0,0> <0,0,0,0> 1 <0,0,0,0> 1 <0,0,0,0> 2 <0,0,0,0> 2 <0,0,0,0> 3 <0,0,0,0> 3 <0,0,0,0> Page 0 modification is already notified Lazy diff creation enables process 0 keep writing data on page 0 without modifying TreadMarks structure P0 R Process 2 P0 I Process 3 1 1 2 2 3 3 P1 I P1 I 1 1 2 2 3 3 <0,0,0,0> <0,0,0,0> <0,0,0,0> <0,0,0,0> 1 <0,0,0,0> 1 <0,0,0,0> 2 <0,0,0,0> 2 <0,0,0,0> 3 <0,0,0,0> 3 <0,0,0,0>

Example Proc 1 reads Page 0 Proc 0 writes Page 0 Proc 1 writes Page 0 Process 0 Process 1 P0 I w0 P0 I w0 Example 1 w0 1 w0 2 2 3 3 P1 R P1 I 1 1 2 2 Proc 1 reads Page 0 Proc 0 writes Page 0 Proc 1 writes Page 0 Proc 2 reads Page 0 Barrier 3 3 <1,1,1,1> <1,1,1,1> <0,0,0,0> <1,1,1,1> <1,1,1,1> <0,0,0,0> 1 <1,1,1,1> <0,0,0,0> 1 <1,1,1,1> <0,0,0,0> 2 <1,1,1,1> <0,0,0,0> 2 <1,1,1,1> <0,0,0,0> 3 <1,1,1,1> <0,0,0,0> 3 <1,1,1,1> <0,0,0,0> P0 I w0 Process 2 P0 I w0 Process 3 New interval is created and vector time stamp is updated Synchronization data (interval, write notice) is propagated Modified pages are changed into invalidate Actual diff is created Actual written data is not propagated 1 w0 1 w0 2 2 3 3 P1 I P1 I 1 1 2 2 3 3 <1,1,1,1> <1,1,1,1> <0,0,0,0> <1,1,1,1> <1,1,1,1> <0,0,0,0> 1 <1,1,1,1> <0,0,0,0> 1 <1,1,1,1> <0,0,0,0> 2 <1,1,1,1> <0,0,0,0> 2 <1,1,1,1> <0,0,0,0> 3 <1,1,1,1> <0,0,0,0> 3 <1,1,1,1> <0,0,0,0>

Example Proc 1 reads Page 0 Proc 0 writes Page 0 Proc 1 writes Page 0 Process 0 Process 1 P0 I w0 P0 I w0 Example 1 w0 1 w0 2 2 3 3 P1 R P1 I 1 1 2 2 Proc 1 reads Page 0 Proc 0 writes Page 0 Proc 1 writes Page 0 Proc 2 reads Page 0 Barrier Proc 3 reads Page 0 3 3 <1,1,1,1> <1,1,1,1> <0,0,0,0> <1,1,1,1> <1,1,1,1> <0,0,0,0> 1 <1,1,1,1> <0,0,0,0> 1 <1,1,1,1> <0,0,0,0> 2 <1,1,1,1> <0,0,0,0> 2 <1,1,1,1> <0,0,0,0> 3 <1,1,1,1> <0,0,0,0> 3 <1,1,1,1> <0,0,0,0> P0 I w0 Process 2 P0 R w0 Process 3 1 w0 1 w0 2 2 Actual written data (diff) is transferred when the data is actually required The necessary diffs could be tracked based on TreadMarks structure 3 3 P1 I P1 I 1 1 2 2 3 3 <1,1,1,1> <1,1,1,1> <0,0,0,0> <1,1,1,1> <1,1,1,1> <0,0,0,0> 1 <1,1,1,1> <0,0,0,0> 1 <1,1,1,1> <0,0,0,0> 2 <1,1,1,1> <0,0,0,0> 2 <1,1,1,1> <0,0,0,0> 3 <1,1,1,1> <0,0,0,0> 3 <1,1,1,1> <0,0,0,0>

Example Proc 1 reads Page 0 Proc 0 writes Page 0 Proc 1 writes Page 0 Process 0 Process 1 P0 I P0 I Example 1 1 2 2 3 3 P1 R P1 I 1 1 2 2 Proc 1 reads Page 0 Proc 0 writes Page 0 Proc 1 writes Page 0 Proc 2 reads Page 0 Barrier Proc 3 reads Page 0 3 3 <1,1,1,1> <1,1,1,1> <1,1,1,1> <1,1,1,1> 1 <1,1,1,1> 1 <1,1,1,1> 2 <1,1,1,1> 2 <1,1,1,1> 3 <1,1,1,1> 3 <1,1,1,1> P0 I Process 2 P0 R Process 3 1 1 2 2 Garbage collector is triggered when available memory is low Garbage collector removes unnecessary data 3 3 P1 I P1 I 1 1 2 2 3 3 <1,1,1,1> <1,1,1,1> <1,1,1,1> <1,1,1,1> 1 <1,1,1,1> 1 <1,1,1,1> 2 <1,1,1,1> 2 <1,1,1,1> 3 <1,1,1,1> 3 <1,1,1,1>

Implementation Details Interval Creation Logically a new interval begins at each release and acquire Practically interval creation is postponed until the communication occur Diff creation Lazy diff creation Page is writable until a diff request or a write notice arrives for the page At that time, the actual diff is created, the page is read protected, and the twin is discarded

Implementation Details Lock Barrier A statically assigned manager acquired round-robin fashion Releaser informs the synchronization information Processor id Vector timestamp All write notices Acquirer incorporates this information into its own data structure Append interval record Append write-notice and update pointers A centralized manager All clients informs its vector timestamp and intervals between the manager’s last timestamp and its own time stamp After gathering all synchronization information, the manager informs all clients of all intervals between their time stamp and its current stamp The clients incorporates this information into its own data structure

Implementation Details Access Misses Initially assumed that process 0 has the page If there are some write notices Gather the write notices which have larger timestamp Sending query to the diffs in parallel Merge the query into the page Garbage collection Discards unnecessary data Triggered when the available storage is dropped below a threshold TreadMarks relies on Unix standard

Contents Overview Preliminaries: why Implementations: how Experiments Environment Basic Operations Costs Applications Results

Results Environment Base Operation Costs 8 DECstation-5000/240 running Ultrix V4.3 100 Mbps ATM Lan + 10 Mbps Ethernet Base Operation Costs The minimum roundtrip time 500 𝜇𝑆 Using signal handler, the time is increases to 670 𝜇𝑆 Remotely acquire a free lock The manager hold last lock 827 𝜇𝑆, otherwise 1149 𝜇𝑆 The minimum time of 8 processor barrier 2186 𝜇𝑆

Experiments Applications Water – molecular dynamics simulation, 343 molecules for 5 steps Jacobi – Success Over-Relaxation with a grid of 2000 by 1000 elements TSP – branch & bound algorithm to solve the traveling sales man problem for a 19 cities Quicksort – sort an array of 256K integers. Using bubble sort to sort subarray of less than 1K elements ILINK – genetic linkage analysis

Results Jacobi & TSP Quicksort ILINK Water Execution Statistics for an 8-Processor Run on TreadMarks Jacobi & TSP High computation per communication Small # of messages Quicksort Twice larger massages and data per second than Jacobi’s ILINK Inherent load balancing problem Water High communication and synchronization rate (582 lockssec) Speed up obtained on TreadMarks

Results TreadMarks Execution Time Breakdown Unix Overhead Breakdown TreadMarks Overhead Breakdown

Results Execution time for Water

Results TSP: Traveling Salesperson Problem [Distributed Shared Memory: Concepts and Systems, Jelica Protic,Milo Tomasevic,Veljko Milutinović] Definition 3.2 A program is properly-labeled (PL) if the following hold: 𝑠ℎ𝑎𝑟𝑒𝑑 𝑎𝑐𝑐𝑒𝑠𝑠 ⊆𝑠ℎ𝑎𝑟𝑒 𝑑 𝐿 , 𝑐𝑜𝑚𝑝𝑒𝑡𝑖𝑛𝑔 ⊆𝑠𝑝𝑒𝑐𝑖𝑎 𝑙 𝐿 , 𝑒𝑛𝑜𝑢𝑔ℎ_𝑠𝑝𝑒𝑎𝑐𝑖𝑎 𝑙 𝐿 ⊆𝑎𝑐 𝑞 𝐿 𝑜𝑟 𝑟𝑒 𝑙 𝐿 TSP: Traveling Salesperson Problem Branch and Bound: pruned version of DFS Lazy >> Eager Lazy < Eager Branch and Bound is not properly labeled program

Conclusion DSM as a platform for parallel computation Implementation Efficiency User-level implementations Commercial systems Implementation Eight DECstation-5400/240’s connected to a ATM/Ethernet LAN Lazy RC, multiple-writer, lazy diff creation Good speedups for Jacobi, TSP, Quicksort and ILINK Moderate speedups for Water