CS258 Spring 2002 Mark Whitney and Yitao Duan Memory Consistency Models in Wide-area Storage System – Or What Do They Mean? CS258 Spring 2002 Mark Whitney and Yitao Duan
Motivations Global scale computing approaching Wide area storage is becoming a reality The greed for processing power calls the marriage of the two Traditional approach to large scale data processing: Hierarchy What if new algorithms require to touch more data? Scale SMP? Use OceanStore as testbed
Data and Computation Hungry Applications Quantum Chromodynamics Biomolecular Dynamics Weather Forecasting Cosmological Dark Matter Biomolecular Electrostatics Electric and Magnetic Molecular Properties
Data Grid for High Energy Physics - CalTech Tier2 Centre ~1 TIPS Online System Offline Processor Farm ~20 TIPS CERN Computer Centre FermiLab ~4 TIPS France Regional Centre Italy Regional Centre Germany Regional Centre Institute Institute ~0.25TIPS Pentium II 300 MHz Physicist workstations ~100 MBytes/sec ~622 Mbits/sec ~1 MBytes/sec HPSS There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Physics data cache ~PBytes/sec ~622 Mbits/sec or Air Freight (deprecated) Caltech ~1 TIPS Tier 0 Tier 1 Tier 2 Tier 4 1 TIPS is approximately 25,000 SpecInt95 equivalents
Background What is OceanStore? Observations and questions A global persistent data store scalable to billions of users High availability, fault-tolerance, security Caching to reduce network congestion, guarantee availability and performance Flexible consistency semantics Observations and questions Remarkable resemblance to MP memory system Replica = cache, client = processor, data object = memory item OceanStore consistency semantics are typically that of a file system’s. What do they mean to a program?
Running Parallel Applications on OceanStore Why do we try this Distributed computing Grid World Wide Computing New programming paradigm? (OceanStore is a new phenomena, will it bring out new applications? Where will computing infrastructure go given the advance of network, storage and parallel processing?
Shared Virtual Memory Space ParaApp OS Kernel OClient
Running SMP Apps on OceanStore OceanStore data objects are globally identified Virtual address in application address space mapped to OceanStore object ID Shared memory address access turned into OceanStore requests
Consistency Models
Performance Evaluation OceanStore …(#of inner rings, …) Nachos++! MIPS R3000 processor w/FP Stanford SPLASH-2 benchmark suite 4 x 4 matrix LU decomposition
Computation Time Number of Cycles
Network Latency Milliseconds
Open Questions Programming model Cache policy Consistency models Sharing granularity
Conclusion and Future Work Matrix decomposition runs on OceanStore! Wide-area distributed synchronizations are expensive (not surprising) Need better memory model if want to run shared memory applications Message passing? – Seems to be a better match (use explicit OceanStore APIs) New programming model?