Download presentation
Presentation is loading. Please wait.
1
Extended Memory Semantics for Thread Synchronization Sheng Li, Ying Zhou Operating System Progress Report Nov 1 st, 2007 Sheng Li, Ying Zhou Operating System Progress Report Nov 1 st, 2007
2
2 Problems Hardware multithreading is no longer a privilege of supercomputing, it is already part of the major microprocessors. E.g. In Sun Niagara 2 has 64 threads/chip and 256 threads/server. Concurrency management is one of the biggest challenges in multithreaded system Key requirement: Low overhead and scalable thread synchronization Synchronization mechanisms Atomic primitives (Test-and-Set, Compare-and-Swap, LL-SC) Software routines built on them have poor performance and scalability Empty/Full bits, using extension bit for each memory location to denote the empty/full state. Better performance [1], but still not enough
3
Nov 1 st, 2007 3 Our Goal Solve the synchronization bottleneck by using Extended Memory Semantics Better performance and scalability Quantify the performance gain when using EMS, compared to other synchronization mechanisms (e.g Empty/Full bits)
4
Nov 1 st, 2007 4 Extended Memory Semantics Memory instructions are characterized synchronization behavior. Load.ff, Load.fe, Store.xf, Store.ef, Store.xe. (F--- Full, e--- empty, x---don’t care) 64 bits of data/metadata Extension bit
5
Nov 1 st, 2007 5 EMS handler There is no free lunch… EMS handler has overhead Creating the handler threads To queue up memory requests, to build the data structure
6
Nov 1 st, 2007 6 What we have done so far Build the EMS model on both architecture and OS aspects in the Structural Simulation Toolkit (SST) SST is the simulation environment for massively lightweight multithreading, developed at Notre Dame and Sandia Lab Modified the glibc to use EMS Especially pthread library Design benchmarks for different categories Run the simulations to evaluate EMS performance
7
Nov 1 st, 2007 7 Tightly Coupled Parallel Each thread competes with the others for the only lock before updating the counter Very high contention, worst case
8
Nov 1 st, 2007 8 Loosely Coupled Parallel Each thread competes locks with the others before updating the counters. Mild contention
9
Nov 1 st, 2007 9 Embarrassingly Parallel No contention, no locks
10
Nov 1 st, 2007 10 Embarrassingly parallel and loosely coupled parallel Low synchronization overhead--- guaranteed by EMS EMS shows very good scalability Synchronization distribution
11
Nov 1 st, 2007 11 Tightly Coupled Parallel Bad performance for EMS in the worst case Most of threads are used for synchronization, not for real job
12
Nov 1 st, 2007 12 The Road Ahead Build/complete other synchronization mechanisms (e.g. Empty/Full bits and etc) into SST Modify glibc to make it support for other synchronization mechanisms Compare performance between EMS and other synchronization mechanisms
13
Nov 1 st, 2007 13 Thank you! Questions?
14
Nov 1 st, 2007 14 Bibliography [1] Performance and Programming Experience on the Tera MTA, Larry Carter, John Feo, Allan Snavely, PPSC, 1999
15
Nov 1 st, 2007 15 Back up Slides
16
Nov 1 st, 2007 16 Lightweight Threads Thread context (frame) is 32 double words (256 bytes) Two double words are reserved for the thread status; 30 general purpose registers. No other per thread state, easy for multithreading. Frames are stored in memory (No Register File) Registers are aliases for memory locations
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.