Reactive Spin-locks: A Self-tuning Approach Phuong Hoai Ha Marina Papatriantafilou Philippas Tsigas I-SPAN ’05, Las Vegas, Dec. 7 th – 9 th, 2005
I-SPAN '052 Outline Mutual exclusion –Overhead –Available reactive spin-locks New reactive spin-lock –Model –Algorithm –Evaluation Conclusions
I-SPAN '053 Mutual exclusion Performance goals: –Low latency –Low contention –…–… Entry sectionCritical sectionExit sectionNoncritical sec. Lock released Requests issued Arbitration Lock sent to winner
I-SPAN '054 Spin-lock categories Arbitrating locks: –Determine who is the next lock-holder in advance, e.g. ticket-locks, queue-locks. –Advantages: Prevent processors from causing bursts in network traffic and high contention on the lock. Non-arbitrating locks: –E.g. Test-and-set locks –Advantages: Exploit locality/cache Tolerate failures in the Entry section.
I-SPAN '055 Arbitrating vs. non-arbitrating locks Interconnection Network Interconnection Network Interconnection Network Interconnection Network
I-SPAN '056 Available reactive spin-lock algorithms Drawbacks: –Their reactive schemes rely on Fixed experimental thresholds –The thresholds frequently become inappropriate in variable and unpredictable environments like multiprogramming systems –E.g. ticket locks with proportional backoff, test-and-test-and- set locks with exponential backoff Known probability distributions of some inputs –The assumption is not usually feasible.
I-SPAN '057 New reactive spin-lock algorithm Ideas –A non-arbitrating lock with adaptive sensible backoff delay. Advantages –Its reactive scheme is self-tuning Neither experimentally tuned thresholds nor probability distributions of inputs are needed –It combines advantages of both arbitrating and non- arbitrating spin-lock categories. It can exploit locality as well as reduce contention on the lock.
I-SPAN '058 Find sensible backoff delay Need to optimize trade-off between: –Latency The interval between a pair of lock-release and lock-acquisition –Contention on the lock This is an online problem. Load on the lock delay=?
I-SPAN '059 Reactive scheme – Increase delay only when the load on lock is the highest so far, – When increasing delay, increase just enough to keep the competitive ratio c = P - (P-1)/P 1/(P-1) Bounds for loads on the lock: 1 l t P During a load-rising phase: Similar for load-dropping phase In each load-rising/load-dropping phase, the reactive scheme is competitive with competitive ration c= (ln(P))
I-SPAN '0510 Interconnection Network Interconnection Network Algorithm The algorithm guarantees mutual exclusion and non- livelock. Its space complexity is log(P).
I-SPAN '0511 Evaluation Benchmarks –Spark98 kernel: lmv –SPLASH-2 suite: Volrend and Radiosity Representatives: –Arbitrating: ticket lock with (tuned) proportional backoff –Non-arbitrating: test-and-test-and-set lock with (tuned) exponential backoff System –A ccNUMA SGI Origin2000 with MHz MIPS R1000 processors.
I-SPAN '0512 Experimental results
I-SPAN '0513 Experimental results (2)
I-SPAN '0514 Experimetal results (3)
I-SPAN '0515 Conclusions We have designed and implemented a new reactive spin-lock: –It is self-tuning. –It combines advantages of both arbitrating and non- arbitrating locks –Its reactive scheme is competitive with c= (ln(P)) The lock automatically adjusts its backoff delay reasonably according to loads on the lock as well as applications
Thanks for your attention!
I-SPAN '0517 Estimate delay bases Fairness –A fair lock helps parallel application gain performance since the application threads can execute their non- critical section in parallel. –Definition: Heuristic to estimate base l, where a, b are system documented constants and DoCS is the delay outside CS, where n i is #lock-acquisitions of a processor in t and N is #processors
I-SPAN '0518 NUMA Another parameter that makes the problem harder is NUMA –Latency is much different –E.g. ccNUMA SGI Origin2000
I-SPAN '0519 Model: An online problem A sequence of loads on the lock are unfolded on-the-fly. When observing a load, the algorithm must decide how much its current backoff delay should be lengthened. –If increasing delay too soon, it will waste time on a long delay when the lock becomes available –If not increasing delay in time, it will cause high contention on the lock it must increase delay at high loads reasonably Goal is to maximize t delay t.load t,where t delay t P
I-SPAN '0520 Algorithm LockType: Initial delay = L.counter x base l The algorithm guarantees mutual exclusion and non-livelock. Its space complexity is log(P). Acquire( Lock pL) L = FAA(pL.L, ) if L.lock then delay = ComputeDelay(L) cond = do sleep(delay) L = pL.L if L.lock then delay = ComputeDelay(L) continue; cond = FAA(pL.L, ) while cond.lock Release( Lock pL) do L = pL.L while not CAS(pL.L,L, )