Presentation is loading. Please wait.

Presentation is loading. Please wait.

HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

Similar presentations


Presentation on theme: "HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform."— Presentation transcript:

1 HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform Communication Architectures Zoran Radovic and Erik Hagersten {zoran.radovic, erik.hagersten}@it.uu.se HPCA-9 Ninth International Symposium on High Performance Computer Architecture Anaheim, California, February 8-12, 2003

2 HBO Locks zoran.radovic@it.uu.seUppsala Architecture Research Team (UART) Synchronization Basics   Locks are used to protect the shared critical section data   Common software- based solutions:  Simple spin-locks TATAS (‘84) TATAS_EXP (‘90)  Queue-based locks MCS (‘91) CLH (‘93) A:=0 BARRIER LOCK(L) A:=A+1 UNLOCK(L) LOCK(L) B:=A+5 UNLOCK(L)

3 HBO Locks zoran.radovic@it.uu.seUppsala Architecture Research Team (UART) Raytrace Speedup Sun WildFire (WF) 14 WF

4 HBO Locks zoran.radovic@it.uu.seUppsala Architecture Research Team (UART) Vasaloppet “Contention Problem in Sweden” Traditional cross-country ski race 55 miles … 51.6533 miles to go… CS

5 HBO Locks zoran.radovic@it.uu.seUppsala Architecture Research Team (UART) Spin Locks Under Contention Amount of Contention Spin locks w/ backoff Critical Section (CS) Cost IF (more contention)  THEN less efficient CS … “The more important the slower it runs…” IF (more contention)  THEN less efficient CS … “The more important the slower it runs…”

6 HBO Locks zoran.radovic@it.uu.seUppsala Architecture Research Team (UART) Queue-based Locks Amount of Contention Spin locks w/ backoff CS Cost Queue-based locks IF (more contention)  THEN constant CS cost … IF (more contention)  THEN constant CS cost …

7 HBO Locks zoran.radovic@it.uu.seUppsala Architecture Research Team (UART) This Talk Amount of Contention Queue-based locks Spin locks w/ backoff HBO locks CS Cost IF (more contention)  THEN more efficient CS … “The more important the faster it runs…” IF (more contention)  THEN more efficient CS … “The more important the faster it runs…”

8 HBO Locks zoran.radovic@it.uu.seUppsala Architecture Research Team (UART) Raytrace Speedup HBO Locks Sun WildFire (WF) 14 WF

9 HBO Locks zoran.radovic@it.uu.seUppsala Architecture Research Team (UART)Outline Background & Motivation  NUMA vs. NUCA Architectures  Hierarchical Back-Off (HBO) Locks  HBO  HBO_GT  HBO_GT with starvation detection/avoidance  Performance Results  Conclusions

10 HBO Locks zoran.radovic@it.uu.seUppsala Architecture Research Team (UART) Switch Non-Uniform Memory Architecture (NUMA)  Many NUMA optimizations are proposed  Page migration  speed up accesses to “private” data  Page replication  speed up reads to “shared” data  Does not help communication…  E.g., synchronization P1 $ P2 $ P3 $ Pn $ P1 $ P2 $ P3 $ Pn $ Memory 1 2 – 10 Access time ratio...

11 HBO Locks zoran.radovic@it.uu.seUppsala Architecture Research Team (UART) A “new” property of NUMAs…  NUCA Non-Uniform Communication Architecture (NUCA)   NUCA examples (NUCA ratios):  1992: Stanford DASH (~ 4.5)  1996: Sequent NUMA-Q (~ 10)  1999: Sun WildFire (~ 6)  2000: Compaq DS-320 (~ 3.5)  Future: CMP, SMT (~ 10) NUCA ratio Switch P1 $ P2 $ P3 $ Pn $ P1 $ P2 $ P3 $ Pn $ Memory 1 2 – 10 NUCA optimizations are getting important for future architectures! NUCA optimizations are getting important for future architectures!...

12 HBO Locks zoran.radovic@it.uu.seUppsala Architecture Research Team (UART) Our Goals Design scalable spin locks that exploit NUCAs  Create communication affinity  Keep the lock in the neighborhood [Mr. Rogers, 1968]  Speeds up lock handover  Lowers the access cost to critical section (CS) data  Reduce remote “probing” traffic  Portable and scalable to many NUCA nodes

13 HBO Locks zoran.radovic@it.uu.seUppsala Architecture Research Team (UART) The HBO Lock (the simplest HBO)  What do we need?  node_id  Compare&swap ( CAS ) atomic operation CAS (Lock_address, FREE, node_id)  lock-acquire:  If the lock-value is in the state FREE: The node_id is CAS -ed into the lock location  Else: 2 cases (for 2 levels of non-uniformity): The lock is “local”  TATAS_EXP with small backoff The lock is “remote”  TATAS_EXP with large backoff  Simple but fairly effective… Creates Communication Affinity

14 HBO Locks zoran.radovic@it.uu.seUppsala Architecture Research Team (UART) … The HBO_GT Lock GT = Global Throttling FREE P $ P $ P $ P $ Node 2 : Memory P $ P $ P $ P $ Node 5 : Memory FREE Lock1: Lock2: P FREE2 P Local spinning Remote spinning (w/ exp. backoff) …… FREE  CS222 (remote_node_id) FREE Lock3: 0x00000000 my_is_ spinning: 0x00000000 my_is_ spinning: Probing... (with CAS) addr(Lock1) Read a node- local flag...

15 HBO Locks zoran.radovic@it.uu.seUppsala Architecture Research Team (UART) The HBO_GT Lock GT = Global Throttling A couple of nanoseconds later …

16 HBO Locks zoran.radovic@it.uu.seUppsala Architecture Research Team (UART) … The HBO_GT Lock GT = Global Throttling FREE P $ P $ P $ P $ Node 2 : Memory P $ P $ P $ P $ Node 5 : Memory FREE Lock1: Lock2: 5 P Local spinning Remote spinning (w/ exp. backoff) …… FREE  CS55 (remote_node_id) FREE Lock3: 0x00000000 my_is_ spinning: 0x00000000 my_is_ spinning: Probing... (with CAS) addr(Lock1) Read a node- local flag... 5 P

17 HBO Locks zoran.radovic@it.uu.seUppsala Architecture Research Team (UART) Our NUCA: Sun WildFire NUCA ratio Switch P1 $ P2 $ P3 $ P14 $ P1 $ P2 $ P3 $ P14 $ Memory 1 6 14 WF...

18 HBO Locks zoran.radovic@it.uu.seUppsala Architecture Research Team (UART) Traditional Microbenchmark for (i = 0; i < iterations; i++) { LOCK(L); /* null/small Critical Section */ UNLOCK(L); }  For each thread:

19 HBO Locks zoran.radovic@it.uu.seUppsala Architecture Research Team (UART) NUCA-performance Traditional microbenchmark WF

20 HBO Locks zoran.radovic@it.uu.seUppsala Architecture Research Team (UART) New Microbenchmark critical_work for (i = 0; i < iterations; i++) { LOCK(L); delay(critical_work); // CS UNLOCK(L); static_delay(); random_delay(); }  More realistic node handoffs for queue-locks  Constant number of processors  Control the “amount of contention”

21 HBO Locks zoran.radovic@it.uu.seUppsala Architecture Research Team (UART) Performance Results New microbenchmark, 2-node Sun WildFire, 28 CPUs WF 14 Fairness?

22 HBO Locks zoran.radovic@it.uu.seUppsala Architecture Research Team (UART) Fairness Study New microbenchmark, 2-node Sun WildFire, 28 CPUs t

23 HBO Locks zoran.radovic@it.uu.seUppsala Architecture Research Team (UART) Application Performance Raytrace Speedup WF

24 HBO Locks zoran.radovic@it.uu.seUppsala Architecture Research Team (UART) Application Performance Raytrace Speedup WF

25 HBO Locks zoran.radovic@it.uu.seUppsala Architecture Research Team (UART) HBO Locks Under Contention Amount of Contention Queue-based locks Spin locks w/ backoff CS Cost HBO locks

26 HBO Locks zoran.radovic@it.uu.seUppsala Architecture Research Team (UART) Total Traffic: Raytrace 1.11x 1.45x

27 HBO Locks zoran.radovic@it.uu.seUppsala Architecture Research Team (UART) Application Performance 28-processor runs

28 HBO Locks zoran.radovic@it.uu.seUppsala Architecture Research Team (UART)  First-come, first-served not desirable for NUCAs  The HBO lock exploits NUCAs by  creating locality through CS affinity (stable lock)  reducing traffic compared with the test&set locks  HBO performs better under contention  Traffic is significantly reduced  Applications with contented locks scale better with HBO locks on NUCAs  Starvation detection/avoidance in the paper…Conclusions

29 HBO Locks zoran.radovic@it.uu.seUppsala Architecture Research Team (UART) http://www.it.uu.se/research/group/uart UART’s Home Page Supported by Sun Microsystems, Inc., and the Parallel and Scientific Computing Institute (PSCI)


Download ppt "HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform."

Similar presentations


Ads by Google