3-1 JMH Associates © 2004, All rights reserved Windows Application Development Chapter 9 Synchronization Performance Impact and Guidelines.

3-2 JMH Associates © 2004, All rights reserved OBJECTIVESOBJECTIVES Upon completion of this session, you will be able to:  Avoid unnecessary synchronization  Avoid SMP performance pitfalls  Describe performance impact and factors on single and multiple processor systems  Improve multithreaded and SMP performance

3-3 JMH Associates © 2004, All rights reserved ContentsContents 1.CRITICAL_SECTION – Mutex Tradeoffs 2.SMP Impact 3.Semaphores to Reduce Thread Contention 4.Processor Affinity 5.Tuning with CS Spin Counts 6.Performance Guidelines and Pitfalls 7.Lab Exercise 9-1

3-4 JMH Associates © 2004, All rights reserved 1. CRITICAL_SECTION – Mutex Tradeoffs Conventional Wisdom:  Critical sections are much faster than mutexes  They operate in user, not kernel, space Non-conventional wisdom  5 or more contending threads may be better with a mutex  Mutexes have a more linear behavior

3-5 JMH Associates © 2004, All rights reserved CS-Mutex Performance Comparison TimedMutualExclusion.exe  See code comment for more explanation Four additional simple programs to compare performance  StatsNS.c: No synchronization. Wrong, but runs as fast as possible  StatsCS.c: Uses CRITICAL_SECTION  StatsIN.c: Uses interlocked functions  StatsMX.c: Uses mutex

3-8 JMH Associates © 2004, All rights reserved 2. SMP Impact SMP allows transparent use of multiple processors  The kernel scheduler assigns ready threads to processors  Intel Xeon® has multiprocessing on a single processor  “Hyper-Threading” – also Pentium 4®  Note: A thread may run on several processors in its lifetime  Result:  Performance gain (sometimes) - Example: sortMT  Dramatic performance loss (sometimes)

3-10 JMH Associates © 2004, All rights reserved 3. Semaphores to Reduce Thread Contention Scenario:  N worker threads contend for a shared resource  Using a CS or mutex  Performance degradation is severe  Distinct worker threads provide a “natural” solution  Simple conceptually and to implement Problem:  Improve performance  Retain the simplicity One solution – “Semaphore throttle”  Use a semaphore: Max count limits the running threads

3-11 JMH Associates © 2004, All rights reserved A Semaphore Throttle Boss thread creates a semaphore  Max value/initial value set to a “small number” (such as 4)  Number of processors, or a tunable value Worker threads get a semaphore unit before working  Wait on the semaphore, not the mutex (or CS) while (TRUE) { // Worker loop WFSO (hSem, Infinite); WFSO (hMutex, Infinite); // Get work unit, etc.... ReleaseMutex (hMutex);... ReleaseSemaphore (hSem, 1, NULL); } // End of worker loop

3-12 JMH Associates © 2004, All rights reserved Semaphore Throttle Variations Some workers may acquire multiple units  Concept: These workers use more resources  Caution: Deadlock risk. Boss thread tunes dynamically  Decreases/increases number of active workers  By waiting or releasing semaphore units  Note: Max value is set once at initialization If max count is 1, mutex is redundant  Often the best SMP solution TimedMutualExclusion: Sixth parameter Max # active workers

3-14 JMH Associates © 2004, All rights reserved 4. Processor Affinity Process-Specific “process affinity mask” Thread-Specific “thread processor affinity mask” Threads can only run on permitted processor(s) ThAM <= PrAM System affinity mask: Each bit represents a configured processor  Process AM <= System AM BOOL GetProcessAffinityMask( HANDLE hProcess, LPDWORD lpProcessAffinityMask, LPDWORD lpSystemAffinityMask );

3-15 JMH Associates © 2004, All rights reserved Setting Thread Processor Affinity DWORD SetThreadAffinityMask ( HANDLE hThread, DWORD dwThreadAffinityMask ); ThAM <= PrAM Only affects future thread scheduling  Target thread could already be running on a prohibited processor Question (Exercise for reader)  How are Intel Xeon processors represented?  Hyper-Threading runs multiple threads concurrently Distinct threads can have reserved processor(s) Also see SetThreadIdealProcessor

3-16 JMH Associates © 2004, All rights reserved 5. Tuning with CS Spin Counts CS operations run in user, not kernel, space EnterCriticalSection() uses a “spin lock”  InterlockedCompareExchange() sets the lock only if it is reset  If previously locked:  Single processor: Wait in the kernel until unlocked (SC == 0)  SMP: Try again – kernel wait only after “spin count” attempts Single processor advantage: Fast if no wait required SMP advantage: Avoid contention between processors Guideline: For short duration, high contention locks  Example value: 4000

3-17 JMH Associates © 2004, All rights reserved Setting the Spin Count Value is ignored on a single processor system Initial value – Replace ICS call with: BOOL InitializeCriticalSectionAndSpinCount( LPCRITICAL_SECTION lpCriticalSection, DWORD dwSpinCount ); Dynamic spin count adjustment: DWORD SetCriticalSectionSpinCount( LPCRITICAL_SECTION lpCriticalSection, DWORD dwSpinCount );

3-18 JMH Associates © 2004, All rights reserved 6. Performance Guidelines and Pitfalls Avoid performance problems  Beware of conjecture about performance  Locking is expensive, use only as required  Hold a mutex as long as needed – but no longer  High contention hinders performance  Beware of global locks  Synchronization will impact the program performance  -Be especially careful when running on SMP systems Reduce thread contention  Avoid too many active threads  Use a semaphore to limit worker or server threads

3-19 JMH Associates © 2004, All rights reserved 7. Lab Exercise 9-1 Use TimedMutualExclusion 1. Obtain your own results  CS vs. mutex  Single processor vs. SMP vs. Xeon (if available) 2. Extend to add spin count tuning  TimedMutualExclusionSC  Add a parameter for the initial SC  Tune performance on an SMP system Alternative: Gather data from statsCS, etc. to assess synchronization performance impact

3-1 JMH Associates © 2004, All rights reserved Windows Application Development Chapter 9 Synchronization Performance Impact and Guidelines.

Similar presentations

Presentation on theme: "3-1 JMH Associates © 2004, All rights reserved Windows Application Development Chapter 9 Synchronization Performance Impact and Guidelines."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

3-1 JMH Associates © 2004, All rights reserved Windows Application Development Chapter 9 Synchronization Performance Impact and Guidelines.

Similar presentations

Presentation on theme: "3-1 JMH Associates © 2004, All rights reserved Windows Application Development Chapter 9 Synchronization Performance Impact and Guidelines."— Presentation transcript:

Similar presentations

About project

Feedback