3-1 JMH Associates © 2004, All rights reserved Windows Application Development Chapter 9 Synchronization Performance Impact and Guidelines.

Slides:



Advertisements
Similar presentations
Operating Systems Semaphores II
Advertisements

1 Processes and Threads Chapter Processes 2.2 Threads 2.3 Interprocess communication 2.4 Classical IPC problems 2.5 Scheduling.
Copyright © 2000, Daniel W. Lewis. All Rights Reserved. CHAPTER 8 SCHEDULING.
Chapter 6 Concurrency: Deadlock and Starvation Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee Community.
Overview Assignment 8: hints Assignment 7: solution Deadlocks
Bilgisayar Mühendisliği Bölümü GYTE - Bilgisayar Mühendisliği Bölümü Multithreading the SunOS Kernel J. R. Eykholt, S. R. Kleiman, S. Barton, R. Faulkner,
Threading Part 4 CS221 – 4/27/09. The Final Date: 5/7 Time: 6pm Duration: 1hr 50mins Location: EPS 103 Bring: 1 sheet of paper, filled both sides with.
1 Threads CSCE 351: Operating System Kernels Witawas Srisa-an Chapter 4-5.
CS533 Concepts of Operating Systems Class 4 Linux Kernel Locking Issues.
Concurrency: Mutual Exclusion, Synchronization, Deadlock, and Starvation in Representative Operating Systems.
CS444/CS544 Operating Systems Synchronization 2/19/2007 Prof. Searleman
7-1 JMH Associates © 2003, All rights reserved Designing and Developing Reliable, Scaleable Multithreaded Windows Applications Chapter 10 Supplement Advanced.
8-1 JMH Associates © 2004, All rights reserved Windows Application Development Chapter 10 - Supplement Introduction to Pthreads for Application Portability.
5-1 JMH Associates © 2004, All rights reserved Windows Application Development Chapter 5 Threading Models for Reliability.
CS533 Concepts of Operating Systems Class 3 Monitors.
1 Concurrency: Deadlock and Starvation Chapter 6.
CS533 - Concepts of Operating Systems 1 CS533 Concepts of Operating Systems Class 8 Synchronization on Multiprocessors.
CS533 Concepts of Operating Systems Class 17 Linux Kernel Locking Techniques.
Threads CNS What is a thread?  an independent unit of execution within a process  a "lightweight process"  an independent unit of execution within.
Fundamentals of Python: From First Programs Through Data Structures
1 Race Conditions/Mutual Exclusion Segment of code of a process where a shared resource is accessed (changing global variables, writing files etc) is called.
Discussion Week 3 TA: Kyle Dewey. Overview Concurrency overview Synchronization primitives Semaphores Locks Conditions Project #1.
1 Concurrency: Deadlock and Starvation Chapter 6.
CSE 380 – Computer Game Programming Render Threading Portal, by Valve,
Multi-core Programming Thread Profiler. 2 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Topics Look at Intel® Thread Profiler features.
Chapter 6 Concurrency: Deadlock and Starvation Operating Systems: Internals and Design Principles, 6/E William Stallings Dave Bremer Otago Polytechnic,
CS510 Concurrent Systems Introduction to Concurrency.
Cosc 4740 Chapter 6, Part 3 Process Synchronization.
MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 2 Processes and Threads Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
1-1 © 2004 JMH Associates. All rights reserved. Windows Application Development Chapter 7 Windows Thread Management.
1 Confidential Enterprise Solutions Group Process and Threads.
4061 Session 23 (4/10). Today Reader/Writer Locks and Semaphores Lock Files.
Copyright © 1997 – 2014 Curt Hill Concurrent Execution of Programs An Overview.
Games Development 2 Concurrent Programming CO3301 Week 9.
1 Announcements The fixing the bug part of Lab 4’s assignment 2 is now considered extra credit. Comments for the code should be on the parts you wrote.
Operating Systems CSE 411 Multi-processor Operating Systems Multi-processor Operating Systems Dec Lecture 30 Instructor: Bhuvan Urgaonkar.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
11/18/20151 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam.
Kernel Locking Techniques by Robert Love presented by Scott Price.
Lecture 8 Page 1 CS 111 Online Other Important Synchronization Primitives Semaphores Mutexes Monitors.
1 Interprocess Communication (IPC) - Outline Problem: Race condition Solution: Mutual exclusion –Disabling interrupts; –Lock variables; –Strict alternation.
Discussion Week 2 TA: Kyle Dewey. Overview Concurrency Process level Thread level MIPS - switch.s Project #1.
CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.
Monitors and Blocking Synchronization Dalia Cohn Alperovich Based on “The Art of Multiprocessor Programming” by Herlihy & Shavit, chapter 8.
Copyright © Curt Hill Concurrent Execution An Overview for Database.
CGS 3763 Operating Systems Concepts Spring 2013 Dan C. Marinescu Office: HEC 304 Office hours: M-Wd 11: :30 AM.
Windows CE Overview and Scheduling Presented by Dai Kawano.
1 Previous Lecture Overview  semaphores provide the first high-level synchronization abstraction that is possible to implement efficiently in OS. This.
Practice Chapter Five.
4.1 Introduction to Threads Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads.
SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.
Homework-6 Questions : 2,10,15,22.
Mutual Exclusion -- Addendum. Mutual Exclusion in Critical Sections.
Kernel Synchronization David Ferry, Chris Gill CSE 522S - Advanced Operating Systems Washington University in St. Louis St. Louis, MO
Chapter 3: Windows7 Part 5.
Background on the need for Synchronization
Overview of the Lab 2 Assignment: Linux Scheduler Profiling
MODERN OPERATING SYSTEMS Third Edition ANDREW S
Chapter 3: Windows7 Part 5.
Kernel Synchronization II
Synchronization and Semaphores
Thread Implementation Issues
Kernel Synchronization II
Userspace Synchronization
CS333 Intro to Operating Systems
CSE 451 Section 1/27/2000.
Window Application Development
Synchronization CS Spring 2002.
Threads CSE 2431: Introduction to Operating Systems
Presentation transcript:

3-1 JMH Associates © 2004, All rights reserved Windows Application Development Chapter 9 Synchronization Performance Impact and Guidelines

3-2 JMH Associates © 2004, All rights reserved OBJECTIVESOBJECTIVES Upon completion of this session, you will be able to:  Avoid unnecessary synchronization  Avoid SMP performance pitfalls  Describe performance impact and factors on single and multiple processor systems  Improve multithreaded and SMP performance

3-3 JMH Associates © 2004, All rights reserved ContentsContents 1.CRITICAL_SECTION – Mutex Tradeoffs 2.SMP Impact 3.Semaphores to Reduce Thread Contention 4.Processor Affinity 5.Tuning with CS Spin Counts 6.Performance Guidelines and Pitfalls 7.Lab Exercise 9-1

3-4 JMH Associates © 2004, All rights reserved 1. CRITICAL_SECTION – Mutex Tradeoffs Conventional Wisdom:  Critical sections are much faster than mutexes  They operate in user, not kernel, space Non-conventional wisdom  5 or more contending threads may be better with a mutex  Mutexes have a more linear behavior

3-5 JMH Associates © 2004, All rights reserved CS-Mutex Performance Comparison TimedMutualExclusion.exe  See code comment for more explanation Four additional simple programs to compare performance  StatsNS.c: No synchronization. Wrong, but runs as fast as possible  StatsCS.c: Uses CRITICAL_SECTION  StatsIN.c: Uses interlocked functions  StatsMX.c: Uses mutex

3-6 JMH Associates © 2004, All rights reserved Experimental Results Critical Sections with multiple threads – 1P – Results vary

3-7 JMH Associates © 2004, All rights reserved More Experimental Results Critical Sections vs. Mutexes – multiple threads – 1P

3-8 JMH Associates © 2004, All rights reserved 2. SMP Impact SMP allows transparent use of multiple processors  The kernel scheduler assigns ready threads to processors  Intel Xeon® has multiprocessing on a single processor  “Hyper-Threading” – also Pentium 4®  Note: A thread may run on several processors in its lifetime  Result:  Performance gain (sometimes) - Example: sortMT  Dramatic performance loss (sometimes)

3-9 JMH Associates © 2004, All rights reserved SMP Potential Negative Impact 1, 2, and 4 Processors. CSs and Mutexes CS Mutex

3-10 JMH Associates © 2004, All rights reserved 3. Semaphores to Reduce Thread Contention Scenario:  N worker threads contend for a shared resource  Using a CS or mutex  Performance degradation is severe  Distinct worker threads provide a “natural” solution  Simple conceptually and to implement Problem:  Improve performance  Retain the simplicity One solution – “Semaphore throttle”  Use a semaphore: Max count limits the running threads

3-11 JMH Associates © 2004, All rights reserved A Semaphore Throttle Boss thread creates a semaphore  Max value/initial value set to a “small number” (such as 4)  Number of processors, or a tunable value Worker threads get a semaphore unit before working  Wait on the semaphore, not the mutex (or CS) while (TRUE) { // Worker loop WFSO (hSem, Infinite); WFSO (hMutex, Infinite); // Get work unit, etc.... ReleaseMutex (hMutex);... ReleaseSemaphore (hSem, 1, NULL); } // End of worker loop

3-12 JMH Associates © 2004, All rights reserved Semaphore Throttle Variations Some workers may acquire multiple units  Concept: These workers use more resources  Caution: Deadlock risk. Boss thread tunes dynamically  Decreases/increases number of active workers  By waiting or releasing semaphore units  Note: Max value is set once at initialization If max count is 1, mutex is redundant  Often the best SMP solution TimedMutualExclusion: Sixth parameter Max # active workers

3-13 JMH Associates © 2004, All rights reserved Semaphore Throttle Results One processor

3-14 JMH Associates © 2004, All rights reserved 4. Processor Affinity Process-Specific “process affinity mask” Thread-Specific “thread processor affinity mask” Threads can only run on permitted processor(s) ThAM <= PrAM System affinity mask: Each bit represents a configured processor  Process AM <= System AM BOOL GetProcessAffinityMask( HANDLE hProcess, LPDWORD lpProcessAffinityMask, LPDWORD lpSystemAffinityMask );

3-15 JMH Associates © 2004, All rights reserved Setting Thread Processor Affinity DWORD SetThreadAffinityMask ( HANDLE hThread, DWORD dwThreadAffinityMask ); ThAM <= PrAM Only affects future thread scheduling  Target thread could already be running on a prohibited processor Question (Exercise for reader)  How are Intel Xeon processors represented?  Hyper-Threading runs multiple threads concurrently Distinct threads can have reserved processor(s) Also see SetThreadIdealProcessor

3-16 JMH Associates © 2004, All rights reserved 5. Tuning with CS Spin Counts CS operations run in user, not kernel, space EnterCriticalSection() uses a “spin lock”  InterlockedCompareExchange() sets the lock only if it is reset  If previously locked:  Single processor: Wait in the kernel until unlocked (SC == 0)  SMP: Try again – kernel wait only after “spin count” attempts Single processor advantage: Fast if no wait required SMP advantage: Avoid contention between processors Guideline: For short duration, high contention locks  Example value: 4000

3-17 JMH Associates © 2004, All rights reserved Setting the Spin Count Value is ignored on a single processor system Initial value – Replace ICS call with: BOOL InitializeCriticalSectionAndSpinCount( LPCRITICAL_SECTION lpCriticalSection, DWORD dwSpinCount ); Dynamic spin count adjustment: DWORD SetCriticalSectionSpinCount( LPCRITICAL_SECTION lpCriticalSection, DWORD dwSpinCount );

3-18 JMH Associates © 2004, All rights reserved 6. Performance Guidelines and Pitfalls Avoid performance problems  Beware of conjecture about performance  Locking is expensive, use only as required  Hold a mutex as long as needed – but no longer  High contention hinders performance  Beware of global locks  Synchronization will impact the program performance  -Be especially careful when running on SMP systems Reduce thread contention  Avoid too many active threads  Use a semaphore to limit worker or server threads

3-19 JMH Associates © 2004, All rights reserved 7. Lab Exercise 9-1 Use TimedMutualExclusion 1. Obtain your own results  CS vs. mutex  Single processor vs. SMP vs. Xeon (if available) 2. Extend to add spin count tuning  TimedMutualExclusionSC  Add a parameter for the initial SC  Tune performance on an SMP system Alternative: Gather data from statsCS, etc. to assess synchronization performance impact

3-20 JMH Associates © 2004, All rights reserved NotesNotes