Download presentation
Presentation is loading. Please wait.
Published byGervais Conley Modified over 8 years ago
1
@gfraiteur a deep investigation on how.NET multithreading primitives map to hardware and Windows Kernel You Thought You Understood Multithreading Gaël Fraiteur PostSharp Technologies Founder & Principal Engineer
2
@gfraiteur Hello! My name is GAEL. No, I don’t think my accent is funny. my twitter
3
@gfraiteur Agenda 1. Hardware 2. Windows Kernel 3. Application
4
@gfraiteur Microarchitecture Hardware
5
@gfraiteur There was no RAM in Colossus, the world’s first electronic programmable computing device (1944). http://en.wikipedia.org/wiki/File:Colossus.jpg
6
@gfraiteur Hardware Processor microarchitecture Intel Core i7 980x
7
@gfraiteur Hardware Hardware latency Measured on Intel® Core™ i7-970 Processor (12M Cache, 3.20 GHz, 4.80 GT/s Intel® QPI) with SiSoftware Sandra
8
@gfraiteur Hardware Cache Hierarchy Distributed into caches L1 (per core) L2 (per core) L3 (per die) Die 1 Core 4 Processor Core 5 Processor Core 7 Processor L2 Cache Core 6 Processor L2 Cache Memory Store (DRAM) Die 0 Core 1 Processor Core 0 Processor Core 2 Processor L2 Cache Core 3 Processor L2 Cache L3 Cache L2 Cache L3 Cache L2 Cache Data Transfer Core Interconnec t Processor Interconnect Synchronization Synchronized through an asynchronous “bus”: Request/response Message queues Buffers
9
@gfraiteur Hardware Memory Ordering Issues due to implementation details: Asynchronous bus Caching & Buffering Out-of-order execution
10
@gfraiteur Hardware Memory Ordering Processor 0Processor 1 Store A := 1Load A Store B := 1Load B Possible values of (A, B) for loads? Processor 0Processor 1 Load ALoad B Store B := 1Store A := 1 Possible values of (A, B) for loads? Processor 0Processor 1 Store A := 1Store B := 1 Load BLoad A Possible values of (A, B) for loads?
11
@gfraiteur Hardware Memory Models: Guarantees http://en.wikipedia.org/wiki/Memory_ordering X84, AMD64: strong orderingARM: weak ordering
12
@gfraiteur Hardware Memory Models Compared Processor 0Processor 1 Store A := 1Load A Store B := 1Load B Processor 0Processor 1 Load ALoad B Store B := 1Store A := 1
13
@gfraiteur Hardware Compiler Memory Ordering The compiler (CLR) can: Cache (memory into register), Reorder Coalesce writes volatile keyword disables compiler optimizations. And that’s all!
14
@gfraiteur Hardware Thread.MemoryBarrier() Processor 0Processor 1 Store A := 1Store B := 1 Thread.MemoryBarrier() Load BLoad A Barriers serialize access to memory.
15
@gfraiteur Hardware Atomicity Processor 0Processor 1 Store A := (long) 0xFFFFFFFFLoad A Up to native word size (IntPtr) Properly aligned fields (attention to struct fields!)
16
@gfraiteur Hardware Interlocked access The only locking mechanism at hardware level. System.Threading.Interlocked Add Add(ref x, int a ) { x += a; return x; } Increment Inc(ref x ) { x += 1; return x; } Exchange Exc( ref x, int v ) { int t = x; x = a; return t; } CompareExchange CAS( ref x, int v, int c ) { int t = x; if ( t == c ) x = a; return t; } Read(ref long) Implemented by L3 cache and/or interconnect messages. Implicit memory barrier
17
@gfraiteur Hardware Cost of interlocked Measured on Intel® Core™ i7-970 Processor (12M Cache, 3.20 GHz, 4.80 GT/s Intel® QPI) with C# code. Cache latencies measured with SiSoftware Sandra.
18
@gfraiteur Hardware Cost of cache coherency
19
@gfraiteur Multitasking No thread, no process at hardware level There no such thing as “wait” One core never does more than 1 “thing” at the same time (except HyperThreading) Task-State Segment: CPU State Permissions (I/O, IRQ) Memory Mapping A task runs until interrupted by hardware or OS
20
@gfraiteur Lab: Non-Blocking Algorithms Non-blocking stack (4.5 MT/s) Ring buffer (13.2 MT/s)
21
@gfraiteur Windows Kernel Operating System
22
@gfraiteur SideKick (1988). http://en.wikipedia.org/wiki/SideKick
23
@gfraiteur Program (Sequence of Instructions) CPU State Wait Dependencies (other things) Virtual Address Space Resources At least 1 thread (other things) Processes and Threads ProcessThread Windows Kernel
24
@gfraiteur Windows Kernel Processed and Threads 8 Logical Processors (4 hyper-threaded cores) 2384 threads 155 processes
25
@gfraiteur Windows Kernel Dispatching threads to CPU Executing CPU 0 CPU 1 CPU 2 CPU 3 Thread H Thread G Thread I Thread J Mutex Semaphore Event Timer Thread A Thread C Thread B Wait Thread D Thread E Thread F Ready
26
@gfraiteur Live in the kernel Sharable among processes Expensive! Event Mutex Semaphore Timer Thread Dispatcher Objects (WaitHandle) Kinds of dispatcher objectsCharacteristics Windows Kernel
27
@gfraiteur Windows Kernel Cost of kernel dispatching Measured on Intel® Core™ i7-970 Processor (12M Cache, 3.20 GHz, 4.80 GT/s Intel® QPI)
28
@gfraiteur Windows Kernel Wait Methods Thread.Sleep Thread.Yield Thread.Join WaitHandle.Wait (Thread.SpinWait)
29
@gfraiteur Windows Kernel SpinWait: smart busy loops 1. Thread.SpinWait (Hardware awareness, e.g. Intel HyperThreading) 2. Thread.Sleep(0) (give up to threads of same priority) 3. Thread.Yield() (give up to threads of any priority) 4. Thread.Sleep(1) (sleeps for 1 ms) Processor 0Processor 1 A = 1; B = 1; SpinWait w = new SpinWait(); int localB; if ( A == 1 ) { while ( ((localB = B) == 0 ) w.SpinOnce(); }
30
@gfraiteur.NET Framework Application Programming
31
@gfraiteur Application Programming Design Objectives Ease of use High performance from applications!
32
@gfraiteur Application Programming Instruments from the platform Hardware Interlocked SpinWait Volatile Operating System Thread WaitHandle (mutex, event, semaphore, …) Timer
33
@gfraiteur Application Programming New Instruments Concurrency & Synchronization Monitor (Lock) , SpinLock ReaderWriterLock Lazy, LazyInitializer Barrier, CountdownEvent Data Structures BlockingCollection Concurrent(Queue |Stack |Bag |Dictionary ) ThreadLocal, ThreadStatic Thread Dispatching ThreadPool Dispatcher , ISynchronizeInvoke BackgroundWorker Frameworks PLINQ Tasks Async/Await
34
@gfraiteur Concurrency & Synchronization Application Programming
35
@gfraiteur Concurrency & Synchronization Monitor: the lock keyword 1. Start with interlocked operations (no contention) 2. Continue with “spin wait”. 3. Create kernel event and wait Good performance if low contention
36
@gfraiteur Concurrency & Synchronization ReaderWriterLock Concurrent reads allowed Writes require exclusive access
37
@gfraiteur Concurrency & Synchronization Lab: ReaderWriterLock
38
@gfraiteur Data Structures Application Programming
39
@gfraiteur Data Structures BlockingCollection Use case: pass messages between threads TryTake: wait for an item and take it Add: wait for free capacity (if limited) and add item to queue CompleteAdding: no more item
40
@gfraiteur Data Structures Lab: BlockingCollection
41
@gfraiteur Data Structures ConcurrentStack node top Producer A Producer B Producer C Consumer A Consumer B Consumer C Consumers and producers compete for the top of the stack node
42
@gfraiteur Data Structures Lab: Concurrent Stack
43
@gfraiteur Data Structures ConcurrentQueue node tail Producer A Producer B Producer C Consumer A Consumer B Consumer C head Consumers compete for the head. Producers compete for the tail. Both compete for the same node if the queue is empty.
44
@gfraiteur Data Structures ConcurrentBag Thread A Bag Thread Local Storage AThread Local Storage B node Producer AConsumer A Thread B Producer BConsumer B Threads preferentially consume nodes they produced themselves. No ordering
45
@gfraiteur Data Structures ConcurrentDictionary TryAdd TryUpdate (CompareExchange) TryRemove AddOrUpdate GetOrAdd
46
@gfraiteur Thread Dispatching Application Programming
47
@gfraiteur Thread Dispatching ThreadPool Problems solved: Execute a task asynchronously (QueueUserWorkItem) Waiting for WaitHandle (RegisterWaitForSingleObject) Creating a thread is expensive Good for I/O intensive operations Bad for CPU intensive operations
48
@gfraiteur Thread Dispatching Dispatcher Execute a task in the UI thread Synchronously: Dispatcher.Invoke Asynchonously: Dispatcher.BeginInvoke Under the hood, uses the HWND message loop
49
@gfraiteur Thread Dispatching
50
@gfraiteur Q&A Gael Fraiteur gael@postsharp.net @gfraiteur
51
Summary
52
@gfraiteur http://www.postsharp.net/ gael@postsharp.net
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.