Advanced .NET Programming I 11th Lecture

Slides:



Advertisements
Similar presentations
CHARLES UNIVERSITY IN PRAGUE faculty of mathematics and physics C# Language &.NET Platform 6 th Lecture Pavel Ježek
Advertisements

Chapter 6: Process Synchronization
Background Concurrent access to shared data can lead to inconsistencies Maintaining data consistency among cooperating processes is critical What is wrong.
CH7 discussion-review Mahmoud Alhabbash. Q1 What is a Race Condition? How could we prevent that? – Race condition is the situation where several processes.
Slides 8d-1 Programming with Shared Memory Specifying parallelism Performance issues ITCS4145/5145, Parallel Programming B. Wilkinson Fall 2010.
Threading Part 2 CS221 – 4/22/09. Where We Left Off Simple Threads Program: – Start a worker thread from the Main thread – Worker thread prints messages.
“THREADS CANNOT BE IMPLEMENTED AS A LIBRARY” HANS-J. BOEHM, HP LABS Presented by Seema Saijpaul CS-510.
Computer Architecture II 1 Computer architecture II Lecture 9.
1 Sharing Objects – Ch. 3 Visibility What is the source of the issue? Volatile Dekker’s algorithm Publication and Escape Thread Confinement Immutability.
Synchronization CSCI 444/544 Operating Systems Fall 2008.
OSE 2013 – synchronization (lec3) 1 Operating Systems Engineering Locking & Synchronization [chapter #4] By Dan Tsafrir,
1 Concurrent Languages – Part 1 COMP 640 Programming Languages.
CDP 2013 Based on “C++ Concurrency In Action” by Anthony Williams, The C++11 Memory Model and GCCThe C++11 Memory Model and GCC Wiki and Herb Sutter’s.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Chapter 6 – Process Synchronisation (Pgs 225 – 267)
Java Thread and Memory Model
Multiprocessor Cache Consistency (or, what does volatile mean?) Andrew Whitaker CSE451.
CHARLES UNIVERSITY IN PRAGUE faculty of mathematics and physics Advanced.NET Programming I 5 th Lecture Pavel Ježek
Fundamentals of Parallel Computer Architecture - Chapter 71 Chapter 7 Introduction to Shared Memory Multiprocessors Yan Solihin Copyright.
CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.
CHARLES UNIVERSITY IN PRAGUE faculty of mathematics and physics C# Language &.NET Platform 8 th Lecture Pavel Ježek
CS533 Concepts of Operating Systems Jonathan Walpole.
Week 9, Class 3: Java’s Happens-Before Memory Model (Slides used and skipped in class) SE-2811 Slide design: Dr. Mark L. Hornick Content: Dr. Hornick Errors:
Implementing Lock. From the Previous Lecture  The “too much milk” example shows that writing concurrent programs directly with load and store instructions.
CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Synchronization.
CHARLES UNIVERSITY IN PRAGUE faculty of mathematics and physics Advanced.NET Programming II 3 rd Lecture Pavel Ježek
December 1, 2006©2006 Craig Zilles1 Threads & Atomic Operations in Hardware  Previously, we introduced multi-core parallelism & cache coherence —Today.
Symmetric Multiprocessors: Synchronization and Sequential Consistency
EECE 310: Software Engineering
CSE 120 Principles of Operating
Software Coherence Management on Non-Coherent-Cache Multicores
Atomic Operations in Hardware
Atomic Operations in Hardware
Understand Computer Storage and Data Types
Atomicity CS 2110 – Fall 2017.
Computing with C# and the .NET Framework
Lecture 5: GPU Compute Architecture
Threads and Memory Models Hal Perkins Autumn 2011
Introduction to Operating Systems
Lecture 5: GPU Compute Architecture for the last time
COP 4600 Operating Systems Spring 2011
The C++ Memory model Implementing synchronization)
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Implementing synchronization
COT 5611 Operating Systems Design Principles Spring 2014
COP 4600 Operating Systems Fall 2010
Introduction to High Performance Computing Lecture 20
Jonathan Walpole Computer Science Portland State University
Threads and Memory Models Hal Perkins Autumn 2009
COT 5611 Operating Systems Design Principles Spring 2012
UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department
Background and Motivation
Implementing Mutual Exclusion
Conditions for Deadlock
Implementing Mutual Exclusion
CSE 451: Operating Systems Autumn 2003 Lecture 7 Synchronization
CSE 451: Operating Systems Autumn 2005 Lecture 7 Synchronization
CSE 451: Operating Systems Winter 2003 Lecture 7 Synchronization
CSE 153 Design of Operating Systems Winter 19
CS333 Intro to Operating Systems
Relaxed Consistency Finale
Foundations and Definitions
Programming with Shared Memory Specifying parallelism
Problems with Locks Andrew Whitaker CSE451.
Advanced .NET Programming I 13th Lecture
C# Language & .NET Platform 10th Lecture
C# Language & .NET Platform 9th Lecture
Threads CSE451 Andrew Whitaker TODO: print handouts for AspectRatio.
C# Language & .NET Platform 12th Lecture
Presentation transcript:

Advanced .NET Programming I 11th Lecture Pavel Ježek pavel.jezek@d3s.mff.cuni.cz Some of the slides are based on University of Linz .NET presentations. © University of Linz, Institute for System Software, 2004 published under the Microsoft Curriculum License (http://www.msdnaa.net/curriculum/license_curriculum.aspx)

Locks Allow to execute complex operations “atomically” (if used correctly). Are slow if locking (Monitor.Enter) blocks (implies processor yield) problem for short critical sections – consider spinlocks – .NET struct System.Threading.SpinLock).

Locks Allow to execute complex operations “atomically” (if used correctly). Are slow if locking (Monitor.Enter) blocks (implies processor yield) problem for short critical sections – consider spinlocks – .NET struct System.Threading.SpinLock). Are slow if locking (Monitor.Enter) will not block (implies new unused syncblock [“lock”] allocation + the locking itself) – again problem for short critical sections – consider lock-free/wait-free algorithms/data structures

Journey to Lock-free/Wait-free World What is C#/.NET’s memory model? Any guaranties of a thread behavior (operation atomicity and ordering) from point of view of other threads?

Atomicity in C# Reads and writes of the following data types are atomic: bool, char, byte, sbyte, short, ushort, uint, int, float, and reference types (of the reference itself). OK

Atomicity in C# Reads and writes of the following data types are atomic: bool, char, byte, sbyte, short, ushort, uint, int, float, and reference types (of the reference itself). Reads and writes of other types, including long, ulong, double, decimal, and user-defined types, are not guaranteed to be atomic. OK NO!

Atomicity in C# Reads and writes of the following data types are atomic: bool, char, byte, sbyte, short, ushort, uint, int, float, and reference types (of the reference itself). Reads and writes of other types, including long, ulong, double, decimal, and user-defined types, are not guaranteed to be atomic. There is no guarantee of atomic read-write (e.g. int a = b; is not atomic). There is definitely no guarantee of atomic read-modify-write (e.g. a++; ). OK NO! NO! NO!

Interlocked Static Class .NET provides explicit atomicity for common read-modify-write scenarios, via “methods” of the Interlocked class: Method Available for types Read long Add/Increment/Decrement int, long Exchange/CompareExchange int, long, single, double, and generic for T where T : class All Interlocked methods are wait-free!

2 Threads Executing. Expected Output? int a = 0; int b = 0; void t1() { a = 1; Console.Write(b); } void t2() { b = 1; Console.Write(a); Option Result A 0 0 B 0 1 C 1 0 D 1 1

2 Threads Executing. Expected Output? int a = 0; int b = 0; void t1() { a = 1; Console.Write(b); } void t2() { b = 1; Console.Write(a); OK, compiler can do almost “anything” with this code (e.g. reorder a = 1 after Console.Write(b))! So, let’s suppose we disabled all compiler optimizations. Option Result A 0 0 B 0 1 C 1 0 D 1 1

2 Threads Executing. Expected Output? int a = 0; int b = 0; void t1() { a = 1; Console.Write(b); } void t2() { b = 1; Console.Write(a); Option Result A 0 0 B 0 1 C 1 0 D 1 1

2 Threads Executing. Expected Output? int a = 0; int b = 0; void t1() { a = 1; Console.Write(b); } void t2() { b = 1; Console.Write(a); Option Result A 0 0 B 0 1 C 1 0 D 1 1 (when running simultaneously)

2 Threads Executing. Expected Output? int a = 0; int b = 0; void t1() { a = 1; Console.Write(b); } void t2() { b = 1; Console.Write(a); Option Result A 0 0 B 0 1 C 1 0 (due to preemption) D 1 1 (when running simultaneously)

2 Threads Executing. Expected Output? int a = 0; int b = 0; void t1() { a = 1; Console.Write(b); } void t2() { b = 1; Console.Write(a); Option Result A 0 0 B 0 1 C 1 0 (due to preemption – Console.Write(b) is not atomic!) D 1 1 (when running simultaneously)

2 Threads Executing. Expected Output? int a = 0; int b = 0; void t1() { a = 1; Console.Write(b); } void t2() { b = 1; Console.Write(a); a = 1 temp1 = b Console.Write(temp1) Option Result A 0 0 B 0 1 C 1 0 (due to preemption – Console.Write(b) is not atomic!) D 1 1 (when running simultaneously)

2 Threads Executing. Expected Output? int a = 0; int b = 0; void t1() { a = 1; Console.Write(b); } void t2() { b = 1; Console.Write(a); t1: a = 1 t1: temp1 = b (== 0) t2: b = 1 t2: temp2 = a (== 1) t2: Console.Write(temp2) (== 1) t1: Console.Write(temp1) (== 0) Option Result A 0 0 B 0 1 C 1 0 (due to preemption – Console.Write(b) is not atomic!) D 1 1 (when running simultaneously)

2 Threads Executing. Expected Output? int a = 0; int b = 0; void t1() { a = 1; Console.Write(b); } void t2() { b = 1; Console.Write(a); Option Result A 0 0 (can happed due to optimizations in current processors – memory access reordering) B 0 1 C 1 0 (due to preemption – Console.Write(b) is not atomic!) D 1 1 (when running simultaneously)

2 Threads Executing. Expected Output? int a = 0; int b = 0; void t1() { a = 1; Console.Write(b); } void t2() { b = 1; Console.Write(a); t1: a = 1 (stored in CPU1 cache) t2: b = 1 (stored in CPU2 cache) t1: temp1 = b (== 0 in CPU1 cache) t2: temp2 = a (== 0 in CPU2 cache) CPU1: writes back a (== 1) CPU2: sees a == 1 CPU2: writes back b (== 1) CPU1: sees b == 1 t1: Console.Write(temp1) (== 0) t2: Console.Write(temp2) (== 0) Option Result A 0 0 (can happed due to optimizations in current processors – memory access reordering) B 0 1 C 1 0 (due to preemption – Console.Write(b) is not atomic!) D 1 1 (when running simultaneously)

Concurrent Access using System; using System.Threading; class Test {   class Test { public static int result; public static bool finished; static void Thread2() { result = 123; finished = true; } static void Main() { finished = false; new Thread(Thread2).Start(); for (;;) { if (finished) { Console.WriteLine("result = {0}", result); return;

Concurrent Access or using System; using System.Threading;   class Test { public static int result; public static bool finished; static void Thread2() { result = 123; finished = true; } static void Main() { finished = false; new Thread(Thread2).Start(); for (;;) { if (finished) { Console.WriteLine("result = {0}", result); return; or

Concurrent Access Can it be more wrong? or using System; using System.Threading;   class Test { public static int result; public static bool finished; static void Thread2() { result = 123; finished = true; } static void Main() { finished = false; new Thread(Thread2).Start(); for (;;) { if (finished) { Console.WriteLine("result = {0}", result); return; Can it be more wrong? or

Concurrent Access Oh, YES!  or or using System; using System.Threading;   class Test { public static int result; public static bool finished; static void Thread2() { result = 123; finished = true; } static void Main() { finished = false; new Thread(Thread2).Start(); for (;;) { if (finished) { Console.WriteLine("result = {0}", result); return;   } Oh, YES!  or or

Compiler optimizations rule them all. Concurrent Access using System; using System.Threading;   class Test { public static int result; public static bool finished; static void Thread2() { result = 123; finished = true; } static void Main() { finished = false; new Thread(Thread2).Start(); for (;;) { if (finished) { Console.WriteLine("result = {0}", result); return;   } Oh, YES!  Compiler optimizations rule them all. or or

Concurrent Access – Solution with Locks using System; using System.Threading; class Test { public static int result; public static bool finished; static void Thread2() { lock (???) { result = 123; finished = true; } static void Main() { finished = false; new Thread(Thread2).Start(); for (;;) { if (finished) { Console.WriteLine("result = {0}", result); return; } } }

Concurrent Access – Wrong Solution with Locks using System; using System.Threading; class Test { public static int result; public static bool finished; static void Thread2() { lock (typeof(Test)) { result = 123; finished = true; } static void Main() { finished = false; new Thread(Thread2).Start(); for (;;) { if (finished) { Console.WriteLine("result = {0}", result); return; } } }

Concurrent Access – Still Wrong Solution with Locks? class Test { public int result; public bool finished; void Thread2() { lock (this) { result = 123; finished = true; } void Thread1() { finished = false; new Thread(Thread2).Start(); for (;;) { if (finished) { Console.WriteLine("result = {0}", result); return; static void Main() { new Test().Thread1(); } }  

Concurrent Access – Correct Wrong Solution with Locks class Test { public int result; public bool finished; private object resultLock = new object(); void Thread2() { lock (resultLock) { result = 123; finished = true; } void Thread1() { finished = false; new Thread(Thread2).Start(); for (;;) { if (finished) { Console.WriteLine("result = {0}", result); return; static void Main() { new Test().Thread1(); } }  

Concurrent Access or or using System; using System.Threading;   class Test { public static int result; public static bool finished; static void Thread2() { result = 123; finished = true; } static void Main() { finished = false; new Thread(Thread2).Start(); for (;;) { if (finished) { Console.WriteLine("result = {0}", result); return; or or

Concurrent Access – Volatile Magic! using System; using System.Threading;   class Test { public static int result; public static volatile bool finished; static void Thread2() { result = 123; finished = true; } static void Main() { finished = false; new Thread(Thread2).Start(); for (;;) { if (finished) { Console.WriteLine("result = {0}", result); return;

Volatile Access – Part I ECMA: An optimizing compiler that converts CIL to native code shall not remove any volatile operation, nor shall it coalesce multiple volatile operations into a single operation.

volatile → Limited Compiler Optimizations using System; using System.Threading;   class Test { public static int result; public static volatile bool finished; static void Thread2() { result = 123; finished = true; } static void Main() { finished = false; new Thread(Thread2).Start(); for (;;) { if (finished) { Console.WriteLine("result = {0}", result); return; or

Volatile Access – Part II ECMA/C# Spec: A read of a volatile field is called a volatile read. A volatile read has “acquire semantics”; that is, it is guaranteed to occur prior to any references to memory that occur after it in the instruction sequence. ECMA/C# Spec: A write of a volatile field is called a volatile write. A volatile write has “release semantics”; that is, it is guaranteed to happen after any memory references prior to the write instruction in the instruction sequence. Both constraints visible and obeyed by C# compiler, and CLR/JIT!

Concurrent Access – Volatile Access using System; using System.Threading;   class Test { public static int result; public static volatile bool finished; static void Thread2() { result = 123; finished = true; } static void Main() { finished = false; new Thread(Thread2).Start(); for (;;) { if (finished) { Console.WriteLine("result = {0}", result); return;

Volatile Access – Part III System.Threading.Thread.VolatileRead/VolatileWrite from/to any field ↔ any read/write from/to a volatile field ECMA: Thread.VolatileRead: Performs a volatile read from the specified address. The value at the given address is atomically loaded with acquire semantics, meaning that the read is guaranteed to occur prior to any references to memory that occur after the execution of this method in the current thread. It is recommended that Thread.VolatileRead and Thread.VolatileWrite be used in conjunction. Calling this method affects only this single access; other accesses to the same location are required to also be made using this method or Thread.VolatileWrite if the volatile semantics are to be preserved. This method has exactly the same semantics as using the volatile prefix on the load CIL instruction, except that atomicity is provided for all types, not just those 32 bits or smaller in size. MSDN: Thread.VolatileRead: Reads the value of a field. The value is the latest written by any processor in a computer, regardless of the number of processors or the state of processor cache. CORRECT NOT TRUE!

Volatile Access – Part IV System.Threading.Thread.VolatileRead/VolatileWrite from/to any field ↔ read/write from/to a volatile field Volatile read/write is slower than normal read/write → excessive use of volatile fields can degrade performace! System.Threading.Thread.VolatileRead/VolatileWrite allow to do volatile reads/writes only on need-to-do basis – i.e. only in parts of algorithm with data races, or allows volatile writes without volatile reads, etc.

2 Threads Executing. Expected Output? volatile int a = 0; volatile int b = 0; void t1() { a = 1; Console.Write(b); } void t2() { b = 1; Console.Write(a); Option Result A 0 0 B 0 1 C 1 0 D 1 1

2 Threads Executing. Expected Output? volatile int a = 0; volatile int b = 0; void t1() { a = 1; Console.Write(b); } void t2() { b = 1; Console.Write(a); Option Result A 0 0 – still possible due to volatile read’s acquire semantics! B 0 1 C 1 0 D 1 1

Thread.MemoryBarrier() ECMA: Guarantees that all subsequent loads or stores from the current thread will not access memory until after all previous loads and stores from the current thread have completed, as observed from this or other threads. MSDN: The processor executing the current thread cannot reorder instructions in such a way that memory accesses prior to the call to MemoryBarrier execute after memory accesses that follow the call to MemoryBarrier.

2 Threads Executing. Expected Output? int a = 0; int b = 0; void t1() { a = 1; Thread.MemoryBarrier(); Console.Write(b); } void t2() { b = 1; Console.Write(a); Option Result A 0 0 (Finally OK – cannot happen here!) B 0 1 C 1 0 (Still possible due to preemption) D 1 1 (when running simultaneously)

2 Threads Executing. Expected Output? Warning: volatile should be still considered in most situations – to avoid C#/JIT compiler optimizations. volatile int a = 0; volatile int b = 0; void t1() { a = 1; Thread.MemoryBarrier(); Console.Write(b); } void t2() { b = 1; Console.Write(a); Option Result A 0 0 (Finally OK – cannot happen here!) B 0 1 C 1 0 (Still possible due to preemption) D 1 1 (when running simultaneously)

Concurrent Access – Solution with Locks using System; using System.Threading; class Test { public static int result; public static bool finished; private static object resultLock = new object(); static void Thread2() { lock (resultLock) { result = 123; finished = true; } static void Main() { finished = false; new Thread(Thread2).Start(); for (;;) { if (finished) { Console.WriteLine("result = {0}", result); return; } } } Does it really work? If yes, then why?

Implicit Memory Barriers Many threading API include an implicit memory barrier (aka memory fence), e.g.: Monitor.Enter/Monitor.Exit Interlocked.* Thread.Start

System.Collections.Concurrent

System.Collections.Immutable ImmutableArray<T> ImmutableDictionary<T> ImmutableHashSet<T> ImmutableList<T> ImmutableQueue<T> ImmutableSortedDictionary<TKey, TValue> ImmutableSortedSet<T> ImmutableStack<T>

System.Collections.Immutable var list = ImmutableList.Create(); list = list.Add("first"); list = list.Add("second"); list = list.Add("third");

System.Collections.Immutable var list = ImmutableList.Create(); list = list.Add("first"); list = list.Add("second"); list = list.Add("third"); var builder = ImmutableList.CreateBuilder(); builder.Add("first"); builder.Add("second"); builder.Add("third"); var list = builder.ToImmutable();