Bart J.F. De Smet Software Development Engineer Microsoft Corporation Session Code: DTL206 Wishful.

Slides:



Advertisements
Similar presentations
James Kolpack, InRAD LLC popcyclical.com. CodeStock is proudly partnered with: Send instant feedback on this session via Twitter: Send a direct message.
Advertisements

Parallel Extensions to the.NET Framework Daniel Moth Microsoft
FUTURE OF.NET PARALLEL PROGRAMMING Joseph Albahari SESSION CODE: DEV308 (c) 2011 Microsoft. All rights reserved.
Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.
Stephen Toub Parallel Computing Platform Microsoft Corporation.
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
Parallel Programming in Visual Studio 2010 Sasha Goldshtein Senior Consultant, Sela Group
Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.
3.5 Interprocess Communication
Daniel Moth  Parallel Computing Platform Microsoft Corporation TL26.
Connect with life Bijoy Singhal Developer Evangelist | Microsoft India.
Larry Mead Microsoft Corp. Jon Flanders Session Code: INT203.
SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.
A Top Level Overview of Parallelism from Microsoft's Point of View in 15 minutes IDC HPC User’s Forum April 2010 David Rich Director Strategic Business.
Better concurrency with a.NET language for isolation and asynchrony Niklas Gustafsson Parallel Computing Platform Microsoft Corporation
Image Processing Image Processing Windows HPC Server 2008 HPC Job Scheduler Dryad DryadLINQ Machine Learning Graph Analysis Graph Analysis Data Mining.NET.
 Lynne Hill General Manager Parallel Computing Platform Visual Studio.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Microsoft Research Faculty Summit Panacea or Pandora’s Box? Software Transactional Memory Panacea or Pandora’s Box? Christos Kozyrakis Assistant.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Software & the Concurrency Revolution by Sutter & Larus ACM Queue Magazine, Sept For CMPS Halverson 1.
Parallel Extensions A glimpse into the parallel universe By Eric De Carufel Microsoft.NET Solution Architect at Orckestra
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
CS 346 – Chapter 4 Threads –How they differ from processes –Definition, purpose Threads of the same process share: code, data, open files –Types –Support.
Eric White Technical Evangelist Microsoft Corporation Session Code: OFC403.
CS162 Week 5 Kyle Dewey. Overview Announcements Reactive Imperative Programming Parallelism Software transactional memory.
Bart J.F. De Smet Software Development Engineer Microsoft Corporation Session Code: DTL315.
Stephen Toub Senior Program Manager Lead Microsoft Session Code: DTL203.
Visual Studio 2010 and.NET Framework 4 Training Workshop.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Dawie Human Infrastructure Architect Inobits Consulting VIR202.
Operating Systems Unit 2: – Process Context switch Interrupt Interprocess communication – Thread Thread models Operating Systems.
Concurrency (Threads) Threads allow you to do tasks in parallel. In an unthreaded program, you code is executed procedurally from start to finish. In a.
Mark Michaelis Chief Computer Nerd IDesign/Itron/IntelliTechture DTL313.
THE FUTURE OF C#: GOOD THINGS COME TO THOSE WHO ‘AWAIT’ Joseph Albahari SESSION CODE: DEV411 (c) 2011 Microsoft. All rights reserved.
David Callahan Distinguished Engineer Parallel Computing Platform Team Visual Studio Microsoft.
Chapter 4 – Thread Concepts
Using Microsoft Visual Basic to Build Windows Phone Applications
Parallel Programming By J. H. Wang May 2, 2017.
Chapter 4 – Thread Concepts
Async or Parallel? No they aren’t the same thing!
Upgrading Your C# Programming Skills to Be a More Effective Developer
Lighting Up Windows Server 2008 R2 Using the ConcRT on UMS
Computer Engg, IIT(BHU)
Parallel Algorithm Design
Task Parallel Library: Design Principles and Best Practices
Many-core Software Development Platforms
TechEd /14/2018 6:26 PM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered.
C++ Forever: Interactive Applications in the Age of Manycore
Matt Masson Software Development Engineer Microsoft Corporation
Advanced Dashboard Creation Using Microsoft SharePoint Server 2010
Branching and Merging Practices
12/2/2018 4:10 AM © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Building event-driven, long-running apps with Windows workflow
Building responsive apps and sites with HTML5 web workers
F# for Parallel and Asynchronous Programming
Tech·Ed North America /8/ :16 PM
TechEd /9/2018 4:17 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.
Windows Phone application performance and optimization
Intro to Workflow Services and Windows Server AppFabric
Lecture Topics: 11/1 General Operating System Concepts Processes
Pedro Miguel Teixeira Senior Software Developer Microsoft Corporation
Microsoft SharePoint Conference 2009 Jon Flanders
DEV422 Becoming a C# Time Lord Joe Albahari 2/5/ :46 AM
Peter Provost Sr. Program Manager Microsoft Session Code: DEV312
How and When to Use MEF: Too Much Is Never Enough
Tech Ed North America /27/ :04 AM Required Slide
Foundations and Definitions
Server & Tools Business
Presentation transcript:

Bart J.F. De Smet Software Development Engineer Microsoft Corporation Session Code: DTL206 Wishful thinking?

Agenda The concurrency landscape Language headaches.NET 4.0 facilities Task Parallel Library PLINQ Coordination Data Structures Asynchronous programming Incubation projects Summary

Moore’s law The number of transistors incorporated in a chip will approximately double every 24 months. Gordon Moore – Intel – 1965 Let’s sell processors

Moore’s law today It can't continue forever. The nature of exponentials is that you push them out and eventually disaster happens. Gordon Moore – Intel – 2005 Let’s sell even more processors Let’s sell even more processors

Hardware Paradigm Shift “… we see a very significant shift in what architectures will look like in the future... fundamentally the way we've begun to look at doing that is to move from instruction level concurrency to … multiple cores per die. But we're going to continue to go beyond there. And that just won't be in our server lines in the future; this will permeate every architecture that we build. All will have massively multicore implementations.” Intel Developer Forum, Spring 2004 Pat Gelsinger Chief Technology Officer, Senior Vice President Intel Corporation February, 19, ,0001, ‘70‘80‘90‘00‘10 Power Density (W/cm 2 ) Pentium ® processors Hot Plate Nuclear Reactor Rocket Nozzle Sun’s Surface Intel Developer Forum, Spring Pat Gelsinger Many-core Peak Parallel GOPs Single Threaded Perf 10% per year To Grow, To Keep Up, We Must Embrace Parallel Computing GOPS 32,7682, Today’s Architecture: Heat becoming an unmanageable problem! Parallelism Opportunity 80X

Problem statement Shared mutable state Needs synchronization primitives Locks are problematic Risk for contention Poor discoverability (SyncRoot anyone?) Not composable Difficult to get right (deadlocks, etc.) Coarse-grained concurrency Threads well-suited for large units of work Expensive context switching Asynchronous programming

What can go wrong? Races Deadlocks Livelocks Lock convoys Cache coherency Overheads Lost event notifications Broken serializability Priority inversion

Microsoft Parallel Computing Initiative Applications Domain libraries Programming models & languages Developer Tooling Runtime, platform, OS, HyperVisor Hardware VB C#C# F#F# Constructing Parallel Applications Executing fine-grain Parallel Applications Coordinating system resources/services

Agenda The concurrency landscape Language headaches.NET 4.0 facilities Task Parallel Library PLINQ Coordination Data Structures Asynchronous programming Incubation projects Summary

Languages: two extremes LISP heritage (Haskell, ML) No mutable stateMutable state Fortran heritage (C, C++, C#, VB) Fundamentalist functional programming F#

Mutability Mutable by default (C# et al) Immutable by default (F# et al) int x = 5; // Share out x x++; let x = 5 // Share out x // Can’t mutate x let mutable x = 5 // Share out x x <- x + 1 Synchronization required No locking required Explicit opt-in

Side-effects will kill you Elimination of common sub-expressions? Runtime out of control Can’t optimize code Types don’t reveal side-effects Haskell concept of IO monad Did you know? LINQ is a monad! Source: let now = DateTime.Now in (now, now) (DateTime.Now, DateTime.Now) static DateTime Now { get; }

T IO - Promote (Return) Monads for dummies IO T

IO - Combine (Bind) T Monads for dummies Source: IO TIO R IEnumerable SelectMany(IEnumerable, Func >)

Languages: two roadmaps? Making C# better Add safety nets? Immutability Purity constructs Linear types Software Transactional Memory Kamikaze-style of concurrency Simplify common patterns Making Haskell mainstream Just right? Too academic? Not a smooth upgrade path? C# Haskell NirvanaNirvana

Taming side-effects in F# Bart J.F. De Smet Software Development Engineer Microsoft Corporation

Agenda The concurrency landscape Language headaches.NET 4.0 facilities Task Parallel Library PLINQ Coordination Data Structures Asynchronous programming Incubation projects Summary

Parallel Extensions Architecture.NET Program Proc 1 … … PLINQ Execution Engine C# Compiler VB Compiler C++ Compiler IL OS Scheduling Primitives (also UMS in Windows 7 and up) DeclarativeQueriesDeclarativeQueries Data Partitioning Chunk Range Hash Striped Repartitioning Data Partitioning Chunk Range Hash Striped Repartitioning Operator Types Map Scan Build Search Reduction Operator Types Map Scan Build Search Reduction Merging Async (pipeline) Synch Order Preserving Sorting ForAll Merging Async (pipeline) Synch Order Preserving Sorting ForAll Proc p Parallel Algorithms Query Analysis Task Parallel Library (TPL) Coordination Data Structures Thread-safe Collections Synchronization Types Coordination Types Thread-safe Collections Synchronization Types Coordination Types Task APIs Task Parallelism Futures Scheduling Task APIs Task Parallelism Futures Scheduling PLINQ TPL or CDS F# Compiler Other.NET Compiler

Task Parallel Library – Tasks System.Threading.Tasks Task Parent-child relationships Explicit grouping Waiting and cancelation Task Tasks that produce values Also known as futures Parallel Task 1Task 2…Task N

Work Stealing Internally, the runtime uses Work stealing techniques Lock-free concurrent task queues Work stealing has provably Good locality Work distribution properties p1p2p

22 Example code to parallelize void MultiplyMatrices(int size, double[,] m1, double[,] m2, double[,] result) { for (int i = 0; i < size; i++) { for (int j = 0; j < size; j++) { result[i, j] = 0; for (int k = 0; k < size; k++) { result[i, j] += m1[i, k] * m2[k, j]; }

23 Solution today int N = size; int P = 2 * Environment.ProcessorCount; int Chunk = N / P; // size of a work chunk ManualResetEvent signal = new ManualResetEvent(false); int counter = P; // counter limits kernel transitions for (int c = 0; c < P; c++) { // for each chunk ThreadPool.QueueUserWorkItem(o => { int lc = (int)o; for (int i = lc * Chunk; // process one chunk i < (lc + 1 == P ? N : (lc + 1) * Chunk); // respect upper bound i++) { // original loop body for (int j = 0; j < size; j++) { result[i, j] = 0; for (int k = 0; k < size; k++) { result[i, j] += m1[i, k] * m2[k, j]; } if (Interlocked.Decrement(ref counter) == 0) { // efficient interlocked ops signal.Set(); // and kernel transition only when done } }, c); } signal.WaitOne(); Error Prone High Overhead Tricks Static Work Distribution Knowledge of Synchronization Primitives Heavy Synchronization Lack of Thread Reuse

24 Solution with Parallel Extensions void MultiplyMatrices(int size, double[,] m1, double[,] m2, double[,] result) { Parallel.For (0, size, i => { for (int j = 0; j < size; j++) { result[i, j] = 0; for (int k = 0; k < size; k++) { result[i, j] += m1[i, k] * m2[k, j]; } }); } Structured parallelism

Task Parallel Library – Loops Common source of work in programs System.Threading.Parallel class Parallelism when iterations are independent Body doesn’t depend on mutable state E.g. static variables, writing to local variables used in subsequent iterations Synchronous All iterations finish, regularly or exceptionally for (int i = 0; i < n; i++) work(i); … foreach (T e in data) work(e); Parallel.For(0, n, i => work(i)); … Parallel.ForEach(data, e => work(e)); Why immutability gains attention

Task Parallel Library Bart J.F. De Smet Software Development Engineer Microsoft Corporation

Amdahl’s law Maximum speedup: S k – speed-up factor for portion k P k – percentage of instructions in part k that can parallelized Simplified: P – percentage of instructions that can be parallelized N – number of processors Sky is not the limit

Amdahl’s law by example Theoretical maximum speedup determined by amount of linear code

Performance Tips Compute intensive and/or large data sets Work done should be at least 1,000s of cycles Do not be gratuitous in task creation Lightweight, but still requires object allocation, etc. Parallelize only outer loops where possible Unless N is insufficiently large to offer enough parallelism Prefer isolation & immutability over synchronization Synchronization == !Scalable Try to avoid shared data Have realistic expectations Amdahl’s Law Speedup will be fundamentally limited by the amount of sequential computation Gustafson’s Law But what if you add more data, thus increasing the parallelizable percentage of the application?

Enable LINQ developers to leverage parallel hardware Fully supports all.NET Standard Query Operators Abstracts away the hard work of using parallelism Partitions and merges data intelligently (classic data parallelism) Minimal impact to existing LINQ programming model AsParallel extension method Optional preservation of input ordering (AsOrdered) Query syntax enables runtime to auto-parallelize Automatic way to generate more Tasks, like Parallel Graph analysis determines how to do it Very little synchronization internally: highly efficient Parallel LINQ (PLINQ) var q = from p in people where p.Name == queryInfo.Name && p.State == queryInfo.State && p.Year >= yearStart && p.Year <= yearEnd orderby p.Year ascending select p; Query Task 1…Task N

PLINQ Bart J.F. De Smet Software Development Engineer Microsoft Corporation

Coordination Data Structures New synchronization primitives (System.Threading) Barrier Multi-phased algorithm Tasks signal and wait for phases CountdownEvent Has an initial counter value Gets signaled when count reaches zero LazyInitializer Lazy initialization routines Reference type variable gets initialized lazily SemaphoreSlim Slim brother to Semaphore (goes kernel mode) SpinLock, SpinWait Loop-based wait (“spinning”) Avoids context switch or kernel mode transition

Coordination Data Structures Concurrent collections (System.Collections.Concurrent) BlockingCollection Producer/consumer scenarios Blocks when no data is available (consumer) Blocks when no space is available (producer) ConcurrentBag ConcurrentDictionary ConcurrentQueue, ConcurrentStack Thread-safe and scalable collections As lock-free as possible Partitioner Facilities to partition data in chunks E.g. PLINQ partitioning problems

Coordination Data Structures Bart J.F. De Smet Software Development Engineer Microsoft Corporation

Asynchronous workflows in F# Language feature unique to F# Based on theory of monads But much more exhaustive compared to LINQ… Overloadable meaning for specific keywords Continuation passing style Not: ‘a -> ‘b But: ‘a -> (‘b -> unit) -> unit In C# style: Action > Core concept: async { /* code */ } Syntactic sugar for keywords inside block E.g. let!, do!, use! Function takes computation result

36 Asynchronous workflows in F# let processAsync i = async { use stream = File.OpenRead(sprintf "Image%d.tmp" i) let! pixels = stream.AsyncRead(numPixels) let pixels' = transform pixels i use out = File.OpenWrite(sprintf "Image%d.done" i) do! out.AsyncWrite(pixels') } let processAsyncDemo = printfn "async demo..." let tasks = [ for i in 1.. numImages -> processAsync i ] Async.RunSynchronously (Async.Parallel tasks) |> ignore printfn "Done!" Run tasks in parallel stream.Read(numPixels, pixels -> let pixels' = transform pixels i use out = File.OpenWrite(sprintf "Image%d.done" i) do! out.AsyncWrite(pixels') )

Asynchronous workflows in F# Bart J.F. De Smet Software Development Engineer Microsoft Corporation

Reactive Fx First-class events in.NET Dualism of IEnumerable interface IObservable Pull versus push Pull (active): IEnumerable and foreach Push (passive): raise events and event handlers Events based on functions Composition at its best Definition of operators: LINQ to Events Realization of the continuation monad

39 IObservable and IObserver // Dual of IEnumerable public interface IObservable { IDisposable Subscribe(IObserver observer); } // Dual of IEnumerator public interface IObserver { // IEnumerator.MoveNext return value void OnCompleted(); // IEnumerator.MoveNext exceptional return void OnError(Exception error); // IEnumerator.Current property void OnNext(T value); } Way to unsubscribe Signaling the last event Virtually two return types Contra- variance Co- variance

ReactiveFx Bart J.F. De Smet Software Development Engineer Microsoft Corporation Visit channel9.msdn.com for info

Agenda The concurrency landscape Language headaches.NET 4.0 facilities Task Parallel Library PLINQ Coordination Data Structures Asynchronous programming Incubation projects Summary

DevLabs project (previously “Maestro”) Coordination between components “Disciplined sharing” Actor model Agents communicate via messages Channels to exchange data via ports Language features (based on C#) Declarative data pipelines and protocols Side-effect-free functions Asynchronous methods Isolated methods Also suitable in distributed setting

43 Channels for message exchange agent Program : channel Microsoft.Axum.Application { public Program() { string[] args = receive(PrimaryChannel::CommandLine); PrimaryChannel::ExitCode <-- 0; }

44 Agents and channels channel Adder { input int Num1; input int Num2; output int Sum; } agent AdderAgent : channel Adder { public AdderAgent() { int result = receive ( PrimaryChannel ::Num1) + receive ( PrimaryChannel ::Num2); PrimaryChannel ::Sum <-- result; } Send / receive primitives

45 Protocols channel Adder { input int Num1; input int Num2; output int Sum; Start: { Num1 -> GotNum1; } GotNum1: { Num2 -> GotNum2; } GotNum2: { Sum -> End; } } State transition diagram

46 Use of pipelines agent MainAgent : channel Microsoft.Axum.Application { function int Fibonacci(int n) { if (n <= 1) return n; return Fibonacci(n - 1) + Fibonacci(n - 2); } int c = 10; void ProcessResult(int n) { Console.WriteLine(n); if (--c == 0) PrimaryChannel::ExitCode <-- 0; } public MainAgent() { var nums = new OrderedInteractionPoint (); nums ==> Fibonacci ==> ProcessResult; for (int i = 0; i < c; i++) nums < i; } Description of data flow Mathematical function

47 Domains domain Chatroom { private string m_Topic; private int m_UserCount; reader agent User : channel UserCommunication { //... } writer agent Administrator : channel AdminCommunication { //... } Unit of sharing between agents

48 Asynchronous methods private asynchronous void ReadFile(string path) { Stream stream = new Stream(...); int numRead = stream.Read(...); while (numRead > 0) {... numRead = stream.Read(...); } Blocking operations inside

Axum in a nutshell Bart J.F. De Smet Software Development Engineer Microsoft Corporation

Another DevLabs project Cutting edge, released 7/28 Specialized fork from.NET 4.0 Beta 1 CLR modifications required First-class transactions on memory As an alternative to locking “Optimistic” concurrency methodology Make modifications Rollback changes on conflict Core concept: atomic { /* code */ }

Transactional memory Subtle difference Problems with locks: Potential for deadlocks… …and more ugliness Granularity matters a lot Don’t compose well atomic { m_x++; m_y--; throw new MyException() } lock (GlobalStmLock) { m_x++; m_y--; throw new MyException() }

52 Bank account sample public static void Transfer(BankAccount from, BankAccount backup, BankAccount to, int amount) { Atomic.Do(() => { // Be optimistic, credit the beneficiary first to.ModifyBalance(amount); // Find the appropriate funds in source accounts try { from.ModifyBalance(-amount); } catch (OverdraftException) { backup.ModifyBalance(-amount); } }); }

53 Atomic cell update public class SingleCellQueue where T : class { T m_item; public void T Get() { atomic { T temp = m_item; if (temp == null) retry; m_item = null; return temp; } public void T Put(T item) { atomic { if (m_item != null) retry; m_item = item; } Don’t forget

The hard truth about STM Great features ACIDACID Optimistic concurrency Transparent rollback and re-execute System.Transactions (LTM) and DTC support Implementation Instrumentation of shared state access JIT compiler modification No hardware support currently Result: 2x to 7x serial slowdown (in alpha prototype) But improved parallel scalability

STM.NET Bart J.F. De Smet Software Development Engineer Microsoft Corporation Visit msdn.microsoft.com/devlabs

DryadLINQ Dryad Infrastructure for cluster computation Concept of job DryadLINQ LINQ over Dryad Decomposition of query Distribution over computation nodes Roughly similar to PLINQ A la “map-reduce” Declarative approach works

DryadLINQ = LINQ + Dryad C# Vertex code Query plan (Dryad job) Data collection results Collection collection; bool IsLegal(Key k); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};

DryadLINQ Bart J.F. De Smet Software Development Engineer Microsoft Corporation Visit research.microsoft.com/dryad

Agenda The concurrency landscape Language headaches.NET 4.0 facilities Task Parallel Library PLINQ Coordination Data Structures Asynchronous programming Incubation projects Summary

Parallel programming requires thinking Avoid side-effects Prefer immutability Act 1 = Library approach in.NET 4.0 Task Parallel Library Parallel LINQ Coordination Data Structures Asynchronous patterns (+ a bit of language sugar) Act 2 = Different approaches are lurking Software Transactional Memory Purification of languages

Sessions On-Demand & Community Resources for IT Professionals Resources for Developers Microsoft Certification & Training Resources Resources

Related Content Breakout Sessions (session codes and titles) Interactive Theater Sessions (session codes and titles) Hands-on Labs (session codes and titles)

Track Resources Resource 1ki Resource 2 Resource 3 Resource 4

Complete an evaluation on CommNet and enter to win! Required Slide

© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. Required Slide