Bartosz Milewski. Cilk (MIT, Cilk Arts, Intel Cilk+) Parallel Haskell Microsoft UMS Threads (User-Mode Scheduling) (Only 64-bit and Server) Microsoft.

Slides:



Advertisements
Similar presentations
Etter/Ingber Engineering Problem Solving with C Fundamental Concepts Chapter 4 Modular Programming with Functions.
Advertisements

© 2009 Charles E. Leiserson and Pablo Halpern1 Introduction to Cilk++ Programming PADTAD July 20, 2009 Cilk, Cilk++, Cilkview, and Cilkscreen, are trademarks.
INTEL CONFIDENTIAL Threading for Performance with Intel® Threading Building Blocks Session:
Fine-grain Task Aggregation and Coordination on GPUs
MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
Parallel Extensions to the.NET Framework Daniel Moth Microsoft
IBM’s X10 Presentation by Isaac Dooley CS498LVK Spring 2006.
Chimera: Collaborative Preemption for Multitasking on a Shared GPU
A Dynamic World, what can Grids do for Multi-Core computing? Daniel Goodman, Anne Trefethen and Douglas Creager
CHAPTER 5 THREADS & MULTITHREADING 1. Single and Multithreaded Processes 2.
CS 5204 – Operating Systems 1 Scheduler Activations.
INSTITUTE OF COMPUTING TECHNOLOGY An Adaptive Task Creation Strategy for Work-Stealing Scheduling Lei Wang, Huimin Cui, Yuelu Duan, Fang Lu, Xiaobing Feng,
MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts Essentials – 2 nd Edition Chapter 4: Threads.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.
Ceng Operating Systems Chapter 2.1 : Processes Process concept Process scheduling Interprocess communication Deadlocks Threads.
Concurrency, Threads, and Events Robbert van Renesse.
Process Concept An operating system executes a variety of programs
Intro to OS CUCS Mossé Processes and Threads What is a process? What is a thread? What types? A program has one or more locus of execution. Each execution.
Chapter 51 Threads Chapter 5. 2 Process Characteristics  Concept of Process has two facets.  A Process is: A Unit of resource ownership:  a virtual.
ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.
SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.
Multithreaded Web Server.
Multi-core Programming Thread Profiler. 2 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Topics Look at Intel® Thread Profiler features.
CS 153 Design of Operating Systems Spring 2015
Operating Systems Lecture 09: Threads (Chapter 4)
Threads, Thread management & Resource Management.
Tutorial 5 Even More Synchronization! presented by: Antonio Maiorano Paul Di Marco.
Parallel Programming: Responsiveness vs. Performance Joe Hummel, PhD Microsoft MVP Visual C++ Technical Staff: Pluralsight, LLC Professor: U. of Illinois,
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Computer Systems Principles Concurrency Patterns Emery Berger and Mark Corner University.
Scheduling Lecture 6. What is Scheduling? An O/S often has many pending tasks. –Threads, async callbacks, device input. The order may matter. –Policy,
CSE 60641: Operating Systems Scheduler Activations: Effective Kernel Support for the User-Level Management of Parallelism. Thomas E. Anderson, Brian N.
OPERATING SYSTEM SUPPORT DISTRIBUTED SYSTEMS CHAPTER 6 Lawrence Heyman July 8, 2002.
CS333 Intro to Operating Systems Jonathan Walpole.
CSE 451: Operating Systems Section 5 Midterm review.
ITFN 3601 Introduction to Operating Systems Lecture 3 Processes, Threads & Scheduling Intro.
CSE 451: Operating Systems Winter 2015 Module 5 1 / 2 User-Level Threads & Scheduler Activations Mark Zbikowski 476 Allen Center.
Computer Network Lab. Korea University Computer Networks Labs Se-Hee Whang.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition, Chapter 4: Multithreaded Programming.
Operating Systems CSE 411 CPU Management Sept Lecture 10 Instructor: Bhuvan Urgaonkar.
Energy-Aware Resource Adaptation in Tessellation OS 3. Space-time Partitioning and Two-level Scheduling David Chou, Gage Eads Par Lab, CS Division, UC.
CSC Multiprocessor Programming, Spring, 2012 Chapter 8 – Applying Thread Pools Dr. Dale E. Parson, week 10.
University of Michigan Electrical Engineering and Computer Science Adaptive Input-aware Compilation for Graphics Engines Mehrzad Samadi 1, Amir Hormati.
Chapter 4: Threads.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
The C++11 Memory Model CDP Based on “C++ Concurrency In Action” by Anthony Williams, The C++11 Memory Model and GCCThe C++11 Memory Model and GCC Wiki.
Where Testing Fails …. Problem Areas Stack Overflow Race Conditions Deadlock Timing Reentrancy.
Tuning Threaded Code with Intel® Parallel Amplifier.
B ERKELEY P AR L AB Lithe Composing Parallel Software Efficiently PLDI  June 09, 2010 Heidi Pan, Benjamin Hindman, Krste Asanovic  {benh,
Event Sources and Realtime Actions
HPX The C++ Standards Library for Concurrency and Parallelism
C++11 Threading Lieven de Cock
CMPS 5433 Programming Models
Prabhanjan Kambadur, Open Systems Lab, Indiana University
Advanced C++ Programming
Boost String API & Threads
CS399 New Beginnings Jonathan Walpole.
Lighting Up Windows Server 2008 R2 Using the ConcRT on UMS
A task-based implementation for GeantV
Task Scheduling for Multicore CPUs and NUMA Systems
Chapter 4 Multithreading programming
Transactional Memory Semaphores, monitors, and conditional critical regions all suffer from limitations based on lock semantics Naïve synchronization may.
Multithreading.
Compiler Front End Panel
Thread Implementation Issues
Multithreading Tutorial
Building Web Applications with Microsoft ASP
MULTITHREADING PROGRAMMING
Presentation transcript:

Bartosz Milewski

Cilk (MIT, Cilk Arts, Intel Cilk+) Parallel Haskell Microsoft UMS Threads (User-Mode Scheduling) (Only 64-bit and Server) Microsoft PPL (Parallel Patterns Library) Intel TBB (Threading Building Blocks) JVM/.NET-based languages and libraries

Programmer: What can be run in parallel Identify tasks System: What will be run in parallel Assign tasks to threads/processors System: Load balancing, work stealing System: Dealing with blocked tasks Take them off thread, reuse thread Create new UMS thread

Core 1 Core 2 Core 3 Core 4 Task 1 Task 5 Task 2 Task 3 Task 4 Task 6 OversubscribedIdle

Core 1 Core 2 Core 3 Core 4 Task 1 Task 5 Task 2 Task 3 Task 4 Task 6 Busy

Core 1 Core 2 Core 3 Core 4 Task 2 Task 4 Busy Idle

Abstraction level Threads: low Tasks: high Resource usage Threads: heavy-weight Tasks: lightweight Problem solving Threads: improve latency Tasks: improve throughput Level of parallelism Threads: large grain Tasks: fine grain

void thFun(std::promise & prms) { std::string str("Hello from future!"); prms.set_value(str); } void test() { std::promise prms; std::future ftr = prms.get_future(); std::thread th(&thFun, std::ref(prms)); std::cout << "Hello from main!\n"; std::string str = ftr.get(); th.join(); std::cout << str << std::endl; }

std::string taskFun() { std::string str("Hello from task!"); return str; } void test() { std::future ftr = std::async(&taskFun); std::cout << "Hello from main!\n"; std::string str = ftr.get(); std::cout << str << std::endl; }

(30.6.8) The template function async provides a mechanism to launch a function potentially in a new thread and provides the result of the function in a future object with which it shares a shared state. Launch Policies launch::async launch::deferred launch::any (default) Deferred tasks execute in the context of the forcing thread What if the future is not forced?

Global, file-static, class-static, function-static Each thread initializes them separately Each thread destroys them at the end Tasks with launch::async thread_locals must be destroyed Before future::get (or future::wait) returns, or Before future is destructed without forcing Tasks with launch::deferred thread_locals follow the lifetime of the forcing thread

thread_local “as if” task_local At task completion, destroy thread_locals Starting a new task on an old thread: re-initialize all thread_locals Problems Library/runtime hooks for thread_local management TlsAlloc DLL_THREAD_ATTACH/DETACH calls to DllMain

Take a blocked task off a thread Detect blocked tasks Save task state, including thread_locals Restore a taks to a thread Detect unblocked tasks Restore task state, including thread_locals Problems Mutexes are thread-aware Potential for spurious deadlocks (not really)

C++ tasks are ill-fitted for task-based parallelism When designing a task-parallel library, consider thread_local vs task-local mutex (im-)mobility See description of Lockable in the Standard Blocking tasks

Blog: tasks-in-c11-not-quite-there-yet/ tasks-in-c11-not-quite-there-yet/ C++11 tutorial: ncurrencyTutorialPartFive.aspx ncurrencyTutorialPartFive.aspx Proposals: std.org/jtc1/sc22/wg21/docs/papers/2009/n288 0.html std.org/jtc1/sc22/wg21/docs/papers/2009/n288 0.html std.org/jtc1/sc22/wg21/docs/papers/2010/n303 8.html std.org/jtc1/sc22/wg21/docs/papers/2010/n303 8.html