Data Parallelism Task Parallel Library (TPL) The use of lambdas Map-Reduce Pattern FEN 20141UCN Teknologi/act2learn.

Slides:



Advertisements
Similar presentations
Section 5: More Parallel Algorithms
Advertisements

Parallelism Lecture notes from MKP and S. Yalamanchili.
Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.
1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
Multithreaded Programs in Java. Tasks and Threads A task is an abstraction of a series of steps – Might be done in a separate thread – Java libraries.
IAP C# Lecture 3 Parallel Computing Geza Kovacs. Sequential Execution So far, all our code has been executing instructions one after another, on a single.
CSE 1302 Lecture 21 Exception Handling and Parallel Programming Richard Gesick.
DISTRIBUTED COMPUTING & MAP REDUCE CS16: Introduction to Data Structures & Algorithms Thursday, April 17,
Recursion. Recursion is a powerful technique for thinking about a process It can be used to simulate a loop, or for many other kinds of applications In.
Parallel Processors Todd Charlton Eric Uriostique.
A Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 3 Parallel Prefix, Pack, and Sorting Dan Grossman Last Updated: November.
Analysis of Algorithms. Time and space To analyze an algorithm means: –developing a formula for predicting how fast an algorithm is, based on the size.
Advanced Topics in Algorithms and Data Structures 1 Two parallel list ranking algorithms An O (log n ) time and O ( n log n ) work list ranking algorithm.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
SE320: Introduction to Computer Games Week 8: Game Programming Gazihan Alankus.
PARALLEL PROGRAMMING ABSTRACTIONS 6/16/2010 Parallel Programming Abstractions 1.
Parallel Programming in.NET Kevin Luty.  History of Parallelism  Benefits of Parallel Programming and Designs  What to Consider  Defining Types of.
CS 221 – May 13 Review chapter 1 Lab – Show me your C programs – Black spaghetti – connect remaining machines – Be able to ping, ssh, and transfer files.
Department of Computer Science and Engineering, HKUST 1 HKUST Summer Programming Course 2008 Recursion.
Lists in Python.
Parallel Programming Models Basic question: what is the “right” way to write parallel programs –And deal with the complexity of finding parallelism, coarsening.
Amdahl's Law Validity of the single processor approach to achieving large scale computing capabilities Presented By: Mohinderpartap Salooja.
Passing Other Objects Strings are called immutable which means that once a String object stores a value, it never changes –recall when we passed a message.
Task Parallel Library (TPL)
Concurrent Algorithms. Summing the elements of an array
CSE373: Data Structures & Algorithms Lecture 27: Parallel Reductions, Maps, and Algorithm Analysis Nicki Dell Spring 2014.
Java Threads. What is a Thread? A thread can be loosely defined as a separate stream of execution that takes place simultaneously with and independently.
Compiled by Maria Ramila Jimenez
Lecture 21 Parallel Programming Richard Gesick. Parallel Computing Parallel computing is a form of computation in which many operations are carried out.
CS107 References and Arrays By Chris Pable Spring 2009.
Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization.
Lecture 20: Parallelism & Concurrency CS 62 Spring 2013 Kim Bruce & Kevin Coogan CS 62 Spring 2013 Kim Bruce & Kevin Coogan Some slides based on those.
University of Washington What is parallel processing? Spring 2014 Wrap-up When can we execute things in parallel? Parallelism: Use extra resources to solve.
Huseyin YILDIZ Software Design Engineer Microsoft Corporation SESSION CODE: DEV314.
1. 2 Pipelining vs. Parallel processing  In both cases, multiple “things” processed by multiple “functional units” Pipelining: each thing is broken into.
Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3,
Parallel Programming 0024 Mergesort Spring Semester 2010.
FEN 2014UCN Teknologi/act2learn1 Higher order functions Observer Pattern Delegates Events Visitor Pattern Lambdas and closures Lambdas in libraries.
Euro-Par, 2006 ICS 2009 A Translation System for Enabling Data Mining Applications on GPUs Wenjing Ma Gagan Agrawal The Ohio State University ICS 2009.
JAVA COLLECTIONS LIBRARY School of Engineering and Computer Science, Victoria University of Wellington COMP T2, Lecture 2 Thomas Kuehne.
DEV303. Tiny Functions Why Does It Need a Name?
Other news? async and await Anonymous types (var, dynamic) Tuples Object instantiation Extension methods UCN Teknologi/act2learn1FEN 2014.
Page :Algorithms in the Real World Parallelism: Lecture 1 Nested parallelism Cost model Parallel techniques and algorithms
Threads. Thread A basic unit of CPU utilization. An Abstract data type representing an independent flow of control within a process A traditional (or.
JAVA COLLECTIONS LIBRARY School of Engineering and Computer Science, Victoria University of Wellington COMP T2, Lecture 2 Marcus Frean.
CSE373: Data Structures & Algorithms Lecture 22: Parallel Reductions, Maps, and Algorithm Analysis Kevin Quinn Fall 2015.
CS 367 Introduction to Data Structures Lecture 11.
Parallel Computing Presented by Justin Reschke
Concurrency and Performance Based on slides by Henri Casanova.
1/50 University of Turkish Aeronautical Association Computer Engineering Department Ceng 541 Introduction to Parallel Computing Dr. Tansel Dökeroğlu
CS 221 – May 22 Timing (sections 2.6 and 3.6) Speedup Amdahl’s law – What happens if you can’t parallelize everything Complexity Commands to put in your.
Parallel Prefix, Pack, and Sorting. Outline Done: –Simple ways to use parallelism for counting, summing, finding –Analysis of running time and implications.
Today Threading, Cont. Multi-core processing. Java Never Ends! Winter 2016CMPE212 - Prof. McLeod1.
Mergesort example: Merge as we return from recursive calls Merge Divide 1 element 829.
A brief intro to: Parallelism, Threads, and Concurrency
Parallel Software Development with Intel Threading Analysis Tools
Atomic Operations in Hardware
The University of Adelaide, School of Computer Science
Introduction to Parallelism.
Functions CIS 40 – Introduction to Programming in Python
Instructor: Lilian de Greef Quarter: Summer 2017
EE 193: Parallel Computing
CSE373: Data Structures & Algorithms Lecture 27: Parallel Reductions, Maps, and Algorithm Analysis Catie Baker Spring 2015.
Distributed System Gang Wu Spring,2018.
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Multithreading Why & How.
Operating System Overview
CMSC 202 Threads.
Lecture 20 Parallel Programming CSE /8/2019.
Presentation transcript:

Data Parallelism Task Parallel Library (TPL) The use of lambdas Map-Reduce Pattern FEN 20141UCN Teknologi/act2learn

Multi Cores and Multithreading Today it seems that Moore’s law doesn’t apply to processor speed anymore. Recently we haven’t got more CPU speed. Instead we got more cores. Applications should take advantage of this by being multithreaded. Demo: demos\parallel1demos\parallel1 FEN 2014UCN Teknologi/act2learn2

Parallel For-loop Note: – Parallel.For(..) is actually a higher order function taking a lambda (the loop body) as argument. FEN 2014UCN Teknologi/act2learn3

How much can we gain by many cores? Amdahl's law ( Shows maximum speedup factor relative to number of processors. “Parallel portion” is the percentage of the code that can be parallelized. How many cores do you have? (Remember to count your graphics card processors ) FEN 2014UCN Teknologi/act2learn4

From Founder of Amdahl Corporation, now a part of Fujitsu: FEN 2014UCN Teknologi/act2learn5

Number of cores – number of threads? View the task manager while you experiment with noOfThreads Too few or too many threads? demos\parallel2 FEN 2014UCN Teknologi/act2learn6

Parallelism and higher order functions (lambdas) This constructor is actually a higher order function receiving this function as argument. FEN 2014UCN Teknologi/act2learn7

Another example where Parallel.For doesn’t work Summing an array: Parallel.For does it fast, but gets it wrong. What’s the problem? FEN 2014UCN Teknologi/act2learn8 demos\CS-code\SummingParallel_2

It can be fixed: int sum = 0; object lockObj = new object(); Parallel.ForEach( a, () => 0, (x, loopState, partialSum) => { SlowDown(slowFactor); return x + partialSum; }, (localPartialSum) => { lock (lockObj) { sum = sum + localPartialSum; } ); FEN 2014UCN Teknologi/act2learn9 demos\SummingParallelV3 But this is not a simple higher order function !

Parallel Invoke Easy to start many concurrent actions using Parallel.Invoke.Parallel.Invoke Invoke is a higher order function taking an array of Action delegates as argument. FEN 2014UCN Teknologi/act2learn10

Map Reduce Pattern FEN 2014UCN Teknologi/act2learn11 The Map-Reduce pattern from functional programming makes often handling parallelism easier. Among others, it’s used in processing large datasets in a distributed and parallel environment (Google, for instance).

Map Reduce Pattern FEN 2014UCN Teknologi/act2learn12 The idea is to make it easier to parallelize an algorithm (aggregation, for instance) by splitting it into a map phase and a reduce phase. The map-phase doesn’t change the data structure, but creates a new one prepared for the reduce phase. Since the map-phase doesn’t change state it’s easier to parallelize.

Map Reduce Pattern FEN 2014UCN Teknologi/act2learn13 Let’s assume that we have some huge collection of objects and want to find the number of objects with some given property: Map-phase: Create a new collection to contain 0s and 1s, iterate through the collection and add 1 if the current object has the property, add 0 otherwise. – Since the original container isn’t changed and updates in new container take place in different positions this phase is easily parallelized. Reduce-phase: count the number of 1s. In this phase synchronization may be necessary. Lots of objects Map … Reduce Result (aggregation)

Map Reduce Pattern FEN 2014UCN Teknologi/act2learn14 In the container: – Map returns a new Container Func of op Func of f

Map Reduce Pattern FEN 2014UCN Teknologi/act2learn15 Call with lambdas: demos\ParallelMapReduce Two input parameters (Func of op) Recall: a lambda can be a block of code (‘{---}‘)

Map Reduce Pattern FEN 2014UCN Teknologi/act2learn16 Map can be parallelized easily: demos\CS-code\ParallelMapReduce Just use Parallel.For(-)

Many cores cannot beat a good algorithm! FEN 2014UCN Teknologi/act2learn17 \ParallelSort \MergeSort \ParallelSort2(SeqBubble) Speed Up:

Sorting n: no of elements c: no of cores – Sequential bubble sort: O(n 2 ) – Parallel bubble sort: O((n/c) 2 ) – If the problem is O(n 2 ), then the speedup is O(c 2 ) – Mergesort: O(n*log 2 n) FEN 2014UCN Teknologi/act2learn18

Many cores cannot beat a good algorithm! …and it gets worse as n increases: FEN 2014UCN Teknologi/act2learn19 \ParallelSort \MergeSort

…and it gets worse and worse as n increases: FEN 2014UCN Teknologi/act2learn20 And we save a lot of power and help to save the Earth’s climat !

Final remarks In the future, we, as programmers won’t be saved by Moore’s Law. We need to use these many cores in our applications. And do it intelligently. FEN 2014UCN Teknologi/act2learn21