CS61A Lecture 29 Parallel Computing Jom Magrotker UC Berkeley EECS August 7, 2012.

Slides:

Advertisements

Similar presentations

CS61A Lecture 25 Delayed Sequences Jom Magrotker UC Berkeley EECS July 31, 2012.

Advertisements

Concurrency: Deadlock and Starvation Chapter 6. Deadlock Permanent blocking of a set of processes that either compete for system resources or communicate.

CS61A Lecture 8 Data Abstraction Tom Magrino and Jon Kotker UC Berkeley EECS June 28, 2012.

COS Operating System Assignment 4 (Precept 2) Inter-Process Communication and Process management Fall 2004.

CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.

Intro to Computer Org. Pipelining, Part 2 – Data hazards + Stalls.

D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.

Background Concurrent access to shared data can lead to inconsistencies Maintaining data consistency among cooperating processes is critical What is wrong.

Parallel Processing (CS526) Spring 2012(Week 6).  A parallel algorithm is a group of partitioned tasks that work with each other to solve a large problem.

CS61A Lecture 4 Higher Order Functions Tom Magrino and Jon Kotker UC Berkeley EECS June 21, 2012.

1/25 Concurrency and asynchronous computing How do we deal with evaluation when we have a bunch of processors involved?

Threading Part 2 CS221 – 4/22/09. Where We Left Off Simple Threads Program: – Start a worker thread from the Main thread – Worker thread prints messages.

Reference: Message Passing Fundamentals.

Avishai Wool lecture Introduction to Systems Programming Lecture 4 Inter-Process / Inter-Thread Communication.

Computer Science 162 Section 1 CS162 Teaching Staff.

CSC 160 Computer Programming for Non-Majors Lecture #7: Variables Revisited Prof. Adam M. Wittenstein

1 Organization of Programming Languages-Cheng (Fall 2004) Concurrency u A PROCESS or THREAD:is a potentially-active execution context. Classic von Neumann.

Race Conditions CS550 Operating Systems. Review So far, we have discussed Processes and Threads and talked about multithreading and MPI processes by example.

A. Frank - P. Weisberg Operating Systems Introduction to Cooperating Processes.

Code Generation CS 480. Can be complex To do a good job of teaching about code generation I could easily spend ten weeks But, don’t have ten weeks, so.

CS61A Lecture 28 Distributed Computing Jom Magrotker UC Berkeley EECS August 6, 2012.

CS61A Lecture 14 Object-Oriented Programming Jom Magrotker UC Berkeley EECS July 11, 2012.

CS61A Lecture 2 Functions and Applicative Model of Computation Tom Magrino and Jon Kotker UC Berkeley EECS June 19, 2012.

Concurrency Recitation – 2/24 Nisarg Raval Slides by Prof. Landon Cox.

10/04/2011CS4961 CS4961 Parallel Programming Lecture 12: Advanced Synchronization (Pthreads) Mary Hall October 4, 2011.

Nachos Phase 1 Code -Hints and Comments

Chapter 5 Being a Web App. Very few servlet or JSP stands alone Many times in our application, different servlets or JSPs need to share information 

CS61A Lecture 21 Scheme Jom Magrotker UC Berkeley EECS July 24, 2012.

Java Data Types Data types, variable declaration, and initialization.

CS61A Lecture 13 Object-Oriented Programming Jom Magrotker UC Berkeley EECS July 10, 2012.

Games Development 2 Concurrent Programming CO3301 Week 9.

1 Announcements The fixing the bug part of Lab 4’s assignment 2 is now considered extra credit. Comments for the code should be on the parts you wrote.

Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to.

Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.

CS61A Lecture 12 Immutable Trees Jom Magrotker UC Berkeley EECS July 9, 2012.

Welcome to CS61A Disc. 29/47 :D Dickson Tsai OH: Tu, Th 4-5pm 411 Soda Previous stop: None >>> Today: Working effectively in.

Kernel Locking Techniques by Robert Love presented by Scott Price.

CS61A Lecture 23 Py Jom Magrotker UC Berkeley EECS July 26, 2012.

CS61A Lecture 27 Logic Programming Jom Magrotker UC Berkeley EECS August 2, 2012.

CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.

1 Condition Variables CS 241 Prof. Brighten Godfrey March 16, 2012 University of Illinois.

CS61A Lecture 20 Object-Oriented Programming: Implementation Jom Magrotker UC Berkeley EECS July 23, 2012.

13-1 Chapter 13 Concurrency Topics Introduction Introduction to Subprogram-Level Concurrency Semaphores Monitors Message Passing Java Threads C# Threads.

Fall 2008Programming Development Techniques 1 Topic 20 Concurrency Section 3.4.

CS61A Lecture 9 Immutable Data Structures Jom Magrotker UC Berkeley EECS July 2, 2012.

C H A P T E R E L E V E N Concurrent Programming Programming Languages – Principles and Paradigms by Allen Tucker, Robert Noonan.

Lecture 12 Page 1 CS 111 Online Using Devices and Their Drivers Practical use issues Achieving good performance in driver use.

Implementing Lock. From the Previous Lecture  The “too much milk” example shows that writing concurrent programs directly with load and store instructions.

CSCI1600: Embedded and Real Time Software Lecture 17: Concurrent Programming Steven Reiss, Fall 2015.

CS61A Lecture 10 Immutable Data Structures Jom Magrotker UC Berkeley EECS July 3, 2012.

CS61A Lecture 11 Immutable Trees Jom Magrotker UC Berkeley EECS July 5, 2012.

CS61A Lecture 1 Introduction and Basics Tom Magrino and Jon Kotker UC Berkeley EECS June 18, 2012.

Today… Modularity, or Writing Functions. Winter 2016CISC101 - Prof. McLeod1.

Implementing Mutual Exclusion Andy Wang Operating Systems COP 4610 / CGS 5765.

Lecture 5 Page 1 CS 111 Summer 2013 Bounded Buffers A higher level abstraction than shared domains or simple messages But not quite as high level as RPC.

By: Antonio Vazquez.  As far as this year goes, there were a lot of struggles that I had this year, I can’t really explain why, they just occurred. 

Remote Procedure Calls

CS703 – Advanced Operating Systems

Background on the need for Synchronization

Atomic Operations in Hardware

Atomic Operations in Hardware

Recitation 2: Synchronization, Shared memory, Matrix Transpose

Background and Motivation

Coordination Lecture 5.

Implementing Mutual Exclusion

CSCI1600: Embedded and Real Time Software

Implementing Mutual Exclusion

CSE 153 Design of Operating Systems Winter 19

CS333 Intro to Operating Systems

CSCI1600: Embedded and Real Time Software

Presentation transcript:

CS61A Lecture 29 Parallel Computing Jom Magrotker UC Berkeley EECS August 7, 2012

2 C OMPUTER S CIENCE IN THE N EWS

3 T ODAY Parallel Computing – Problems that can occur – Ways of avoiding problems

4 S O F AR functions data structures objects abstraction interpretation evaluation One Program One Computer

5 Y ESTERDAY & T ODAY Multiple Programs! – On Multiple Computers (Networked and Distributed Computing) – On One Computer (Concurrency and Parallelism)

6 Y ESTERDAY : D ISTRIBUTED C OMPUTING Programs on different computers working together towards a goal. – Information Sharing & Communication Ex. Skype, The World Wide Web, Cell Networks – Large Scale Computing Ex. “Cloud Computing” and Map-Reduce

7 R EVIEW : D ISTRIBUTED C OMPUTING Architecture – Client-Server – Peer-to-Peer Message Passing Design Principles – Modularity – Interfaces

8 T ODAY : P ARALLEL C OMPUTATION Simple Idea: Have more than one piece of code running at the same time. Question is: why bother?

9 P ARALLEL C OMPUTATION : W HY ? Primarily: Speed. The majority of computers have multiple processors, meaning that we can actually have two (or more) pieces of code running at the same time!

10 T HE L OAD -C OMPUTE -S TORE M ODEL For today, we will be using the following model for computation. x = x * 2 1.Load the variable(s) 2.Compute the right hand side 3.Store the result into the variable.

11 T HE L OAD -C OMPUTE -S TORE M ODEL 1.Load the variable(s) 2.Compute the right hand side 3.Store the result into the variable. x = x * 2 1.Load up x: x -> 5 2.Compute x * 2: 10 3.Store the new value of: x <- 10 x: 5 Memory x: 10 “x is 5”“x * 2 is 10”

12 A NNOUNCEMENTS Project 4 due Today. – Partnered project, in two parts. – Twelve questions, so please start early! – Two extra credit questions. Homework 14 due Today. – Now includes contest voting. – Assignment is short. Tomorrow at the end of lecture there is a course survey, please attend!

13 A NNOUNCEMENTS : F INAL Final is Thursday, August 9. – Where? 1 Pimentel. – When? 6PM to 9PM. – How much? All of the material in the course, from June 18 to August 8, will be tested. Closed book and closed electronic devices. One 8.5” x 11” ‘cheat sheet’ allowed. No group portion. We have ed you if you have conflicts and have told us. If you haven’t told us yet, please let us know by yesterday. We’ll you the room later today. Final review session 2 Tonight, from 8pm to 10pm in the HP Auditorium (306 Soda).

Parallel Example Thread 1 x = x * 2 L1: Load X C1: Compute (X * 2) S1: Store (X * 2) -> X Thread 2 y = y + 1 L2: Load Y C2: Compute (Y + 1) S2: Store (Y + 1) -> Y x: 10 y: 3 x: 10 y: 3 Memory Idea is that we perform the steps of each thread together, interleaving them in any way we want. Thread 1: L1, C1, S1 Thread 2: L2, C2, S2 x: 20 y: 4 x: 20 y: 4

Parallel Example Thread 1 x = x * 2 Thread 2 y = y + 1 x: 10 y: 3 x: 10 y: 3 Memory Idea is that we perform the steps of each thread together, interleaving them in any way we want. Thread 1: L1, C1, S1 Thread 2: L2, C2, S2 x: 20 y: 4 x: 20 y: 4 L1, C1, S1, L2, C2, S2 OR L1, L2, C1, S1, C2, S2 OR L1, L2, C1, C2, S1, S2 OR...

16 S O W HAT ? Okay, what’s the point? IDEALLY: It shouldn’t matter what different “shuffling” of the steps we do, the end result should be the same. BUT, what if two threads are using the same data?

Parallel Example Thread 1 x = x * 2 L1: Load X C1: Compute (X * 2) S1: Store (X * 2) -> X Thread 2 x = x + 1 L2: Load X C2: Compute (X + 1) S2: Store (X + 1) -> X x: 10 Memory Idea is that we perform the steps of each thread together, interleaving them in any way we want. Thread 1: L1, C1, S1 Thread 2: L2, C2, S2 x: 21 or x: 22 x: 21 or x: 22

Parallel Example Thread 1 x = x * 2 L1: Load X C1: Compute (X * 2) S1: Store (X * 2) -> X Thread 2 x = x + 1 L2: Load X C2: Compute (X + 1) S2: Store (X + 1) -> X x: 10 Memory Thread 1: L1, C1, S1 Thread 2: L2, C2, S2 L1, C1, L2, C2, S2, S1 “X is 10”“x * 2 is 20”“X is 10”“x + 1 is 11” x: 11 x: 20

19 OH NOES!!1! ONE We got a wrong answer! This is one of the dangers of using parallelism in a program! What happened here is that we ran into the problem of non-atomic (multi-step) operations. A thread could be interrupted by another and result in incorrect behavior! So, smarty pants, how do we fix it?

20 P RACTICE : P ARALLELISM Suppose we initialize the value z to 3. Given the two threads, which are run at the same time, what are all the possible values of z after they run? z = z * 33 y = z + 0 z = z + y y = z + 0 z = z + y

21 P RACTICE : P ARALLELISM Suppose we initialize the value z to 3. Given the two threads, which are run at the same time, what are all the possible values of z after they run? 99, 6, 198, 102 z = z * 33 y = z + 0 z = z + y y = z + 0 z = z + y

22 B REAK

23 L OCKS We need a way to make sure that only one person messes with a piece of data at a time! from Threading import Lock x_lock = Lock() x_lock.acquire() x = x * 2 x_lock.release() x_lock.acquire() x = x * 2 x_lock.release() THREAD 1 x_lock.acquire() x = x + 1 x_lock.release() x_lock.acquire() x = x + 1 x_lock.release() THREAD 2 Try to get exclusive rights to the lock. If succeeds, keep working. Otherwise, wait for lock to be released. Give up your exclusive rights so someone else can take their turn. Lock methods are atomic, meaning that we don’t need to worry about someone interrupting us in the middle of an acquire or release.

24 S OUNDS G REAT ! “With great power comes great responsibility!” Now only one thread will manipulate the value x at a time if we always wrap code touching x in a lock acquire and release. So our code works as intended! Now only one thread will manipulate the value x at a time if we always wrap code touching x in a lock acquire and release. So our code works as intended! BUT

25 M ISUSING O UR P OWER x_lock1.acquire() x = x * 2 x_lock1.release() x_lock1.acquire() x = x * 2 x_lock1.release() THREAD 1 from threading import Lock x_lock1 = Lock() x_lock2 = Lock() x_lock2.acquire() x = x + 1 x_lock2.release() x_lock2.acquire() x = x + 1 x_lock2.release() THREAD 2 Won’t work! They aren’t being locked out using the same lock, we just went back to square 1.

26 M ISUSING O UR P OWER x_lock.acquire() LONG_COMPUTATION() x = x * 2 x_lock.release() x_lock.acquire() LONG_COMPUTATION() x = x * 2 x_lock.release() THREAD 1 from threading import Lock x_lock = Lock() x_lock.acquire() x = x + 1 x_lock.release() x_lock.acquire() x = x + 1 x_lock.release() THREAD 2 If LONG_COMPUTATION doesn’t need x, we just caused thread 2 to have to wait a LONG time for a bunch of work that could have happened in parallel. This is inefficient!

27 M ISUSING O UR P OWER start_turn(THREAD_NUM) x = x * 2 x_lock.release() start_turn(THREAD_NUM) x = x * 2 x_lock.release() THREAD 1 from threading import Lock x_lock = Lock() def start_turn(num): sleep(num) x_lock.acquire() start_turn(THREAD_NUM) x = x + 1 x_lock.release() start_turn(THREAD_NUM) x = x + 1 x_lock.release() THREAD 2 If we didn’t actually want thread 2 to be less likely to get access to x first each time, then this is an unfair solution that favors the first (lowered numbered) threads. This example is a bit contrived, but this does happen! It usually takes a bit of code for it to pop up, however.

28 M ISUSING O UR P OWER from threading import Lock x_lock = Lock() y_lock = Lock() What if thread 1 acquires the x_lock and thread 2 acquires the y_lock? Now nobody can do work! This is called deadlock. x_lock.acquire() y_lock.acquire() x, y = 2 * y, 22 y_lock.release() x_lock.release() x_lock.acquire() y_lock.acquire() x, y = 2 * y, 22 y_lock.release() x_lock.release() THREAD 1 y_lock.acquire() x_lock.acquire() y, x = 2 * x, 22 x_lock.release() y_lock.release() y_lock.acquire() x_lock.acquire() y, x = 2 * x, 22 x_lock.release() y_lock.release() THREAD 2

29 T HE 4 T YPES OF P ROBLEMS In general, you can classify the types of problems you see from parallel code into 4 groups: – Incorrect Results – Inefficiency – Unfairness – Deadlock

30 H OW D O W E A VOID I SSUES ? Honestly, there isn’t a one-size-fits-all solution. You have to be careful and think hard about what you’re doing with your code. In later courses (CS162), you learn common conventions that help avoid these issues, but there’s still the possibility that problems will occur!

31 A NOTHER T OOL : S EMAPHORES Locks are just a really basic tool available for managing parallel code. There’s a LARGE variety of tools out there that you might encounter. For now, we’re going to quickly look at one more classic: – Semaphores

32 A NOTHER T OOL : S EMAPHORES What if I want to allow only up to a certain number of threads manipulate the same piece of data at the same time? fruit_bowl = [“banana”, “banana”, “banana”] def eat_thread(): fruit_bowl.pop() print(“ate a banana!”) def buy_thread(): fruit_bowl.append(“banana”) print(“bought a banana”) A Semaphore is a classic tool for this situation!

33 A NOTHER T OOL : S EMAPHORES from threading import Semaphore fruit_bowl = [“banana”, “banana”, “banana”] fruit_sem = Semaphore(len(fruit_bowl)) #3 def eat_thread(): fruit_sem.acquire() fruit_bowl.pop() print(“ate a banana!”) def buy_thread(): fruit_bowl.append(“banana”) print(“bought a banana”) fruit_sem.release() Increments the count Decrements the counter. If the counter is 0, wait until someone increments it before subtracting 1 and moving on.

34 P RACTICE : W HAT C OULD G O W RONG ? What, if anything, is wrong with the code below? from threading import Lock x_lock = Lock() y_lock = Lock() def thread1(): global x, y x_lock.acquire() if x % 2 == 0: y_lock.acquire() x = x + y y_lock.release() x_lock.release() def thread1(): global x, y x_lock.acquire() if x % 2 == 0: y_lock.acquire() x = x + y y_lock.release() x_lock.release() def thread2(): global x, y y_lock.acquire() for i in range(50): y = y + i x_lock.acquire() print(x + y) y_lock.release() x_lock.release() def thread2(): global x, y y_lock.acquire() for i in range(50): y = y + i x_lock.acquire() print(x + y) y_lock.release() x_lock.release()

35 P RACTICE : W HAT C OULD G O W RONG ? What, if anything, is wrong with the code below? from threading import Lock the_lock = Lock() def thread2(): global x the_lock.acquire() for i in range(50): x = x + i print(i) the_lock.release() def thread2(): global x the_lock.acquire() for i in range(50): x = x + i print(i) the_lock.release() def thread1(): global x the_lock.acquire() x = fib(5000) the_lock.release() def thread1(): global x the_lock.acquire() x = fib(5000) the_lock.release()

36 C ONCLUSION Parallelism in code is important for making efficient code! Problems arise when we start manipulating data shared by multiple threads. We can use locks or semaphores to avoid issues of incorrect results. There are 4 main problems that can occur when writing parallel code. Preview: A Powerful Pattern for Distributed Computing, MapReduce

37 x_serializer = make_serializer() x = def thread1(): global x x = x * def thread2(): global x x = x E XTRAS : S ERIALIZERS Honestly, this is just a cool way to use locks with functions. def make_serializer(): serializer_lock = Lock() def serialize(fn): def serialized(*args): serializer_lock.acquire() result = fn(*args) serializer_lock.release() return result return serialized return serialize