CS10 The Beauty and Joy of Computing Lecture #8 : Concurrency 2010-09-29 Nvidia CEO Jen-Hsun Huang said: “whatever capability you think you have today,

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

TO COMPUTERS WITH BASIC CONCEPTS Lecturer: Mohamed-Nur Hussein Abdullahi Hame WEEK 1 M. Sc in CSE (Daffodil International University)
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
Computer Abstractions and Technology
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Extending the Unified Parallel Processing Speedup Model Computer architectures take advantage of low-level parallelism: multiple pipelines The next generations.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Multicore & Parallel Processing P&H Chapter ,
1 CS 3410 Computer System Organization and Programming K. Walsh TAs: Deniz Altinbuken Hussam Abu-Libdeh Consultants: Adam Sorrin Arseney Romanenko.
Fall 2002EECS150 - Lec02-CMOS Page 1 EECS150 - Digital Design Lecture 2 - CMOS August 27, 2003 by Mustafa Ergen Lecture Notes: John Wawrzynek.
CS61C L23 Synchronous Digital Systems (1) Garcia, Fall 2011 © UCB Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C.
Chapter Hardwired vs Microprogrammed Control Multithreading
Spring 2002EECS150 - Lec02-CMOS Page 1 EECS150 - Digital Design Lecture 2 - CMOS January 24, 2002 John Wawrzynek.
CS10 The Beauty and Joy of Computing Lecture #8 : Concurrency Prof Jonathan Koomey looked at 6 decades of data and found that energy efficiency.
Old-Fashioned Mud-Slinging with Flash
CIS 314 : Computer Organization Lecture 1 – Introduction.
CS10 The Beauty and Joy of Computing Lecture #8 : Concurrency IBM’s Watson computer (really 2,800 cores) is leading former champions $35,734.
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
Lecture 2 : Introduction to Multicore Computing Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang University.
Computer System Architectures Computer System Software
CS 61C L01 Introduction (1) Garcia, Spring 2004 © UCB Lecturer PSOE Dan Garcia CS61C www page www-inst.eecs.berkeley.edu/~cs61c/
Computing hardware CPU.
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
Games Development 2 Concurrent Programming CO3301 Week 9.
Parallel Processing Steve Terpe CS 147. Overview What is Parallel Processing What is Parallel Processing Parallel Processing in Nature Parallel Processing.
Outline  Over view  Design  Performance  Advantages and disadvantages  Examples  Conclusion  Bibliography.
Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to.
Computer Organization & Assembly Language © by DR. M. Amer.
Lecture 20: Parallelism & Concurrency CS 62 Spring 2013 Kim Bruce & Kevin Coogan CS 62 Spring 2013 Kim Bruce & Kevin Coogan Some slides based on those.
Classic Model of Parallel Processing
CS162 Week 5 Kyle Dewey. Overview Announcements Reactive Imperative Programming Parallelism Software transactional memory.
1. 2 Pipelining vs. Parallel processing  In both cases, multiple “things” processed by multiple “functional units” Pipelining: each thing is broken into.
Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.
Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
The Beauty and Joy of Computing Lecture #10 Concurrency UC Berkeley EECS Lecturer Gerald Friedland Prof Jonathan Koomey looked at 6 decades of data and.
Computer Organization Yasser F. O. Mohammad 1. 2 Lecture 1: Introduction Today’s topics:  Why computer organization is important  Logistics  Modern.
University of Washington Today Quick review? Parallelism Wrap-up 
Concurrency and Performance Based on slides by Henri Casanova.
Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.
Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.
CS203 – Advanced Computer Architecture
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
Computer Organization IS F242. Course Objective It aims at understanding and appreciating the computing system’s functional components, their characteristics,
CS10 The Beauty and Joy of Computing Lecture #4 : Functions UC Berkeley EECS Lecturer SOE Dan Garcia Researchers at Microsoft and UW are working.
History of Computers and Performance David Monismith Jan. 14, 2015 Based on notes from Dr. Bill Siever and from the Patterson and Hennessy Text.
UC Berkeley EECS Sr Lecturer SOE Dan Garcia Valve (video game makers of Half-Life) believes the future of video games may not be in the input device (ala.
What’s going on here? Can you think of a generic way to describe both of these?
Conclusions on CS3014 David Gregg Department of Computer Science
CS203 – Advanced Computer Architecture
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Chapter 2.1 CPU.
CIT 668: System Architecture
Stateless Combinational Logic and State Circuits
At a recent fireside chat…
The Beauty and Joy of Computing Lecture #8 : Concurrency
Morgan Kaufmann Publishers
Architecture & Organization 1
What happens inside a CPU?
Architecture & Organization 1
Parallel Processing Sharing the load.
Chapter 1 Introduction.
Welcome to Architectures of Digital Systems
The University of Adelaide, School of Computer Science
Lecture 20 Parallel Programming CSE /27/2019.
EE 155 / Comp 122 Parallel Computing
Presentation transcript:

CS10 The Beauty and Joy of Computing Lecture #8 : Concurrency Nvidia CEO Jen-Hsun Huang said: “whatever capability you think you have today, it’s nothing compared to what you’re going to have in a couple of years … due to supercomputers in the cloud”. Now, cloud computing does what you could do on your PC. Imagine 40,000 times that! UC Berkeley EECS Lecturer SOE Dan Garcia

UC Berkeley CS10 “The Beauty and Joy of Computing” : Concurrency (2) Garcia, Fall 2010 Knowledge is recognizing what you know and what you don't. I hear and I forget. I see and I remember. I do and I understand. Happy Confucius Day!

UC Berkeley CS10 “The Beauty and Joy of Computing” : Concurrency (3) Garcia, Fall 2010 Intra-computer  Today’s lecture  Multiple computing “helpers” are cores within one machine  Aka “multi-core”  Although GPU parallism is also “intra- computer” Inter-computer  Week 12’s lectures  Multiple computing “helpers” are different machines  Aka “distributed computing”  Grid & cluster computing Concurrency & Parallelism, 10 mi up…

UC Berkeley CS10 “The Beauty and Joy of Computing” : Concurrency (4) Garcia, Fall 2010 Anatomy: 5 components of any Computer

UC Berkeley CS10 “The Beauty and Joy of Computing” : Concurrency (5) Garcia, Fall 2010 Anatomy: 5 components of any Computer Computer Memory Devices Input Output John von Neumann invented this architecture Processor Control (“brain”) Datapath (“brawn”) What causes the most headaches for SW and HW designers with multi-core computing? a)Control b)Datapath c)Memory d)Input e)Output

UC Berkeley CS10 “The Beauty and Joy of Computing” : Concurrency (6) Garcia, Fall 2010 Processor Control (“brain”) Datapath (“brawn”) But what is INSIDE a Processor?

UC Berkeley CS10 “The Beauty and Joy of Computing” : Concurrency (7) Garcia, Fall 2010 But what is INSIDE a Processor? Primarily Crystalline Silicon 1 mm – 25 mm on a side 2009 “feature size” (aka process) ~ 45 nm = 45 x m (then 32, 22, and 16 [by yr 2013]) M transistors conductive layers “CMOS” (complementary metal oxide semiconductor) - most common Package provides: spreading of chip-level signal paths to board-level heat dissipation. Ceramic or plastic with gold wires. Chip in Package Bare Processor Die Processor Control (“brain”) Datapath (“brawn”)

UC Berkeley CS10 “The Beauty and Joy of Computing” : Concurrency (8) Garcia, Fall 2010 Moore’s Law Predicts: 2X Transistors / chip every 2 years Gordon Moore Intel Cofounder B.S. Cal 1950! Year # of transistors on an integrated circuit (IC) en.wikipedia.org/wiki/Moore's_law What is this “curve”? a) Constant b) Linear c) Quadratic d) Cubic e) Exponential

UC Berkeley CS10 “The Beauty and Joy of Computing” : Concurrency (9) Garcia, Fall 2010 Moore’s Law and related curves

UC Berkeley CS10 “The Beauty and Joy of Computing” : Concurrency (10) Garcia, Fall 2010 Moore’s Law and related curves

UC Berkeley CS10 “The Beauty and Joy of Computing” : Concurrency (11) Garcia, Fall 2010 Power Density Prediction circa 2000 Core 2

UC Berkeley CS10 “The Beauty and Joy of Computing” : Concurrency (12) Garcia, Fall 2010 Going Multi-core Helps Energy Efficiency  Power of typical integrated circuit ~ C V 2 f  C = Capacitance, how well it “stores” a charge  V = Voltage  f = frequency. I.e., how fast clock is (e.g., 3 GHz) William Holt, HOT Chips 2005 Activity Monitor (on the lab Macs) shows how active your cores are

UC Berkeley CS10 “The Beauty and Joy of Computing” : Concurrency (13) Garcia, Fall 2010 Energy & Power Considerations Courtesy: Chris Batten

UC Berkeley CS10 “The Beauty and Joy of Computing” : Concurrency (14) Garcia, Fall 2010 Parallelism again? What’s different this time? “This shift toward increasing parallelism is not a triumphant stride forward based on breakthroughs in novel software and architectures for parallelism; instead, this plunge into parallelism is actually a retreat from even greater challenges that thwart efficient silicon implementation of traditional uniprocessor architectures.” – Berkeley View, December 2006  HW/SW Industry bet its future that breakthroughs will appear before it’s too late view.eecs.berkeley.edu

UC Berkeley CS10 “The Beauty and Joy of Computing” : Concurrency (15) Garcia, Fall 2010  A Thread stands for “thread of execution”, is a single stream of instructions  A program / process can split, or fork itself into separate threads, which can (in theory) execute simultaneously.  An easy way to describe/think about parallelism  A single CPU can execute many threads by Time Division Multipexing  Multithreading is running multiple threads through the same hardware CPU Time Thread 0 Thread 1 Thread 2 Background: Threads

UC Berkeley CS10 “The Beauty and Joy of Computing” : Concurrency (16) Garcia, Fall 2010 Applications can almost never be completely parallelized; some serial code remains s is serial fraction of program, P is # of cores (was processors) Amdahl’s law: Speedup(P) = Time(1) / Time(P) ≤ 1 / ( s + [ (1-s) / P) ], and as P  ∞ ≤ 1 / s Even if the parallel portion of your application speeds up perfectly, your performance may be limited by the sequential portion Speedup Issues : Amdahl’s Law Time Number of Cores Parallel portion Serial portion en.wikipedia.org/wiki/Amdahl's_law

UC Berkeley CS10 “The Beauty and Joy of Computing” : Concurrency (17) Garcia, Fall 2010 Speedup Issues : Overhead  Even assuming no sequential portion, there’s…  Time to think how to divide the problem up  Time to hand out small “work units” to workers  All workers may not work equally fast  Some workers may fail  There may be contention for shared resources  Workers could overwriting each others’ answers  You may have to wait until the last worker returns to proceed (the slowest / weakest link problem)  There’s time to put the data back together in a way that looks as if it were done by one

UC Berkeley CS10 “The Beauty and Joy of Computing” : Concurrency (18) Garcia, Fall 2010  This “sea change” to multi- core parallelism means that the computing community has to rethink: a) Languages b) Architectures c) Algorithms d) Data Structures e) All of the above Life in a multi-core world…

UC Berkeley CS10 “The Beauty and Joy of Computing” : Concurrency (19) Garcia, Fall 2010  What if two people were calling withdraw at the same time?  E.g., balance=100 and two withdraw 75 each  Can anyone see what the problem could be?  This is a race condition  In most languages, this is a problem.  In Scratch, the system doesn’t let two of these run at once. But parallel programming is hard! en.wikipedia.org/wiki/Concurrent_computing

UC Berkeley CS10 “The Beauty and Joy of Computing” : Concurrency (20) Garcia, Fall 2010  Two people need to draw a graph but there is only one pencil and one ruler.  One grabs the pencil  One grabs the ruler  Neither release what they hold, waiting for the other to release  Livelock also possible  Movement, no progress  Dan and Luke demo Another concurrency problem … deadlock! en.wikipedia.org/wiki/Deadlock

UC Berkeley CS10 “The Beauty and Joy of Computing” : Concurrency (21) Garcia, Fall 2010  “Sea change” of computing because of inability to cool CPUs means we’re now in multi-core world  This brave new world offers lots of potential for innovation by computing professionals, but challenges persist Summary