Multicore: Panic or Panacea? Mikko H. Lipasti Associate Professor Electrical and Computer Engineering University of Wisconsin – Madison

Slides:



Advertisements
Similar presentations
“Amdahl's Law in the Multicore Era” Mark Hill and Mike Marty University of Wisconsin IEEE Computer, July 2008 Presented by Dan Sorin.
Advertisements

CSE431 Chapter 7A.1Irwin, PSU, 2008 CSE 431 Computer Architecture Fall 2008 Chapter 7A: Intro to Multiprocessor Systems Mary Jane Irwin (
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
A KTEC Center of Excellence 1 Cooperative Caching for Chip Multiprocessors Jichuan Chang and Gurindar S. Sohi University of Wisconsin-Madison.
Lecture 6: Multicore Systems
Daniel Schall, Volker Höfner, Prof. Dr. Theo Härder TU Kaiserslautern.
Computer Abstractions and Technology
System Simulation Of 1000-cores Heterogeneous SoCs Shivani Raghav Embedded System Laboratory (ESL) Ecole Polytechnique Federale de Lausanne (EPFL)
Introduction Companion slides for
IBM Research Division © 2007 IBM Corporation July 22, 2008 The 50B Transistor Challenge Mikko Lipasti Department of Electrical and Computer Engineering.
An Introduction To PARALLEL PROGRAMMING Ing. Andrea Marongiu
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
Introduction What is Parallel Algorithms? Why Parallel Algorithms? Evolution and Convergence of Parallel Algorithms Fundamental Design Issues.
Introduction Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Modified by Rajeev Alur for CIS 640 at Penn, Spring.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Dr. Gheith Abandah, Chair Computer Engineering Department The University of Jordan 20/4/20091.
ECE/CS 752: Advanced Computer Architecture I
Chapter 18 Multicore Computers
Lecture 2 : Introduction to Multicore Computing Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang University.
ECE/CS 757: Advanced Computer Architecture II
Computer System Architectures Computer System Software
18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013.
Computer Architecture Challenges Shriniwas Gadage.
Last Time Performance Analysis It’s all relative
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
Logic Synthesis for Low Power(CHAPTER 6) 6.1 Introduction 6.2 Power Estimation Techniques 6.3 Power Minimization Techniques 6.4 Summary.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
Grad Student Visit DayUniversity of Wisconsin-Madison Wisconsin Computer Architecture Guri SohiMark HillMikko LipastiDavid WoodKaru Sankaralingam Nam Sung.
COMPUTER ENGINEERING AT THE UNIVERSITY OF WISCONSIN - MADISON.
Chapter 1 Performance & Technology Trends Read Sections 1.5, 1.6, and 1.8.
Dr. Alexandra Fedorova School of Computing Science SFU
University of Washington What is parallel processing? Spring 2014 Wrap-up When can we execute things in parallel? Parallelism: Use extra resources to solve.
1 Wisconsin Computer Architecture Guri SohiMark HillMikko LipastiDavid WoodKaru Sankaralingam Nam Sung Kim.
A few issues on the design of future multicores André Seznec IRISA/INRIA.
1. 2 Pipelining vs. Parallel processing  In both cases, multiple “things” processed by multiple “functional units” Pipelining: each thing is broken into.
Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.
Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? Wasim Shaikh Date: 10/29/2015.
Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
On Transactional Memory, Spinlocks and Database Transactions Khai Q. Tran Spyros Blanas Jeffrey F. Naughton (University of Wisconsin Madison)
ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Evaluation – Metrics, Simulation, and Workloads Copyright 2004 Daniel.
CISC 879 : Advanced Parallel Programming Vaibhav Naidu Dept. of Computer & Information Sciences University of Delaware Dark Silicon and End of Multicore.
Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.
CS203 – Advanced Computer Architecture
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
Chapter 1 Performance & Technology Trends. Outline What is computer architecture? Performance What is performance: latency (response time), throughput.
William Stallings Computer Organization and Architecture 8th Edition
18-447: Computer Architecture Lecture 30B: Multiprocessors
CS5102 High Performance Computer Systems Thread-Level Parallelism
Welcome: Intel Multicore Research Conference
Parallel Processing - introduction
Morgan Kaufmann Publishers
Architecture & Organization 1
Circuits and Interconnects In Aggressively Scaled CMOS
Morgan Kaufmann Publishers
Architecture & Organization 1
James Goodman (emeritus) Mark Hill Nam Kim Mikko Lipasti
Single-Chip Multiprocessors: the Rebirth of Parallel Architecture
Adaptive Single-Chip Multiprocessing
Computer Evolution and Performance
CSC3050 – Computer Architecture
William Stallings Computer Organization and Architecture 8th Edition
William Stallings Computer Organization and Architecture 8th Edition
Presentation transcript:

Multicore: Panic or Panacea? Mikko H. Lipasti Associate Professor Electrical and Computer Engineering University of Wisconsin – Madison

Sep 18, 2007Mikko Lipasti-University of Wisconsin Multicore Mania First, servers IBM Power4, 2001 Then desktops AMD Athlon X2, 2005 Then laptops Intel Core Duo, 2006 Soon, your cellphone ARM MPCore, prototypes for a while now

What is behind this trend? Moore’s Law Chip power consumption Single-thread performance trend [source: Intel] Sep 18, 2007Mikko Lipasti-University of Wisconsin

Dynamic Power Static CMOS: current flows when active Combinational logic evaluates new inputs Flip-flop, latch captures new value (clock edge)‏ Terms C: capacitance of circuit wire length, number and size of transistors V: supply voltage A: activity factor f: frequency Future: Fundamentally power-constrained Sep 18, 2007Mikko Lipasti-University of Wisconsin

Easy answer: Multicore Single CoreDual CoreQuad Core Core areaA~A/2~A/4 Core powerW~W/2~W/4 Chip powerW + OW + O’W + O’’ Core performanceP0.9P0.8P Chip performanceP1.8P3.2P Sep 18, 2007Mikko Lipasti-University of Wisconsin Core

f Amdahl’s Law f – fraction that can run in parallel 1-f – fraction that must run serially Sep 18, 2007Mikko Lipasti-University of Wisconsin Time # CPUs 1 1-f f n

Fixed Chip Power Budget Amdahl’s Law Ignores (power) cost of n cores Revised Amdahl’s Law More cores  each core is slower Parallel speedup < n Serial portion (1-f) takes longer Also, interconnect and scaling overhead Sep 18, 2007Mikko Lipasti-University of Wisconsin # CPUs Time 1 1-f f n

Fixed Power Scaling Fixed power budget forces slow cores Serial code quickly dominates Sep 18, 2007Mikko Lipasti-University of Wisconsin

Predictions and Challenges Parallel scaling limits many-core >4 cores only for well-behaved programs Optimistic about new applications Interconnect overhead Single-thread performance Will degrade unless we innovate Parallel programming Express/extract parallelism in new ways Retrain programming workforce Sep 18, 2007Mikko Lipasti-University of Wisconsin

Research Agenda Programming for parallelism Sources of parallelism New applications, tools, and approaches Single-thread performance and power Most attractive to programmer/user Chip multiprocessor overheads Interconnect, caches, coherence, fairness Sep 18, 2007Mikko Lipasti-University of Wisconsin

Finding Parallelism 1. Functional parallelism Car: {engine, brakes, entertain, nav, …} Game: {physics, logic, UI, render, …} 2. Automatic extraction [UW Multiscalar] Decompose serial programs 3. Data parallelism Vector, matrix, db table, pixels, … 4. Request parallelism Web, shared database, telephony, … Sep 18, 2007Mikko Lipasti-University of Wisconsin

Balancing Work Amdahl’s parallel phase f: all cores busy If not perfectly balanced (1-f) term grows (f not fully parallel) Performance scaling suffers Manageable for data & request parallel apps Very difficult problem for other two: Functional parallelism Automatically extracted Scale power to mismatch [Multiscalar] Sep 18, 2007Mikko Lipasti-University of Wisconsin

Coordinating Work Synchronization Some data somewhere is shared Coordinate/order updates and reads Otherwise  chaos Traditionally: locks and mutual exclusion Hard to get right, even harder to tune for perf. Research: Transactional Memory [UW Multifacet] Programmer: Declare potential conflict Hardware and/or software: speculate & check Commit or roll back and retry Sep 18, 2007Mikko Lipasti-University of Wisconsin

Single-thread Performance Still most attractive source of performance Speeds up parallel and serial phases Can use it to buy back power Must focus on power consumption Performance benefit ≥ Power cost Sep 18, 2007Mikko Lipasti-University of Wisconsin

Single-thread Performance Hardware accelerators and circuits Domain-specific [UW MESA] Reconfigurable [UW Compton] VLSI and design automation [UW WISCAD, Kursun] Increasing frequency Seems prohibitive: clock power Clever clocking schemes can help [UW Pharm] Increasing instruction-level parallelism [UW Multiscalar, UW Pharm, UW Smith] Without blowing power budget Alternatively, reduce power for same performance Sep 18, 2007Mikko Lipasti-University of Wisconsin

Chip Multiprocessor Overheads Core Interconnect [UW Pharm] 80% of chip power [Borkar, ISLPED ‘07 panel] Need fundamentally different approach Revisit circuit switching Cache coherence [UW Multifacet, Pharm] Match workload behavior Optimize for on-chip communication Sep 18, 2007Mikko Lipasti-University of Wisconsin

Chip Multiprocessor Overheads Shared caches [UW Multifacet, Multiscalar, Smith] On-chip memory can be shared Optimize replacement, replication Fairness [UW Smith] Maintain Performance isolation Share resources fairly (memory, caches) Sep 18, 2007Mikko Lipasti-University of Wisconsin

Research UW Sep 18, 2007Mikko Lipasti-University of Wisconsin GroupFacultyURL ComptonKati Compton KursunVolkan Kursun MESAMike Schulte mesa.ece.wisc.edu MultifacetMark Hill, David Wood MultiscalarGuri Sohi PHARMMikko Lipasti SmithJames Smith VerticalKaru Sankaralingam WISCADAzadeh Davoodi

Conclusion Forecast Limited multicore (≤4) is here to stay Manycore (>4) will find its place Hardware Challenges Single-thread performance and power Multicore overhead Software Challenges Finding application parallelism Creating correct parallel programs Creating scalable parallel programs Sep 18, 2007Mikko Lipasti-University of Wisconsin

Questions? Sep 18, 2007Mikko Lipasti-University of Wisconsin