“Amdahl's Law in the Multicore Era” Mark Hill and Mike Marty University of Wisconsin IEEE Computer, July 2008 Presented by Dan Sorin.

Slides:

Advertisements

Similar presentations

Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6.

Advertisements

Introduction Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit TexPoint fonts used in EMF. Read the TexPoint manual.

Robert Colwell R&E Colwell & Assoc. Inc. Thoughts on Comp Arch Futures Why does this guy look so worried? Oh.

Programming exercises: Angel – lms.wsu.edu – Submit via zip or tar – Write-up, Results, Code Doodle: class presentations Student Responses First visit.

Distributed Systems CS

(C) 2001 Daniel Sorin Correctly Implementing Value Prediction in Microprocessors that Support Multithreading or Multiprocessing Milo M.K. Martin, Daniel.

PradeepKumar S K Asst. Professor Dept. of ECE, KIT, TIPTUR. PradeepKumar S K, Asst.

La microarchitecture est morte. Longue vie à la microarchitecture! ISCA 2010 Panel, St. Malo, France (Microarchitecture is dead. Long live microarchitecture!)

Lecturer: Simon Winberg Lecture 18 Amdahl’s Law & YODA Blog & Design Review.

University of Wisconsin-Madison © 2008 Multifacet Project Amdahl’s Law in the Multicore Era Mark D. Hill and Michael R. Marty University of Wisconsin—Madison.

- Sam Ganzfried - Ryan Sukauye - Aniket Ponkshe. Outline Effects of asymmetry and how to handle them Design Space Exploration for Core Architecture Accelerating.

An Introduction To PARALLEL PROGRAMMING Ing. Andrea Marongiu

Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Quantitative.

Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scalability 36th International Symposium on Computer Architecture Brian Rogers †‡, Anil Krishna.

Introduction What is Parallel Algorithms? Why Parallel Algorithms? Evolution and Convergence of Parallel Algorithms Fundamental Design Issues.

Estimating Multimedia Instruction Performance Based on Workload Characterization and Measurement Gheewala, A.; Peir, J.-K.; Yen-Kuang Chen; Lai, K.; IEEE.

Introduction Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Modified by Rajeev Alur for CIS 640 at Penn, Spring.

CPE 731 Advanced Computer Architecture Multiprocessor Introduction

Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.

“Evaluating MapReduce for Multi-core and Multiprocessor Systems” Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, Christos Kozyrakis Computer.

Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.

Computer System Architectures Computer System Software

University of Wisconsin-Madison © 2008 Multifacet Project Amdahl’s Law in the Multicore Era Mark D. Hill and Michael R. Marty Univ. of Wisconsin—Madison.

18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013.

1 The Performance Potential for Single Application Heterogeneous Systems Henry Wong* and Tor M. Aamodt § *University of Toronto § University of British.

Performance Evaluation of Parallel Processing. Why Performance?

Lab: Introduction to Loop Transformations Tomofumi Yuki EJCP 2015 June 22, Nancy.

Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of.

INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.

Amdahl’s Law in the Multicore Era Mark D.Hill & Michael R.Marty 2008 ECE 259 / CPS 221 Advanced Computer Architecture II Presenter : Tae Jun Ham 2012.

EECE 571R (Spring 2009) Massively parallel/distributed platforms Matei Ripeanu matei at ece.ubc.ca.

Agenda Project discussion Modeling Critical Sections in Amdahl's Law and its Implications for Multicore Design, S. Eyerman, L. Eeckhout, ISCA'10 [pdf]pdf.

Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.

Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.

Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)

Lecturer: Simon Winberg Lecture 18 Amdahl’s Law (+- 25 min)

Parallel Processing Steve Terpe CS 147. Overview What is Parallel Processing What is Parallel Processing Parallel Processing in Nature Parallel Processing.

CISC 879 : Advanced Parallel Programming Vaibhav Naidu Dept. of Computer & Information Sciences University of Delaware Importance of Single-core in Multicore.

Chapter 1 Performance & Technology Trends Read Sections 1.5, 1.6, and 1.8.

Dean Tullsen UCSD.  The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive.

A few issues on the design of future multicores André Seznec IRISA/INRIA.

1. 2 Pipelining vs. Parallel processing  In both cases, multiple “things” processed by multiple “functional units” Pipelining: each thing is broken into.

Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.

Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? Wasim Shaikh Date: 10/29/2015.

1Thu D. NguyenCS 545: Distributed Systems CS 545: Distributed Systems Spring 2002 Communication Medium Thu D. Nguyen

Advanced Computer Networks Lecture 1 - Parallelization 1.

Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,

Multicore: Panic or Panacea? Mikko H. Lipasti Associate Professor Electrical and Computer Engineering University of Wisconsin – Madison

ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Evaluation – Metrics, Simulation, and Workloads Copyright 2004 Daniel.

Computer Science 320 Measuring Sizeup. Speedup vs Sizeup If we add more processors, we should be able to solve a problem of a given size faster If we.

Concurrency and Performance Based on slides by Henri Casanova.

LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?

1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.

740: Computer Architecture Memory Consistency Prof. Onur Mutlu Carnegie Mellon University.

Concurrency Idea. 2 Concurrency idea Challenge –Print primes from 1 to Given –Ten-processor multiprocessor –One thread per processor Goal –Get ten-fold.

A Case for Redundant Arrays of Inexpensive Disks (RAID) -1988

Lab: Introduction to Loop Transformations

Lynn Choi School of Electrical Engineering

18-447: Computer Architecture Lecture 30B: Multiprocessors

Computer Architecture: Parallel Processing Basics

Resource Aware Scheduler – Initial Results

Lab: Introduction to Loop Transformations

EE 193: Parallel Computing

PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.

Chapter 4 Multiprocessors

COMP60611 Fundamentals of Parallel and Distributed Systems

Tim Harris (MSR Cambridge)

Presented by Eric Wheeler

Presentation transcript:

“Amdahl's Law in the Multicore Era” Mark Hill and Mike Marty University of Wisconsin IEEE Computer, July 2008 Presented by Dan Sorin

2ECE 259 / CPS 221 Introduction Multicore is here  architects need to cope Time to re-visit Amdahl’s Law Speedup = 1/ [(1-f) + f/s] f = fraction of computation that’s parallel s = speedup on parallel fraction Goal of paper is to gain insights –Not actually a “research paper”, per se

3ECE 259 / CPS 221 System Model & Assumptions Chip contains fixed number, say N, of “base core equivalents” (BCEs) Can construct more powerful cores by fusing BCEs –Performance of core is function of number of BCEs it uses –Perf(1) < Perf (R) < R –In paper, assume Perf(R) = sqrt(R) –Why doesn’t Perf(R) = R? Homogeneous vs heterogeneous cores –Homogeneous: N/R cores per chip –Heterogeneous: 1 + (N-R) cores per chip Rest of paper ignores/abstracts many issues –Shared caches (L2 and beyond), interconnection network

4ECE 259 / CPS 221 Homogeneous Cores Reminder: N/R cores per chip Data in Figures 2a & 2b shows: –Speedups are often depressingly low, especially for large R –Even for large values of f, speedups are low What’s intuition behind results? –For small R, chip performs poorly on sequential code –For large R, chip performs poorly on parallel code

5ECE 259 / CPS 221 Heterogeneous Cores Reminder: 1 big core + (N-R) minimal cores per chip Data in Figures 2c & 2d shows: –Speedups are much better than for homogeneous cores –But still not doing great on parallel code What’s intuition behind results? –For large f, can’t make good use of big core

6ECE 259 / CPS 221 Somewhat Obvious Next Step If homogeneous isn’t great and heterogeneous isn’t always great, can we dynamically adjust to workload? Assign more BCEs to big core when sequential –When parallel code, no need for big core Data in Figures 2e and 2f show: –Yup, this was a good idea (best of both worlds) Is this realistic, though?

7ECE 259 / CPS 221 Conclusions Just because world is now multicore, we can’t forget about single-core performance –Aside: interesting observation from a traditionally MP group Cost-effectiveness matters –Sqrt(R) may seem bad, but may actually be fine Amdahl is still correct – we’re limited by f Dynamic provisioning of resources, if possible, is important

8ECE 259 / CPS 221 Questions/Concerns Is this model too simplistic to be insightful? –Abstractions can be good, but can also be misleading –For example, this paper focuses on cores, when the real action is in the memory system and interconnection network –Concrete example: more cores require more off-chip memory bandwidth  having more cores than you can feed isn’t going to help you Are the overheads for dynamic reconfiguration going to outweigh its benefits? –CoreFusion paper does this, but it ain’t cheap or easy What if breakthrough in technology (e.g., from Prof. Dwyer’s research) removes the power wall? –Do we go back to big uniprocessors?