Amdahl’s Law in the Multicore Era Mark D.Hill & Michael R.Marty 2008 ECE 259 / CPS 221 Advanced Computer Architecture II Presenter : Tae Jun Ham 2012.

Slides:

Advertisements

Similar presentations

S CALING T HE S PEEDUP OF MULTI - CORE CHIPS BASED ON A MDAHL S LAW A.V. Bogdanov, Kyaw Zaya DUBNA,

Advertisements

Parallelism Lecture notes from MKP and S. Yalamanchili.

“Amdahl's Law in the Multicore Era” Mark Hill and Mike Marty University of Wisconsin IEEE Computer, July 2008 Presented by Dan Sorin.

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

PradeepKumar S K Asst. Professor Dept. of ECE, KIT, TIPTUR. PradeepKumar S K, Asst.

La microarchitecture est morte. Longue vie à la microarchitecture! ISCA 2010 Panel, St. Malo, France (Microarchitecture is dead. Long live microarchitecture!)

Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.

Lecturer: Simon Winberg Lecture 18 Amdahl’s Law & YODA Blog & Design Review.

University of Wisconsin-Madison © 2008 Multifacet Project Amdahl’s Law in the Multicore Era Mark D. Hill and Michael R. Marty University of Wisconsin—Madison.

Optimal Speedup on a Low-Degree Multi-Core Parallel Architecture (LoPRAM) Alejandro Salinger Cheriton School of Computer Science University of Waterloo.

Parallel Processing & Distributed Systems Thoai Nam Chapter 2.

An Introduction To PARALLEL PROGRAMMING Ing. Andrea Marongiu

CS 584 Lecture 11 l Assignment? l Paper Schedule –10 Students –5 Days –Look at the schedule and me your preference. Quickly.

 States that the number of transistors on a microprocessor will double every two years.  Current technology is approaching physical limitations. The.

CPE 731 Advanced Computer Architecture Multiprocessor Introduction

Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.

“Evaluating MapReduce for Multi-core and Multiprocessor Systems” Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, Christos Kozyrakis Computer.

Tal Mor  Create an automatic system that given an image of a room and a color, will color the room walls  Maintaining the original texture.

Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.

Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.

University of Wisconsin-Madison © 2008 Multifacet Project Amdahl’s Law in the Multicore Era Mark D. Hill and Michael R. Marty Univ. of Wisconsin—Madison.

Lecture 3 – Parallel Performance Theory - 1 Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science.

Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of.

INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.

EECE 571R (Spring 2009) Massively parallel/distributed platforms Matei Ripeanu matei at ece.ubc.ca.

Agenda Project discussion Modeling Critical Sections in Amdahl's Law and its Implications for Multicore Design, S. Eyerman, L. Eeckhout, ISCA'10 [pdf]pdf.

Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.

® 1 VLSI Design Challenges for Gigascale Integration Shekhar Borkar Intel Corp. October 25, 2005.

Empowering efficient HPC with Dell Martin Hilgeman HPC Consultant EMEA.

April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.

Lecturer: Simon Winberg Lecture 18 Amdahl’s Law (+- 25 min)

Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to.

CMT OS scheduling summary Yipkei Kwok 03/18/2008.

CISC 879 : Advanced Parallel Programming Vaibhav Naidu Dept. of Computer & Information Sciences University of Delaware Importance of Single-core in Multicore.

Chapter 1 Performance & Technology Trends Read Sections 1.5, 1.6, and 1.8.

Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.

Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications.

Experts in numerical algorithms and HPC services Compiler Requirements and Directions Rob Meyer September 10, 2009.

Classic Model of Parallel Processing

1. 2 Pipelining vs. Parallel processing  In both cases, multiple “things” processed by multiple “functional units” Pipelining: each thing is broken into.

Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? Wasim Shaikh Date: 10/29/2015.

Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.

Weekly Report- Reduction Ph.D. Student: Leo Lee date: Oct. 30, 2009.

Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,

August 13, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 11: Multiprocessors: Uniform Memory Access * Jeremy R. Johnson Monday,

Amdahl’s Law CPS 5401 Fall 2013 Shirley Moore

CISC 879 : Advanced Parallel Programming Vaibhav Naidu Dept. of Computer & Information Sciences University of Delaware Dark Silicon and End of Multicore.

Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.

Kriging for Estimation of Mineral Resources GISELA/EPIKH School Exequiel Sepúlveda Department of Mining Engineering, University of Chile, Chile ALGES Laboratory,

1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.

Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.

Computer Organization CS345 David Monismith Based upon notes by Dr. Bill Siever and from the Patterson and Hennessy Text.

1 Advanced Embedded Systems Lecture 7 Advances in Embedded Systems CPUs.

Dynamic and On-Line Design Space Exploration for Reconfigurable Architecture Fakhreddine Ghaffari, Michael Auguin, Mohamed Abid Nice Sophia Antipolis University.

COSC3330 Computer Architecture Lecture 21. TLP and Multi-core

Parallel Computing in the Multicore Era

Introduction to Parallelism.

Multi-Processing in High Performance Computer Architecture:

Discussion Lead: Pen-Chung (Pen) Yew

Parallel Processing Sharing the load.

CSE8380 Parallel and Distributed Processing Presentation

Parallel Computing in the Multicore Era

Chapter 4 Multiprocessors

Mattan Erez The University of Texas at Austin

Potential for parallel computers/parallel programming

What I've done in past 6 months

DMP: Deterministic Shared Memory Multiprocessing

Presentation transcript:

Amdahl’s Law in the Multicore Era Mark D.Hill & Michael R.Marty 2008 ECE 259 / CPS 221 Advanced Computer Architecture II Presenter : Tae Jun Ham

Outline Summary - Amdahl’s law in the multicore era - Symmetric MC Case - Asymmetric MC Case - Dynamic MC Case Review - Strong Point - Negative Point - Possible Questions

Problem  Multicore Chip Design has additional degree of freedom - Total number of Cores - Complexity of the individual core - Multicore Chip Design Style (Symmetric / Asymmetric / Dynamic)  Goal of this paper : To explore the design space of multicore chip and obtaining some useful implication for computer architects

Amdahl’s Law  Original :  Multicore :

Basic Assumptions  Limited Resource : Area  Resource Unit : BCE(Base Core Equivalence)  Simple Core : Consume : 1 BCE Performance : 1  Complex Core : Consume : r BCEs Performance : perf(r) = sqrt(r)

Symmetric Multicore Model  Resource : n BCEs  Each core consumes r BCEs  Total number of core : n/r  Serial Performance : perf(r)  Parallel Performance : perf(r) * (n/r)

Symmetric Multicore Analysis  Parallelization is important  rBCEs>1 can be optimal (Complex core is still important even with the diminishing return in performance per area)

Asymmetric Multicore Model  Resource : n BCEs  One complex core consumes r BCEs  Other cores consumes 1 BCE  Total number of core : n-r+1  Serial Performance : perf(r)  Parallel Performance : perf(1) * (n-r)+perf(r)

Asymmetric Multicore Analysis  Asymmetric multicore allows better speedups  For asymmetric multicore, having a nice complex core is crucial

Dynamic Multicore Model  Resource : n BCEs  Forms a r BCEs complex core for sequential operation  Other part consumes 1 BCE  Total number of core : n ( parallel ) / n-r+1 (serial)  Serial Performance : perf(r)  Parallel Performance : n * perf(1) = n

Dynamic Multicore Analysis  Dynamic Multicore provides better speedups

Strength  Identified the future research direction 1. Increase Parallelism 2. Increase Core Performance 3. Better asymmetric & dynamic multicore design  Derived corollary for Amdahl’s law for multicore cases

Limitation  Not very accurate model 1. Limited Resource : combination of power, area and cost 2. Performance Model : can be different from sqrt(r) 3. Need to consider partially parallel portion  Skepticism 1. Can Moore’s law continue till 256 core per chip? 2. Can we really achieve 99.9% parallelization?  Optimal point highly depends on parallel portion. As parallel portion differs among applications, it is hard to determine the best hardware design

Future work / Discussions  What would be the appropriate ways to implement dynamic multicore design with HW?  How do we develop a better analytical model for multicore performance?  What would be software challenges for asymmetric multicore or dynamic multicore?  What would be the most power efficient multicore design among three choices presented?