Multi-core, Mega-nonsense. Will multicore cure cancer? Given that multicore is a reality –…and we have quickly jumped from one core to 2 to 4 to 8 –It.

Slides:



Advertisements
Similar presentations
The Implications of Multi-core. What I want to do today Given that everyone is heralding Multi-core –Is it really the Holy Grail? –Will it cure cancer?
Advertisements

4. Shared Memory Parallel Architectures 4.4. Multicore Architectures
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 6: Multicore Systems
Microprocessor Performance, Phase II Yale Patt The University of Texas at Austin STAMATIS Symposium TU-Delft September 28, 2007.
Microarchitecture is Dead AND We must have a multicore interface that does not require programmers to understand what is going on underneath.
Microprocessor Microarchitecture Multithreading Lynn Choi School of Electrical Engineering.
Yale Patt The University of Texas at Austin World University Presidents’ Symposium University of Belgrade April 4, 2009 Future Microprocessors: What must.
Yale Patt The University of Texas at Austin Universidade de Brasilia Brasilia, DF -- Brazil August 12, 2009 Future Microprocessors: Multi-core, Multi-nonsense,
Instructor: Sazid Zaman Khan Lecturer, Department of Computer Science and Engineering, IIUC.
1 xkcd/619. Multicore & Parallel Processing Guest Lecture: Kevin Walsh CS 3410, Spring 2011 Computer Science Cornell University.
1 Microprocessor-based Systems Course 4 - Microprocessors.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Multicore & Parallel Processing P&H Chapter ,
CPU Processor Speed Timeline Speed =.02 Mhz Year= 1972 Transistors= 3500 It takes 66, CPU’s to equal 1 i7.
CSCE101 – 4.2, 4.3 October 17, Power Supply Surge Protector –protects from power spikes which ruin hardware. Voltage Regulator – protects from insufficient.
Central Processing Unit IS – 221 PC Admin 2/26/2007 by Len Krygsman IV.
1 Lecture 26: Case Studies Topics: processor case studies, Flash memory Final exam stats:  Highest 83, median 67  70+: 16 students, 60-69: 20 students.
Parallel Algorithms - Introduction Advanced Algorithms & Data Structures Lecture Theme 11 Prof. Dr. Th. Ottmann Summer Semester 2006.

Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
Multi-core Processing The Past and The Future Amir Moghimi, ASIC Course, UT ECE.
Joram Benham April 2,  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.
Semiconductor Memory 1970 Fairchild Size of a single core –i.e. 1 bit of magnetic core storage Holds 256 bits Non-destructive read Much faster than core.
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
1 Thread level parallelism: It’s time now ! André Seznec IRISA/INRIA CAPS team.
Introduction CSE 410, Spring 2008 Computer Systems
Last Time Performance Analysis It’s all relative
Parallel and Distributed Systems Instructor: Xin Yuan Department of Computer Science Florida State University.
Lynn Choi School of Electrical Engineering Microprocessor Microarchitecture The Past, Present, and Future of CPU Architecture.
Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier.
INTRODUCTION Crusoe processor is 128 bit microprocessor which is build for mobile computing devices where low power consumption is required. Crusoe processor.
1 Thread level parallelism: It’s time now ! André Seznec IRISA/INRIA CAPS team.
My major mantra: We Must Break the Layers. Algorithm Program ISA (Instruction Set Arch)‏ Microarchitecture Circuits Problem Electrons.
Yale Patt The University of Texas at Austin University of California, Irvine March 2, 2012 High Performance in the Multi-core Era: The role of the Transformation.
Hardware Trends. Contents Memory Hard Disks Processors Network Accessories Future.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
10/27: Lecture Topics Survey results Current Architectural Trends Operating Systems Intro –What is an OS? –Issues in operating systems.
Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)
Outline  Over view  Design  Performance  Advantages and disadvantages  Examples  Conclusion  Bibliography.
Chapter 1 Performance & Technology Trends Read Sections 1.5, 1.6, and 1.8.
Multi-Core Development Kyle Anderson. Overview History Pollack’s Law Moore’s Law CPU GPU OpenCL CUDA Parallelism.
A few issues on the design of future multicores André Seznec IRISA/INRIA.
Shashwat Shriparv InfinitySoft.
Microprocessors BY Sandy G.
Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.
Stored Programs In today’s lesson, we will look at: what we mean by a stored program computer how computers store and run programs what we mean by the.
THE BRIEF HISTORY OF 8085 MICROPROCESSOR & THEIR APPLICATIONS
Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.
Succeeding with Technology Chapter 2 Hardware Designed to Meet the Need The Digital Revolution Integrated Circuits and Processing Storage Input, Output,
Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO Session 3 Computer Evolution.
1 A simple parallel algorithm Adding n numbers in parallel.
PipeliningPipelining Computer Architecture (Fall 2006)
Samira Khan University of Virginia Feb 25, 2016 COMPUTER ARCHITECTURE CS 6354 Asymmetric Multi-Cores The content and concept of this course are adapted.
Introduction CSE 410, Spring 2005 Computer Systems
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and multi-threading 2004 version: 31 clock cycles from fetch to retire,
Lynn Choi School of Electrical Engineering
مقدمة في علوم الحاسب Lecture 5
CSE 410, Spring 2006 Computer Systems
Parallel Processing - introduction
Lynn Choi School of Electrical Engineering
Hardware September 19, 2017.
Computer Architecture Lecture 4 17th May, 2006
Computer Evolution and Performance
CS 286 Computer Organization and Architecture
EE 155 / Comp 122 Parallel Computing
Presentation transcript:

Multi-core, Mega-nonsense

Will multicore cure cancer? Given that multicore is a reality –…and we have quickly jumped from one core to 2 to 4 to 8 –It is easy to let one’s imagination run wild – a million cores! A lot of misinformation has surfaced What multi-core is and what it is not And where we go from here

To whet the appetite Can multi-core save power via the freq cube law? Is ILP dead? Should sample benchmarks drive future designs? Is hardware really sequential? Should multi-core structures be simple? Does productivity demand we ignore what’s below?

Mega-nonsense Multi-core was a solution to a performance problem Hardware works sequentially Make the hardware simple – thousands of cores Do in parallel at a slower clock and save power ILP is dead Examine what is (rather than what can be) Communication: off-chip hard, on-chip easy Abstraction is a pure good Programmers are all dumb and need to be protected Thinking in parallel is hard

Mega-nonsense Multi-core was a solution to a performance problem Hardware works sequentially Make the hardware simple – thousands of cores Do in parallel at a slower clock and save power ILP is dead Examine what is (rather than what can be) Communication: off-chip hard, on-chip easy Abstraction is a pure good Programmers are all dumb and need to be protected Thinking in parallel is hard

How we got here (Moore’s Law) The first microprocessor (Intel 4004), 1971 –2300 transistors –106 KHz The Pentium chip, 1992 –3.1 million transistors –66 MHz Today –more than one billion transistors –Frequencies in excess of 5 GHz Tomorrow ?

How have we used the available transistors?

Intel Pentium M

Intel Core 2 Duo Penryn, nm, 3MB L2

Why Multi-core chips? In the beginning: a better and better uniprocessor –improving performance on the hard problems –…until it just got too hard Followed by: a uniprocessor with a bigger L2 cache –forsaking further improvement on the “hard” problems –poorly utilizing the chip area –and blaming the processor for not delivering performance Today: dual core, quad core, octo core Tomorrow: ???

Why Multi-core chips? It is easier than designing a much better uni-core …and cheaper! It was embarrassing to continue making L2 bigger It was the next obvious step

So, What’s the Point Yes, Multi-core is a reality No, it wasn’t a technological solution to performance improvement Ergo, we do not have to accept it as is i.e., we can get it right the second time, and that means: What goes on the chip What are the interfaces

Mega-nonsense Multi-core was a solution to a performance problem Hardware works sequentially Make the hardware simple – thousands of cores Do in parallel at a slower clock and save power ILP is dead Examine what is (rather than what can be) Communication: off-chip hard, on-chip easy Abstraction is a pure good Programmers are all dumb and need to be protected Thinking in parallel is hard

Hardware is the ultimate in parallelism! It is NOT about cycle by cycle, It is about what goes on in EACH cycle

Mega-nonsense Multi-core was a solution to a performance problem Hardware works sequentially Make the hardware simple – thousands of cores Do in parallel at a slower clock and save power ILP is dead Examine what is (rather than what can be) Communication: off-chip hard, on-chip easy Abstraction is a pure good Programmers are all dumb and need to be protected Thinking in parallel is hard

The Asymmetric Chip Multiprocessor (ACMP) Niagara -like core Large core ACMP Approach Niagara -like core “Niagara” Approach Large core Large core Large core “Tile-Large” Approach

Large core vs. Small Core Out-of-order Wide fetch e.g. 4-wide Deeper pipeline Aggressive branch predictor (e.g. hybrid) Many functional units Trace cache Memory dependence speculation In-order Narrow Fetch e.g. 2-wide Shallow pipeline Simple branch predictor (e.g. Gshare) Few functional units Large Core Small Core

Throughput vs. Serial Performance

Mega-nonsense Multi-core was a solution to a performance problem Hardware works sequentially Make the hardware simple – thousands of cores Do in parallel at a slower clock and save power ILP is dead Examine what is (rather than what can be) Communication: off-chip hard, on-chip easy Abstraction is a pure good Programmers are all dumb and need to be protected Thinking in parallel is hard

Huh?

Mega-nonsense Multi-core was a solution to a performance problem Hardware works sequentially Make the hardware simple – thousands of cores Do in parallel at a slower clock and save power ILP is dead Examine what is (rather than what can be) Communication: off-chip hard, on-chip easy Abstraction is a pure good Programmers are all dumb and need to be protected Thinking in parallel is hard

ILP is dead We double the number of transistors on the chip –Pentium M: 77 Million transistors (50M for the L2 cache) –2 nd Generation: 140 Million (110M for the L2 cache) We see 5% improvement in IPC Ergo: ILP is dead! Perhaps we have blamed the wrong culprit. The EV4,5,6,7,8 data: from EV4 to EV8: –Performance improvement: 55X –Performance from frequency: 7X –Ergo: 55/7 > 7 -- more than half due to microarchitecture

Moore’s Law A law of physics A law of process technology A law of microarchitecture A law of psychology

Mega-nonsense Multi-core was a solution to a performance problem Hardware works sequentially Make the hardware simple – thousands of cores Do in parallel at a slower clock and save power ILP is dead Examine what is (rather than what can be) Communication: off-chip hard, on-chip easy Abstraction is a pure good Programmers are all dumb and need to be protected Thinking in parallel is hard

Examine what is (rather than what can be) Should sample benchmarks drive future designs? Another bridge over the East River?

Mega-nonsense Multi-core was a solution to a performance problem Hardware works sequentially Make the hardware simple – thousands of cores Do in parallel at a slower clock and save power ILP is dead Examine what is (rather than what can be) Communication: off-chip hard, on-chip easy Abstraction is a pure good Programmers are all dumb and need to be protected Thinking in parallel is hard

Mega-nonsense Multi-core was a solution to a performance problem Hardware works sequentially Make the hardware simple – thousands of cores Do in parallel at a slower clock and save power ILP is dead Examine what is (rather than what can be) Communication: off-chip hard, on-chip easy Abstraction is a pure good Programmers are all dumb and need to be protected Thinking in parallel is hard

“Abstraction” is Misunderstood Taxi to the airport The Scheme Chip (Deeper understanding) Sorting (choices) Microsoft developers (Deeper understanding)

Mega-nonsense Multi-core was a solution to a performance problem Hardware works sequentially Make the hardware simple – thousands of cores Do in parallel at a slower clock and save power ILP is dead Examine what is (rather than what can be) Communication: off-chip hard, on-chip easy Abstraction is a pure good Programmers are all dumb and need to be protected Thinking in parallel is hard

Not all programmers are created equal Some want to just get their work done –Performance be damned –They could care less about how computers work Some want performance above all else –They understand how computers work –They can program at the lowest level Ergo: At least two interfaces

Mega-nonsense Multi-core was a solution to a performance problem Hardware works sequentially Make the hardware simple – thousands of cores Do in parallel at a slower clock and save power ILP is dead Examine what is (rather than what can be) Communication: off-chip hard, on-chip easy Abstraction is a pure good Programmers are all dumb and need to be protected Thinking in parallel is hard

Thinking in Parallel is Hard Perhaps: Thinking is Hard How do we get people to believe: Thinking in parallel is natural

Parallel Programming is Hard? What if we start teaching parallel thinking in the first course to freshmen For example: –Factorial –Parallel search –Streaming

Mega-nonsense Multi-core was a solution to a performance problem Hardware works sequentially Make the hardware simple – thousands of cores Do in parallel at a slower clock and save power ILP is dead Examine what is (rather than what can be) Communication: off-chip hard, on-chip easy Abstraction is a pure good Programmers are all dumb and need to be protected Thinking in parallel is hard

!