Yale Patt The University of Texas at Austin World University Presidents’ Symposium University of Belgrade April 4, 2009 Future Microprocessors: What must.

Slides:

Advertisements

Similar presentations

The Implications of Multi-core. What I want to do today Given that everyone is heralding Multi-core –Is it really the Holy Grail? –Will it cure cancer?

Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.

Lecture 6: Multicore Systems

Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.

Microprocessor Performance, Phase II Yale Patt The University of Texas at Austin STAMATIS Symposium TU-Delft September 28, 2007.

Microarchitecture is Dead AND We must have a multicore interface that does not require programmers to understand what is going on underneath.

Yale Patt The University of Texas at Austin Universidade de Brasilia Brasilia, DF -- Brazil August 12, 2009 Future Microprocessors: Multi-core, Multi-nonsense,

Microarchitectural Approaches to Exceeding the Complexity Barrier © Eric Rotenberg 1 Microarchitectural Approaches to Exceeding the Complexity Barrier.

SYNAR Systems Networking and Architecture Group CMPT 886: Special Topics in Operating Systems and Computer Architecture Dr. Alexandra Fedorova School of.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.

Instruction Level Parallelism (ILP) Colin Stevens.

CS 300 – Lecture 20 Intro to Computer Architecture / Assembly Language Caches.

Microprocessors Introduction to ia64 Architecture Jan 31st, 2002 General Principles.

Csci4203/ece43631 Review Quiz. 1)It is less expensive 2)It is usually faster 3)Its average CPI is smaller 4)It allows a faster clock rate 5)It has a simpler.

CS 300 – Lecture 2 Intro to Computer Architecture / Assembly Language History.

1  2004 Morgan Kaufmann Publishers Chapter Seven.

1 Lecture 26: Case Studies Topics: processor case studies, Flash memory Final exam stats:  Highest 83, median 67  70+: 16 students, 60-69: 20 students.

Parallel Algorithms - Introduction Advanced Algorithms & Data Structures Lecture Theme 11 Prof. Dr. Th. Ottmann Summer Semester 2006.

1 Chapter 4 The Central Processing Unit and Memory.

Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.

Joram Benham April 2,  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

1 Layers of Computer Science, ISA and uArch Alexander Titov 20 September 2014.

Semiconductor Memory 1970 Fairchild Size of a single core –i.e. 1 bit of magnetic core storage Holds 256 bits Non-destructive read Much faster than core.

Why do so many chips fail? Ira Chayut, Verification Architect (opinions are my own and do not necessarily represent the opinion of my employer)

Introduction CSE 410, Spring 2008 Computer Systems

1 Lecture 1: CS/ECE 3810 Introduction Today’s topics:  Why computer organization is important  Logistics  Modern trends.

Multi-Core Architectures

Lynn Choi School of Electrical Engineering Microprocessor Microarchitecture The Past, Present, and Future of CPU Architecture.

Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.

1 Thread level parallelism: It’s time now ! André Seznec IRISA/INRIA CAPS team.

My major mantra: We Must Break the Layers. Algorithm Program ISA (Instruction Set Arch)‏ Microarchitecture Circuits Problem Electrons.

Yale Patt The University of Texas at Austin University of California, Irvine March 2, 2012 High Performance in the Multi-core Era: The role of the Transformation.

C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.

[Tim Shattuck, 2006][1] Performance / Watt: The New Server Focus Improving Performance / Watt For Modern Processors Tim Shattuck April 19, 2006 From the.

IT253: Computer Organization

3/15/2002CSE Final Remarks Concluding Remarks SOAP.

Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)

Outline  Over view  Design  Performance  Advantages and disadvantages  Examples  Conclusion  Bibliography.

Super computers Parallel Processing By Lecturer: Aisha Dawood.

Dean Tullsen UCSD.  The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive.

Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.

A few issues on the design of future multicores André Seznec IRISA/INRIA.

Shashwat Shriparv InfinitySoft.

Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.

MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.

Computer Architecture Lecture 26 Past and Future Ralph Grishman November 2015 NYU.

1  2004 Morgan Kaufmann Publishers Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality:

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.

Multi-core, Mega-nonsense. Will multicore cure cancer? Given that multicore is a reality –…and we have quickly jumped from one core to 2 to 4 to 8 –It.

Computer Organization Yasser F. O. Mohammad 1. 2 Lecture 1: Introduction Today’s topics:  Why computer organization is important  Logistics  Modern.

Sima Dezső Introduction to multicores October Version 1.0.

Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.

Processor Performance & Parallelism Yashwant Malaiya Colorado State University With some PH stuff.

“Processors” issues for LQCD January 2009 André Seznec IRISA/INRIA.

1 A simple parallel algorithm Adding n numbers in parallel.

Spring 2003CSE P5481 WaveScalar and the WaveCache Steven Swanson Ken Michelson Mark Oskin Tom Anderson Susan Eggers University of Washington.

Introduction CSE 410, Spring 2005 Computer Systems

Lynn Choi School of Electrical Engineering

Lynn Choi School of Electrical Engineering

A Common Machine Language for Communication-Exposed Architectures

Hyperthreading Technology

The University of Texas at Austin

Introduction, Focus, Overview

Morgan Kaufmann Publishers

Levels of Parallelism within a Single Processor

Constructing a system with multiple computers or processors

The University of Texas at Austin

Chapter 1 Introduction.

Levels of Parallelism within a Single Processor

Introduction, Focus, Overview

Presentation transcript:

Yale Patt The University of Texas at Austin World University Presidents’ Symposium University of Belgrade April 4, 2009 Future Microprocessors: What must we do differently to effectively utilize multi-core and many-core chips? …and what are the implications re: education?

What I want to try to do today: First, explain multi-core chips –How we got to where we are –What we have to do differently moving forward Second, discuss what that means with respect to education

Algorithm Program ISA (Instruction Set Arch) Microarchitecture Circuits Problem Electronic Devices

Algorithm Program ISA (Instruction Set Arch) Microarchitecture Circuits Problem Electronic Devices

Algorithm Program ISA (Instruction Set Arch) Microarchitecture Circuits Problem Electronic Devices

How many electronic devices? The first microprocessor (Intel 4004), 1971 –2300 transistors –106 KHz The Pentium chip, 1992 –3.1 million transistors –66 MHz Today –more than one billion transistors –Frequencies in excess of 5 GHz

In the next few years: Process technology: 50 billion transistors –Gelsinger says we are can go down to 10 nanometers (I like to say 100 angstroms just to keep us focused) Dreamers will use whatever we come up with How will we design the microarchitecture? –How will we harness 50 billion transistors?

How will we use 50 billion transistors? How have we used the transistors up to now?

How have we used the available transistors?

Intel Pentium M

Intel Core 2 Duo Penryn, nm, 3MB L2

Why Multi-core chips? It is easier than designing a much better uni-core It is embarrassing to continue making L2 bigger It is the next obvious step It is not the Holy Grail

that is… In the beginning: a better and better uniprocessor –improving performance on the hard problems More recently: a uniprocessor with a bigger L2 cache –forsaking further improvement on the “hard” problems –poorly utilizing the chip area –and blaming the processor for not delivering performance Today: dual core, quad core Tomorrow: ???

The Good News: Lots of cores on the chip The Bad News: Not much benefit.

In my opinion the reason is: Our inability to effectively exploit: -- The transformation hierarchy -- Parallel programming

Algorithm Program ISA (Instruction Set Arch) Microarchitecture Circuits Problem Electrons

Up to now Maintain the artificial walls between the layers Keep the abstraction layers secure –Makes for a better comfort zone (Mostly) Improving the Microarchitecture –Pipelining, Branch Prediction, Speculative Execution Out-of-order Execution, Caches, Trace Cache Today, we have too many transistors –BANDWIDTH and POWER are blocking improvement –We MUST change the paradigm

…and that means (I think) breaking the layers: Compiler, Microarchitecture –Multiple levels of cache –Block-structured ISA –Part by compiler, part by uarch –Fast track, slow track Algorithm, Compiler, Microarchitecture –X + superscalar – the Refrigerator –Niagara X / Pentium Y Microarchitecture, Circuits –Verification Hooks –Internal fault tolerance

IF we break the layers: 50 billion transistors naturally leads to –A large number of simple processors, AND –A few very heavyweight processors, AND –Enough “accelerators” for handling lots of special tasks Heavyweight processors mean we can exploit ILP –i.e., Improve performance on hard, sequential problems –The death of ILP has been greatly exaggerated –There is still plenty of head room yet to be pursued We need software that can utilize both We need multiple interfaces

that is: IF we are willing to continue to pursue ILP IF we are willing to break the layers IF we are willing to embrace parallel programming

The Microprocessor of 2019 It WILL BE a Multi-core chip –But it will be PentiumX / Niagara Y –It will tackle off-chip bandwidth (better L2 miss handling) –It will tackle power consumption (ON/OFF switches) –It will tackle soft errors (internal fault tolerance) –It will tackle security And it WILL CONTAIN a heavyweight ILP processor –With the levels of transformation integrated

The Heavyweight ILP Processor: Compiler/Microarchitecture Symbiosis –Multiple levels of cache –Fast track / Slow track –Part by compiler, part by microarchitecture –Block-structured ISA Better Branch Prediction (e.g., indirect jumps) Ample sprinkling of Refrigerators SSMT (Also known as helper threads) Power Awareness (more than ON/OFF switches) Verification hooks (CAD a first class citizen) Internal Fault tolerance (for soft errors) Better security

Unfortunately: We train computer people to work within their layer Too few understand anything outside their layer and, as to multiple cores: People think sequential

At least two problems

Conventional Wisdom Problem 1: “Abstraction” is Misunderstood Taxi to the airport The Scheme Chip (Deeper understanding) Sorting (choices) Microsoft developers (Deeper understanding) Wireless networks (Layers revisited)

Conventional Wisdom Problem 2: Thinking in Parallel is Hard Perhaps: Thinking is Hard How do we get people to believe: Thinking in parallel is natural

Too many computer professionals don’t get it. We can exploit all these transistors –IF we can understand each other’s layer Thousands of cores, hundreds of accelerators –Ability to power on/off under program control Algorithms, Compiler, Microarchitecture, Circuits all talking to each other … Harnessing 50 billion transistor chips We have an Education Problem We have an Education Opportunity

What does this mean for most countries? One does NOT have to produce chips to be a major force in the multi-core, many core era One DOES have to understand multiple layers –which does not require huge capital investment –BUT does require a very large teacher investment One DOES have to embrace parallel thinking

How does one do it? FIRST, Do not accept the premise: Parallel programming is hard SECOND, Do not accept the premise: It is okay to know only one layer

Parallel Programming is Hard? What if we start teaching parallel thinking in the first course to freshmen For example: –Factorial –Parallel search –Streaming

How does one do it? FIRST, Do not accept the premise: Parallel programming is hard SECOND, Do not accept the premise: It is okay to know only one layer

Students can understand more than one layer What if we get rid of “top-down” FIRST –Students do not get it – they have no underpinnings –Objects are too high a level of abstraction –So, students end up memorizing –Memorizing isn’t learning (and certainly not understanding) What if we START with “motivated” bottom up –Students build on what they already know –Memorizing is replaced with real learning –Continually raise the level of abstraction The student sees the layers from the start –The student makes the connections –The student understands what is going on

If students understand, they can fix their own bugs …and, there is no substitute for Design it wrong, Debug it yourself, Fix it yourself, AND see the working result.

And, while I am at it: Students in every country are smart –No country has an advantage in that domain Don’t be afraid to work them very, very hard –Students can master very difficult material –And, students will not complain if they are learning –But they will complain if they are being fed “tedium.” That means teachers who know the difference

Again: what does this mean for most countries? One does NOT have to produce chips to be a major force in the multi-core, many core era One DOES have to understand multiple layers –which does not require huge capital investment –BUT does require a very large teacher investment One DOES have to embrace parallel thinking

The answer to both is education.

Thank you!