15-447 Computer ArchitectureFall 2008 © November 19, 2007 Nael Abu-Ghazaleh Lecture 26 Emerging.

Slides:



Advertisements
Similar presentations
Lecture 21Comp. Arch. Fall 2006 Chapter 8: I/O Systems Adapted from Mary Jane Irwin at Penn State University for Computer Organization and Design, Patterson.
Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 6: Multicore Systems
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
Computer Science & Engineering
Microprocessors. Von Neumann architecture Data and instructions in single read/write memory Contents of memory addressable by location, independent of.
CIS 570 Advanced Computer Systems University of Massachusetts Dartmouth Instructor: Dr. Michael Geiger Fall 2008 Lecture 1: Fundamentals of Computer Design.
CMSC 411 Computer Systems Architecture Lecture 1 Computer Architecture at Crossroads Instructor: Anwar Mamat Slides from Alan Sussman, Pete Keleher, Chau-Wen.
Lecture Objectives: 1)Explain the limitations of flash memory. 2)Define wear leveling. 3)Define the term IO Transaction 4)Define the terms synchronous.
The Evolution of RISC A Three Party Rivalry By Jenny Mitchell CS147 Fall 2003 Dr. Lee.
1 Burroughs B5500 multiprocessor. These machines were designed to support HLLs, such as Algol. They used a stack architecture, but part of the stack was.
S. Barua – CPSC 440 CHAPTER 8 INTERFACING PROCESSORS AND PERIPHERALS Topics to be covered  How to.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
Room: E-3-31 Phone: Dr Masri Ayob TK 2123 COMPUTER ORGANISATION & ARCHITECTURE Lecture 4: Computer Performance.
CSCE101 – 4.2, 4.3 October 17, Power Supply Surge Protector –protects from power spikes which ruin hardware. Voltage Regulator – protects from insufficient.
Chapter Hardwired vs Microprogrammed Control Multithreading
1 Interfacing Processors and Peripherals I/O Design affected by many factors (expandability, resilience) Performance: — access latency — throughput — connection.
Computer ArchitectureFall 2008 © November 12, 2007 Nael Abu-Ghazaleh Lecture 24 Disk IO.
CPE 731 Advanced Computer Architecture Multiprocessor Introduction
CPU Chips The logical pinout of a generic CPU. The arrows indicate input signals and output signals. The short diagonal lines indicate that multiple pins.
Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.
5.1 Chaper 4 Central Processing Unit Foundations of Computer Science  Cengage Learning.
Module I Overview of Computer Architecture and Organization.
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
PHY 201 (Blum) Buses Warning: some of the terminology is used inconsistently within the field.
Computer performance.
Storage & Peripherals Disks, Networks, and Other Devices.
CPU BASICS, THE BUS, CLOCKS, I/O SUBSYSTEM Philip Chan.
1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,
Introduction CSE 410, Spring 2008 Computer Systems
Buses Warning: some of the terminology is used inconsistently within the field.
Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.
Multi-core architectures. Single-core computer Single-core CPU chip.
Multi-Core Architectures
I/O Example: Disk Drives To access data: — seek: position head over the proper track (8 to 20 ms. avg.) — rotational latency: wait for desired sector (.5.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
1 Recap (from Previous Lecture). 2 Computer Architecture Computer Architecture involves 3 inter- related components – Instruction set architecture (ISA):
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
[Tim Shattuck, 2006][1] Performance / Watt: The New Server Focus Improving Performance / Watt For Modern Processors Tim Shattuck April 19, 2006 From the.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
Computer Architecture Lec 1 - Introduction. 01/19/10Lec 01-intro 2 Outline Computer Science at a Crossroads Computer Architecture v. Instruction Set Arch.
I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.
August 1, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 9: I/O Devices and Communication Buses * Jeremy R. Johnson Wednesday,
MBG 1 CIS501, Fall 99 Lecture 18: Input/Output (I/O): Buses and Peripherals Michael B. Greenwald Computer Architecture CIS 501 Fall 1999.
Stored Programs In today’s lesson, we will look at: what we mean by a stored program computer how computers store and run programs what we mean by the.
CS2100 Computer Organisation Input/Output – Own reading only (AY2015/6) Semester 1 Adapted from David Patternson’s lecture slides:
Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
1 Lecture 14 Buses and I/O Data Transfer Peng Liu
BCS361: Computer Architecture I/O Devices. 2 Input/Output CPU Cache Bus MemoryDiskNetworkUSBDVD …
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29.
Computer Organization Yasser F. O. Mohammad 1. 2 Lecture 1: Introduction Today’s topics:  Why computer organization is important  Logistics  Modern.
Central Processing Unit (CPU) The Computer’s Brain.
VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.
Computer Architecture Furkan Rabee
Introduction CSE 410, Spring 2005 Computer Systems
Feeding Parallel Machines – Any Silver Bullets? Novica Nosović ETF Sarajevo 8th Workshop “Software Engineering Education and Reverse Engineering” Durres,
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
Lecture 2. A Computer System for Labs
Lynn Choi School of Electrical Engineering
Lynn Choi School of Electrical Engineering
Architecture & Organization 1
CS775: Computer Architecture
Architecture & Organization 1
Lecture 14 Buses and I/O Data Transfer
Chapter 1 Introduction.
Computer Evolution and Performance
Chapter 4 Multiprocessors
Presentation transcript:

Computer ArchitectureFall 2008 © November 19, 2007 Nael Abu-Ghazaleh Lecture 26 Emerging Architectures CS : Computer Architecture

Computer ArchitectureFall 2008 © 2 Last Time: Buses and I/O Buses: Bunch of wires Shared Interconnect: multiple “devices” connect to the same bus Versatile: new devices can connect (even ones we didn’t know existed when bus was designed) Can become a bottleneck –Shorter->faster; less devices->faster Have to: –Define the protocol to make devices communicate –Come up with an arbitration mechanism Data Lines Control Lines

Computer ArchitectureFall 2008 © 3 Types of Buses System bus –Connects processor and memory –Short, fast, synchronous, design specific I/O Bus –Usually is lengthy and slower; industry standard –Need to match a wide range of I/O devices –Connects to the processor-memory bus or backplane bus ProcessorMemory Processor Memory Bus Bus Adaptor Bus Adaptor Bus Adaptor I/O Bus Backplane Bus I/O Bus

Computer ArchitectureFall 2008 © 4 Bus “Mechanics” Master Slave Have to define how we hand-shake –Depends on whether its synchronous or not Bus arbitration protocol –Contention vs. reservation; centralized vs. distributed I/O Model –Programmed I/O; Interrupt driven I/O; DMA Increasing performance (mainly bandwidth) –Shorter; closer; wider –Block transfers (instead of byte transfers) –Split transaction buses –…

Computer ArchitectureFall 2008 © 5 Today—Emerging Architectures We are at an interesting point in computer architecture evolution What is emerging and why is it emerging?

Computer ArchitectureFall 2008 © 6 Uniprocessor Performance (SPECint) VAX : 25%/year 1978 to 1986 RISC + x86: 52%/year 1986 to 2002 RISC + x86: ??%/year 2002 to present From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, Sept. 15, 2006  Sea change in chip design—what is emerging? 3X ??%/year

Computer ArchitectureFall 2008 © 7 How did we get there? First, what allowed the ridiculous 52% improvement per year to continue for around 20 years? –If cars improved as much we would have 1 million Km/hr cars! Is it just the number of transistors/clock rate? No! Its also all the stuff that we’ve been learning about!

Computer ArchitectureFall 2008 © 8 Walk down memory lane What was the first processor organization we looked at? –Single cycle processors How did multi-cycle processors improve those? What did we do after that to improve performance? –Pipelining; why does that help? What are the limitations? From there we discussed superscalar architectures –Out of order execution; multiple ALUs –This is basically state of the art in uniprocessors –What gave us problems there?

Computer ArchitectureFall 2008 © 9 Detour: couple of other design points Very Large Instruction Word Architectures; let the compiler do the work Great for energy efficiency—less Instruction Level Parallelism Not binary compatible? Trasnmeta Crusoe Processor

Computer ArchitectureFall 2008 © 10 SIMD ISA Extensions—Parallelism from the Data? Same Instruction applied to multiple Data at the same time –How can this help? MMX (Intel) and 3DNow! (AMD) ISA extensions Great for graphics; originally invented for scientific codes (vector processors) –Not a general solution End of detour!

Computer ArchitectureFall 2008 © 11 Back to Moore’s law Why are the “good times” over? –Three walls 1.“Instruction Level Parallelism” (ILP) Wall –Less parallelism available in programs (2->4->8->16) –Tremendous increase in complexity to get more –Does VLIW help? –What can help? –Conclusion: standard architectures cannot continue to do their part of sustaining Moore’s law

Computer ArchitectureFall 2008 © 12 Wall 2: Memory Wall What did we do to help this? –Still very very expensive to access memory How do we see the impact in practice? Very different from when I learned architecture! µProc 52%/yr. (2X/1.5yr) DRAM 9%/yr. (2X/10 yrs) DRAM CPU 1982 Processor-Memory Performance Gap: (grows 50% / year) Performance “Moore’s Law”

Computer ArchitectureFall 2008 © 13 Ways out? Multithreaded Processors Can we switch to other threads if we need to access memory? –When do we need to access memory? What support is needed? Can I use it to help with the ILP wall as well?

Computer ArchitectureFall 2008 © 14 Symmetric Multithreaded Processors How do I switch between threads? Hardware support for that How does this help? But, increased contention for everything (BW, TLB, caches…)

Computer ArchitectureFall 2008 © 15 Third Wall: Physics/Power wall We’re down to the level of playing with a few atoms More error prone; lower yield But also soft-errors and wear out –Logic that sometimes works! –Can we do something in architecture to recover?

Computer ArchitectureFall 2008 © 16 Power! Our topic next class

Computer ArchitectureFall 2008 © 17 So, what is our way out? Any ideas? Maybe architecture becomes commodity; this is the best we can do –This happens to a lot of technologies: why don’t we have the million km/hr car? Do we actually need more processing power? –8 bit embedded processors good enough for calculators; 4 bit ones probably good enough for elevators –Is there any sense to continue investing so much time and energy into this stuff? Power Wall + Memory Wall + ILP Wall = Brick Wall

Computer ArchitectureFall 2008 © 18 A lifeline? Multi-core architectures How does this help? Think of the three walls The new Moore’s law: –the number of cores will double every 3 years! –Many-core architectures

Computer ArchitectureFall 2008 © 19 Overcoming the three walls ILP Wall? –Don’t need to restrict myself to a single thread –Natural parallelism available across threads/programs Memory wall? –Hmm, that is a tough one; on the surface, seems like we made it worse –Maybe help coming from industry Physics/power wall? –Use less aggressive core technology Simpler processors, shallower pipelines But more processors –Throw-away cores to improve yield Do you buy it?

Computer ArchitectureFall 2008 © 20 7 Questions for Parallelism Applications: 1. What are the apps? 2. What are kernels of apps? Hardware: 3. What are the HW building blocks? 4. How to connect them? Programming Models: 5. How to describe apps and kernels? 6. How to program the HW? Evaluation: 7. How to measure success? (Inspired by a view of the Golden Gate Bridge from Berkeley)

Computer ArchitectureFall 2008 © 21 Sea Change in Chip Design Intel 4004 (1971): 4-bit processor, 2312 transistors, 0.4 MHz, 10 micron PMOS, 11 mm 2 chip Processor is the new transistor! RISC II (1983): 32-bit, 5 stage pipeline, 40,760 transistors, 3 MHz, 3 micron NMOS, 60 mm 2 chip 125 mm 2 chip, micron CMOS = 2312 RISC II+FPU+Icache+Dcache –RISC II shrinks to  0.02 mm 2 at 65 nm

Computer ArchitectureFall 2008 © 22 Architecture Design space What should each core look like? Should all cores look the same? How should the chip interconnect between them look? What level of the cache should they share? –And what are the implications of that? Are there new security issues? –Side channel attacks; denial of service attacks Many other questions… Brand new playground; exciting time to do architecture research

Computer ArchitectureFall 2008 © 23 Hardware Building Blocks: Small is Beautiful Given difficulty of design/validation of large designs Given power limits what can build, parallel is energy efficient way to achieve performance –Lower threshold voltage means much lower power Given redundant processors can improve chip yield –Cisco Metro 188 processors + 4 spares –Sun Niagara sells 6 or 8 processor version Expect modestly pipelined (5- to 9-stage) CPUs, FPUs, vector, SIMD PEs One size fits all? –Amdahl’s Law  a few fast cores + many small cores

Computer ArchitectureFall 2008 © 24 Elephant in the room We tried this parallel processing thing before –Very difficult It failed, pretty much –A lot of academic progress and neat algorithms, but little impact commercially We actually have to do new programming –A lot of effort to develop; error prone; etc.. –La-Z-boy programming era is over –Need new programming models Amdahl’s law Applications: What will you use 1024 cores for? These concerns are being voiced by a substantial segment of academia/industry –What do you think? –Its coming, no matter what