Presentation is loading. Please wait.

Presentation is loading. Please wait.

15-447 Computer ArchitectureFall 2008 © November 19, 2007 Nael Abu-Ghazaleh Lecture 26 Emerging.

Similar presentations


Presentation on theme: "15-447 Computer ArchitectureFall 2008 © November 19, 2007 Nael Abu-Ghazaleh Lecture 26 Emerging."— Presentation transcript:

1 15-447 Computer ArchitectureFall 2008 © November 19, 2007 Nael Abu-Ghazaleh naelag@cmu.edu http://www.qatar.cmu.edu/~msakr/15447-f08 Lecture 26 Emerging Architectures CS 15-447: Computer Architecture

2 15-447 Computer ArchitectureFall 2008 © 2 Last Time: Buses and I/O Buses: Bunch of wires Shared Interconnect: multiple “devices” connect to the same bus Versatile: new devices can connect (even ones we didn’t know existed when bus was designed) Can become a bottleneck –Shorter->faster; less devices->faster Have to: –Define the protocol to make devices communicate –Come up with an arbitration mechanism Data Lines Control Lines

3 15-447 Computer ArchitectureFall 2008 © 3 Types of Buses System bus –Connects processor and memory –Short, fast, synchronous, design specific I/O Bus –Usually is lengthy and slower; industry standard –Need to match a wide range of I/O devices –Connects to the processor-memory bus or backplane bus ProcessorMemory Processor Memory Bus Bus Adaptor Bus Adaptor Bus Adaptor I/O Bus Backplane Bus I/O Bus

4 15-447 Computer ArchitectureFall 2008 © 4 Bus “Mechanics” Master Slave Have to define how we hand-shake –Depends on whether its synchronous or not Bus arbitration protocol –Contention vs. reservation; centralized vs. distributed I/O Model –Programmed I/O; Interrupt driven I/O; DMA Increasing performance (mainly bandwidth) –Shorter; closer; wider –Block transfers (instead of byte transfers) –Split transaction buses –…

5 15-447 Computer ArchitectureFall 2008 © 5 Today—Emerging Architectures We are at an interesting point in computer architecture evolution What is emerging and why is it emerging?

6 15-447 Computer ArchitectureFall 2008 © 6 Uniprocessor Performance (SPECint) VAX : 25%/year 1978 to 1986 RISC + x86: 52%/year 1986 to 2002 RISC + x86: ??%/year 2002 to present From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, Sept. 15, 2006  Sea change in chip design—what is emerging? 3X ??%/year

7 15-447 Computer ArchitectureFall 2008 © 7 How did we get there? First, what allowed the ridiculous 52% improvement per year to continue for around 20 years? –If cars improved as much we would have 1 million Km/hr cars! Is it just the number of transistors/clock rate? No! Its also all the stuff that we’ve been learning about!

8 15-447 Computer ArchitectureFall 2008 © 8 Walk down memory lane What was the first processor organization we looked at? –Single cycle processors How did multi-cycle processors improve those? What did we do after that to improve performance? –Pipelining; why does that help? What are the limitations? From there we discussed superscalar architectures –Out of order execution; multiple ALUs –This is basically state of the art in uniprocessors –What gave us problems there?

9 15-447 Computer ArchitectureFall 2008 © 9 Detour: couple of other design points Very Large Instruction Word Architectures; let the compiler do the work Great for energy efficiency—less Instruction Level Parallelism Not binary compatible? Trasnmeta Crusoe Processor

10 15-447 Computer ArchitectureFall 2008 © 10 SIMD ISA Extensions—Parallelism from the Data? Same Instruction applied to multiple Data at the same time –How can this help? MMX (Intel) and 3DNow! (AMD) ISA extensions Great for graphics; originally invented for scientific codes (vector processors) –Not a general solution End of detour!

11 15-447 Computer ArchitectureFall 2008 © 11 Back to Moore’s law Why are the “good times” over? –Three walls 1.“Instruction Level Parallelism” (ILP) Wall –Less parallelism available in programs (2->4->8->16) –Tremendous increase in complexity to get more –Does VLIW help? –What can help? –Conclusion: standard architectures cannot continue to do their part of sustaining Moore’s law

12 15-447 Computer ArchitectureFall 2008 © 12 Wall 2: Memory Wall What did we do to help this? –Still very very expensive to access memory How do we see the impact in practice? Very different from when I learned architecture! µProc 52%/yr. (2X/1.5yr) DRAM 9%/yr. (2X/10 yrs) 1 10 100 1000 19801981198319841985198619871988198919901991199219931994199519961997199819992000 DRAM CPU 1982 Processor-Memory Performance Gap: (grows 50% / year) Performance “Moore’s Law”

13 15-447 Computer ArchitectureFall 2008 © 13 Ways out? Multithreaded Processors Can we switch to other threads if we need to access memory? –When do we need to access memory? What support is needed? Can I use it to help with the ILP wall as well?

14 15-447 Computer ArchitectureFall 2008 © 14 Symmetric Multithreaded Processors How do I switch between threads? Hardware support for that How does this help? But, increased contention for everything (BW, TLB, caches…)

15 15-447 Computer ArchitectureFall 2008 © 15 Third Wall: Physics/Power wall We’re down to the level of playing with a few atoms More error prone; lower yield But also soft-errors and wear out –Logic that sometimes works! –Can we do something in architecture to recover?

16 15-447 Computer ArchitectureFall 2008 © 16 Power! Our topic next class

17 15-447 Computer ArchitectureFall 2008 © 17 So, what is our way out? Any ideas? Maybe architecture becomes commodity; this is the best we can do –This happens to a lot of technologies: why don’t we have the million km/hr car? Do we actually need more processing power? –8 bit embedded processors good enough for calculators; 4 bit ones probably good enough for elevators –Is there any sense to continue investing so much time and energy into this stuff? Power Wall + Memory Wall + ILP Wall = Brick Wall

18 15-447 Computer ArchitectureFall 2008 © 18 A lifeline? Multi-core architectures How does this help? Think of the three walls The new Moore’s law: –the number of cores will double every 3 years! –Many-core architectures

19 15-447 Computer ArchitectureFall 2008 © 19 Overcoming the three walls ILP Wall? –Don’t need to restrict myself to a single thread –Natural parallelism available across threads/programs Memory wall? –Hmm, that is a tough one; on the surface, seems like we made it worse –Maybe help coming from industry Physics/power wall? –Use less aggressive core technology Simpler processors, shallower pipelines But more processors –Throw-away cores to improve yield Do you buy it?

20 15-447 Computer ArchitectureFall 2008 © 20 7 Questions for Parallelism Applications: 1. What are the apps? 2. What are kernels of apps? Hardware: 3. What are the HW building blocks? 4. How to connect them? Programming Models: 5. How to describe apps and kernels? 6. How to program the HW? Evaluation: 7. How to measure success? (Inspired by a view of the Golden Gate Bridge from Berkeley)

21 15-447 Computer ArchitectureFall 2008 © 21 Sea Change in Chip Design Intel 4004 (1971): 4-bit processor, 2312 transistors, 0.4 MHz, 10 micron PMOS, 11 mm 2 chip Processor is the new transistor! RISC II (1983): 32-bit, 5 stage pipeline, 40,760 transistors, 3 MHz, 3 micron NMOS, 60 mm 2 chip 125 mm 2 chip, 0.065 micron CMOS = 2312 RISC II+FPU+Icache+Dcache –RISC II shrinks to  0.02 mm 2 at 65 nm

22 15-447 Computer ArchitectureFall 2008 © 22 Architecture Design space What should each core look like? Should all cores look the same? How should the chip interconnect between them look? What level of the cache should they share? –And what are the implications of that? Are there new security issues? –Side channel attacks; denial of service attacks Many other questions… Brand new playground; exciting time to do architecture research

23 15-447 Computer ArchitectureFall 2008 © 23 Hardware Building Blocks: Small is Beautiful Given difficulty of design/validation of large designs Given power limits what can build, parallel is energy efficient way to achieve performance –Lower threshold voltage means much lower power Given redundant processors can improve chip yield –Cisco Metro 188 processors + 4 spares –Sun Niagara sells 6 or 8 processor version Expect modestly pipelined (5- to 9-stage) CPUs, FPUs, vector, SIMD PEs One size fits all? –Amdahl’s Law  a few fast cores + many small cores

24 15-447 Computer ArchitectureFall 2008 © 24 Elephant in the room We tried this parallel processing thing before –Very difficult It failed, pretty much –A lot of academic progress and neat algorithms, but little impact commercially We actually have to do new programming –A lot of effort to develop; error prone; etc.. –La-Z-boy programming era is over –Need new programming models Amdahl’s law Applications: What will you use 1024 cores for? These concerns are being voiced by a substantial segment of academia/industry –What do you think? –Its coming, no matter what


Download ppt "15-447 Computer ArchitectureFall 2008 © November 19, 2007 Nael Abu-Ghazaleh Lecture 26 Emerging."

Similar presentations


Ads by Google