1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah

Slides:



Advertisements
Similar presentations
1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
Advertisements

The Central Processing Unit: What Goes on Inside the Computer.
CSC457 Seminar YongKang Zhu December 6 th, 2001 About Network Processor.
Computer Architecture & Organization
CS 7810 Lecture 25 DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design T. Austin Proceedings of MICRO-32 November 1999.
Lecture 2: Modern Trends 1. 2 Microprocessor Performance Only 7% improvement in memory performance every year! 50% improvement in microprocessor performance.
Single-Chip Multiprocessor Nirmal Andrews. Case for single chip multiprocessors Advances in the field of integrated chip processing. - Gate density (More.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.
High Performance Computer Architecture Challenges Rajeev Balasubramonian School of Computing, University of Utah.
Chapter Hardwired vs Microprogrammed Control Multithreading
Cluster Prefetch: Tolerating On-Chip Wire Delays in Clustered Microarchitectures Rajeev Balasubramonian School of Computing, University of Utah July 1.
Dynamic Management of Microarchitecture Resources in Future Processors Rajeev Balasubramonian Dept. of Computer Science, University of Rochester.
1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah.
1 Lecture 25: Multi-core Processors Today’s topics:  Writing parallel programs  SMT  Multi-core examples Reminder:  Assignment 9 due Tuesday.
Multi-core processors. History In the early 1970’s the first Microprocessor was developed by Intel. It was a 4 bit machine that was named the 4004 The.
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
Chapter 18 Multicore Computers
Writer:-Rashedul Hasan Editor:- Jasim Uddin
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Lecture#14. Last Lecture Summary Memory Address, size What memory stores OS, Application programs, Data, Instructions Types of Memory Non Volatile and.
Guide to Operating Systems, 4th ed.
Practical PC, 7th Edition Chapter 17: Looking Under the Hood
Computer Architecture Challenges Shriniwas Gadage.
2007 Sept 06SYSC 2001* - Fall SYSC2001-Ch1.ppt1 Computer Architecture & Organization  Instruction set, number of bits used for data representation,
Computer Architecture and Organization Introduction.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
1 CS/EE 6810: Computer Architecture Class format:  Most lectures on YouTube *BEFORE* class  Use class time for discussions, clarifications, problem-solving,
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Chapter 1 Introduction. Architecture & Organization 1 Architecture is those attributes visible to the programmer —Instruction set, number of bits used.
Advanced Computer Architecture 0 Lecture # 1 Introduction by Husnain Sherazi.
1 Computer Architecture Research Overview Focus on: Transactional Memory Rajeev Balasubramonian School of Computing, University of Utah
Lecture 1 1 Computer Systems Architecture Lecture 1: What is Computer Architecture?
Computer Organization and Design Computer Abstractions and Technology
Outline  Over view  Design  Performance  Advantages and disadvantages  Examples  Conclusion  Bibliography.
Computer Architecture 2 nd year (computer and Information Sc.)
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Chapter 5: Computer Systems Design and Organization Dr Mohamed Menacer Taibah University
Computer performance issues* Pipelines, Parallelism. Process and Threads.
Computer Architecture Lecture 26 Past and Future Ralph Grishman November 2015 NYU.
Von Neumann Computers Article Authors: Rudolf Eigenman & David Lilja
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
Succeeding with Technology Chapter 2 Hardware Designed to Meet the Need The Digital Revolution Integrated Circuits and Processing Storage Input, Output,
EECS 322 March 18, 2000 RISC - Reduced Instruction Set Computer Reduced Instruction Set Computer  By reducing the number of instructions that a processor.
An Integrated GPU Power and Performance Model (ISCA’10, June 19–23, 2010, Saint-Malo, France. International Symposium on Computer Architecture)
Sunpyo Hong, Hyesoon Kim
On the Importance of Optimizing the Configuration of Stream Prefetches Ilya Ganusov Martin Burtscher Computer Systems Laboratory Cornell University.
1/50 University of Turkish Aeronautical Association Computer Engineering Department Ceng 541 Introduction to Parallel Computing Dr. Tansel Dökeroğlu
Overview of microcomputer structure and operation
CPU (Central Processing Unit). The CPU is the brain of the computer. Sometimes referred to simply as the processor or central processor, the CPU is where.
PipeliningPipelining Computer Architecture (Fall 2006)
MAHARANA PRATAP COLLEGE OF TECHNOLOGY SEMINAR ON- COMPUTER PROCESSOR SUBJECT CODE: CS-307 Branch-CSE Sem- 3 rd SUBMITTED TO SUBMITTED BY.
Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.
CPU Central Processing Unit
Lynn Choi School of Electrical Engineering
Edexcel GCSE Computer Science Topic 15 - The Processor (CPU)
Lynn Choi School of Electrical Engineering
Embedded Systems Design
Assembly Language for Intel-Based Computers, 5th Edition
The University of Adelaide, School of Computer Science
Guide to Operating Systems, 5th Edition
Architecture & Organization 1
Hyperthreading Technology
Lecture 2: Performance Today’s topics: Technology wrap-up
Architecture & Organization 1
ECEG-3202 Computer Architecture and Organization
CS/EE 6810: Computer Architecture
A High Performance SoC: PkunityTM
ECEG-3202 Computer Architecture and Organization
Computer Evolution and Performance
Presentation transcript:

1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah

2 What is Computer Architecture?

3 If the Intel Pentium4 has a faster clock speed than the IBM Power4, does it execute your programs faster?

4 What is Computer Architecture? If the Intel Pentium4 has a faster clock speed than the IBM Power4, does it execute your programs faster? Completing instruction Clock tick Case 1: Case 2: Time

5 What is Computer Architecture? To a large extent, computer architecture determines: the number of instructions used to execute a program the time each instruction takes to execute the idle cycles when no work gets done the number of instructions that can execute in parallel

6 A Typical Microprocessor Branch Predictor Decode & Rename Issue Logic ALU L2 Cache L1 Instr Cache L1 Data Cache Register File

7 Architecture Trends in the 90s Performance was the ultimate metric Transistors were a limiting factor As on-chip transistors became available in the 90s, more functionality and complex circuitry was added to boost performance – most of the low-hanging fruit has now been picked

8 Hitting the Wall We have now hit the following walls: Single core performance Memory Complexity Power, temperature

9 Hitting the Power Wall Power is as important a metric today as performance From Shekhar Borkar, MICRO’99

10 The Advent of Multi-Core Chips In the past, performance magically increased by 50% every year In the future, this improvement will be only ~20% every year … unless … the application is multi-threaded! Core Cache bank

11 Upcoming Architecture Challenges Improving single core performance Functionalities in multi-core chips Simplifying the programmer’s task Efficient interconnects Power and temperature-efficient designs Designs tolerant of errors For publications, see

12 Interconnects as a Bottleneck In the past, on-chip data transmission on wires cost almost nothing Interconnect speed and power has been improving, but not at the same rate as transistor speeds Hence, relative to computation, communication is much more expensive In the near future, it will take 100 cycles to travel across the chip 50% of chip power can be attributed to interconnects

13 Interconnects in Multi-Core Chips A L1 A CPU 3 CPU 1CPU 2 L2 cache L2 control AA A A A L2 control

14 Not all Wires are Created Equal B-WiresL-WiresW-WiresPW-Wires Relative latency 1x 0.5x 1.6x 3.2x Relative area 1x 4x 0.5x 0.5x Dynamic power (W/m) 2.65  1.46  2.9  0.87  Static Power (W/m)

15 Data Transfers have Varying Needs Example of a cache coherence transaction: Read exclusive request for a shared block

16 Other Interconnect Choices Optical interconnects: speed of light, cost in converting between optical and electrical domains 3D chips: reduces communication distances, low cost for vertical signal transmission, increase in power density

17 3D Layouts Cluster (a) Arch-1 (cache-on-cluster)(b) Arch-2 (cluster on cluster)(c) Arch-3 (staggered) Cache bankIntra-die horizontal wireInter-die vertical wire Die 1 Die 0

18 Upcoming Architecture Challenges Improving single core performance Functionalities in multi-core chips Simplifying the programmer’s task Efficient interconnects Power and temperature-efficient designs Designs tolerant of errors Clustered architectures: relatively low complexity scalable solution easily handles multiple threads

19 Upcoming Architecture Challenges Improving single core performance Functionalities in multi-core chips Simplifying the programmer’s task Efficient interconnects Power and temperature-efficient designs Designs tolerant of errors Heterogeneous perf/power Cores that execute the OS Cores that verify results

20 Upcoming Architecture Challenges Improving single core performance Functionalities in multi-core chips Simplifying the programmer’s task Efficient interconnects Power and temperature-efficient designs Designs tolerant of errors Hardware to support transactional memory

21 Upcoming Architecture Challenges Improving single core performance Functionalities in multi-core chips Simplifying the programmer’s task Efficient interconnects Power and temperature-efficient designs Designs tolerant of errors Faults are caused by high energy particles that deposit enough charge to toggle bits Variations in conditions may cause a circuit to not produce its result in time

22 Research Methodologies It’s all about the simulators! Simplescalar & Wattch & Hotspot: about 10,000 lines of C code that models the flow of instructions through a modern processor Inputs: configuration file that specifies processor parameters, benchmark program (say, gzip) Outputs: how long the program runs on the simulated processor (Simplescalar), how much power is consumed (Wattch), what is the peak temperature (Hotspot)

23 Evaluating a New Idea Lots of reading (it’s better than waiting for divine inspiration) Identify bottlenecks, identify problems, develop an idea, repeatedly question that idea Understand simulator Engineer a solution, modify simulator code (perhaps, write fewer than 1000 lines of C code) Analyze data (things never work the first time), engineer/optimize/debug your solution Write papers Implement in silicon?

24 To Learn More… CS/EE 3810: Computer Organization CS/EE 6810: Computer Architecture CS/EE 7810: Advanced Computer Architecture CS/EE 7820: Parallel Computer Architecture CS 7937 / 7940: Architecture Reading Seminar

25 Title Bullet