Lecture 3: Evolution of the Uniprocessor

Slides:



Advertisements
Similar presentations
Computer architecture
Advertisements

Computer Architecture Instruction-Level Parallel Processors
Instruction-Level Parallel Processors {Objective: executing two or more instructions in parallel} 4.1 Evolution and overview of ILP-processors 4.2 Dependencies.
IMPACT Second Generation EPIC Architecture Wen-mei Hwu IMPACT Second Generation EPIC Architecture Wen-mei Hwu Department of Electrical and Computer Engineering.
Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed System Architectures.
Is SC + ILP = RC? Presented by Vamshi Kadaru Chris Gniady, Babak Falsafi, and T. N. VijayKumar - Purdue University Spring 2005: CS 7968 Parallel Computer.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
CS 300 – Lecture 23 Intro to Computer Architecture / Assembly Language Virtual Memory Pipelining.
The Processor Data Path & Control Chapter 5 Part 4 – Exception Handling N. Guydosh 3/1/04+
Mechanisms: Run-time and Compile-time. Outline Agents of Evolution Run-time Branch prediction Trace Cache SMT, SSMT The memory problem (L2 misses) Compile-time.
Very Long Instruction Word (VLIW) Architecture. VLIW Machine It consists of many functional units connected to a large central register file Each functional.
CHAPTER 2: COMPUTER-SYSTEM STRUCTURES Computer system operation Computer system operation I/O structure I/O structure Storage structure Storage structure.
Chapter 2: Computer-System Structures
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
Chapter 2: Computer-System Structures 2.1 Computer System Operation 2.5 Hardware Protection 2.6 Network Structure.
Performance of mathematical software Agner Fog Technical University of Denmark
Rethinking Computer Architecture Wen-mei Hwu University of Illinois, Urbana-Champaign Celebrating September 19, 2014.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
Lecture 14 Today’s topics MARIE Architecture Registers Buses
Processor Architecture
Computer Architecture: Out-of-Order Execution II
Lecture 3: Computer Architectures
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage.
SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU.
Samira Khan University of Virginia Feb 9, 2016 COMPUTER ARCHITECTURE CS 6354 Precise Exception The content and concept of this course are adapted from.
Computer Architecture: Dataflow (Part I) Prof. Onur Mutlu Carnegie Mellon University.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Introduction to Parallel Processing
Flynn’s Taxonomy Many attempts have been made to come up with a way to categorize computer architectures. Flynn’s Taxonomy has been the most enduring of.
Chapter 2: Computer-System Structures
15-740/ Computer Architecture Lecture 3: Performance
CS501 Advanced Computer Architecture
Basic Processor Structure/design
Precise Exceptions and Out-of-Order Execution
Distributed Processors
buses, crossing switch, multistage network.
Prof. Onur Mutlu Carnegie Mellon University
UNIT – Microcontroller.
Fall 2012 Parallel Computer Architecture Lecture 22: Dataflow I
Embedded Systems Design
Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 2/16/2015
CIS-550 Advanced Computer Architecture Lecture 10: Precise Exceptions
ECE 353 Lab 3 Pipeline Simulator
Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 2/21/2014
Vector Processing => Multimedia
Instruction cycle Instruction: A command given to the microprocessor to perform an operation Program : A set of instructions given in a sequential.
Single Thread Parallelism
Single Thread Parallelism
Computer Architecture Dataflow (Part II) and Systolic Arrays
Lesson Objectives Aims Understand how machine code is generated
Control Unit Introduction Types Comparison Control Memory
15-740/ Computer Architecture Lecture 5: Precise Exceptions
Coe818 Advanced Computer Architecture
CSE 370 – Winter Sequential Logic - 1
buses, crossing switch, multistage network.
The University of Texas at Austin
Samira Khan University of Virginia Sep 5, 2018
Mattan Erez The University of Texas at Austin
AN INTRODUCTION ON PARALLEL PROCESSING
Part 2: Parallel Models (I)
Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/30/2011
15-740/ Computer Architecture Lecture 10: Out-of-Order Execution
Mattan Erez The University of Texas at Austin
Prof. Onur Mutlu Carnegie Mellon University
Chapter 2: Computer-System Structures
Chapter 2: Computer-System Structures
Prof. Onur Mutlu Carnegie Mellon University
Prof. Onur Mutlu Carnegie Mellon University
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Presentation transcript:

Lecture 3: Evolution of the Uniprocessor SIMD VLIW DAE HPS Data Flow

The HPS Paradigm Incorporated the following: Aggressive branch prediction Speculative execution Wide issue Out-of-order execution In-order retirement First published in Micro-18 (1985) Patt, Hwu, Shebanow: Introduction to HPS Patt, Melvin, Hwu, Shebanow: Critical issues

The final flexibility: Data Flow

Characteristics of Data Flow Data Driven execution of inst-level Graphical code Nodes are operators Arcs are I/O Only REAL dependencies constrain processing Anti-dependencies don’t (write-after-read) Output dependencies don’t (write-after-write) NO sequential I-stream (No program counter) Operations execute ASYNCHRONOUSLY Instructions do not reference memory (at least memory as we understand it) Execution is triggered by presence of data

Why Data Flow? Why NOT Data Flow? Irregular Parallelism Only real dependencies constrain Anti-dependencies don’t Sequential I-stream doesn’t (No Program Counter) Asynchronous (NOT in the sense of un-clocked) Why NOT Data Flow? In one word: too much architectural state

Specific Problems with Data Flow Code bloat (everything is explicit) Code debug (asynchronous, non-deterministic flow) Interrupt handling (providing a consistent state) Machine diagnosis (trouble-shooting design errors) Recursion Large data structures Node table overflow (cost of CAM is high) Performance caveats Array copy operations Long latencies Network (bypass paths) congestion