Lecture 3: Evolution of the Uniprocessor

Slides:

Advertisements

Similar presentations

Computer architecture

Advertisements

Computer Architecture Instruction-Level Parallel Processors

Instruction-Level Parallel Processors {Objective: executing two or more instructions in parallel} 4.1 Evolution and overview of ILP-processors 4.2 Dependencies.

IMPACT Second Generation EPIC Architecture Wen-mei Hwu IMPACT Second Generation EPIC Architecture Wen-mei Hwu Department of Electrical and Computer Engineering.

Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed System Architectures.

Is SC + ILP = RC? Presented by Vamshi Kadaru Chris Gniady, Babak Falsafi, and T. N. VijayKumar - Purdue University Spring 2005: CS 7968 Parallel Computer.

Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.

CS 300 – Lecture 23 Intro to Computer Architecture / Assembly Language Virtual Memory Pipelining.

The Processor Data Path & Control Chapter 5 Part 4 – Exception Handling N. Guydosh 3/1/04+

Mechanisms: Run-time and Compile-time. Outline Agents of Evolution Run-time Branch prediction Trace Cache SMT, SSMT The memory problem (L2 misses) Compile-time.

Very Long Instruction Word (VLIW) Architecture. VLIW Machine It consists of many functional units connected to a large central register file Each functional.

CHAPTER 2: COMPUTER-SYSTEM STRUCTURES Computer system operation Computer system operation I/O structure I/O structure Storage structure Storage structure.

Chapter 2: Computer-System Structures

CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION

Chapter 2: Computer-System Structures 2.1 Computer System Operation 2.5 Hardware Protection 2.6 Network Structure.

Performance of mathematical software Agner Fog Technical University of Denmark

Rethinking Computer Architecture Wen-mei Hwu University of Illinois, Urbana-Champaign Celebrating September 19, 2014.

Super computers Parallel Processing By Lecturer: Aisha Dawood.

Lecture 14 Today’s topics MARIE Architecture Registers Buses

Processor Architecture

Computer Architecture: Out-of-Order Execution II

Lecture 3: Computer Architectures

Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage.

SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU.

Samira Khan University of Virginia Feb 9, 2016 COMPUTER ARCHITECTURE CS 6354 Precise Exception The content and concept of this course are adapted from.

Computer Architecture: Dataflow (Part I) Prof. Onur Mutlu Carnegie Mellon University.

Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.

Introduction to Parallel Processing

Flynn’s Taxonomy Many attempts have been made to come up with a way to categorize computer architectures. Flynn’s Taxonomy has been the most enduring of.

Chapter 2: Computer-System Structures

15-740/ Computer Architecture Lecture 3: Performance

CS501 Advanced Computer Architecture

Basic Processor Structure/design

Precise Exceptions and Out-of-Order Execution

Distributed Processors

buses, crossing switch, multistage network.

Prof. Onur Mutlu Carnegie Mellon University

UNIT – Microcontroller.

Fall 2012 Parallel Computer Architecture Lecture 22: Dataflow I

Embedded Systems Design

Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 2/16/2015

CIS-550 Advanced Computer Architecture Lecture 10: Precise Exceptions

ECE 353 Lab 3 Pipeline Simulator

Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 2/21/2014

Vector Processing => Multimedia

Instruction cycle Instruction: A command given to the microprocessor to perform an operation Program : A set of instructions given in a sequential.

Single Thread Parallelism

Single Thread Parallelism

Computer Architecture Dataflow (Part II) and Systolic Arrays

Lesson Objectives Aims Understand how machine code is generated

Control Unit Introduction Types Comparison Control Memory

15-740/ Computer Architecture Lecture 5: Precise Exceptions

Coe818 Advanced Computer Architecture

CSE 370 – Winter Sequential Logic - 1

buses, crossing switch, multistage network.

The University of Texas at Austin

Samira Khan University of Virginia Sep 5, 2018

Mattan Erez The University of Texas at Austin

AN INTRODUCTION ON PARALLEL PROCESSING

Part 2: Parallel Models (I)

Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/30/2011

15-740/ Computer Architecture Lecture 10: Out-of-Order Execution

Mattan Erez The University of Texas at Austin

Prof. Onur Mutlu Carnegie Mellon University

Chapter 2: Computer-System Structures

Chapter 2: Computer-System Structures

Prof. Onur Mutlu Carnegie Mellon University

Prof. Onur Mutlu Carnegie Mellon University

Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.

Presentation transcript:

Lecture 3: Evolution of the Uniprocessor SIMD VLIW DAE HPS Data Flow

The HPS Paradigm Incorporated the following: Aggressive branch prediction Speculative execution Wide issue Out-of-order execution In-order retirement First published in Micro-18 (1985) Patt, Hwu, Shebanow: Introduction to HPS Patt, Melvin, Hwu, Shebanow: Critical issues

The final flexibility: Data Flow

Characteristics of Data Flow Data Driven execution of inst-level Graphical code Nodes are operators Arcs are I/O Only REAL dependencies constrain processing Anti-dependencies don’t (write-after-read) Output dependencies don’t (write-after-write) NO sequential I-stream (No program counter) Operations execute ASYNCHRONOUSLY Instructions do not reference memory (at least memory as we understand it) Execution is triggered by presence of data

Why Data Flow? Why NOT Data Flow? Irregular Parallelism Only real dependencies constrain Anti-dependencies don’t Sequential I-stream doesn’t (No Program Counter) Asynchronous (NOT in the sense of un-clocked) Why NOT Data Flow? In one word: too much architectural state

Specific Problems with Data Flow Code bloat (everything is explicit) Code debug (asynchronous, non-deterministic flow) Interrupt handling (providing a consistent state) Machine diagnosis (trouble-shooting design errors) Recursion Large data structures Node table overflow (cost of CAM is high) Performance caveats Array copy operations Long latencies Network (bypass paths) congestion