Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory.

Slides:



Advertisements
Similar presentations
Instruction Level Parallelism and Superscalar Processors
Advertisements

Machine cycle.
Computer Organization and Architecture
Computer architecture
UQ: Explain in brief integer instruction pipeline stages of Pentium
CSCI 4717/5717 Computer Architecture
RISC and Pipelining Prof. Sin-Min Lee Department of Computer Science.
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
Superscalar processors Review. Dependence graph S1S2 Nodes: instructions Edges: ordered relations among the instructions Any ordering-based transformation.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture VLIW Steve Ko Computer Sciences and Engineering University at Buffalo.
Fall EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Computer Organization Lecture 20 Pipelining: “bucket brigade” MIPS.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 3 (and Appendix C) Instruction-Level Parallelism and Its Exploitation Computer Architecture.
Computer Systems Organization
1 COMP541 Sequencing – III (Sequencing a Computer) Montek Singh April 9, 2007.
Pipelined Processor II CPSC 321 Andreas Klappenecker.
S. Barua – CPSC 440 CHAPTER 5 THE PROCESSOR: DATAPATH AND CONTROL Goals – Understand how the various.
Organization of a Simple Computer. Computer Systems Organization  The CPU (Central Processing Unit) is the “brain” of the computer. Fetches instructions.
Prince Sultan College For Woman
Instruction Sets and Pipelining Cover basics of instruction set types and fundamental ideas of pipelining Later in the course we will go into more depth.
Basic Microcomputer Design. Inside the CPU Registers – storage locations Control Unit (CU) – coordinates the sequencing of steps involved in executing.
Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.
What have mr aldred’s dirty clothes got to do with the cpu
RISC By Ryan Aldana. Agenda Brief Overview of RISC and CISC Features of RISC Instruction Pipeline Register Windowing and renaming Data Conflicts Branch.
Organization of a Simple Computer The organization of a simple computer with one CPU and two I/O devices.
CS Computer Architecture Section 600 Dr. Angela Guercio Fall 2010.
The Microarchitecture Level
Computer Organization and Architecture Tutorial 1 Kenneth Lee.
Instruction Level Parallelism Pipeline with data forwarding and accelerated branch Loop Unrolling Multiple Issue -- Multiple functional Units Static vs.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
1 Computer Architecture Part II-B: CPU Instruction Set.
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
Sogang University Advanced Computing System Chap 2. Processor Technology Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved The Microarchitecture Level.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
Computer performance issues* Pipelines, Parallelism. Process and Threads.
Processor Types And Instruction sets Chapter- 5.
EKT303/4 Superscalar vs Super-pipelined.
EECS 322 March 18, 2000 RISC - Reduced Instruction Set Computer Reduced Instruction Set Computer  By reducing the number of instructions that a processor.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
RISC / CISC Architecture by Derek Ng. Overview CISC Architecture RISC Architecture  Pipelining RISC vs CISC.
Advanced Pipelining 7.1 – 7.5. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.
New-School Machine Structures Parallel Requests Assigned to computer e.g., Search “Katz” Parallel Threads Assigned to core e.g., Lookup, Ads Parallel Instructions.
STUDY OF PIC MICROCONTROLLERS.. Design Flow C CODE Hex File Assembly Code Compiler Assembler Chip Programming.
Computer Organization
Instruction Level Parallelism
How do we evaluate computer architectures?
Visit for more Learning Resources
William Stallings Computer Organization and Architecture 8th Edition
The University of Adelaide, School of Computer Science
5.2 Eleven Advanced Optimizations of Cache Performance
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Single Clock Datapath With Control
Processor Pipelining Yasser Mohammad.
Computer Architecture
Instruction Level Parallelism and Superscalar Processors
Superscalar Processors & VLIW Processors
Superscalar Pipelines Part 2
Instruction Execution Cycle
Computer Architecture
Created by Vivi Sahfitri
Morgan Kaufmann Publishers The Processor
Presentation transcript:

Part 2

Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved A five-level memory hierarchy. Note cost vs. size.

1. All instructions are directly executed by hardware. 2. Maximize the rate at which instructions are issued. 3. Instructions should be easy to decode. 4. Only loads and stores should reference memory. 5. Provide many registers.

1. All instructions are directly executed by hardware.  Eliminate the microcode interpreter

2. Maximize the rate at which instructions are issued.  If you issue 500 MIPS, you have a 500 MIPS machine.  Parallelism

3. Instructions should be easy to decode.  Made possible by regular, fixed-length instructions w/ a small number of fields.  Fewer instructions are better.  Fewer instruction formats are better.

4. Only loads and stores should reference memory.  Memory access takes a long time.  Most instructions should use registers.  Separate ops for load & store.  can be done in parallel

5. Provide many registers.  At least 32!  Time consuming to have to save registers temporarily and reload them later.

 Ways to increase speed: a. increase the clock speed b. parallelism types: 1. processor/core level 2. instruction level

 Fetching instruction from memory is slow.  So use a Prefetch Buffer = set of registers (memory) containing instructions to be executed.  Fetch and execution can now be done in parallel!

Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved  A five-stage pipeline  The state of each stage as a function of time. Nine clock cycles are illustrated.

 Latency = time to execute instruction  Bandwidth = MIPS (instructions per second – typically in millions)  Cycle time = time to move through 1 stage of the pipeline = clock rate = clock cycle

Problem: Let the clock rate = 3 nsec/stage and the execution of each instruction requires 6 stages or steps. a. What is the bandwidth in MIPS for a machine without any pipeline (i.e., without any instruction- level parallelism)? b. What is the bandwidth in MIPS for a machine with a pipeline?

Problem: Let the clock rate = 3 nsec/stage and the execution of each instruction requires 6 stages or steps. a. What is the bandwidth in MIPS for a machine without any pipeline (i.e., without any instruction- level parallelism)? 6 stages/inst x 3x10 -9 sec/stage = 18x10 -9 sec/inst 1 inst/18x10 -9 sec = 56 MIPS

Problem: Let the clock rate = 3 nsec/stage and the execution of each instruction requires 6 stages or steps. a. What is the bandwidth in MIPS for a machine without any pipeline (i.e., without any instruction- level parallelism)? 6 stages/inst x 3x10 -9 sec/stage = 18x10 -9 sec/inst 1 inst/18x10 -9 sec = 56 MIPS b. What is the bandwidth in MIPS for a machine with a pipeline?

Problem: Let the clock rate = 3 nsec/stage and the execution of each instruction requires 6 stages or steps. a. What is the bandwidth in MIPS for a machine without any pipeline (i.e., without any instruction- level parallelism)? 6 stages/inst x 3x10 -9 sec/stage = 18x10 -9 sec/inst 1 inst/18x10 -9 sec = 56 MIPS b. What is the bandwidth in MIPS for a machine with a pipeline? 3x10 -9 sec/inst 1 inst/3x10 -9 sec = 333 MIPS

Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved Dual five-stage pipelines with a common instruction fetch unit. fetches pairs of instructions

Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved Note: Since 2 inst can be executed at the same time (S4), they must not conflict over resource usage (e.g., register) and neither must depend on the result of the other. How can we insure this?

Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved Note: Since 2 inst can be executed at the same time (S4), they must not conflict over resource usage (e.g., register) and neither must depend on the result of the other. How can we insure this? (1) hardware, (2) compiler

 386 – no pipeline  486 – one pipeline  first generation Pentium  two 5-stage pipelines: 1. u pipeline - can execute any instruction 2. v pipeline – limited; only integer instructions or FXCH  P4 – 20 stages  “The later "Prescott" and "Cedar Mill" Pentium 4 cores (and their Pentium D derivatives) had a 31-stage pipeline, the longest in mainstream consumer computing.”  Nehalem (16 pipeline stages), Enhanced Core, and Sandy Bridge microachitecture (next few slides; see architectures-optimization-manual.pdf) architectures-optimization-manual.pdf

Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved A superscalar processor with five functional units. S3 issued every clock cycle S4 may require more than 1 clock cycle