New-School Machine Structures Parallel Requests Assigned to computer e.g., Search “Katz” Parallel Threads Assigned to core e.g., Lookup, Ads Parallel Instructions.

Slides:

Advertisements

Similar presentations

CS1104: Computer Organisation School of Computing National University of Singapore.

Advertisements

CMPT 334 Computer Organization

Computer Organization and Architecture

Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.

CS61C L24 Introduction to CPU Design (1) Garcia, Spring 2007 © UCB Cell pic to web site  A new MS app lets people search the web based on a digital cell.

Processor Technology and Architecture

CS61C L23 Synchronous Digital Systems (1) Garcia, Fall 2011 © UCB Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C.

CS61C L18 Introduction to CPU Design (1) Beamer, Summer 2007 © UCB Scott Beamer, Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture.

1 Lecture 2: MIPS Instruction Set Today’s topic:  MIPS instructions Reminder: sign up for the mailing list cs3810 Reminder: set up your CADE accounts.

Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.

CPEN Digital System Design Chapter 10 – Instruction SET Architecture (ISA) © Logic and Computer Design Fundamentals, 4 rd Ed., Mano Prentice Hall.

CS61C L24 Introduction to CPU Design (1) Garcia, Fall 2006 © UCB Fedora Core 6 (FC6) just out  The latest version of the distro has been released; they.

Inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 24 Introduction to CPU design Stanford researchers developing 3D camera.

Lecture 16: Basic CPU Design

Shift Instructions (1/4)

CS61C L20 Datapath © UC Regents 1 CS61C - Machine Structures Lecture 20 - Datapath November 8, 2000 David Patterson

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 27: Single-Cycle CPU Datapath Design Instructor: Sr Lecturer SOE Dan Garcia

CENG311 Computer Architecture Kayhan Erciyes. CS231 Assembly language and Digital Circuits Instructor:Kayhan Erciyes Office:

Computer Architecture ECE 4801 Berk Sunar Erkay Savas.

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

CS61C L23 Synchronous Digital Systems (1) Garcia, Fall 2011 © UCB Senior Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.

IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.

Computer Systems Organization CS 1428 Foundations of Computer Science.

Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.

COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections

CS1104 – Computer Organization PART 2: Computer Architecture Lecture 12 Overview and Concluding Remarks.

Computer Organization and Design Computer Abstractions and Technology

CSCI 211 Intro Computer Organization –Consists of gates for logic And Or Not –Processor –Memory –I/O interface.

Computer Architecture CPSC 350

Computer Architecture CSE 3322 Lecture 2 NO CLASS MON Sept 1 Course WEB SITE crystal.uta.edu/~jpatters.

CDA 3101 Fall 2013 Introduction to Computer Organization

CHAPTER 6 Instruction Set Architecture 12/7/

ECE 15B Computer Organization Spring 2011 Dmitri Strukov Partially adapted from Computer Organization and Design, 4 th edition, Patterson and Hennessy,

Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.

DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO CS 219 Computer Organization.

CS61C L20 Datapath © UC Regents 1 Microprocessor James Tan Adapted from D. Patterson’s CS61C Copyright 2000.

Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.

MIPS Processor.

1 TM 1 Embedded Systems Lab./Honam University ARM Microprocessor Programming Model.

Computer Operation. Binary Codes CPU operates in binary codes Representation of values in binary codes Instructions to CPU in binary codes Addresses in.

Simulator Outline of MIPS Simulator project  Write a simulator for the MIPS five-stage pipeline that does the following: Implements a subset of.

CS 61C: Great Ideas in Computer Architecture MIPS Datapath 1 Instructors: Nicholas Weaver & Vladimir Stojanovic

CPU Design - Datapath. Review Use muxes to select among input S input bits selects 2 S inputs Each input can be n-bits wide, indep of S Can implement.

Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:

Computers’ Basic Organization

Instruction Set Architecture

Stateless Combinational Logic and State Circuits

Morgan Kaufmann Publishers

Performance of Single-cycle Design

CS/COE0447 Computer Organization & Assembly Language

Computer Architecture CSCE 350

Single-Cycle DataPath

CSCE 212 Chapter 5 The Processor: Datapath and Control

Instructions - Type and Format

Yuanqing Cheng Great Ideas in Computer Architecture Lecture 11: RISC-V Processor Datapath Yuanqing Cheng

CS/COE0447 Computer Organization & Assembly Language

Lecturer: Alan Christopher

MIPS Processor.

A primer on Computers and Programs

Guest Lecturer TA: Shreyas Chand

Instructor Paul Pearce

What is Computer Architecture?

COMP541 Datapaths I Montek Singh Mar 18, 2010.

Introduction to Microprocessor Programming

What is Computer Architecture?

What is Computer Architecture?

Guest Lecturer: Justin Hsia

Instruction Set Architecture

MIPS Processor.

CS/COE0447 Computer Organization & Assembly Language

Presentation transcript:

New-School Machine Structures Parallel Requests Assigned to computer e.g., Search “Katz” Parallel Threads Assigned to core e.g., Lookup, Ads Parallel Instructions >1 one time e.g., 5 pipelined instructions Parallel Data >1 data one time e.g., Add of 4 pairs of words Hardware descriptions All one time Programming Languages 6/9/ Smart Phone Warehouse Scale Computer Software Hardware Harness Parallelism & Achieve High Performance Logic Gates Core … Memory (Cache) Input/Output Computer Cache Memory Core Instruction Unit(s) Functional Unit(s) A 3 +B 3 A 2 +B 2 A 1 +B 1 A 0 +B 0 Today’s Lecture

Levels of Representation/Interpretation 6/9/ lw $t0, 0($2) lw $t1, 4($2) sw $t1, 0($2) sw $t0, 4($2) High Level Language Program (e.g., C) Assembly Language Program (e.g., MIPS) Machine Language Program (MIPS) Hardware Architecture Description (e.g., block diagrams) Compiler Assembler Machine Interpretation temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; Logic Circuit Description (Circuit Schematic Diagrams) Architecture Implementation Anything can be represented as a number, i.e., data or instructions

Five Components of a Computer Control Datapath Memory Input Output 6/9/2016 3

Reality Check: Typical MIPS Chip Die Photograph 6/9/ Integer Control and Datapath Performance Enhancing On-Chip Memory (iCache + dCache) Floating Pt Control and Datapath Protection- oriented Virtual Memory Support

Computer Eras: Mainframe 1950s-60s 6/9/ “Big Iron”: IBM, UNIVAC, … build $1M computers for businesses => COBOL, Fortran, timesharing OS Processor (CPU) I/O Memory

Example MIPS Block Diagram 6/9/2016 6

A MIPS Family (Toshiba) 6/9/2016 7

The Processor Processor (CPU): the active part of the computer, which does all the work (data manipulation and decision-making) – Datapath: portion of the processor which contains hardware necessary to perform operations required by the processor (“the brawn”) – Control: portion of the processor (also in hardware) which tells the datapath what needs to be done (“the brain”) 6/9/2016 8

Stages of the Datapath : Overview Problem: a single, atomic block which “executes an instruction” (performs all necessary operations beginning with fetching the instruction) would be too bulky and inefficient Solution: break up the process of “executing an instruction” into stages or phases, and then connect the phases to create the whole datapath – Smaller phases are easier to design – Easy to optimize (change) one phase without touching the others 6/9/2016 9

Instruction Level Parallelism 6/9/ P 1 Instr 1 IFIDALUMEMWR P 2P 3P 4P 5P 6P 7P 8P 9P 10P 11P 12 Instr 2 IFIDALUMEMWRIFIDALUMEMWR Instr 2 Instr 3 IFIDALUMEMWR IFIDALUMEMWR Instr 4 IFIDALUMEMWR Instr 5 IFIDALUMEMWR Instr 6 IFIDALUMEMWR Instr 7 IFIDALUMEMWR Instr 8

Phases of the Datapath (1/5) There is a wide variety of MIPS instructions: so what general steps do they have in common? Phase 1: Instruction Fetch – No matter what the instruction, the 32-bit instruction word must first be fetched from memory (the cache-memory hierarchy) – Also, this is where we Increment PC (that is, PC = PC + 4, to point to the next instruction: byte addressing so + 4) Simulator: Instruction = Memory[PC]; PC+=4; 6/9/

Phases of the Datapath (2/5) Phase 2: Instruction Decode – Upon fetching the instruction, we next gather data from the fields (decode all necessary instruction data) – First, read the opcode to determine instruction type and field lengths – Second, read in data from all necessary registers For add, read two registers For addi, read one register For jal, no reads necessary 6/9/

Simulator for Decode Phase Register1 = Register[rsfield]; Register2 = Register[rtfield]; if (opcode == 0) … else if (opcode >5 && opcode <10) … else if (opcode …) … Better C statement for chained if statements? 6/9/

Phases of the Datapath (3/5) Phase 3: ALU (Arithmetic-Logic Unit) – Real work of most instructions is done here: arithmetic (+, -, *, /), shifting, logic (&, |), comparisons (slt) – What about loads and stores? lw $t0, 40($t1) Address we are accessing in memory = the value in $t1 PLUS the value 40 So we do this addition in this stage Simulator: Result = Register1 op Register2; Address = Register1 + Addressfield 6/9/

Phases of the Datapath (4/5) Phase 4: Memory Access – Actually only the load and store instructions do anything during this phase; the others remain idle during this phase or skip it all together – Since these instructions have a unique step, we need this extra phase to account for them – (As a result of the cache system, this phase is expected to be fast: talk about next week) Simulator: Register[rtfield] = Memory[Address] or Memory[Address] = Register[rtfield] 6/9/

Phases of the Datapath (5/5) Phase 5: Register Write – Most instructions write the result of some computation into a register – E.g.,: arithmetic, logical, shifts, loads, slt – What about stores, branches, jumps? Don’t write anything into a register at the end These remain idle during this fifth phase or skip it all together Simulator: Register[rdfield] = Result 6/9/

Laptop Innards 6/9/

Server Internals 6/9/

Server Internals 6/9/ Google Server

The ARM Inside the iPhone 6/9/

ARM Architecture RM_architecture RM_architecture 6/9/

iPhone Innards 6/9/ GHz ARM Cortex A8 You will about multiple processors, data level parallelism, caches in 61C I/O Processor Memory

Review Key Technology Trends and Limitations – Transistor doubling BUT power constraints and latency considerations limit performance improvement – (Single Processor) computers are about as fast as they are likely to get, exploit parallelism to go faster Five Components of a Computer – Processor/Control + Datapath – Memory – Input/Output: Human interface/KB + Mouse, Display, Storage … evolving to speech, audio, video Architectural Family: One Instruction Set, Many Implementations 6/9/