CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 16 – RISC-V Vectors Krste Asanovic Electrical Engineering and.

Slides:



Advertisements
Similar presentations
Instruction Set Design
Advertisements

1/1/ / faculty of Electrical Engineering eindhoven university of technology Introduction Part 2: Data types and addressing modes dr.ir. A.C. Verschueren.
ISA Issues; Performance Considerations. Testing / System Verilog: ECE385.
1 Lecture 3: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 2: IT Students.
Topics covered: CPU Architecture CSE 243: Introduction to Computer Architecture and Hardware/Software Interface.
CS252 Graduate Computer Architecture Spring 2014 Lecture 9: VLIW Architectures Krste Asanovic
The University of Adelaide, School of Computer Science
ELEN 468 Advanced Logic Design
Microprocessors General Features To be Examined For Each Chip Jan 24 th, 2002.
Chapter 2 Instructions: Language of the Computer Part III.
Microprocessors The MIPS Architecture (Floating Point Instruction Set) Mar 26th, 2002.
COMP3221: Microprocessors and Embedded Systems Lecture 2: Instruction Set Architecture (ISA) Lecturer: Hui Wu Session.
CIS429/529 ISA - 1 Instruction Set Architectures Classification Addressing Modes Types of Instructions Encoding Instructions MIPS64 Instruction Set.
Recap.
CS 300 – Lecture 6 Intro to Computer Architecture / Assembly Language Instructions.
Part II: Addressing Modes
1 Layers of Computer Science, ISA and uArch Alexander Titov 20 September 2014.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
1 Instruction Set Architecture (ISA) Alexander Titov 10/20/2012.
Module : Algorithmic state machines. Machine language Machine language is built up from discrete statements or instructions. On the processing architecture,
Csci 136 Computer Architecture II – Summary of MIPS ISA Xiuzhen Cheng
Computer Organization Rabie A. Ramadan Lecture 3.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Elements of Datapath for the fetch and increment The first element we need: a memory unit to store the instructions of a program and supply instructions.
Computer Organization and Architecture + Networks
Addressing Modes in Microprocessors
Part of the Assembler Language Programmers Toolbox
Central Processing Unit Architecture
A Closer Look at Instruction Set Architectures
Choices in Designing an ISA
MIPS Coding Continued.
ELEN 468 Advanced Logic Design
COMP2121: Microprocessors and Interfacing
RISC Concepts, MIPS ISA Logic Design Tutorial 8.
A Closer Look at Instruction Set Architectures
Prof. Sirer CS 316 Cornell University
William Stallings Computer Organization and Architecture 8th Edition
Instruction Format MIPS Instruction Set.
COMP4211 : Advance Computer Architecture
Advanced Computer Architecture 5MD00 / 5Z032 Instruction Set Design
CSCI206 - Computer Organization & Programming
THE sic mACHINE CSCI/CMPE 3334 David Egle.
Basic Processing Unit Unit- 7 Engineered for Tomorrow CSE, MVJCE.
CS170 Computer Organization and Architecture I
Lecture 1 - Introduction
Introduction to Intel IA-32 and IA-64 Instruction Set Architectures
MIPS Instruction Encoding
The University of Adelaide, School of Computer Science
1 Overview of Microprocessors A. Parveen. 2 Lecture overview Introduction to microprocessors Instruction set architecture Typical commercial microprocessors.
Topic 5: Processor Architecture Implementation Methodology
Multivector and SIMD Computers
Instruction encoding We’ve already seen some important aspects of processor design. A datapath contains an ALU, registers and memory. Programmers and compilers.
MIPS Instruction Encoding
MIPS History MIPS is a computer family
Introduction to Micro Controllers & Embedded System Design
Topic 5: Processor Architecture
MIPS History MIPS is a computer family
Computer Instructions
Prof. Sirer CS 316 Cornell University
Instruction Format MIPS Instruction Set.
Introduction to Microprocessor Programming
Instruction encoding We’ve already seen some important aspects of processor design. A datapath contains an ALU, registers and memory. Programmers and compilers.
Instruction Set Principles
MIPS History MIPS is a computer family
MIPS Coding Continued.
CPU Structure CPU must:
Chapter 10 Instruction Sets: Characteristics and Functions
Chapter 4 The Von Neumann Model
Presentation transcript:

CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 16 – RISC-V Vectors Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste http://inst.eecs.berkeley.edu/~cs152 CS252 S05

Last Time in Lecture 16 GPU architecture Evolved from graphics-only, to more general-purpose computing GPUs programmed as attached accelerators, with software required to separate GPU from CPU code, move memory Many cores, each with many lanes thousands of lanes on current high-end GPUs SIMT model has hardware management of conditional execution code written as scalar code with branches, executed as vector code with predication CS252 S05

New RISC-V “V” Vector Extension Being added as a standard extension to the RISC-V ISA An updated form of Cray-style vectors for modern microprocessors Today, a short tutorial on current draft standard, v0.7 v0.7 is intended to be close to final version of RISC-V vector extension Still a work in progress, so details might change before standardization https://github.com/riscv/riscv-v-spec WARNING: Lab 4 uses older version of vector ISA, since new tools not available yet Most concepts carry over, if not programming details

RISC-V Scalar State Program counter (pc) 32x32/64-bit integer registers (x0-x31) x0 always contains a 0 Floating-point (FP), adds 32 registers (f0-f31) each can contain a single- or double-precision FP value (32-bit or 64-bit IEEE FP) FP status register (fcsr), used for FP rounding mode & exception reporting ISA string options: RV32I (XLEN=32, no FP) RV32IF (XLEN=32, FLEN=32) RV32ID (XLEN=32, FLEN=64) RV64I (XLEN=64, no FP) RV64IF (XLEN=64, FLEN=32) RV64ID (XLEN=64, FLEN=64) CS252 S05

Vector Extension Additional State 32 vector data registers, v0-v31, each VLEN bits long Vector length register vl Vector type register vtype Other control registers: vstart For trap handling vrm/vxsat Fixed-point rounding mode/saturation Also appear in fcsr Vector data registers VLEN bits per vector register, (implementation-dependent) v0 v31 Vector length register vl Vector type register vtype

Vector Type Register vsew[2:0] field encodes standard element width (SEW) in bits of elements in vector register (SEW = 8*2vsew ) vlmul[1:0] encodes vector register length multiplier (LMUL = 2vlmul = 1-8) vediv[1:0] encodes how vector elements are divided into equal sub-elements (EDIV = 2vediv = 1-8)

Example Vector Register Data Layouts (LMUL=1)

Setting vector configuration, vsetvli/vsetvl The vsetvl{i} configuration instructions set the vtype register, and also set the vl register, returning the vl value in a scalar register vsetvli rd, rs1, e8 # Set SEW=8, vl=min(VLEN/SEW,rs1), rd=vl vtype parameters (SEW,LMUL,EDIV) encoded as immediate in instruction Resulting machine vector length setting Requested application vector length Usually use immediate form, vsetvli, to set vtype parameters. The register version vsetvl is usually used for context save/restore Instruction encoding

vsetvl{i} operation The first scalar register argument, rs1, is the requested application vector length (AVL) The type argument (either immediate or second register) indicates how the vector registers should be configured Configuration includes size of each element The vector length is set to the minimum of requested AVL and the maximum supported vector length (VLMAX) in the new configuration VLMAX = LMUL*VLEN/SEW vl = min(AVL, VLMAX) The value placed in vl is also written to the scalar destination register rd

Simple stripmined vector memcpy example Set configuration, calculate vector strip length Unit-stride vector load bytes Unit-stride vector store bytes Binary machine code can run on machines with any VLEN!

Vector Load Instructions Vector destination Scalar base address Scalar stride (bytes) Vector of offsets (bytes)

Vector Store Instructions Vector store data

Vector Unit-Stride Loads/Stores

Vector Strided Load/Store Instructions

Vector Indexed Loads/Stores

Vector Length Multiplier, LMUL Gives fewer but longer vector registers Set by vlmul[1:0]field in vtype during setvli LMUL=2 LMUL=4

LMUL=8 stripmined vector memcpy example Combine eight vector registers into group (v0,v1,…,v7) Set configuration, calculate vector strip length Unit-stride vector load bytes Unit-stride vector store bytes Binary machine code can run on machines with any VLEN!

Mixed-Width Loops Have different element widths in one loop, even in one instruction Want same number of elements in each vector register, even if different bits/element Solution: Keep SEW/LMUL constant

CS152 Administrivia PS 4 due Friday April 5 in Section Lab 4 out on Friday Lab 3 due Monday April 8 CS252 S05

CS252 Administrivia Next week readings: Cray-1, VLIW & Trace Scheduling