IClass – A Many-core processor based on RISC-V

Slides:



Advertisements
Similar presentations
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Advertisements

ARM Cortex-A9 MPCore ™ processor Presented by- Chris Cai (xiaocai2) Rehana Tabassum (tabassu2) Sam Mussmann (mussmnn2)
1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.
THE MIPS R10000 SUPERSCALAR MICROPROCESSOR Kenneth C. Yeager IEEE Micro in April 1996 Presented by Nitin Gupta.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
Embedded Systems Programming
1 Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
Chapter 12 Pipelining Strategies Performance Hazards.
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor Sankaralingam et al. Presented by Cynthia Sturton CS 258 3/3/08.
Inside The CPU. Buses There are 3 Types of Buses There are 3 Types of Buses Address bus Address bus –between CPU and Main Memory –Carries address of where.
Introduction of Intel Processors
ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.
Computer Architecture System Interface Units Iolanthe II approaches Coromandel Harbour.
Trace cache and Back-end Oper. CSE 4711 Instruction Fetch Unit Using I-cache I-cache I-TLB Decoder Branch Pred Register renaming Execution units.
Instruction Level Parallelism Pipeline with data forwarding and accelerated branch Loop Unrolling Multiple Issue -- Multiple functional Units Static vs.
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
RSIM: An Execution-Driven Simulator for ILP-Based Shared-Memory Multiprocessors and Uniprocessors.
Out-of-Order OpenRISC 2 semesters project Semester B: OR1200 ISA Extension Final B Presentation By: Vova Menis-Lurie Sonia Gershkovich Advisor: Mony Orbach.
OOO Pipelines - II Smruti R. Sarangi IIT Delhi 1.
OOO Pipelines - III Smruti R. Sarangi Computer Science and Engineering, IIT Delhi.
SHAKTI PROCESSORS RAHUL BODDUNA RISE LAB, IIT MADRAS
1 ECE 734 Final Project Presentation Fall 2000 By Manoj Geo Varghese MMX Technology: An Optimization Outlook.
GCSE Computing - The CPU
Protection in Virtual Mode
Instruction Level Parallelism
Visit for more Learning Resources
William Stallings Computer Organization and Architecture 8th Edition
Multiscalar Processors
Smruti R. Sarangi IIT Delhi
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Lynn Choi Dept. Of Computer and Electronics Engineering
PowerPC 604 Superscalar Microprocessor
Timing Model of a Superscalar O-o-O processor in HAsim Framework
Architecture & Organization 1
CS203 – Advanced Computer Architecture
Lecture: Out-of-order Processors
Smruti R. Sarangi Computer Science and Engineering, IIT Delhi
Introduction to Pentium Processor
Stamatis Vassiliadis Symposium Sept. 28, 2007 J. E. Smith
Lecture 16: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
The Microarchitecture of the Pentium 4 processor
The University of Adelaide, School of Computer Science
Lecture 10: Out-of-order Processors
Lecture 11: Out-of-order Processors
Lecture: Out-of-order Processors
Architecture & Organization 1
Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Smruti R. Sarangi IIT Delhi
Lecture 11: Memory Data Flow Techniques
Out-of-Order Commit Processor
Smruti R. Sarangi Computer Science and Engineering, IIT Delhi
Lecture: Out-of-order Processors
Advanced Computer Architecture
Apparao Kodavanti Srinivasa Guntupalli
* From AMD 1996 Publication #18522 Revision E
Midterm 2 review Chapter
The University of Adelaide, School of Computer Science
ARM ORGANISATION.
Lecture 17 Multiprocessors and Thread-Level Parallelism
Course Outline for Computer Architecture
Lecture 17 Multiprocessors and Thread-Level Parallelism
GCSE Computing - The CPU
Chapter 11 Processor Structure and function
The University of Adelaide, School of Computer Science
Conceptual execution on a processor which exploits ILP
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

IClass – A Many-core processor based on RISC-V RISE Lab, IIT Madras

Objective To build an out-of-order core that could compete with present day cores of Desktop and mobile environments. To develop interconnects with cache-coherence support. To create a many-core processor using hybrid interconnects with a uniform interface across the interconnects.

Features of Out-of-Order core. Supports RV64IMAFD ISA as defined by RISC-V spec version 2.1. Supports RISC-V privilege spec 1.9.1 all modes. 8- stage core with Out-of-Order execution through explicit register re- naming approach. Dual Issue. Parameterized set-associative I-Cache and D-Cache VIPT Caches + Non-Blocking AXI bus support with multiple masters. Parameterized tournament branch prediction unit 2 ALU units and 1 FPU Unit. Prioritized for selecting instructions from issue queue based on age MMU support modeling the Power 3-level PTW CAM based speculative load store unit Single & Double precision Pipelined floating point unit optimized for maximum performance.

Overview

Bypass Network Dependent instructions have 3 cycle bubble between them. Producer Select Drive Execute Broadcast Consumer Wakeup Select Drive Execute In bypass network instructions are predicted to get finished in certain cycles Accordingly instructions dependent are woken up. Producer Select Drive Execute Broadcast Wakeup Select Drive Execute Consumer

Implementation of Bypass Network Instead of having registers for operand ready, every instruction is attributed to Delay register. Delay registers contents are moved “Shift register” at the time of broadcast. Contents of “Shift registers” are right shifted every cycle. When the right most bit in “Shift register” is set, then corresponding instruction is released for execution.

CAM based Load Store Unit Each memory access instruction is allotted an entry in one of LS queues. The value from the store is forwarded in case of address match. Alias bit is set in case of wrong speculation and pipeline is flushed at the time of commit. EAC CAM SEARCH Broadcast Load result Store Queue Memory Access Load Queue Cache Store Commit CAM SEARCH Flush Wire Load Commit

Verification Environment The verification environment consists of spike as golden reference. Each test case generated by AAPG consists of 20,000 odd instructions. Written in Python. We have an in house (Instruction Set Simulator) - ISS dumps state of the processor by generating the all register and memory values for each instruction executed. Tests Performed. RISC-V Tests AAPG RISC-V Torture test cases. CSMITH tests. ISS dump MATCH AAPG YES done NO dump RTL

IClass Performance Results. Benchmarks: Coremarks : 3.6 coremarks/MHz Dhrystone : 2.6 DMIPS/MHz Synthesis Results: FPGA : LUT Count : 110K. FPGA : Frequency : 100MHz

Manager-Client Pairing. Acquire : Request From Client to Manager Probe : Request From Manager to Client Release : Response From Client to Manager Grant : Response From Manager to Client Finish : Acknowledgement by Client. Jesse G. Beu et al. Manager Client Pairing : A Framework for Implementing Coherence Hierarchies. '11

Bi-directional Ring Bus with MCP Manager Client

Mesh with Notification Network Number of hops on a Mesh Network M x N Total HOPs :M+N M N Daya, Bhavya K., et al. "SCORPIO: a 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering. '14

Hybrid Interconnects. Manager Manager Client Client Manager Manager

Source Code All our code is open-sourced. You can find it at https://bitbucket.org/casl/shakti_public. Contact us for further discussions and collaborations.