TRIPS – An EDGE Instruction Set Architecture Chirag Shah April 24, 2008.

Slides:



Advertisements
Similar presentations
DSPs Vs General Purpose Microprocessors
Advertisements

Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.
CSCI 4717/5717 Computer Architecture
Chapter 8: Central Processing Unit
Microprocessors. Von Neumann architecture Data and instructions in single read/write memory Contents of memory addressable by location, independent of.
Lecture 2-Berkeley RISC Penghui Zhang Guanming Wang Hang Zhang.
ΜP rocessor Architectures To : Eng. Ahmad Hassan By: Group 18.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
PART 4: (2/2) Central Processing Unit (CPU) Basics CHAPTER 13: REDUCED INSTRUCTION SET COMPUTERS (RISC) 1.
Processor Technology and Architecture
Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
11/11/05ELEC CISC (Complex Instruction Set Computer) Veeraraghavan Ramamurthy ELEC 6200 Computer Architecture and Design Fall 2005.
PSU CS 106 Computing Fundamentals II Introduction HM 1/3/2009.
GCSE Computing - The CPU
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor Sankaralingam et al. Presented by Cynthia Sturton CS 258 3/3/08.
Pipelining By Toan Nguyen.
Reduced Instruction Set Computers (RISC) Computer Organization and Architecture.
Advanced Computer Architectures
RISC and CISC. Dec. 2008/Dec. and RISC versus CISC The world of microprocessors and CPUs can be divided into two parts:
Computer Architecture CST 250 INTEL PENTIUM PROCESSOR Prepared by:Omar Hirzallah.
Advances in Language Design
Computer Organization and Architecture Reduced Instruction Set Computers (RISC) Chapter 13.
Basics and Architectures
Chun Chiu. Overview What is RISC? Characteristics of RISC What is CISC? Why using RISC? RISC Vs. CISC RISC Pipelines Advantage of RISC / disadvantage.
INTRODUCTION Crusoe processor is 128 bit microprocessor which is build for mobile computing devices where low power consumption is required. Crusoe processor.
Chapter 1 An Introduction to Processor Design 부산대학교 컴퓨터공학과.
What have mr aldred’s dirty clothes got to do with the cpu
RISC Architecture RISC vs CISC Sherwin Chan.
Ramesh.B ELEC 6200 Computer Architecture & Design Fall /29/20081Computer Architecture & Design.
Computer Architecture And Organization UNIT-II General System Architecture.
RISC ARCHITECTURE BY TEDDY LEE. TOPICS REVIEW OF RISC RISC ARCHITECTURE RISC VS. CISC PA-RISC HP ARCHITECTURE.
RISC and CISC. What is CISC? CISC is an acronym for Complex Instruction Set Computer and are chips that are easy to program and which make efficient use.
 Introduction to SUN SPARC  What is CISC?  History: CISC  Advantages of CISC  Disadvantages of CISC  RISC vs CISC  Features of SUN SPARC  Architecture.
Computer Architecture 2 nd year (computer and Information Sc.)
ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.
DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –
CISC and RISC 12/25/ What is CISC? acronym for Complex Instruction Set Computer Chips that are easy to program and which make efficient use of memory.
What is a Microprocessor ? A microprocessor consists of an ALU to perform arithmetic and logic manipulations, registers, and a control unit Its has some.
Von Neumann Computers Article Authors: Rudolf Eigenman & David Lilja
EKT303/4 Superscalar vs Super-pipelined.
COMPUTER ORGANIZATIONS CSNB123 NSMS2013 Ver.1Systems and Networking1.
RISC / CISC Architecture by Derek Ng. Overview CISC Architecture RISC Architecture  Pipelining RISC vs CISC.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 1: Overview of High Performance Processors * Jeremy R. Johnson Wed. Sept. 27,
Autumn 2006CSE P548 - Dataflow Machines1 Von Neumann Execution Model Fetch: send PC to memory transfer instruction from memory to CPU increment PC Decode.
CISC. What is it?  CISC - Complex Instruction Set Computer  CISC is a design philosophy that:  1) uses microcode instruction sets  2) uses larger.
Computer Organization IS F242. Course Objective It aims at understanding and appreciating the computing system’s functional components, their characteristics,
Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.
Addressing modes, memory architecture, interrupt and exception handling, and external I/O. An ISA includes a specification of the set of opcodes (machine.
Computer Organization and Architecture Lecture 1 : Introduction
PROGRAMMABLE LOGIC CONTROLLERS SINGLE CHIP COMPUTER
Microprocessor and Microcontroller Fundamentals
15-740/ Computer Architecture Lecture 3: Performance
Embedded Systems Design
Advanced Topic: Alternative Architectures Chapter 9 Objectives
Architecture & Organization 1
Overview Introduction General Register Organization Stack Organization
CISC (Complex Instruction Set Computer)
Architecture & Organization 1
CISC AND RISC SYSTEM Based on instruction set, we broadly classify Computer/microprocessor/microcontroller into CISC and RISC. CISC SYSTEM: COMPLEX INSTRUCTION.
Chapter 12 Pipelining and RISC
Computer Architecture
COMPUTER ORGANIZATION AND ARCHITECTURE
ADSP 21065L.
Chapter 4 The Von Neumann Model
Presentation transcript:

TRIPS – An EDGE Instruction Set Architecture Chirag Shah April 24, 2008

What is an Instruction Set Architecture (ISA)? Attributes of a computer as seen by a machine language programmer Native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O Native, machine language commands – opcodes CISC (’60s and ’70s) RISC (’80s, ’90s, and early ’00s)

CISC vs RISC CISC (Complex Instruction Set Computer) RISC (Reduced Instruction Set Computer) Emphasis on hardwareEmphasis on software Multi-clock, complex instructions Single-clock, reduced instructions “LOAD” and “STORE” incorporated in instructions “LOAD” and “STORE” are independent instructions Small code sizes, high cycles per second Large code sizes, low cycles per second Transistors used for storing complex instructions Spends more transistors on memory registers

Generic Computer Data resides in main memory Execution unit carries out computations Can only operate on data loaded into registers

Multiply Two Numbers One number “A” stored in 2:3 Other number “B” stored in 5:2 Store product in 2:3

CISC Approach Complex instructions built into hardware (Ex. MULT) Entire task in one line of assembly MULT 2:3, 5:2 High-level language A = A * B Compiler – high-level language into assembly Smaller program size & fewer calls to memory -> savings on cost of memory and storage

RISC Approach Only simple instructions – 4 lines of assembly LOAD A, 2:3 LOAD B, 5:2 PROD A, B STORE 2:3, A Less transistors of hardware space All instructions execute in uniform time (one clock cycle) - pipelining

What is Pipelining? Before Pipelining

After Pipelining

Why do we need a new ISA? 20 yrs RISC CPU performance - deeper pipelines Suffer from data dependency Worse for longer pipelines Pipeline scaling nearly exhausted Beyond pipeline centric ISA

Steve Keckler and Doug Burger Associate professors - University of Texas at Austin predicted beginning of the end for conventional microprocessor architectures Remarkable leaps in speed over last decade tailing off Higher performance -> greater complexity Designs consumed too much power and produced too much heat Industry at inflection point - old ways have stopped working Industry shifting to multicore to buy time, not a long range solution

EDGE Architecture EDGE (Explicit Data Graph Execution) Conventional architectures process one instruction at a time; EDGE processes blocks of instructions all at once and more efficiently Current multicore technologies increase speed by adding more processors Shifts burden to software programmers, who must rewrite their code EDGE technology - alternative approach when race to multicore runs out of steam

EDGE Architecture (contd.) Provides richer interface between compiler and microarchitecture: directly expresses dataflow graph that compiler generates CISC and RISC require hardware to rediscover data dependences dynamically at runtime Therefore CISC and RISC require many power- hungry structures and EDGE does not

TRIPS Tera-op Reliable Intelligently Adaptive Processing System – first EDGE processor prototype Funded by the Defense Advanced Research Projects Agency - $15.4 million Goal of one trillion instructions per second by 2012

Technology Characteristics for Future Architectures 1.New concurrency mechanisms 2.Power-efficient performance 3.On-chip communication-dominated execution 4.Polymorphism – Use its execution and memory units in different ways to run diverse applications

TRIPS – Addresses Four Technology Characteristics 1.Increased concurrency – array of concurrently executing arithmetic logic units (ALUs) 2.Power-efficient performance – spreads out overheads of sequential, von Neumann semantics, over 128-instruction blocks 3.Compile-time instruction placement to mitigate communication delays 4.Increased flexibility – dataflow execution model does not presuppose a given application computation pattern

Two Key Features Block-atomic execution: Compiler sends executable code to hardware in blocks of 128 instructions. Processor sees and executes a block all at once, as if single instruction; greatly decreases overhead associated with instruction handling and scheduling. Direct instruction communication: Hardware delivers a producer instruction’s output directly as an input to a consumer instruction, rather than writing to register file. Instructions execute in data flow fashion; each instruction executes as soon as its inputs arrive.

Code Example – Vector Addition Add and accumulate for fixed size vectors Initial control flow graph

Loop is unrolled Reduces the overhead per loop iteration Reduces the number of conditional branches that must be executed

Compiler produces TRIPS Intermediate Language (TIL) files Syntax of (name, target, sources)

Block Dataflow Graph

Scheduler analyzes each block dataflow graph Places instructions within the block Produces assembly language files

Block-level execution, up to 8 blocks concurrently

TRIPS prototype chip nm ASIC process; 500 MHz Two processing cores; each can issue 16 operations per cycle with up to 1,024 instructions in flight simultaneously Current high-performance processors - maximum execution rate of 4 operations per cycle 2 MBs L2 cache – 32 banks

Execution node – fully functional ALU and 64 instruction buffers Data flow techniques work well with the three kinds of concurrency found in software – instruction level, thread level, and data level parallelism

Architecture Generations Driven by Technology