Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.

Slides:



Advertisements
Similar presentations
Computer Science and Engineering Laboratory, Transport-triggered processors Jani Boutellier Computer Science and Engineering Laboratory This.
Advertisements

Xtensa C and C++ Compiler Ding-Kai Chen
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
Comp Sci Floating Point Arithmetic 1 Ch. 10 Floating Point Unit.
Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.
A reconfigurable system featuring dynamically extensible embedded microprocessor, FPGA, and customizable I/O Borgatti, M. Lertora, F. Foret, B. Cali, L.
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
Microprocessor and Microcontroller Based Systems Instructor: Eng.Moayed N. EL Mobaied The Islamic University of Gaza Faculty of Engineering Electrical.
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
Embedded Systems Programming
1 Engineering Problem Solving With C++ An Object Based Approach Fundamental Concepts Chapter 1 Engineering Problem Solving.
Stretch Wide Data Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.
Midterm Wednesday Chapter 1-3: Number /character representation and conversion Number arithmetic Combinational logic elements and design (DeMorgan’s Law)
Configurable System-on-Chip: Xilinx EDK
Introduction to ARM Architecture, Programmer’s Model and Assembler Embedded Systems Programming.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
Code Optimization Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.
UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.
Embedded Systems Programming
Chapter 6 Memory and Programmable Logic Devices
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 1 Introduction.
Computer Architecture and Organization
An Introduction Chapter Chapter 1 Introduction2 Computer Systems  Programmable machines  Hardware + Software (program) HardwareProgram.
Basics and Architectures
RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
UNIT - 1Topic - 3. Computer software is a program that tells a computer what to do. Computer software, or just software, is any set of machine-readable.
Designing the WRAMP Dean Armstrong The University of Waikato.
COMP3221 lec04--prog-model.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lecture 4: Programmer’s Model of Microprocessors
© 2004 Mercury Computer Systems, Inc. FPGAs & Software Components Graham Bardouleau & Jim Kulp Mercury Computer Systems, Inc. High Performance Embedded.
Ramesh.B ELEC 6200 Computer Architecture & Design Fall /29/20081Computer Architecture & Design.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
EE204 L12-Single Cycle DP PerformanceHina Anwar Khan EE204 Computer Architecture Single Cycle Data path Performance.
1 Control Unit Operation and Microprogramming Chap 16 & 17 of CO&A Dr. Farag.
Chapter 2 Data Manipulation. © 2005 Pearson Addison-Wesley. All rights reserved 2-2 Chapter 2: Data Manipulation 2.1 Computer Architecture 2.2 Machine.
Introduction to Microprocessors
DIGITAL SIGNAL PROCESSORS. Von Neumann Architecture Computers to be programmed by codes residing in memory. Single Memory to store data and program.
MICROOCESSORS AND MICROCONTROLLER:
MICROPROCESSOR FUNCTION Technician Series Created Mar 2015 gmail.com.
ALU (Continued) Computer Architecture (Fall 2006).
CISC and RISC 12/25/ What is CISC? acronym for Complex Instruction Set Computer Chips that are easy to program and which make efficient use of memory.
1 The Instruction Set Architecture September 27 th, 2007 By: Corbin Johnson CS 146.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
RISC / CISC Architecture by Derek Ng. Overview CISC Architecture RISC Architecture  Pipelining RISC vs CISC.
1 TM 1 Embedded Systems Lab./Honam University ARM Microprocessor Programming Model.
FPGA Technology Overview Carl Lebsack * Some slides are from the “Programmable Logic” lecture slides by Dr. Morris Chang.
Addressing modes, memory architecture, interrupt and exception handling, and external I/O. An ISA includes a specification of the set of opcodes (machine.
By Abhishek.S 8 th Sem,CS Under the guidance of Mrs. Annapurna B.E, MTech Lecturer of NIE.
Computer Organization Exam Review CS345 David Monismith.
Programmable Hardware: Hardware or Software?
PROGRAMMABLE LOGIC CONTROLLERS SINGLE CHIP COMPUTER
Assembly language.
Engineering Problem Solving With C An Object Based Approach
ECE354 Embedded Systems Introduction C Andras Moritz.
CS1251 Computer Architecture
Prof. Sirer CS 316 Cornell University
CISC (Complex Instruction Set Computer)
Microcomputer Architecture
Central Processing Unit
Dynamically Reconfigurable Architectures: An Overview
To DSP or Not to DSP? Chad Erven.
Enhancing Data Path M 4-bit Reg X A L U
Prof. Sirer CS 316 Cornell University
Introduction to Computer Systems
ARM Introduction.
Course Outline for Computer Architecture
CPU Structure CPU must:
Computer Systems An Introducton.
ADSP 21065L.
Presentation transcript:

Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation

(R)evolution of Processors Rock Hard Ice Hard Play-dough Hard

(R)evolution of Processors Rock Hard Ice Hard Play-dough Hard Hardwire, GPP Perform well in most conditions but not extreme conditions

(R)evolution of Processors Rock Hard Ice Hard Play Dough Hard GPP with FPGAs Custom designs perform well in some extreme conditions. Required extensive knowledge of hardware design

(R)evolution of Processors Rock Hard Ice Hard Play-dough Hard GPP with embedded programmable logics Reconfiguration triggered by software

(R)evolution of Processors Ice Hard –Contains ASIC (Application Specific IC) designs Increases time-to- market Takes time to reconfigure

Software Hotspots In DSP –80% of the processing load are spent on 20% of the code Hand tuned assembly that can take thousands of cycle to execute. Less portable –The remaining 80% of the code have complex system functions Run well on most GPP

Software Hotspots Example when 16 QuadAM modem (19.2 Kbaud) implemented entirely in software –takes 177,000 instruction cycles to execute on TIC6711 FPGA Co-processor (a few cycles)

Solving Hotspots PROCESSOR + FPGA MULTIPLE DSPs P P P P FPGA DSP ENABLED PROCESSORS P P RISC PROCESSOR PROGRAMMABLE LOGIC

An Example of Configurable Processor (Stretch S5000) ALU FPU 32-BIT RF CONTROL 128-BIT WRF 32-BIT RF ALU FPU S5 ENGINE I/O I/O + DMA ISEF Instruction-Set Extension Fabric DATA RAM 32KB SRAM 256KB D-CACHE 32KB I-CACHE 32KB MMU S5 Engine Common To All S5000 Processors 300 MHz Xtensa-V 32-bit RISC Processor I/O Subsystem Tailored To Markets & Applications Programmable Logic Data Path Inside The RISC Processor 32 x 128b Wide Registers + Flexible Wide Load/Store Instructions

Programmable Logic Architecture RISC DP Instruction Set Extension Fabric (ISEF) WRAR Memory

ISEF Resources An ISEF includes: –Computation resources –Routing resources –Pipeline resources –State Register resources 2 types of computation resources: –4096 arithmetic units (AUs) for arithmetic and logic operations –8192 multiplier units (MUs) for multiply and shift operations Example: A single ISEF may implement –32 16*16 multipliers – bit ALUs

Wide Register Wide register file is used for holding WR data –32 WR registers (128-bits each) –Divided into 2 banks of 16 registers (WRA and WRB) The WRA/WRB types associate a variable with WR bank A/B –WRA v1, v2, v3; –WRB w1, w2, w3; The WR type defaults to WRA –Use WRA/WRB to avoid unnecessary register moves between the two WR banks

Extension Instructions (EIs) The power of the Software Configurable Processor (SCP) architecture is derived from the ability to define new and complex instructions that operate on very wide data Extension Instruction’s 3 steps 1.EI Definition: write a Stretch-C function 2.EI Compilation: compile the Stretch-C function 3.EI Use: call an EI through its intrinsic in the application code (C/C++)

Extension Instructions 1.Define an Extension Instruction (writing Stretch-C) #include SE_FUNC void V_AND8(WR v1, WR vMask, WR *vOut) { *vOut = v1 & vMask; } 2.Compile and link EI (Stretch-C source file: *.xc ) 3.Use EI in C/C++ application code (calling intrinsics) #include “vector.h” WR v1, vMask, vOut; … WRL128I(&v1, (WR*) memSrc1Ptr, 0); V_AND8(v1, vMask, &vOut); WRS128I(vOut, (WR*) memDstPtr, 0); vector.xc

Extension Instructions –Are issued by the Xtensa –Read source operands from the 128-bit WR and/or 32-bit AR register files –Execute out of the ISEF –Write destination operands to WR Once the ISEF is configured with the new instruction, it may be –Called as an intrinsic from application C code –Used as an assembly instruction in an assembly source file

Writing Stretch-C Functions #include SE_FUNC void V_AND128( WR v1, WR v2, WR *vOut) { *vOut = v1 & vMask; } #include stretch.h header file Stretch-C functions are identified by keyword SE_FUNC void EI names are identified by the Stretch-C function name (for single instruction functions) EI source and destination operands are defined by the Stretch-C function parameters EI operation is defined by the Stretch-C function instructions

Extension Instruction Parameters 1 Extension Instructions are user defined assembly instructions that use input and output operands An Extension Instruction can specify up to 3 Parameters –0, 1, 2, or 3 inputs –0, 1 or 2 outputs Input and output parameters reside in register files –Inputs come from the WR or AR register files –Outputs may only be written to the WR register file Assembly # result = a + b ADD result, a, b Stretch-C // RESULT = A + B V_ADD4(A, B, &RESULT);

Extension Instruction Parameters 2 EI source operands (inputs) may include –Up to 3 WR inputs (use WR, WRA or WRB) –Up to 2 AR inputs (use int, short, etc.) EI destination operands (outputs) may include –Up to 2 WR outputs, each writing a separate WR bank –Use the C pointer notation for outputs A single WR parameter may be used as both an input and output operand SE_FUNC void FOO(int c1, WR v1, WRB *vOut){ } SE_FUNC void FOO(WR v1, WRA *vOut1, WRB *vOut2){ } SE_FUNC void FOO(WR v1, WRA *vInOut1, WRB *vOut2){ }

Example of Stretch-C RGB2YCrCb Y = R G B Cr = R G B Cb = R G B Or Y = (77R + 150G + 29B) >> 8 Cb = (-43R - 85G + 128B ) >> 8 Cr = (128R - 107G + 21B ) >> 8

RGB2YCC SE_FUNC void rgb2ycc(WR A, WR *B) { se_sint r[5], g[5], b[5]; se_sint y[5], cb[5], cr[5]; int i, j; /* unpack A to RGB data, does not use any ISEF logic */ for (i = 0; i < 5; i++) { j = i * 3 * 8; r[i] = A(j+7, j); g[i] = A(j+15, j+8); b[i] = A(j+23, j+16); } /* converting 5 pixels */ for (i = 0; i < 5; i++) { y[i] = ( 77*r[i] + 150*g[i] + 29*b[i] ) >> 8; cb[i] = (-43*r[i] - 85*g[i] + 128*b[i] ) >> 8; cr[i] = (128*r[i] - 107*g[i] - 21*b[i] ) >> 8; } /* pack YCbCr to B */ *B = (cr[4],cb[4],y[4],cr[3],cb[3],y[3],cr[2],cb[2],y[2],cr[1],cb[1],y[1],cr[0],cb[0],y[0]); }

Stretch Compiler scc libei.hlibei.a rgb2ycc.xc scc rgb2ycc.c scc rgb2ycc.exe rgb2ycc.o target compile link Stretch compile run

Compiler Option S5000

Summary Software Configurable Processor –Describe hardware using C/C++ But not trivial. Basic understanding of the architecture is needed –Reconfiguration can take place in 150 micro-seconds 2 ISEFs per chip –Can ping pong Configuration files stored in SDRAM –Use DMA to preload information ISEF is proprietary and NOT FPGAs