Experiments with the Peripheral Virtual Component Interface Roman L. Lysecky, Frank Vahid*, Tony D. Givargis Dept. of Computer Science & Engineering University.

Slides:



Advertisements
Similar presentations
Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.
Advertisements

Bus Specification Embedded Systems Design and Implementation Witawas Srisa-an.
The Bus Architecture of Embedded System ESE 566 Report 1 LeTian Gu.
Presenter : Cheng-Ta Wu Kenichiro Anjo, Member, IEEE, Atsushi Okamura, and Masato Motomura IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39,NO. 5, MAY 2004.
1 Fast Configurable-Cache Tuning with a Unified Second-Level Cache Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering University.
Digitally-Bypassed Transducers: Interfacing Digital Mockups to Real-Time Medical Equipment Scott Sirowy*, Tony Givargis and Frank Vahid* This work was.
EN0129 PC and Network Technology - 1 Sajjad Shami Adrian Robson Gerhard Fehringer School of Computing, Engineering & Information Sciences Northumbria University.
Introduction To VHDL for Combinational Logic
8088/86 Microprocessors and Supporting Chips
Introduction to Avalon Interface Hardik Shah Robotics and Embedded Systems Department of Informatics Technische Universität München www6.in.tum.de 06 May.
1 A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Roman Lysecky *Dept. of Electrical Engineering Dept. of Computer.
Dr A Sahu Dept of Computer Science & Engineering IIT Guwahati.
Reporter :LYWang We propose a multimedia SoC platform with a crossbar on-chip bus which can reduce the bottleneck of on-chip communication.
System On Chip - SoC Mohanad Shini JTAG course 2005.
1/1/ / faculty of Electrical Engineering eindhoven university of technology Processor support devices Part 1:Interrupts and shared memory dr.ir. A.C. Verschueren.
RLH - Spring 1998ECE 611 Hardware - 1 Basic Microprocessor Hardware ECE 611 Microprocessor Systems Dr. Roger L. Haggard, Associate Professor Department.
Nios implementation in CCD Camera for "Pi of the Sky" experiment Photonics and Web Engineering Research Group Institute of Electronics Systems Warsaw University.
Southampton: Oct 99Asynchronous Circuit Compilation- 1 AMULET3-H n Asynchronous macrocell ARM compatible processor core Full custom RAM Compiled ROM Balsa.
Parallel I/O Interface Memory CPUI/OTransducer Actuator Output Device Input Device Parallel Interface Microprocessor / Microcontroller Direct memory access(DMA)
Chuanjun Zhang, UC Riverside 1 Low Static-Power Frequent-Value Data Caches Chuanjun Zhang*, Jun Yang, and Frank Vahid** *Dept. of Electrical Engineering.
Instruction-based System-level Power Evaluation of System-on-a-chip Peripheral Cores Tony Givargis, Frank Vahid* Dept. of Computer Science & Engineering.
Roman LyseckyUniversity of California, Riverside1 Techniques for Reducing Read Latency of Core Bus Wrappers Roman L. Lysecky, Frank Vahid, & Tony D. Givargis.
A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning Roman Lysecky, Frank Vahid* Department.
Parameterized Systems-on-a-Chip Frank Vahid Tony Givargis, Roman Lysecky, Leslie Tauro, Susan Cotterell Department of Computer Science and Engineering.
A First-step Towards an Architecture Tuning Methodology for Low Power Greg Stitt, Frank Vahid*, Tony Givargis Dept. of Computer Science & Engineering University.
System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center.
Tony GivargisUniversity of California, Riverside & NEC USA1 Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design Tony D. Givargis.
A Self-Optimizing Embedded Microprocessor using a Loop Table for Low Power Frank Vahid* and Ann Gordon-Ross Dept. of Computer Science and Engineering University.
Implementation of DSP Algorithm on SoC. Mid-Semester Presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompaning engineer : Emilia Burlak.
1 Energy Savings and Speedups from Partitioning Critical Software Loops to Hardware in Embedded Systems Greg Stitt, Frank Vahid, Shawn Nematbakhsh University.
Propagating Constants Past Software to Hardware Peripherals Frank Vahid*, Rilesh Patel and Greg Stitt Dept. of Computer Science and Engineering University.
COMP3221 lec31-mem-bus-II.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lectures 32: Memory and Bus Organisation - II
Xilinx at Work in Hot New Technologies ® Spartan-II 64- and 32-bit PCI Solutions Below ASSP Prices January
Spring EE 437 Lillevik 437s06-l2 University of Portland School of Engineering Advanced Computer Architecture Lecture 2 NSD with MUX and ROM Class.
ECE 493T9 Real Time Embedded System Tutorial Set 3 June 10, Spring 2008.
A Fast On-Chip Profiler Memory Roman Lysecky, Susan Cotterell, Frank Vahid* Department of Computer Science and Engineering University of California, Riverside.
1 3-General Purpose Processors: Altera Nios II 2 Altera Nios II processor A 32-bit soft core processor from Altera Comes in three cores: Fast, Standard,
Exercise 2 The Motherboard
ECE 371 Microprocessor Interfacing Unit 4 - Introduction to Memory Interfacing.
Core of the Embedded System
8086/8088 Hardware Specifications A Course in Microprocessor Electrical Engineering Dept. University of Indonesia.
1 SERIAL PORT INTERFACE FOR MICROCONTROLLER EMBEDDED INTO INTEGRATED POWER METER Mr. Borisav Jovanović, Prof.dr Predrag Petković, Prof.dr. Milunka Damnjanović,
Computer Architecture
A Self-Optimizing Embedded Microprocessor using a Loop Table for Low Power Frank Vahid* and Ann Gordon-Ross Dept. of Computer Science and Engineering University.
Computer Architecture Lecture 9 by Engineer A. Lecturer Aymen Hasan AlAwady 10/2/2014 University of Kufa - Information Technology Research and Development.
Chonnam national university VLSI Lab 8.4 Block Integration for Hard Macros The process of integrating the subblocks into the macro.
Laboratoire d' Intégration des Systèmes et des Technologies System-Level Hardware-Based Protection of Memories against Soft-Errors Valentin Gherman Samuel.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
Modes of transfer in computer
DEVICES AND COMMUNICATION BUSES FOR DEVICES NETWORK– PARALLEL BUS DEVICE PROTOCOLS 1.
Roman LyseckyUniversity of California, Riverside1 Pre-fetching for Improved Core Interfacing Roman Lysecky, Frank Vahid, Tony Givargis, & Rilesh Patel.
Making Good Points : Application-Specific Pareto-Point Generation for Design Space Exploration using Rigorous Statistical Methods David Sheldon, Frank.
CPU/BIOS/BUS CES Industries, Inc. Lesson 8.  Brain of the computer  It is a “Logical Child, that is brain dead”  It can only run programs, and follow.
Codesigned On-Chip Logic Minimization Roman Lysecky & Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also.
Scott Sirowy, Chen Huang, and Frank Vahid † Department of Computer Science and Engineering University of California, Riverside {ssirowy,chuang,
On-Chip Logic Minimization Roman Lysecky & Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also with the.
INTRODUCTION TO MICROPROCESSOR. Do you know computer organization? Arithmetic Logic Unit Memory Output Input Control Unit.
1 Frequent Loop Detection Using Efficient Non-Intrusive On-Chip Hardware Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering.
System on a Programmable Chip (System on a Reprogrammable Chip)
DIRECT MEMORY ACCESS and Computer Buses
Department of Computer Science and Engineering
I/O Memory Interface Topics:
Techniques for Reducing Read Latency of Core Bus Wrappers
Chapter 1: Introduction
System Interconnect Fabric
ریز پردازنده. ریز پردازنده مراجع درس میکروکنترلرهای AVR برنامه نویسی اسمبلی و C محمدعلی مزیدی، سپهر نعیمی و سرمد نعیمی مرجع کامل میکروکنترلرهای AVR.
Portable SystemC-on-a-Chip
Automatic Tuning of Two-Level Caches to Embedded Applications
ADSP 21065L.
On-Chip Buses/Networks for SoC
Presentation transcript:

Experiments with the Peripheral Virtual Component Interface Roman L. Lysecky, Frank Vahid*, Tony D. Givargis Dept. of Computer Science & Engineering University of California, Riverside *also with the Center for Embedded Computer Systems, UC Irvine This work was supported by the National Science Foundation under grant # CCR , and by a Design Automation Conference graduate scholarship.

On-chip system bus MicroprocessorMemory On-chip peripheral bus Bridge System-on-a-chip Introduction Advent of Systems-on-a- Chip (SOC’s) and cores Peripheral cores Microprocessor support components UART’s, DMA controllers, CODECs, off-chip bus interfaces, etc. Core library... Peripheral core To other systems Problem: how integrate cores into different SOC’s having different on-chip peripheral buses?

Introduction: The Core Integration Problem Solution 1: User modifies core for specific bus Could accidentally change the core’s functionality Solution 2: Different core version per bus Can’t consider all buses Solution 3: Standard bus Not likely [VSIA] Solution 4: Bus wrappers Promising -- but how much overhead? Peripheral core Core library Peripheral bus X Peripheral core for X Peripheral core for Z Peripheral core for Y Peripheral core for X Peripheral bus X Peripheral core for std Standard bus( std) Bus wrapper Peripheral core Peripheral bus X Bus wrapper for X Peripheral core

On-chip system bus MicroprocessorMemory On-chip peripheral bus Bridge System-on-a-chip Introduction Peripheral core Bus wrapper Peripheral core internals Peripheral core internals PVCI Bus wrapper approach Proposed by Virtual Socket Interface Alliance Separate core into internals and bus wrapper What overhead comes with a bus-wrapper solution? PVCI: Peripheral Virtual Component Interface -- standard between wrapper and internals Eases integration Only bus wrapper need be modified for different buses

Setup for evaluating PVCI overhead Digital camera example Synthesizable RTL VHDL Synopsys synthesis, simulation and power analysis About 100,000 cells 3 versions of the CCD and CODEC peripherals Integrated Non-PVCI wrapper (bi-direct.) Designed before PVCI PVCI wrapper (uni-direct.) 2 peripheral buses ISA Custom MIPS MEM. BIOS BRIDGE CCD CODEC On-chip peripheral bus System bus Digital camera

PVCI general structure Two uni-directional buses Handshake control Synchronous Bus wrapper Peripheral core internals PVCI Peripheral core On-chip peripheral bus val wdata ack rdata addr ess read clock

Experiments with the ISA bus 23-bit address bus 32-bit bi-directional data bus 4-cycles per access minimum Slower peripherals can extend access time using iochrdy signal Bus Master Peripheral (Bus Slave) isa_ ale isa_ addr isa_ ioch rdy ack_ data isa_ data isa_ ior isa_iowi clock isa_addr isa_ale isa_data isa_ior isa_iow isa_iochrdy data readystart transfer

Experiments with the ISA bus Size overhead of about 1000 gates per peripheral Power overhead of about 0.05 milliwatts (<1%) No performance overhead Since ISA has 4- cycle minimum access delay PVCI vs. Integrated

Experiments with a custom peripheral bus Similar to ISA, but... No 4-cycle minimum Handshake clock bus_addr bus_data bus_ior bus_rdy asserted by core data ready Integrated version asserted by bus wrapper asserted by core internals data ready clock bus_addr bus_data bus_ior bus_rdy wrp_addr wrp_data wrp_read wrp_ack Wrapper version Performance overhead Performance overhead on reads can occur

Experiments with a custom peripheral bus Size overhead of about 1000 gates per peripheral Power overhead of about 0.05 milliwatts (<1%) Performance overhead of about 5% in this example PVCI vs. Integrated

Experiments 1000 gates per core overhead is fairly small Typical peripheral core may have from gates [Inventra library] 0.05 milliwatts per core overhead is also small No performance overhead with ISA bus Performance overhead of 5% on reads with faster bus Essentially due to reads taking 4 cycles instead of 2 cycles

Conclusions Overheads in size, power and performance of PVCI vs. Integrated core were small Only significant overhead was performance in certain case Our earlier work on pre-fetching can reduce or eliminate this overhead [ISSS’99, DATE’00] Remerging the bus wrapper with core internals can also reduce this overhead PVCI and non-PVCI cores were competitive Integration advantages of bus-wrapper approach seem to come with acceptable overhead