Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside.

Slides:



Advertisements
Similar presentations
Tuning of Loop Cache Architectures to Programs in Embedded System Design Susan Cotterell and Frank Vahid Department of Computer Science and Engineering.
Advertisements

Experiments with the Peripheral Virtual Component Interface Roman L. Lysecky, Frank Vahid*, Tony D. Givargis Dept. of Computer Science & Engineering University.
Overview of Programming and Problem Solving ROBERT REAVES.
1.6 Inside the system unit [Hardware]
Today’s Lecture What is the embedded system?
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
Introduction to Information Technology: Your Digital World © 2013 The McGraw-Hill Companies, Inc. All rights reserved.Using Information Technology, 10e©
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
Embedded Systems Design: A Unified Hardware/Software Introduction 1 Introduction to embedded Systems.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
Embedded System Design Using FPGAs Module F1-1. What is an Embedded System It is not a PC! Most computers in the world do not have a keyboard and screen.
A First-step Towards an Architecture Tuning Methodology for Low Power Greg Stitt, Frank Vahid*, Tony Givargis Dept. of Computer Science & Engineering University.
Chapter 1: Introduction
A Self-Optimizing Embedded Microprocessor using a Loop Table for Low Power Frank Vahid* and Ann Gordon-Ross Dept. of Computer Science and Engineering University.
Propagating Constants Past Software to Hardware Peripherals Frank Vahid*, Rilesh Patel and Greg Stitt Dept. of Computer Science and Engineering University.
WELCOME M.TECH- BIOMEDICAL SIGNAL PROCESSING & INSTRUMENTATION Murigendrayya M Hiremath Lecturer –ML DSCE.
Introduction to Embedded Systems. What is an Embedded System? Electronic devices that incorporate a microprocessor or microcontroller within their implementation.
ACOE343 - Real-Time Embedded Processor Systems Dr. Konstantinos Tatas Office 107, FRC building
1 © Unitec New Zealand Overview Of Embedded Hardware ETEC 6416 Date: - 03 Aug, 2011.
L29:Lower Power Embedded Architecture Design 성균관대학교 조 준 동 교수,
Microcontroller Systems: Motivation
Embedded Systems Design 1. 2 Embedded Systems Overview Computing systems are everywhere Most of us think of “desktop” computers –PC’s –Laptops –Mainframes.
Embedded Systems. 2 A “short list” of embedded systems And the list goes on and on Anti-lock brakes Auto-focus cameras Automatic teller machines Automatic.
1 ENGR 631 Embedded Systems Dr. Jerry H. Tucker. 2 Contact Information Class web page egre631/index.html
Introduction to Computers Personal Computing 10. What is a computer? Electronic device Performs instructions in a program Performs four functions –Accepts.
1 Introduction High-Performance Embedded System Design: Using FPGA.
Computing Systems Computer abstractions and technology.
Topics Introduction Hardware and Software How Computers Store Data
© Paradigm Publishing Inc. 2-1 Chapter 2 Input and Processing.
Ch Review1 Review Chapter Microcomputer Systems Hardware, Software, and the Operating System.
A Fast On-Chip Profiler Memory Roman Lysecky, Susan Cotterell, Frank Vahid* Department of Computer Science and Engineering University of California, Riverside.
1 Lecture 1: Embedded Systems Overview, AVR Hardware/Software Introduction.
A Self-Optimizing Embedded Microprocessor using a Loop Table for Low Power Frank Vahid* and Ann Gordon-Ross Dept. of Computer Science and Engineering University.
1 Chapter 1: Introduction.  Embedded systems overview  What are they?  Design challenge – optimizing design metrics  Technologies  Processor technologies.
November SSI Small Scale Integration Up to 12 equivalent gate circuits on a single chip Includes basic gates and flip-flops.
© Paradigm Publishing Inc. 2-1 Chapter 2 Input and Processing.
Automated Design of Custom Architecture Tulika Mitra
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
Microcode Source: Digital Computer Electronics (Malvino and Brown)
Advanced Computer Architecture 0 Lecture # 1 Introduction by Husnain Sherazi.
IEEE ICECS 2010 SysPy: Using Python for processor-centric SoC design Evangelos Logaras Elias S. Manolakos {evlog, Department of Informatics.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Computer Architecture
Computer Basic Vocabulary
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
Chapter 17 Looking “Under the Hood”. 2Practical PC 5 th Edition Chapter 17 Getting Started In this Chapter, you will learn: − How does a computer work.
PHY 201 (Blum)1 Microcode Source: Digital Computer Electronics (Malvino and Brown)
The AVR Microcontroller: History and Features
Technology discontinuities drive new computing paradigms and applications 1960 Mainframe ComputerIBM 1970 Mini-Computer DEC 1980 WorkstationSun, HP 1990PCIntel,
1 THE COMPUTER. 2 Input Processing Output Storage 4 basic functions.
Computer Systems. Bits Computers represent information as patterns of bits A bit (binary digit) is either 0 or 1 –binary  “two states” true and false,
Roman LyseckyUniversity of California, Riverside1 Pre-fetching for Improved Core Interfacing Roman Lysecky, Frank Vahid, Tony Givargis, & Rilesh Patel.
PARTS OF A COMPUTER 2 Hardware Computer Hardware is any of the physical parts of the computer you can touch. There are 4 categories: 1. Input Devices.
Digital Literacy: Computer Basics
Embedded Systems Introduction. Microprocessor building blocks 1. ALU (Arithmetic Logic Unit): The ALU is a sequential logic circuitry that is intended.
©2013 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. Introduction to Computers and Computing.
Embedded Systems Overview Prepared by Nisha Sinsinbar Subject: Microcontoller & Interfacing Sub code: EC Department.
Chapter 1: Embedded Computing Embedded System Design.
KAASHIV INFOTECH – A SOFTWARE CUM RESEARCH COMPANY IN ELECTRONICS, ELECTRICAL, CIVIL AND MECHANICAL AREAS
High-Performance Embedded System Design: Using FPGA
ECE354 Embedded Systems Introduction C Andras Moritz.
EmbedDed Systems – MECT190
IB Computer Science Topic 2.1.1
Why microcontrollers in embedded systems?
EmbedDed Systems – MECT190
Introduction to Embedded Systems
SNS COLLEGE OF TECHNOLOGY
Introduction to Embedded Systems
Embedded Systems By : Simran Amaandeep Singh
Presentation transcript:

Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside Roman Lysecky Department of IP Management Conexant Newport Beach This work was supported by the National Science Foundation under grants CCR and CCR , and by a Design Automation Conference graduate scholarship. This work is being presented at CASES’00 (Compilers, Architectures and Synthesis for Embedded Systems), November 18-19, 2000, San Jose, CA.

A “short list” of embedded systems And the list goes on and on Anti-lock brakes Auto-focus cameras Automatic teller machines Automatic toll systems Automatic transmission Avionic systems Battery chargers Camcorders Cell phones Cell-phone base stations Cordless phones Cruise control Curbside check-in systems Digital cameras Disk drives Electronic card readers Electronic instruments Electronic toys/games Factory control Fax machines Fingerprint identifiers Home security systems Life-support systems Medical testing systems Modems MPEG decoders Network cards Network switches/routers On-board navigation Pagers Photocopiers Point-of-sale systems Portable video games Printers Satellite phones Scanners Smart ovens/dishwashers Speech recognizers Stereo systems Teleconferencing systems Televisions Temperature controllers Theft tracking systems TV set-top boxes VCR’s, DVD players Video game consoles Video phones Washers and dryers

Introduction: Traditional micro- processor use in embedded systems Tasks (not necessarily in the given order) (1) Buy a microprocessor IC (integrated circuit) (2) Integrate it with other IC’s onto a board and insert it into an embedded system (3) Download a software program Processor Software 123 Notice that the processor IC is designed independent of the software Different microprocessor variations thus exist, like low-power or high-performance IC’s Board

Introduction: Modern core-based approach Tasks (1) Buy a microprocessor CORE Hard: layout; Firm: structural HDL; Soft: synthesizable HDL You are buying Intellectual Property, like a file that may come on a floppy, CD-ROM, over the web, etc. You are NOT buying hardware. (2) Design a system-on-a-chip (SOC) from this and other cores (3) Fabricate a SOC IC (4) Insert the IC into an embedded system (5) Download a software program Software 145 Processor HDL 23

Introduction: embedded system unique feature of fixed program SOC’s implementing an embedded system have a unique feature Implements a particular application Thus, the processor may execute a single fixed program that never changes Unlike desktop systems, which execute a variety of programs Examples: digital camera, automobile cruise- controller We can exploit this fixed-program feature For example, by using mask-programmed ROM But much more can be done The software in here never changes after production

Introduction: Proposed core-based approach with architecture tuning Tasks (1) Buy a microprocessor core (2) Design a system-on-a-chip (SOC) from this and other cores (3) TUNE the SOC architecture to a software program (4) Fabricate a SOC IC (5) Insert the IC into an embedded system (6) Download the software program Software 1 45 Processor HDL 23 Processor HDL 6

Introduction: architecture tuning Architecture tuning A way to exploit the fixed- program feature of embedded systems First, do architecture design for the particular application Then, “tune” the core- based system architecture to the particular application program, before IC fabrication Goals: better performance, power, size Core library PeripheralA PeripheralB ProcessorX PeripheralProg. Processor Architecture design Architecture tuning Prog. Processor Peripheral Prog. Processor Peripheral Fixed program Fabrication HDL IC Tuned cores

Introduction: architecture tuning Examples of tuning optimizations Memory hierarchy: no cache, L1 cache, L1+L2 cache Cache organization: size, associativity, write policies Bus structure, data/address encoding DMA block sizes Microprocessor optimizations Internal small-loop table Controller partitioning Datapath shortcuts Register file copies

Introduction: Tuning is a special case of Y-Chart iteration Philips/TriMedia approach of simultaneously developing architecture and its applications ArchitectureApplications Numbers Mapping Analysis Our focus

Problem description Focus of this work: Tuning a microcontroller to its program Goal is reduced power without performance loss Restrict tuning to maintain exact instruction set compatibility No instructions may be added or deleted Thus, no modification to software development environment Also, no problems with porting software to/from other versions of the microcontroller Instruction set incompatibility can be a show stopper Maintenance/upgrades/re-porting of binaries over the lifetime of product and for product variations is a key issue Likewise, a stable software development environment is needed

Previous work Application-specific instruction-set processors [Fisher99] Customize a microprocessor to its application(s) Delete unnecessary instructions, add new ones along with accompanying datapath extensions e.g., Tensilica Customized instruction-set requires customized development tools (e.g., compiler, debugger) Tuning compiler to architecture [Tiwari et al 94] Architectural description languages to inform compiler of architecture features [Halambi et al 99] Tuning cache and cache/bus [Givargis et al 99] organization to application

Tuning environment Currently for the 8051 microcontroller Starts from VHDL synthesizable model of 8051 (soft core) Uses Synopsys synthesis, simulation and power analysis Uses 8051 instruction-set simulator Uses numerous scripts Goal of the enviroment Understand how power is being consumed for a particular application, so that modifications to the architecture (or application) can be made to minimize that power Three main tools Architectural view Instruction-set view Program/data memory view

Tuning environment: architectural view tool Microprocessor structure Program binary ROM generator ROM entity Simulator and power analyzer “Flat” power data Structural hierarchical power data translator and xdu display Microprocessor soft core RT-synthesizer ROM 1.04 mW ALU 1.62 mW RAM 1.42 mW CTRL 2.69 mW DECODER 0.07 mW Total 7.66 mW

Tuning environment: instruction-set view tool Flat power data for instruction 3 Flat power data for instruction 2 Binaries to exe instruction 3 Binaries to exer instruction 2 Microprocessor structure Binaries to exercise instruction 1 ROM generator ROM entity Simulator and power analyzer Flat power data for instruction 1 Power data collector, structural power data translator, and xdu display InstructionPower (mW) ADDC_ ADD_ ANL_ CLR_ CPL_ DA DEC_ DIV INC_ MOVC_ MOVC_ MOV_ MOV_ MUL NOP ORL_ POP PUSH8.7116

Tuning environment: program/data memory view tool Program binary Instruction-set simulator Per-instruction power data (from previous tool) Program hierarchy power translator and xdu display Program/data memory access frequencies and power AddrInsFreqPwrFreq*Pwr 00000LJMP MOV_ MOV_ MOV_ MOV_ RET MOV_ MOV_ MOV_ MOV_ MOV_ LCALL2700 AddrPurposeAccesses 00128P SP DPL DPH P PSW ACC B2598

Tuning environment Program binaryMicroprocessor core Program/data memory view tool (seconds) Architectural view tool (1 hour) Instruction-set power view tool (1 day) Program power data Architecture power data Instruction-set power data

Design flow using the tuning environment Change application DONE Change architecture Run program / data memory view tool Run architecture view tool Run instruction-set view tool Satisfied? Yes No

Experiments Started with 8051 soft core in VHDL Tuning environment was used to Examine where power consumption was occurring for a given application Quickly evaluate the impact of tuning optimizations These are early results, much more work remains

Power consumption of the initial 8051 model Power consumption Mainly due to switching wires Any wire who’s value changed (from 0 to 1) consumes power Want to minimize switching 8051 power consumption 5 main components Controller, RAM, and ALU are the most expensive components These components have potential for general optimizations Total Gates Average power: mW

General optimizations made to the 8051 Prevent unnecessary switching on wires connecting to memories Wires connecting processor to memories are high capacitance They were switching even when not being used So we inserted latches to hold the previous value, a standard power-saving technique Prevent unnecessary switching in decoder and ALU Again, by latching the inputs coming from the controller Fetch instruction bytes only when needed Hold ROM output when not being read

Power after general optimizations Overall power reduction from 37.2 to 11.6 mW. Total gates % improvements ROM82.9% RAM70.5% ALU60.0% CTR19.9% Average power: mW

Tuning optimizations Sought to tune the microprocessor to a particular applicaton GCD (Greatest common divisor) computation Tuning optimizations invoked 1) Replace frequently-accessed RAM locations by internal registers 2) Create datapath shortcuts for most common instructions 3) Partition the controller into a big controller and a small controller, with the small one handling the most frequently- executed GCD instructions

Sample tuning optimization Observation RAM consumes much power Address 224 accessed frequently Possible tuning optimization Replace this RAM location by a register Steps Modify VHDL model Run all three view tools Results Power reduction: 7.67 to 7.27 mW RAM reduced from 1.42 to 0.8 mW, CTRL increased slightly ROM 1.04 mW ALU 1.62 mW RAM 1.42 mW CTRL 2.69 mW DECODER 0.07 mW Total 7.66 mW AddrPurposeAccesses 00128P SP DPL DPH P PSW ACC B2598

Replacing certain RAM locations by registers PSW and accumulator are separated from RAM entity, placed in internal registers Total gates % improvements RAM46.1% Overall15.8% Average Power: mW

Optimized datapath MOV from reg7 to ACC very common Add “shortcut” signal to register file Avoids having data go through ALU Total Gates Power reduced by 0.32 mW (2.7%) Average power: mW Addr InsFreqPwrFreq*Pwr LJMP MOV_ MOV_ MOV_ MOV_ RET MOV_ MOV_ MOV_ MOV_ MOV_ LCALL2700

Controller Partitioning Motivation In many applications, 90% of the time is spent in 10% of the code (or some similar ratio) So let’s partition the controller into two, one handling the 10% of frequently executed code This smaller controller should consume less power Results Average power reduced from 11.6 mW to 11.3 mW (2.6%) Total gates

Conclusions Described an environment for tuning a microprocessor to its application for low power Full instruction set compatibility Multiple views helps find power hogs Fully automated Focus is now on developing tuning optimizations Controller partitioning, small-loop table, datapath shortcuts, register-file copies, etc. Investigate possibility of automating tuning optimizations, develop more general tuning methodology Environment for the 8051 is available on the web: