Embedded System Hardware

Slides:



Advertisements
Similar presentations
DSPs Vs General Purpose Microprocessors
Advertisements

Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.
Computer Organization and Architecture
Fall 2012SYSC 5704: Elements of Computer Systems 1 MicroArchitecture Murdocca, Chapter 5 (selected parts) How to read Chapter 5.
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
PART 4: (2/2) Central Processing Unit (CPU) Basics CHAPTER 13: REDUCED INSTRUCTION SET COMPUTERS (RISC) 1.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
Computer Organization and Assembly language
Reduced Instruction Set Computers (RISC) Computer Organization and Architecture.
Cisc Complex Instruction Set Computing By Christopher Wong 1.
Processor Organization and Architecture
Part 1.  Intel x86/Pentium family  32-bit CISC processor  SUN SPARC and UltraSPARC  32- and 64-bit RISC processors  Java  C  C++  Java  Why Java?
Embedded Systems Design ICT Embedded System What is an embedded System??? Any IDEA???
Real time DSP Professors: Eng. Julian Bruno Eng. Mariano Llamedo Soria.
Computer Organization and Architecture Reduced Instruction Set Computers (RISC) Chapter 13.
CH13 Reduced Instruction Set Computers {Make hardware Simpler, but quicker} Key features  Large number of general purpose registers  Use of compiler.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Chapter 1 An Introduction to Processor Design 부산대학교 컴퓨터공학과.
Introduction to Computer Organization and Architecture Micro Program ภาษาเครื่อง ไมโครโปรแกรม.
RISC By Ryan Aldana. Agenda Brief Overview of RISC and CISC Features of RISC Instruction Pipeline Register Windowing and renaming Data Conflicts Branch.
Part 1.  Intel x86/Pentium family  32-bit CISC processor  SUN SPARC and UltraSPARC  32- and 64-bit RISC processors  Java  C  C++  Java  Why Java?
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
M. Mateen Yaqoob The University of Lahore Spring 2014.
ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.
Reduced Instruction Set Computers. Major Advances in Computers(1) The family concept —IBM System/ —DEC PDP-8 —Separates architecture from implementation.
DIGITAL SIGNAL PROCESSORS. Von Neumann Architecture Computers to be programmed by codes residing in memory. Single Memory to store data and program.
Hardware/Software Codesign of Embedded Systems Power/Voltage Management Voicu Groza School of Information Technology and Engineering
Computer Operation. Binary Codes CPU operates in binary codes Representation of values in binary codes Instructions to CPU in binary codes Addresses in.
Chapter Overview General Concepts IA-32 Processor Architecture
Overview Motivation (Kevin) Thermal issues (Kevin)
Topics to be covered Instruction Execution Characteristics
ARM.
Low-power Digital Signal Processing for Mobile Phone chipsets
Visit for more Learning Resources
Evaluating Register File Size
A Closer Look at Instruction Set Architectures
William Stallings Computer Organization and Architecture 8th Edition
William Stallings Computer Organization and Architecture 8th Edition
CS161 – Design and Architecture of Computer Systems
Embedded Systems Design
Chapter 9 a Instruction Level Parallelism and Superscalar Processors
Advanced Topic: Alternative Architectures Chapter 9 Objectives
Chapter 14 Instruction Level Parallelism and Superscalar Processors
CGRA Express: Accelerating Execution using Dynamic Operation Fusion
Chapter 3 Top Level View of Computer Function and Interconnection
Digital Signal Processors
Central Processing Unit CPU
Instruction Level Parallelism and Superscalar Processors
Introduction to Digital Signal Processors (DSPs)
Central Processing Unit
CISC AND RISC SYSTEM Based on instruction set, we broadly classify Computer/microprocessor/microcontroller into CISC and RISC. CISC SYSTEM: COMPLEX INSTRUCTION.
EE 445S Real-Time Digital Signal Processing Lab Spring 2014
Overheads for Computers as Components 2nd ed.
A High Performance SoC: PkunityTM
Nov. 12, 1997 Bob Brodersen ( CS 152 Computer Architecture and Engineering Introduction to Architectures for Digital.
Embedded System Hardware - Processing -
Computer Architecture
ARM.
What is Computer Architecture?
What is Computer Architecture?
What is Computer Architecture?
Chapter 12 Pipelining and RISC
Created by Vivi Sahfitri
Course Outline for Computer Architecture
CSE378 Introduction to Machine Organization
Presentation transcript:

Embedded System Hardware Embedded system hardware is frequently used in a loop („hardware in a loop“): actuators

Processing units Need for efficiency (power + energy): Why worry about energy and power? „Power is considered as the most important constraint in embedded systems“ [in: L. Eggermont (ed): Embedded Systems Roadmap 2002, STW] Current UMTS phones can hardly be operated for more than an hour, if data is being transmitted. [from a report of the Financial Times, Germany, on an analysis by Credit Suisse First Boston; http://www.ftd.de/tm/tk/9580232.html?nv=se]

The energy/flexibility conflict - Intrinsic Power Efficiency - Operations/Watt [MOPS/mW] Ambient Intelligence 10 DSP-ASIPs hardwired (ASIC) 1 Reconfigurable Computing Processors µPs poor design generation techniques 0.1 0.01 Technology 1.0µ 0.5µ 0.25µ 0.13µ 0.07µ Necessary to optimize! [H. de Man, Keynote, DATE‘02; T. Claasen, ISSCC99]

Power and energy are related to each other In many cases, faster execution also means less energy, but the opposite may be true if power has to be increased to allow faster execution.

Low Power vs. Low Energy Consumption Minimizing the power consumption is important for the design of the power supply the design of voltage regulators the dimensioning of interconnect short term cooling Minimizing the energy consumption is important because of restricted availability of energy (mobile systems) limited battery capacities (only slowly improving) very high costs of energy (solar panels, in space) cooling high costs limited space dependability long lifetimes, low temperatures

Application Specific Circuits (ASICS) or Full Custom Circuits Custom-designed circuits necessary if ultimate speed or energy efficiency is the goal and large numbers can be sold. Approach suffers from long design times and high costs (e.g. Mill. $ mask costs). ASIC synthesis not covered in this course.

Processors At the chip level, embedded chips include micro-controllers and microprocessors. Micro-controllers are the true workhorses of the embedded family. They are the original ’embedded chips’ and include those first employed as controllers in elevators and thermostats [Ryan, 1995]. Key requirements: Energy-efficiency Code-size efficiency: Memory is a scarce resource in embedded systems, in particular for „systems-on-a-chip'' (SoC). Run-time efficiency

New ideas can actually reduce energy consumption Pentium Crusoe Running the same multimedia application. As published by Transmeta [www.transmeta.com]

Dynamic power management (DPM) Example: STRONGARM SA1100 400mW RUN: operational IDLE: a sw routine may stop the CPU when not in use, while monitoring interrupts SLEEP: Shutdown of on-chip activity RUN 10µs 90µs 160ms 10µs IDLE 90µs SLEEP 50mW 160µW

Fundamentals of dynamic voltage scaling (DVS) Power consumption of CMOS circuits (ignoring leakage): Delay for CMOS circuits:  Decreasing Vdd reduces P quadratically, while the run-time of algorithms is only linearly increased (ignoring the effects of the memory system).

Voltage scaling: Example Exploitation discussed in codesign chapter Vdd [Courtesy, Yasuura, 2000]

Variable-voltage/frequency example: INTEL Xscale OS should schedule distribution of the energy budget. From Intel’s Web Site

Code-size efficiency CISC machines: RISC machines designed for run-time-, not for code-size-efficiency Compression techniques: key idea

Code-size efficiency Compression techniques (continued): 2nd instruction set, z.B. ARM Thumb instruction set: Reduction to 65-70 % of original code size 130% of ARM performance with 8/16 bit memory 85% of ARM performance with 32-bit memory 1110 001 01001 0 Rd 0 Rd 0000 Constant 16-bit Thumb instr. ADD Rd #constant 001 10 Rd Constant zero extended major opcode minor opcode source= destination [ARM, R. Gupta]

Two-level control store concept (indirect addressing of instructions) For each instruction address, S contains table address of instruction. Each instruction pattern is stored only once, and not repeatedly stored for each instruction address for which it is needed. Similar to concept of colour lookup table. Can be extended to include subroutines in lookup table. Called nanoprogramming in the Motorola 68000. instruction address S << 32 bit table of used instructions 32 bit CPU

Run-time optimization: Domain-oriented architectures Application: y[j] = i=0 x[j-i]*a[i] i: 0i  n-1: yi[j] = yi-1[j] + x[j-i]*a[i] Architecture: Example: Data path ADSP210x MR MF MX MY * +,- AR AF AX AY +,-,.. D P yi-1[j] x[j-i] x[j-i]*a[i] a[i] Address generation unit (AGU) Address- registers A0, A1, A2 .. i+1, j-i+1 a x - Parallelism - Dedicated registers MR:=0; A1:=1; A2:=n-2; MX:=x[n-1]; MY:=a[0]; for ( j:=1 to n) {MR:=MR+MX*MY; MY:=a[A1]; MX:=x[A2]; A1++; A2--}

Digital Signal Processing (DSP) Processors - Features (1) - Multiply/accumulate (MAC) and zero-overhead loop (ZOL) instructions (as shown) Heterogeneous registers (as shown) Separate address generation units (AGUs) (as in ADSP 210x)

Digital Signal Processing (DSP) Processors - Features (2) - sliding window Modulo addressing: Am++  Am:=(Am+1) mod n (implements ring or circular buffer in memory) x t2 t1 t .. x[n-2] x[n-1] x[0] x[1] .. .. x[n-3] x[n-2] x[n-1] x[n] x[1] Memory, t=t1 Memory, t2=t1+1

Saturating arithmetic Saturating arithmetic: returns largest/smallest number in case of over/underflows Example: a 0111 b + 1001 standard wrap around arithmetic (1)0000 saturating arithmetic 1111 (a+b)/2: correct 1000 wrap around arithmetic 0000 saturating arithmetic + shifted 0111 „almost correct“

Fixed-point arithmetic Shifting required after multiplications and divisions in order to maintain binary point.

Real-time capability Timing behavior has to be predictable Features that cause problems: Caches, in particular with difficult to predict replacement strategies Unified caches (conflicts between instructions and data) Pipelines with difficult to predict stall cycles ("bubbles") Interrupts that are possible any time Memory refreshes that are possible any time Instructions that have data-dependent execution times [Dagstuhl workshop on predictability, Nov. 17-19, 2003]

Multiple memory banks or memories MR MF MX MY * +,- AR AF AX AY +,-,.. D P Address generation unit (AGU) Address- registers A0, A1, A2 .. Simplifies parallel fetches