Laboratory for Computer Science Massachusetts Institute of Technology

Slides:



Advertisements
Similar presentations
Asanovic/Devadas Spring VLIW/EPIC: Statically Scheduled ILP Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology.
Advertisements

Micro controllers introduction. Areas of use You are used to chips like the Pentium and the Athlon, but in terms of installed machines these are a small.
Embedded Systems Design: A Unified Hardware/Software Introduction 1 Chapter 10: IC Technology.
CMOS Logic Circuits.
DSPs Vs General Purpose Microprocessors
Computer Abstractions and Technology
1 Power Management for High- speed Digital Systems Tao Zhao Electrical and Computing Engineering University of Idaho.
Power Reduction Techniques For Microprocessor Systems
Synchronous Digital Design Methodology and Guidelines
CS 252 Graduate Computer Architecture Lecture 14: Embedded Computing
S. Reda EN160 SP’08 Design and Implementation of VLSI Systems (EN1600) Lecture 14: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
8/18/05ELEC / Lecture 11 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
Institute of Digital and Computer Systems 1 Fabio Garzia / Finding Peak Performance in a Process23/06/2015 Chapter 5 Finding Peak Performance in a Process.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 13: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
Spring 2002EECS150 - Lec4-power Page 1 EECS150 - Digital Design Lecture 4 - Power January 31, 2002 John Wawrzynek Thanks to Simon Segars VP Engineering,
Lecture 5 – Power Prof. Luke Theogarajan
Lecture 7: Power.
Power-Aware Computing 101 CS 771 – Optimizing Compilers Fall 2005 – Lecture 22.
Mahapatra-Texas A&M-Spring'021 Power Issues with Embedded Systems Rabi Mahapatra Computer Science.
Low Power Design of Integrated Systems Assoc. Prof. Dimitrios Soudris
6.375 Complex Digital Systems Krste Asanovic March 7, 2007
Engineering 1040: Mechanisms & Electric Circuits Fall 2011 Introduction to Embedded Systems.
CS 423 – Operating Systems Design Lecture 22 – Power Management Klara Nahrstedt and Raoul Rivas Spring 2013 CS Spring 2013.
Computer performance.
L29:Lower Power Embedded Architecture Design 성균관대학교 조 준 동 교수,
6.893: Advanced VLSI Computer Architecture, September 28, 2000, Lecture 4, Slide 1. © Krste Asanovic Krste Asanovic
EE466: VLSI Design Power Dissipation. Outline Motivation to estimate power dissipation Sources of power dissipation Dynamic power dissipation Static power.
Computing Hardware Starter.
17 Sep 2002Embedded Seminar2 Outline The Big Picture Who’s got the Power? What’s in the bag of tricks?
Lecture#14. Last Lecture Summary Memory Address, size What memory stores OS, Application programs, Data, Instructions Types of Memory Non Volatile and.
Ronny Krashinsky Seongmoo Heo Michael Zhang Krste Asanovic MIT Laboratory for Computer Science SyCHOSys Synchronous.
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
Low-Power Wireless Sensor Networks
Last Time Performance Analysis It’s all relative
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Power.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
Washington State University
Why Low Power Testing? 台大電子所 李建模.
Power-Aware Compilation CS 671 April 22, CS 671 – Spring Why Worry about Power Dissipation? Environment Thermal issues: affect cooling, packaging,
1 Interconnect/Via. 2 Delay of Devices and Interconnect.
Basics of Energy & Power Dissipation
© GCSE Computing Computing Hardware Starter. Creating a spreadsheet to demonstrate the size of memory. 1 byte = 1 character or about 1 pixel of information.
Chapter 5: Computer Systems Design and Organization Dr Mohamed Menacer Taibah University
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 6.1 EE4800 CMOS Digital IC Design & Analysis Lecture 6 Power Zhuo Feng.
Seok-jae, Lee VLSI Signal Processing Lab. Korea University
Slides created by: Professor Ian G. Harris Embedded Systems  Embedded systems are computer-based systems which are embedded inside another device (car,
Penn ESE534 Spring DeHon 1 ESE534 Computer Organization Day 19: March 28, 2012 Minimizing Energy.
CS203 – Advanced Computer Architecture
LOW POWER DESIGN METHODS
CS203 – Advanced Computer Architecture
EEE2135 Digital Logic Design Chapter 2. Embedded Computing/ Systems
Temperature and Power Management
ECE354 Embedded Systems Introduction C Andras Moritz.
Low-power Digital Signal Processing for Mobile Phone chipsets
LOW POWER DESIGN METHODS V.ANANDI ASST.PROF,E&C MSRIT,BANGALORE.
Lynn Choi School of Electrical Engineering
Morgan Kaufmann Publishers
Architecture & Organization 1
Reading: Hambley Ch. 7; Rabaey et al. Sec. 5.2
Architecture & Organization 1
Chapter 1 Introduction.
2.C Memory GCSE Computing Langley Park School for Boys.
Computer Evolution and Performance
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Utsunomiya University
Presentation transcript:

Laboratory for Computer Science Massachusetts Institute of Technology Embedded Computing Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology

How many microprocessors do you own? Average individual in developed country owns around 100 microprocessors Almost all are embedded Maybe 10,000 processors/person by 2012! (according to Moore’s Law)

Future Computing Infrastructure μWatt Wireless Sensor Networks Base Stations Wireless Networks The Internet PDAs, Cameras, Cellphones, Laptops, GPS, Set-tops, 0.1-10 Watt Clients Routers MegaWatt Server Farms

What is an Embedded Computer? A computer not used to run general-purpose programs, but instead used as a component of a system. Usually, user cannot change the computer program (except for minor upgrades). Example applications: Toasters Cellphone Digital camera (some have several processors) Games machines Set-top boxes (DVD players, personal video recorders, ...) Televisions Dishwashers Car (some have dozens of processors Router Cellphone basestation .... many more

Early Embedded Computing Examples • MIT Whirlwind, 1946-51 – developed for real-time flight simulator • Intel 4004, 1971 – developed for Busicom 141-PF printing calculator

Important Parameters for Embedded Computers Real-time performance hard real-time: if deadline missed system has failed (car brakes!) soft real-time: missing deadline degrades performance (skipping frames on DVD playback) Real-world I/O performance sensor and actuators require continuous I/O (can’t batch process) Cost includes cost of supporting structures, particularly memory static code size very important (cost of ROM/RAM) often ship millions of copies (worth engineer time to optimize cost down) Power expensive package and cooling affects cost, system size, weight

What is Performance? Latency (or response time or execution time) – time to complete one task Bandwidth (or throughput) – tasks completed per unit time

Performance Measurement Worst Case Rates Average Rates Inputs Execution Rate Average Rate: A > B > C Worst-case Rate: A < B < C Which is best for desktop performance? _______ Which is best for hard real-time task? _______

Future Computing Infrastructure μWatt Wireless Sensor Networks Processors defined by Watts not MIPS! Base Stations Wireless Networks The Internet PDAs, Cameras, Cellphones, Laptops, GPS, Set-tops, 0.1-10 Watt Clients Routers MegaWatt Server Farms

Physics Review Energy measured in Joules Power is rate of energy consumption measured in Watts (Joules/second) Instantaneous power is Vdd * Idd Battery Capacity Measured in Joules 720 Joules/gram for Lithium-Ion batteries 1 instruction on Intel XScale takes ~1nJ

Power versus Energy Integrate power curve to get Power energy Time Peak A Integrate power curve to get energy Power Peak B Time System A has higher peak power, but lower total energy System B has lower peak power, but higher total energy

Impacts on Computer System Energy consumed per task determines battery life Second order effect is that higher current draws decrease effective battery energy capacity (higher power also lowers battery life) Current draw causes IR drops in power supply voltage Requires more power/ground pins to reduce resistance R Requires thick&wide on-chip metal wires or dedicated metal layers Switching current (dI/dt) causes inductive power supply voltage bounce ∝ LdI/dt Requires more pins/shorter pins to reduce inductance L Requires on-chip/on-package decoupling capacitance to help bypass pins during switching transients Power dissipated as heat, higher temps reduce speed and reliability Requires more expensive packaging and cooling systems Fan noise Laptop temperature

Power Dissipation in CMOS Short-Circuit Current Diode Leakage Current Capacitor Charging Current Subthreshold Leakage Current Primary Components: Capacitor Charging (85-90% of active power) Energy is ½ CV2 per transition Short-Circuit Current (10-15% of active power) When both p and n transistors turn on during signal transition Subthreshold Leakage (dominates when inactive) Transistors don’t turn off completely Becoming more significant part of active power with scaling Diode Leakage (negligible) Parasitic source and drain diodes leak to substrate

Reducing Switching Power Power ∝ activity * ½ CV2 * frequency Reduce activity Reduce switched capacitance C Reduce supply voltage V Reduce frequency

Reducing Activity Clock Gating Bus Encodings Remove Glitches Enable Global Clock Clock Gating – don’t clock flip-flop if not needed – avoids transitioning downstream logic – Pentium-4 has hundreds of gated clocks Bus Encodings – choose encodings that minimize transitions on average (e.g., Gray code for address bus) – compression schemes (move fewer bits) Remove Glitches – balance logic paths to avoid glitches during settling – use monotonic logic (domino) Latch (transparent on clock low) Gated Local Clock

Reducing Switched Capacitance Reduce switched capacitance C Different logic styles (logic, pass transistor, dynamic) Careful transistor sizing Tighter layout Segmented structures Shared bus driven by A or B when sending values to C Bus Insert switch to isolate bus segment when B sending to C

Reducing Supply Voltage Quadratic savings in energy per transition – BIG effect • Circuit speed is reduced • Must lower clock frequency to maintain correctness

Reducing Frequency • Doesn’t save energy, just reduces rate at which it is consumed – Some saving in battery life from reduction in rate of discharge

Voltage Scaling for Reduced Energy Reducing supply voltage by 0.5 improves energy per transition by 0.25 Performance is reduced – need to use slower clock Can regain performance with parallel architecture Alternatively, can trade surplus performance for lower energy by reducing supply voltage until “just enough” performance Dynamic Voltage Scaling

Parallel Architectures Reduce Energy at Constant Throughput 8-bit adder/comparator 40MHz at 5V, area = 530 kμ2 Base power Pref Two parallel interleaved adder/compare units 20MHz at 2.9V, area = 1,800 kμ2 (3.4x) Power = 0.36 Pref One pipelined adder/compare unit 40MHz at 2.9V, area = 690 kμ2 (1.3x) Power = 0.39 Pref Pipelined and parallel 20MHz at 2.0V, area = 1,961 kμ2 (3.7x) Power = 0.2 Pref Chandrakasan et. al. “Low-Power CMOS Digital Design”, IEEE JSSC 27(4), April 1992

“Just Enough” Performance Run fast then stop Frequency Run slower and just meet deadline Time t=0 t=deadline Save energy by reducing frequency and voltage to minimum necessary (usually done in O.S.)

Voltage Scaling on Transmeta Crusoe TM5400 Frequency (MHz) Relative Performance (%) Voltage (V) Energy Power 700 100.0 1.65 600 85.7 1.60 94.0 80.6 500 71.4 1.50 82.6 59.0 400 57.1 1.40 72.0 41.4 300 42.9 1.25 57.4 24.6 200 28.6 1.10 44.4 12.7

Types of Embedded Computer General Purpose Processors often too expensive, too hot, too unpredictable, and require too much support logic for embedded applications Microcontroller emphasizes bit-level operations and control-flow intensive operations (a programmable state machine) usually includes on-chip memories and I/O devices DSP (Digital Signal Processor) organized around a multiply-accumulate engine for digital signal processing applications FPGA (Field Programmable Gate Array) reconfigurable logic can replace processors/DSPs for some applications

New Forms of Domain-Specific Processor Network processor arrays of 8-16 simple multithreaded processor cores on a single chip used to process Internet packets used in high-end routers Media processor conventional RISC or VLIW engine extended with media processing instructions (SIMD or Vector) used in set-top boxes, DVD players, digital cameras

DSP Processors Single 32-bit DSP instruction: AccA += (AR1++)*(AR2++) AReg 7 AReg 1 AReg 0 Off-chip memory X Mem Y Mem Addr X Multiply Addr Y ALU Single 32-bit DSP instruction: Acc. A Acc. B AccA += (AR1++)*(AR2++) Equivalent to one multiply, three adds, and two loads in RISC ISA!

Network Processors RISC Control Processor MicroEngine 15 MicroEngine 1 10Gb/s Microcode RAM Buffer RAM DRAM0 PC0 PC1 PC7 Buffer RAM Register File DRAM1 Buffer RAM DRAM2 Buffer RAM Eight threads per microengine ALU DRAM0 Buffer RAM Buffer RAM DRAM1 Scratchpad Data RAM Buffer RAM DRAM2 DRAM3

Programming Embedded Computers Microcontrollers, DSPs, network processors, media processors usually have complex, non- orthogonal instruction sets with specialized instructions and special memory structures poor compiled code quality (% peak with compiled code) high static code efficiency high MIPS/$ and MIPS/W usually assembly-coded in critical loops Worth one engineer year in code development to save $1 on system that will ship 1,000,000 units Assembly coding easier than ASIC chip design But room for improvement…