A 16-Bit Low-Power Microcontroller with Monolithic MEMS-LC Clocking

Slides:



Advertisements
Similar presentations
Mobius Microsystems Microsystems Mbius Slide 1 of 21 A 9.2mW 528/66/50MHz Monolithic Clock Synthesizer for Mobile µP Platforms Custom Integrated Circuits.
Advertisements

DSPs Vs General Purpose Microprocessors
Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.
VADA Lab.SungKyunKwan Univ. 1 L3: Lower Power Design Overview (2) 성균관대학교 조 준 동 교수
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
UMIPS: A Semiconductor IP Repository for IC Design Research and Education Lecturer Michael S. McCorquodale Authors Michael S. McCorquodale and Richard.
Power Reduction Techniques For Microprocessor Systems
Lecturer Michael S. McCorquodale Authors Michael S. McCorquodale, Fadi H. Gebara, Keith L. Kraver, Eric D. Marsman, Robert M. Senger, and Richard B. Brown.
Lecturer Michael S. McCorquodale Authors Michael S. McCorquodale, Mei Kim Ding, and Richard B. Brown Top-Down and Bottom-Up Approaches to Stable Clock.
FIU Chapter 7: Input/Output Jerome Crooks Panyawat Chiamprasert
WIMS Capstone Proposal DSP Demo Abigail Fuentes Rivera Esteban Valentin Lugo Michael Ortiz Sanchez ICOM 5047 Prof Nayda Santiago.
Low Power Memory. Quick Start Training Agenda What constitutes low power memory Variations & vendors of low power memory How to interface using CoolRunner-II.
University of Michigan Electrical Engineering and Computer Science 1 Reducing Control Power in CGRAs with Token Flow Hyunchul Park, Yongjun Park, and Scott.
University of Michigan Electrical Engineering and Computer Science 1 Increasing the Number of Effective Registers in a Low-Power Processor Using a Windowed.
Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.
Integrated  -Wireless Communication Platform Jason Hill.
VIRAM-1 Architecture Update and Status Christoforos E. Kozyrakis IRAM Retreat January 2000.
Mehdi Amirijoo1 Power estimation n General power dissipation in CMOS n High-level power estimation metrics n Power estimation of the HW part.
CSE477 L26 System Power.1Irwin&Vijay, PSU, 2002 Low Power Design in Microarchitectures and Memories [Adapted from Mary Jane Irwin (
Optimization Of Power Consumption For An ARM7- BASED Multimedia Handheld Device Hoseok Chang; Wonchul Lee; Wonyong Sung Circuits and Systems, ISCAS.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 13: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
Lecture 7: Power.
CS 423 – Operating Systems Design Lecture 22 – Power Management Klara Nahrstedt and Raoul Rivas Spring 2013 CS Spring 2013.
Analysis of Instruction-level Vulnerability to Dynamic Voltage and Temperature Variations ‡ Computer Science and Engineering, UC San Diego variability.org.
Spring 2000, 4/27/00 Power evaluation of SmartDust remote sensors CS 252 Project Presentation Robert Szewczyk Andras Ferencz.
Input/Output. Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower.
EE466: VLSI Design Power Dissipation. Outline Motivation to estimate power dissipation Sources of power dissipation Dynamic power dissipation Static power.
CSE477 L26 System Power.1Irwin&Vijay, PSU, 2002 TKT-1527 Digital System Design Issues Low Power Techniques in Microarchitectures and Memories Mary Jane.
Low Power Techniques in Processor Design
1 CMOS Temperature Sensor with Ring Oscillator for Mobile DRAM Self-refresh Control IEEE International Symposium on Circuits and Systems, Chan-Kyung.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Renesas Electronics Europe GmbH A © 2010 Renesas Electronics Corporation. All rights reserved. RL78 Clock Generator.
Introduction to Computing: Lecture 4
RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696
Wireless Intelligent Sensor Modules for Home Monitoring and Control Presented by: BUI, Phuong Nhung, 裴芳绒 António M. Silva1, Alexandre Correia1, António.
1 SERIAL PORT INTERFACE FOR MICROCONTROLLER EMBEDDED INTO INTEGRATED POWER METER Mr. Borisav Jovanović, Prof.dr Predrag Petković, Prof.dr. Milunka Damnjanović,
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
TRIPS – An EDGE Instruction Set Architecture Chirag Shah April 24, 2008.
Dept. of Computer Science, UC Irvine
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.
Implementation of a Simple 8-bit Microprocessor with Reversible Energy Recovery Logic Seokkee Kim and Soo-Ik Chae System Design Group School of Electrical.
ATtiny23131 A SEMINAR ON AVR MICROCONTROLLER ATtiny2313.
경종민 Low-Power Design for Embedded Processor.
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
1 Energy-Efficient Register Access Jessica H. Tseng and Krste Asanović MIT Laboratory for Computer Science, Cambridge, MA 02139, USA SBCCI2000.
UltraSPARC III Hari P. Ananthanarayanan Anand S. Rajan.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 6.1 EE4800 CMOS Digital IC Design & Analysis Lecture 6 Power Zhuo Feng.
Click to edit Master title style Progress Update Energy-Performance Characterization of CMOS/MTJ Hybrid Circuits Fengbo Ren 05/28/2010.
1 Dual-V cc SRAM Class presentation for Advanced VLSIPresenter:A.Sammak Adopted from: M. Khellah,A 4.2GHz 0.3mm 2 256kb Dual-V CC SRAM Building Block in.
1 Power-Aware System on a Chip A. Laffely, J. Liang, R. Tessier, C. A. Moritz, W. Burleson University of Massachusetts Amherst Boston Area Architecture.
Equalizer: Dynamically Tuning GPU Resources for Efficient Execution Ankit Sethia* Scott Mahlke University of Michigan.
LOW POWER DESIGN METHODS
1 Compiler Managed Dynamic Instruction Placement In A Low-Power Code Cache Rajiv Ravindran, Pracheeti Nagarkar, Ganesh Dasika, Robert Senger, Eric Marsman,
Seminar On 8085 microprocessor
LOW POWER DESIGN METHODS V.ANANDI ASST.PROF,E&C MSRIT,BANGALORE.
Christopher Han-Yu Chou Supervisor: Dr. Guy Lemieux
Improving Program Efficiency by Packing Instructions Into Registers
An Introduction to Microprocessor Architecture using intel 8085 as a classic processor
Chris Savarese, Yashesh Shroff, Greg Lawrence
Getting the Most Out of Low Power MCUs
Overheads for Computers as Components 2nd ed.
A High Performance SoC: PkunityTM
Chapter 1 Introduction.
8051 Micro Controller.
COMS 361 Computer Organization
Abbas Rahimi‡, Luca Benini†, and Rajesh Gupta‡ ‡CSE, UC San Diego
ADSP 21065L.
Presentation transcript:

A 16-Bit Low-Power Microcontroller with Monolithic MEMS-LC Clocking Eric D. Marsman1, Robert M. Senger1, Michael S. McCorquodale2, Matthew R. Guthaus1, Rajiv A. Ravindran1, Ganesh S. Dasika1, Scott A. Mahlke1, Richard B. Brown3 1University of Michigan, 2Mobius Microsystems, 3University of Utah IEEE International Symposium on Circuits and Systems May 23rd – May 26th, 2005, Kobe, Japan

Overview Motivation Microsystem Architecture Microcontroller Clock Generation Dynamic Frequency Scaling (DFS) Microsystem Measured Results Compiler Utilization Instruction Level Power Modeling DFS Future Directions Conclusion

Motivation Wireless Integrated Microsystems (WIMS) Environmental Sensors Biomedical Implants Cochlear Implant Heavy Metals Deep Brain Implants m Gas Chromatograph

Commercially available cores Motivation (cont) Power minimization Frequency scaling Voltage scaling Memory architecture Process technology Leakage current mitigation Core Process Frequency No. Bits Core Power ARM7TDMI 0.18um 88MHz 32 22mW Tensilica Xtensa 200MHz 80mW MIPS32M4K 0.13um 300MHz 84mW Infineon C166S 80MHz 16 160mW Commercially available cores

Microsystem Architecture 16-bit, 3-stage pipeline Software controlled register interface to clock generator Peripheral communication interfaces for flexibility

Microcontroller Architecture Primarily a Load-Store architecture 77 instructions, 8 addressing modes Data and address registers split into two windows Hardware support for one level of interrupts and subroutines Banked memory architecture with additional external memory interface Energy/area tradeoffs compared to single 64kB bank Low-power loop cache for commonly executed instructions 15.9% more area 69.2% less power

Monolithic Clock Generation Complementary, cross coupled, negative-transconductance tank Frequency trimming via modulation of tail current with vtrim CMOS compatible 1.056GHz oscillation frequency Buffer amplifier removes amplitude variation

Dynamic Frequency Scaling Fully synthesized logic, no custom design Synchronization chain ensures glitch free output Optional external clock input

Dynamic Frequency Scaling (cont) Glitch suppression example

Microsystem Measured Results TSMC 0.18mm MM/RF bulk CMOS 3.5 million transistors Operates up to 92MHz 33.9mW core power consumption @ 92MHz & 1.8V 1.4mW core power consumption @ 10MHz & 1.1V 17.28mW MEMS clock source power consumption @ 1.8V 740mW sleep power consumption @ 1.1V 3.54mm

Microcontroller Measured Results Static loop cache utilization provides 4 to 20% energy savings Vdd scaling across different frequencies allows for adjustment to program workload requirements Loop cache energy savings Power vs. Vdd across frequency ranges

Energy savings in 64B loop cache WIMS C Compiler Windowed versus non-windowed machine 19% reduction in power consumption 30% performance improvement Dynamic instruction placement in 512B loop cache achieves 43% energy savings over static placement Energy savings in 64B loop cache

Instruction Level Power Modeling Divide ISA into groups of similar instructions noops model inter-instruction pipeline switching Account for memory access energy separately Instruction Group Energy (nJ) add-sub 0.2403 win swap 0.1832 shift 0.1950 load imm 0.1961 boolean 0.2127 branch-nt 0.1720 compare 0.2082 branch-t 0.5741 multiply 2.7702 jmp abs 0.5372 divide 2.7160 jmp rel 0.4020 copy jmp abs sub 0.5658 bit 0.6137 jmp rel sub 0.3527 load abs 0.5249 return 0.3700 load rel 0.3661 swi 0.5585 store abs 0.4427 store rel 0.3070 noop 0.1931 Ext Mem (nJ)1 Loop (nJ) MMR (nJ) Boot Rom (nJ) inst fetch -0.0554 -0.0507 - -0.0420 bit2 -0.1643 -0.1615 -0.1909 load abs2 -0.0976 -0.1016 -0.0877 load rel2 -0.1039 -0.1091 store abs2 -0.0411 -0.0461 -0.0427 store rel2 -0.0525 -0.0633 -0.0575 1 Excludes memory access energy as this is memory dependent 2 Fetch energy counted separately Memory access energy Energy per instruction group

Clock Generation Results No external reference No PLL/DLL High frequency accuracy Low start-up latency Low temperature coefficient Broad operating temperature range Low jitter Minimal area overhead (3% of die) Low Power All Si technology Metric/Parameter LC Clock Reference frequency 1056MHz Output frequencies 0.002 – 66MHz Frequency accuracy across lot ±0.75% Frequency precision (no trim) ±2% Trimmed frequency accuracy 100ppm Worst case duty cycle 48/52 Worst case RMS period jitter <300ppm Temperature stability ±0.9% (-40 to 100C) Max. operation temperature 150C Power supply 1.8V Bias current 9.6mA Power dissipation 17.28mW Min. operating power 7.2mW Start-up latency (25C/125C) 18ns/28ns Si footprint 0.3mm2

MEMS Fabrication Post processing etch using PAD cut Suspended inductor Varactor etch unsuccessful No etch chemistry for MiM oxy-nitride dielectric Use transconductance modulation instead By the way, the deal with the varactors was that we were trying to make very small-gap varactors out of the MiM structure in order to reduce microphonic sensitivity. However, the dielectric for the MiM is an oxy-nitride and though we had a chemistry for etching oxide in the presence of Al, we could not determine one for oxy-nitride in the presence of Al.

DFS Results Glitch free switching Switching latency is 5/2f0, or 37.45ns for this implementation

Preliminary next generation system Future Directions Add DSP for Cochlear Implants and other bio-medical devices Include ring oscillator for a lower power alternative ISA improvements to reduce compiler bottlenecks Address register support Separate data and address register windows DMA instructions Decrease sleep mode power Explore Microsystem design in advanced technologies 3.0mm Preliminary next generation system

Conclusion Described a highly-functional, low-power Microsystem ideally suited for remote and bio-medical applications DFS allows on-the-fly, low-latency adaptation to workload requirements from 33.9mW @ 90MHz to 1.4mW @ 10MHz or sleep mode at 740mW Monolithic clock reference decreases system size, cost, and power consumption compared to other techniques Power-aware compiler takes advantage of low-power architectural features to achieve maximum power reduction

Acknowledgements NSF ERC for WIMS MOSIS Educational Program Artisan Components TSMC Cadence Synopsys Mentor Graphics Coventor