Ultra-low Power for Always-on with Minima Dynamic Margining

Slides:



Advertisements
Similar presentations
1 EE5900 Advanced Embedded System For Smart Infrastructure Energy Efficient Scheduling.
Advertisements

Processing Efficiency Jonah Probell Multimedia Systems Engineer Tensilica Truly Understanding Low-Power Multimedia Chip Design.
Semiconductor Memory Design. Organization of Memory Systems Driven only from outside Data flow in and out A cell is accessed for reading by selecting.
MotoHawk Training Model-Based Design of Embedded Systems.
Lecture 2: Modern Trends 1. 2 Microprocessor Performance Only 7% improvement in memory performance every year! 50% improvement in microprocessor performance.
Subthreshold Logic Energy Minimization with Application- Driven Performance EE241 Final Project Will Biederman Dan Yeager.
Device Sizing Techniques for High Yield Minimum-Energy Subthreshold Circuits Dan Holcomb and Mervin John University of California, Berkeley EE241 Spring.
Embedded Computing From Theory to Practice November 2008 USTC Suzhou.
Optimization Of Power Consumption For An ARM7- BASED Multimedia Handheld Device Hoseok Chang; Wonchul Lee; Wonyong Sung Circuits and Systems, ISCAS.
Intel ® Research mote Ralph Kling Intel Corporation Research Santa Clara, CA.
BLDC MOTOR SPEED CONTROL USING EMBEDDED PROCESSOR
Low power CDN. SPEED Operate vdd at half rails Data should operate at full rails.
MSP432™ MCUs Training Part 4: Clock System & Memory
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
1 VLSI and Computer Architecture Trends ECE 25 Fall 2012.
Word-Size Optimization for Low Energy, Variable Workload Sub-threshold Systems Sudhanshu Khanna, Anurag Nigam ECE 632 – Fall 2008 University of Virginia.
Low Power Wireless Design Dr. Ahmad Bahai National Semiconductor.
Andrea Marongiu Luca Benini ETH Zurich Daniele Cesarini University of Bologna.
Low-Power Wireless Sensor Networks
High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim Performance Analysis of TCP/IP Data.
® 1 VLSI Design Challenges for Gigascale Integration Shekhar Borkar Intel Corp. October 25, 2005.
Scheduling policies for real- time embedded systems.
Data Logging Solution for Digital Signal Processors Brian Newberry Nekton Research, Inc. James M. Conrad University of North.
Title of Selected Paper: IMPRES: Integrated Monitoring for Processor Reliability and Security Authors: Roshan G. Ragel and Sri Parameswaran Presented by:
© 2004, D. J. Foreman 1 Computer Organization. © 2004, D. J. Foreman 2 Basic Architecture Review  Von Neumann ■ Distinct single-ALU & single-Control.
By Sewvanda Hewa Thumbellage Don, Meshegna Shumye, Owen Paxton, Mackenzie Cook, Jonathon Lee, Mohamed Khelifi, Rami Albustami, Samantha Trifoli 1.
0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.
Patricia Gonzalez Divya Akella VLSI Class Project.
CSE466 - Fall What is an Embedded System  Its not a desktop system  Fixed or semi-fixed functionality (not user programmable)  Lacks some or.
Penn ESE534 Spring DeHon 1 ESE534 Computer Organization Day 19: March 28, 2012 Minimizing Energy.
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
CS203 – Advanced Computer Architecture
Product Overview 박 유 진박 유 진.  Nordic Semiconductor ASA(Norway 1983)  Ultra Low Power Wireless Communication System Solution  Short Range Radio Communication(20.
EE141 Timing Issues 1 Chapter 10 Timing Issues Rev /11/2003 Rev /28/2003 Rev /05/2003.
ULTRALEV: Ultra-Low-Energy Video Sensor Networks for IoT ($700B market) Technology Research Center University of Turku Finland.
Software Architecture of Sensors. Hardware - Sensor Nodes Sensing: sensor --a transducer that converts a physical, chemical, or biological parameter into.
Matthew Locke November 2007 A Linux Power Management Architecture.
ALPHA 21164PC. Alpha 21164PC High-performance alternative to a Windows NT Personal Computer.
Programmable Logic Devices
i.MX Processor Roadmap i.MX 8 family i.MX 8M family i.MX 8X family
Crusoe Processor Seminar Guide: By: - Prof. H. S. Kulkarni Ashish.
CS203 – Advanced Computer Architecture
TI Information – Selective Disclosure
ARM Embedded Systems
Temperature and Power Management
Memory Segmentation to Exploit Sleep Mode Operation
Seth Pugsley, Jeffrey Jestes,
Andrea Acquaviva, Luca Benini, Bruno Riccò
ECE Department, University of California, Davis
Evaluating Register File Size
Low-Power SRAM Using 0.6 um Technology
The deadline establish a priority among interrupt requests.
System On Chip.
Embedded Systems Design
Introduction ( A SoC Design Automation)
Ultra-Low-Power Sensor Nodes Featuring a Virtual Runtime Environment
Circuits and Interconnects In Aggressively Scaled CMOS
NRF52832 BLE.
Ultra-Low-Voltage UWB Baseband Processor
HMP for IoT – The path to powerful ultra-efficient nodes
A 100 µW, 16-Channel, Spike-Sorting ASIC with On-the-Fly Clustering
Dual Mode Logic An approach for high speed and energy efficient design
Energy Efficient Scheduling in IoT Networks
Chapter 10 Timing Issues Rev /11/2003 Rev /28/2003
Power improvement in the multitasking environment
A High Performance SoC: PkunityTM
Computer Organization
Research Topics Embedded, Real-time, Sensor Systems Frank Mueller moss
Front-end Digitization for fast Imagers.
Martin Croome VP Business Development GreenWaves Technologies.
Presentation transcript:

Ultra-low Power for Always-on with Minima Dynamic Margining Lauri Koskinen, CTO, Minima Processor

Lowering Voltage and Energy in Real Time 101 Run to Completion Quadratic Energy Savings Higher voltage Higher energy Task DL Embedded many times single task Voltage scaling required Just-in-Time DVFS Time Near-Threshold Operation

Ultra-Wide DVFS required UP TO 15x Legacy µC, 1-2 VDDs, CLK scaling AI Energy Loss AUDIO Detection BLE LINK Higher voltage Higher energy AUDIO No detection BLE L2CAP

With E= CV2, Much Left on the Table 10 UP TO 15x 9 Audio BTLE Crypto Video WIFI GPS High perf 8 7 6 5 4 3 2 Tässä on tärkeätä käydä law of diminishing returns läpi 1 Energy savings Conventional technology

Real-World Measurement Examples SHA256, Commercial DSP IP Keyword spotting: 2,6x energy savings ARM KWS reference SW split into two tasks: Speech-band energy detection (20 MHz / ARM M3) CNN (200 MHz / ARM M3) Higher energy saving with low- activity data (silent room) Operation point change overhead: 250 cycles (SW) + 550 cycles (DCDC 15 mV/µs) 3,5x Energy Savings Ts. KWS ~ 10MCycleä ja BandPass ~ 0.8MCycleä, Ja nämä ajetaan siis 10 x sekunnissa. Delay = (20e6 MHz)/(1e6 us)*(900mV-500mV)/15mV/us Image recognition / vehicle classifier: Nx Five-layer CNN (X MHz / ARM M3) Split X and Y Bluetooth Low Energy stack: Nx

The Challenge Margins

Solution: Margin Dynamically Process Voltage Temperature CLK Conventional Design Feedback loop delay, Gate functionality, etc. CLK Minima Margining Pre-Silicon Minima Margining Working Silicon CLK With DVFS intermittent large margin averages => 0

Create Feedback Loop with In-Situ Monitors Detections CLK control

Several Feedback Loops Required Power profile VDD, Freq. Clock phase SW OS governors, Power profiles, Minima drivers HW Minima clock architecture HW Minima PM Single cycle Tens of ns Interrupt

Loops in Dynamic Margining BTLE Stack @ 0.7V, 64MHz, SS chip Housekeeping @ 0.4V, 2MHz, SS chip Minima HW – SW Interface Housekeeping @ 0.35V, 2MHz, SS chip

1st Feedback Loop Measurements

2nd-Order Benefits: Power Distribution UW-DVFS Power Spread Up To 2x SS limits speed Reduced power variation! FF sets the high power point (and leakage) Mainitse Temperature inversion: “With NT, temperature inversion complicates things even more. I’ll let you figure out what the most power hungry point is at worst temp”

enabled by Minima soft IP Minima: The Product Minima API analyzes your code for optimal energy states Application Code Easily integrable driver offers constant performance or energy minimization per application or data RTOS DVFS Drivers Ultra-Wide DVFS HW enabled by Minima soft IP Minima technology integrable to any IP: ARM, RISC-V, DSP

Final Thoughts Radio energy: 100 nJ/bit (Zigbee), 50.00 nJ/bit (BLE), 3.70 nJ/bit (Wifi) Memory energy: ~100pJ Local memory accesses, DRAM accesses Flash much more Processor energy today: ≥~50pj (CMOS) Processor minimum energy ~5pj Memory compression becomes cheaper in terms of energy Edge computing (contextual computing, fog computing, etc.) becomes cheaper in terms of energy Firman täytyy tarjota muistikompressio hardista ja edge koodia! Acknowledgements Ali M. Niknejad