Proximity Optimization for Adaptive Circuit Design Ang Lu, Hao He, and Jiang Hu.

Slides:



Advertisements
Similar presentations
Porosity Aware Buffered Steiner Tree Construction C. Alpert G. Gandham S. Quay IBM Corp M. Hrkic Univ Illinois Chicago J. Hu Texas A&M Univ.
Advertisements

Gate Sizing for Cell Library Based Designs Shiyan Hu*, Mahesh Ketkar**, Jiang Hu* *Dept of ECE, Texas A&M University **Intel Corporation.
A Hierarchical Multiple Target Tracking Algorithm for Sensor Networks Songhwai Oh and Shankar Sastry EECS, Berkeley Nest Retreat, Jan
Tunable Sensors for Process-Aware Voltage Scaling
OCV-Aware Top-Level Clock Tree Optimization
Yi-Lin Chuang1, Sangmin Kim2, Youngsoo Shin2, and Yao-Wen Chang National Taiwan University, Taiwan KAIST, Korea 2010 DAC.
Reap What You Sow: Spare Cells for Post-Silicon Metal Fix Kai-hui Chang, Igor L. Markov and Valeria Bertacco ISPD’08, Pages
Improving Placement under the Constant Delay Model Kolja Sulimma 1, Ingmar Neumann 1, Lukas Van Ginneken 2, Wolfgang Kunz 1 1 EE and IT Department University.
Dynamic Data Compression in Multi-hop Wireless Networks Abhishek B. Sharma (USC) Collaborators: Leana Golubchik Ramesh Govindan Michael J. Neely.
National Tsing Hua University Po-Yang Hsu,Hsien-Te Chen,
Chop-SPICE: An Efficient SPICE Simulation Technique For Buffered RC Trees Myung-Chul Kim, Dong-Jin Lee and Igor L. Markov Dept. of EECS, University of.
Variability-Driven Formulation for Simultaneous Gate Sizing and Post-Silicon Tunability Allocation Vishal Khandelwal and Ankur Srivastava Department of.
Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu.
Minimal Skew Clock Embedding Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu.
Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.
June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.
1 A Variation-tolerant Sub- threshold Design Approach Nikhil Jayakumar Sunil P. Khatri. Texas A&M University, College Station, TX.
Subthreshold Logic Energy Minimization with Application- Driven Performance EE241 Final Project Will Biederman Dan Yeager.
Power-Aware Placement
Threshold Voltage Assignment to Supply Voltage Islands in Core- based System-on-a-Chip Designs Milestone 1: Gall Gotfried Steven Beigelmacher
Supply Voltage Degradation Aware Analytical Placement Andrew B. Kahng, Bao Liu and Qinke Wang UCSD CSE Department {abk, bliu,
Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts. Work supported.
Institute of Digital and Computer Systems 1 Fabio Garzia / Finding Peak Performance in a Process23/06/2015 Chapter 5 Finding Peak Performance in a Process.
1 A Single-supply True Voltage Level Shifter Rajesh Garg Gagandeep Mallarapu Sunil P. Khatri Department of Electrical and Computer Engineering, Texas A&M.
1 UCSD VLSI CAD Laboratory ISQED-2009 Revisiting the Linear Programming Framework for Leakage Power vs. Performance Optimization Kwangok Jeong, Andrew.
© 2005 Altera Corporation © 2006 Altera Corporation Placement and Timing for FPGAs Considering Variations Yan Lin 1, Mike Hutton 2 and Lei He 1 1 EE Department,
A Global Minimum Clock Distribution Network Augmentation Algorithm for Guaranteed Clock Skew Yield A. B. Kahng, B. Liu, X. Xu, J. Hu* and G. Venkataraman*
Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Automated Design.
Trace-Based Framework for Concurrent Development of Process and FPGA Architecture Considering Process Variation and Reliability 1 Lerong Cheng, 1 Yan Lin,
Threshold Voltage Assignment to Supply Voltage Islands in Core- based System-on-a-Chip Designs Project Proposal: Gall Gotfried Steven Beigelmacher 02/09/05.
UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD.
Page 1 Department of Electrical Engineering National Chung Cheng University, Chiayi, Taiwan Power Optimization for Clock Network with Clock Gate Cloning.
Accuracy-Configurable Adder for Approximate Arithmetic Designs
Sidewinder A Predictive Data Forwarding Protocol for Mobile Wireless Sensor Networks Matt Keally 1, Gang Zhou 1, Guoliang Xing 2 1 College of William and.
A New Methodology for Reduced Cost of Resilience Andrew B. Kahng, Seokhyeong Kang and Jiajia Li UC San Diego VLSI CAD Laboratory.
Power Reduction for FPGA using Multiple Vdd/Vth
1 Coupling Aware Timing Optimization and Antenna Avoidance in Layer Assignment Di Wu, Jiang Hu and Rabi Mahapatra Texas A&M University.
Low-Power Wireless Sensor Networks
March 20, 2007 ISPD An Effective Clustering Algorithm for Mixed-size Placement Jianhua Li, Laleh Behjat, and Jie Huang Jianhua Li, Laleh Behjat,
UC San Diego / VLSI CAD Laboratory Incremental Multiple-Scan Chain Ordering for ECO Flip-Flop Insertion Andrew B. Kahng, Ilgweon Kang and Siddhartha Nath.
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
1 Wire Length Prediction-based Technology Mapping and Fanout Optimization Qinghua Liu Malgorzata Marek-Sadowska VLSI Design Automation Lab UC-Santa Barbara.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
Statistical Sampling-Based Parametric Analysis of Power Grids Dr. Peng Li Presented by Xueqian Zhao EE5970 Seminar.
ECO Timing Optimization Using Spare Cells Yen-Pin Chen, Jia-Wei Fang, and Yao-Wen Chang ICCAD2007, Pages ICCAD2007, Pages
RF network in SoC1 SoC Test Architecture with RF/Wireless Connectivity 1. D. Zhao, S. Upadhyaya, M. Margala, “A new SoC test architecture with RF/wireless.
DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen and Jason Cong Computer Science Department University of California,
Pattern Sensitive Placement For Manufacturability Shiyan Hu, Jiang Hu Department of Electrical and Computer Engineering Texas A&M University College Station,
Pattern Sensitive Placement For Manufacturability Shiyan Hu, Jiang Hu Department of Electrical and Computer Engineering Texas A&M University College Station,
I N V E N T I V EI N V E N T I V E A Morphing Approach To Address Placement Stability Philip Chong Christian Szegedy.
LatchPlanner:Latch Placement Algorithm for Datapath-oriented High-Performance VLSI Design Minsik Cho, Hua Xiang, Haoxing Ren, Matthew M. Ziegler, Ruchir.
Outline Introduction: BTI Aging and AVS Signoff Problem
Physical Synthesis Buffer Insertion, Gate Sizing, Wire Sizing,
Improving Voltage Assignment by Outlier Detection and Incremental Placement Huaizhi Wu* and Martin D.F. Wong** * Atoptech, Inc. ** University of Illinois.
Patricia Gonzalez Divya Akella VLSI Class Project.
1ISPD'03 Process Variation Aware Clock Tree Routing Bing Lu Cadence Jiang Hu Texas A&M Univ Gary Ellis IBM Corp Haihua Su IBM Corp.
Outline Motivation and Contributions Related Works ILP Formulation
1 Cache-Oblivious Query Processing Bingsheng He, Qiong Luo {saven, Department of Computer Science & Engineering Hong Kong University of.
Interconnect Characteristics of 2.5-D System Integration Scheme Yangdong (Steven) Deng & Wojciech P. Maly
Optimal Relay Placement for Indoor Sensor Networks Cuiyao Xue †, Yanmin Zhu †, Lei Ni †, Minglu Li †, Bo Li ‡ † Shanghai Jiao Tong University ‡ HK University.
PROCEED: Pareto Optimization-based Circuit-level Evaluation Methodology for Emerging Devices Shaodi Wang, Andrew Pan, Chi-On Chui and Puneet Gupta Department.
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
Evaluating Register File Size
Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts.
Buffer Insertion with Adaptive Blockage Avoidance
Fine-Grain CAM-Tag Cache Resizing Using Miss Tags
Yiyu Shi*, Wei Yao*, Jinjun Xiong+ and Lei He*
Post-Silicon Tuning for Optimized Circuits
Post-Silicon Calibration for Large-Volume Products
Realizing Closed-loop, Online Tuning and Control for Configurable-Cache Embedded Systems: Progress and Challenges Islam S. Badreldin*, Ann Gordon-Ross*,
Presentation transcript:

Proximity Optimization for Adaptive Circuit Design Ang Lu, Hao He, and Jiang Hu

Introduction Proposed Techniques Experiment Result Conclusion Overview 2

Design Challenges Process Variations Device Aging 3 Power

Adaptive Circuit Apply power according to actual chip variations More energy-efficient than the worst case desgin 4

Delay variation sensors – Critical path replica, used in IBM Power5 processor – Canary flip-flop, like in Razor Tuning knobs – Adaptive body bias – Adaptive supply voltage (voltage interpolation) Sensors and Tuning Knobs in Adaptive Circuit 5 Liang, et al., Micro 2009

Pros and Cons of Adaptive Circuit Cons:  Potentially large area overhead  Higher design complexity Pros: More energy-efficient than the worst case designs 6

Clustering for Adaptive Circuits Two extreme cases: Tune each cell individually?  Too much overhead Tune entire circuit collectively?  Less energy savings Achieve desired trade-off? Clustering 7

Introduction Proposed Techniques – Overall Flow – Time and Location Aware Cell Clustering – Clustering Driven Incremental Placement Experiment Result Conclusion Overview 8

Proposed Flow 9

Cluster cells based on spatial proximity Cluster cells based on their timing characteristics – Kulcarni, et al., TCAD 2008 – Monte Carlo (MC) simulation – Optimize for each MC run Existing Clustering Methods Location Timing Location Timing 10 Manual partition for regular datapath

Need to Consider Both Timing & Spatial Proximity Both paths A & B are critical Bad Cluster: A1 & B2 (Similar timing characteristic) Good Cluster: A2(A1) & B2 (B1) 11

Clustering Algorithm Distance definition Location Timing Clustering algorithm 12

Timing Slack 13

Timing Sensitivity 14

Highly correlated cells are clustered together Spatial correlation is partially addressed by spatial proximity Structural correlation – Not every cell on one critical path need to be tuned – Structurally correlated cells are rarely too far apart Correlations? 15

Introduction Proposed Techniques – Overall Flow – Time and Location Aware Cell Clustering – Clustering Driven Incremental Placement Experiment Result Conclusion Overview 16

Cluster Driven Incremental Placement 17

Min-Cost Network Flow Formulation Source nodes Sink nodes 18

Implementation Them min-cost network flow problem is solved by the Edmond-Karp algorithm Move cells heuristically for fractional flow solutions 19

Wirelength Overhead Control After incremental placement, wirelength increase is estimated If the increase > threshold, rerun clustering with increased weight on spatial proximity 20

Introduction Proposed Techniques Experiment Result Conclusion Overview 21

Experiment Setup Benchmark: – ICCAD 2014 Incremental Timing-Driven Placement Contest benchmark suites – 7 circuits, (130K, 960K) cells Adaptive body bias is employed as platform of adaptive circuit design # cluster is empirically chosen in a range from 10 to 25 22

Comparison and Methodology 1.Over design 2.Location-driven clustering 3.Timing-driven clustering 4.Location and timing driven clustering (Ours) Methods that are compared: For each method, simulate multiple times with varying parameters and report the average results Methodology 23

Testcases and Placement Perturbations Circuit# gatesCells movedAvg. cell move distance edit_dist %7 matrix_mult %8 vga_lcd %27 b %12 leon3mp %89 leon %50 netcard %90 24

Results from only Forward Body Bias Our method achieves 99% timing yield like other methods 1/4 less power than over design 1/3 less area overhead substantial less wire overhead 25

Results from Forward and Reversed Body Bias Our method achieves 99% timing yield similar power much less area overhead than location-only much less wire overhead than timing-only 26

Power/Area – Timing Tradeoff Circuit “mgc_matrix_mult” 27

Impact of Weighting Factors αβγ # clustersAdapt Power∆ Area∆ wire % % % % % % % % % % Circuit “mgc_matrix_mult” 28

Entire Flow Runtime 29

Conclusion Clustering and cluster-driven placement are proposed for adaptive circuit designs Reduce area and power overhead of adaptive circuit, outperform previous methods Assure gates of the same cluster locate in a contiguous region Wire-length increase <1%. 30

Built-In Self Optimization for Variation Resilience of Analog Filters Thank you!