Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao.

Slides:



Advertisements
Similar presentations
Porosity Aware Buffered Steiner Tree Construction C. Alpert G. Gandham S. Quay IBM Corp M. Hrkic Univ Illinois Chicago J. Hu Texas A&M Univ.
Advertisements

OCV-Aware Top-Level Clock Tree Optimization
A Graph-Partitioning-Based Approach for Multi-Layer Constrained Via Minimization Yih-Chih Chou and Youn-Long Lin Department of Computer Science, Tsing.
Optimization of Placement Solutions for Routability Wen-Hao Liu, Cheng-Kok Koh, and Yih-Lang Li DAC’13.
Xing Wei, Wai-Chung Tang, Yu-Liang Wu Department of Computer Science and Engineering The Chinese University of HongKong
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
A Parallel GPU Version of the Traveling Salesman Problem Molly A. O’Neil, Dan Tamir, and Martin Burtscher* Department of Computer Science.
Wen-Hao Liu1, Yih-Lang Li, and Cheng-Kok Koh Department of Computer Science, National Chiao-Tung University School of Electrical and Computer Engineering,
Meng-Kai Hsu, Sheng Chou, Tzu-Hen Lin, and Yao-Wen Chang Electronics Engineering, National Taiwan University Routability Driven Analytical Placement for.
Ripple: An Effective Routability-Driven Placer by Iterative Cell Movement Xu He, Tao Huang, Linfu Xiao, Haitong Tian, Guxin Cui and Evangeline F.Y. Young.
EXPLORING HIGH THROUGHPUT COMPUTING PARADIGM FOR GLOBAL ROUTING Yiding Han, Dean Michael Ancajas, Koushik Chakraborty, and Sanghamitra Roy Electrical and.
Coupling-Aware Length-Ratio- Matching Routing for Capacitor Arrays in Analog Integrated Circuits Kuan-Hsien Ho, Hung-Chih Ou, Yao-Wen Chang and Hui-Fang.
Sparse LU Factorization for Parallel Circuit Simulation on GPU Ling Ren, Xiaoming Chen, Yu Wang, Chenxi Zhang, Huazhong Yang Department of Electronic Engineering,
FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.
An Optimal Algorithm of Adjustable Delay Buffer Insertion for Solving Clock Skew Variation Problem Juyeon Kim, Deokjin Joo, Taehan Kim DAC’13.
MCFRoute: A Detailed Router Based on Multi- Commodity Flow Method Xiaotao Jia, Yici Cai, Qiang Zhou, Gang Chen, Zhuoyuan Li, Zuowei Li.
38 th Design Automation Conference, Las Vegas, June 19, 2001 Creating and Exploiting Flexibility in Steiner Trees Elaheh Bozorgzadeh, Ryan Kastner, Majid.
Layout-based Logic Decomposition for Timing Optimization Yun-Yin Lien* Youn-Long Lin Department of Computer Science, National Tsing Hua University, Hsin-Chu,
Pei-Ci Wu Martin D. F. Wong On Timing Closure: Buffer Insertion for Hold-Violation Removal DAC’14.
Hsiu-Yu Lai Ting-Chi Wang A TPL-Friendly Legalizer for Standard Cell Based Design SASIMI ‘15.
Chih-Hung Lin, Kai-Cheng Wei VLSI CAD 2008
Introduction to Routing. The Routing Problem Apply after placement Input: –Netlist –Timing budget for, typically, critical nets –Locations of blocks and.
MGR: Multi-Level Global Router Yue Xu and Chris Chu Department of Electrical and Computer Engineering Iowa State University ICCAD
A Topology-based ECO Routing Methodology for Mask Cost Minimization Po-Hsun Wu, Shang-Ya Bai, and Tsung-Yi Ho Department of Computer Science and Information.
Area-I/O Flip-Chip Routing for Chip-Package Co-Design Progress Report 方家偉、張耀文、何冠賢 The Electronic Design Automation Laboratory Graduate Institute of Electronics.
Xin-Wei Shih and Yao-Wen Chang.  Introduction  Problem formulation  Algorithms  Experimental results  Conclusions.
CAFE router: A Fast Connectivity Aware Multiple Nets Routing Algorithm for Routing Grid with Obstacles Y. Kohira and A. Takahashi School of Computer Science.
1 Coupling Aware Timing Optimization and Antenna Avoidance in Layer Assignment Di Wu, Jiang Hu and Rabi Mahapatra Texas A&M University.
TSV-Aware Analytical Placement for 3D IC Designs Meng-Kai Hsu, Yao-Wen Chang, and Valerity Balabanov GIEE and EE department of NTU DAC 2011.
Solving Hard Instances of FPGA Routing with a Congestion-Optimal Restrained-Norm Path Search Space Keith So School of Computer Science and Engineering.
Wen-Hao Liu 1, Yih-Lang Li 1, and Kai-Yuan Chao 2 1 Department of Computer Science, National Chiao-Tung University, Hsin-Chu, Taiwan 2 Intel Architecture.
Low-Power Gated Bus Synthesis for 3D IC via Rectilinear Shortest-Path Steiner Graph Chung-Kuan Cheng, Peng Du, Andrew B. Kahng, and Shih-Hung Weng UC San.
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig 1 EECS 527 Paper Presentation High-Performance.
A NEW ECO TECHNOLOGY FOR FUNCTIONAL CHANGES AND REMOVING TIMING VIOLATIONS Jui-Hung Hung, Yao-Kai Yeh,Yung-Sheng Tseng and Tsai-Ming Hsieh Dept. of Information.
Thermal-aware Steiner Routing for 3D Stacked ICs M. Pathak and S.K. Lim Georgia Institute of Technology ICCAD 07.
Bus-Pin-Aware Bus-Driven Floorplanning B. Wu and T. Ho Department of Computer Science and Information Engineering NCKU GLSVLSI 2010.
StreamX10: A Stream Programming Framework on X10 Haitao Wei School of Computer Science at Huazhong University of Sci&Tech.
A Routing Approach to Reduce Glitches in Low Power FPGAs Quang Dinh, Deming Chen, Martin D. F. Wong Department of Electrical and Computer Engineering University.
Jason Cong‡†, Guojie Luo*†, Kalliopi Tsota‡, and Bingjun Xiao‡ ‡Computer Science Department, University of California, Los Angeles, USA *School of Electrical.
ARCHER:A HISTORY-DRIVEN GLOBAL ROUTING ALGORITHM Muhammet Mustafa Ozdal, Martin D. F. Wong ICCAD ’ 07.
Tao Lin Chris Chu TPL-Aware Displacement- driven Detailed Placement Refinement with Coloring Constraints ISPD ‘15.
Huang-Yu Chen †, Mei-Fang Chiang †, Yao-Wen Chang † Lumdo Chen ‡, and Brian Han ‡ Novel Full-Chip Gridless Routing Considering Double-Via Insertion † The.
A SAT-Based Routing Algorithm for Cross-Referencing Biochips Ping-Hung Yuh 1, Cliff Chiung-Yu Lin 2, Tsung- Wei Huang 3, Tsung-Yi Ho 3, Chia-Lin Yang 4,
Test Architecture Design and Optimization for Three- Dimensional SoCs Li Jiang, Lin Huang and Qiang Xu CUhk Reliable Computing Laboratry Department of.
Fast Algorithms for Slew Constrained Minimum Cost Buffering S. Hu*, C. Alpert**, J. Hu*, S. Karandikar**, Z. Li*, W. Shi* and C. Sze** *Dept of ECE, Texas.
Hsing-Chih Chang Chien Hung-Chih Ou Tung-Chieh Chen Ta-Yu Kuan Yao-Wen Chang Double Patterning Lithography-Aware Analog Placement.
Escape Routing of Mixed-Pattern Signals Based on Staggered-Pin- Array PCBs K. Wang, H. Wang and S. Dong Department of Computer Science & Technology, Tsinghua.
PARR:Pin Access Planning and Regular Routing for Self-Aligned Double Patterning XIAOQING XU BEI YU JHIH-RONG GAO CHE-LUN HSU DAVID Z. PAN DAC’15.
Timing-Driven Routing for FPGAs Based on Lagrangian Relaxation
Non-stitch Triple Patterning- Aware Routing Based on Conflict Graph Pre-coloring Po-Ya Hsu Yao-Wen Chang.
ILP-Based Inter-Die Routing for 3D ICs Chia-Jen Chang, Pao-Jen Huang, Tai-Chen Chen, and Chien-Nan Jimmy Liu Department of Electrical Engineering, National.
Parallel Routing for FPGAs based on the operator formulation
Maze Routing Algorithms with Exact Matching Constraints for Analog and Mixed Signal Designs M. M. Ozdal and R. F. Hentschke Intel Corporation ICCAD 2012.
Routability-driven Floorplanning With Buffer Planning Chiu Wing Sham Evangeline F. Y. Young Department of Computer Science & Engineering The Chinese University.
BOB-Router: A New Buffering-Aware Global Router with Over-the-Block Routing Resources Yilin Zhang1, Salim Chowdhury2 and David Z. Pan1 1 Department of.
LEMAR: A Novel Length Matching Routing Algorithm for Analog and Mixed Signal Circuits H. Yao, Y. Cai and Q. Gao EDA Lab, Department of CS, Tsinghua University,
System in Package and Chip-Package-Board Co-Design
High-Performance Global Routing with Fast Overflow Reduction Huang-Yu Chen, Chin-Hsiung Hsu, and Yao-Wen Chang National Taiwan University Taiwan.
A Novel Timing-Driven Global Routing Algorithm Considering Coupling Effects for High Performance Circuit Design Jingyu Xu, Xianlong Hong, Tong Jing, Yici.
Construction of Optimal Data Aggregation Trees for Wireless Sensor Networks Deying Li, Jiannong Cao, Ming Liu, and Yuan Zheng Computer Communications and.
An O(nm) Time Algorithm for Optimal Buffer Insertion of m Sink Nets Zhuo Li and Weiping Shi {zhuoli, Texas A&M University College Station,
1 Double-Patterning Aware DSA Template Guided Cut Redistribution for Advanced 1-D Gridded Designs Zhi-Wen Lin and Yao-Wen Chang National Taiwan University.
Parallel Density-based Hybrid Clustering
Accelerating MapReduce on a Coupled CPU-GPU Architecture
Multi-Commodity Flow Based Routing
2 University of California, Los Angeles
Jin-Yih Li Yih-Lang Li Computer & Information TSMC Science Department,
Chin Hau Hoo, Akash Kumar
Performance and RLC Crosstalk Driven Global Routing
Fast Min-Register Retiming Through Binary Max-Flow
Presentation transcript:

Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao Liu, Yih-Lang Li Department of Computer Science, NCTU ASPDAC 2012

Outline Introduction Preliminaries Problem formulation Proposed algorithms Experimental results Conclusions

Introduction Precise timing information for critical paths/sinks with delay violations is only available after P&R stage. –Re-design is time consuming. Engineering change orders (ECO) can be used to fix timing violations after P&R. –Using spare cells with re-routing.

Introduction (cont.) Conventional timing ECO algorithms focus on improving the delay of one timing path at a time. –[3] considered one two-pin net in the timing path but neglected the multi-pin net topology when selecting inserted buffers. –[4] considered the positions of multiple pins of a net but did not consider the net topology of detailed routing paths. Only optimize the delay of the critical sink by treating one multi-pin net as one two-pin net may degrade the delays of other sinks of the same net. –Sequentially worsening other timing violation paths.

The effect of topology

Introduction (cont.) Besides, detail routing is time consuming. –Greedily finding the inserted buffer and connections may falling into suboptimal. –Sequentially investigating each reconnection to the newly inserted buffer requires unacceptable detailed rerouting runtime. Parallel routing could save the runtime. –GPU supports high computing power with low cost.

Preliminaries

Problem formulation Given –A routed design (D), a buffer set (B), a routed net set (N ALL ), a routed net (N) belonging to N ALL with an edge set (E), a pin set (P), a violation pin set (VP). Objective –Inserting one buffer in B into N, such that the topology of N is changed and the arrival times of the sinks in VP are minimized without the addition of violated sinks. Topology-Aware Buffer Insertion (BI) & Topology Restructuring

Proposed algorithms Buffering Pair scoring (BP). Edge breaking and Buffer connection (EB). Topology Restructuring (TR). Node Computing-based Massively Parallel Maze Routing (NCMPMR).

Buffering pair scoring We want to disregard those BP that may potentially worsening the delay of some sinks. –In other words, invalid BPs are ignored. Then adopts the Elmore delay model to compute the delay difference for all sinks in VP if a BP is valid. The wire length is estimated by the Manhattan distance.

Buffering pair scoring (cont.)

Edge breaking and buffer connection

Edge breaking and buffer connection (cont.)

Topology restructuring

The overall flow

Node computing-based massively parallel maze routing. Iteration 0Routing cost w[A,0]0 w[B,0]Inf w[C,0]Inf w[D,0]Inf Iteration 1Routing cost w[A,1]0 w[B,1]1 w[C,1]5 w[D,1]Inf Iteration 2Routing cost w[A,2]0 w[B,2]1 w[C,2]5 w[D,2]3 Iteration 3Routing cost w[A,3]0 w[B,3]1 w[C,3]4 w[D,3]3

NCMPMR flow

Speedup and preventing race condition Partition routing graph to blocks due to performance and scalability. Stagger adjacent blocks for better performance. –2.25x faster.

Experimental results Environment –AMD Opteron 2.6GHz workstation with 16GB memory. –Intel Xeon E GHz with 8GB memory and a single NVIDIA Tesla C1060 GPU. Implemented in C++. s35932 in IWLS benchmark with additional 300 spare cells. Selects five nets, N1-N5, in s35942 with various degrees of pins to demonstrate.

Critical sink delay improvement

Analysis The following results are on platform 2.

Conclusions This work develops topology-aware ECO timing optimization algorithm flow. –BP, EB, TR. –GPU based re-routing. Improve the WNS and TNS significantly with 7.72x average runtime speedup compared to conventional 2-pin net-based buffer insertion.