Parallel Routing for FPGAs based on the operator formulation

Slides:

Advertisements

Similar presentations

Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.

Advertisements

Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao.

Xing Wei, Wai-Chung Tang, Yu-Liang Wu Department of Computer Science and Engineering The Chinese University of HongKong

1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

EXPLORING HIGH THROUGHPUT COMPUTING PARADIGM FOR GLOBAL ROUTING Yiding Han, Dean Michael Ancajas, Koushik Chakraborty, and Sanghamitra Roy Electrical and.

Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day17: November 20, 2000 Time Multiplexing.

Structure-driven Optimizations for Amorphous Data-parallel Programs 1 Mario Méndez-Lojo 1 Donald Nguyen 1 Dimitrios Prountzos 1 Xin Sui 1 M. Amber Hassaan.

The of Parallelism in Algorithms Keshav Pingali The University of Texas at Austin Joint work with D.Nguyen, M.Kulkarni, M.Burtscher, A.Hassaan, R.Kaleem,

38 th Design Automation Conference, Las Vegas, June 19, 2001 Creating and Exploiting Flexibility in Steiner Trees Elaheh Bozorgzadeh, Ryan Kastner, Majid.

Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of.

Application-Specific Customization of Parameterized FPGA Soft-Core Processors David Sheldon a, Rakesh Kumar b, Roman Lysecky c, Frank Vahid a*, Dean Tullsen.

Dynamic FPGA Routing for Just-in-Time Compilation Roman Lysecky a, Frank Vahid a*, Sheldon X.-D. Tan b a Department of Computer Science and Engineering.

A High Performance Application Representation for Reconfigurable Systems Wenrui GongGang WangRyan Kastner Department of Electrical and Computer Engineering.

An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction Yan Lin 1, Yu Hu 1, Lei He 1 and Vijay Raghunathan 2 1 EE Department,

ASIC vs. FPGA – A Comparisson Hardware-Software Codesign Voin Legourski.

Stochastic Physical Synthesis for FPGAs with Pre-routing Interconnect Uncertainty and Process Variation Yan Lin and Lei He EE Department, UCLA

Global Critical Path: A Tool for System-Level Timing Analysis

PipeRoute: A Pipelining-Aware Router for FPGAs Akshay Sharma, Carl Ebeling* and Scott Hauck Electrical Engineering / *Computer Science & Engineering University.

HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.

DAC 2001: Paper 18.2 Center for Embedded Computer Systems, UC Irvine Center for Embedded Computer Systems University of California, Irvine

HARP: Hard-Wired Routing Pattern FPGAs Cristinel Ababei , Satish Sivaswamy ,Gang Wang , Kia Bazargan , Ryan Kastner , Eli Bozorgzadeh   ECE Dept.

StaticRoute: A novel router for the dynamic partial reconfiguration of FPGAs Brahim Al Farisi, Karel Bruneel, Dirk Stroobandt 2/9/2013.

ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement.

Yehdhih Ould Mohammed Moctar1 Nithin George2 Hadi Parandeh-Afshar2

Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL, School of Computer and Communication Sciences Efficient.

CSE 242A Integrated Circuit Layout Automation Lecture: Partitioning Winter 2009 Chung-Kuan Cheng.

Power Reduction for FPGA using Multiple Vdd/Vth

FPGA Switch Block Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.

Scalable and Deterministic Timing-Driven Parallel Placement for FPGAs Supervisor: Dr. Guy Lemieux October 20, 2011 Chris Wang.

COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.

Un/DoPack: Re-Clustering of Large System-on-Chip Designs with Interconnect Variation for Low-Cost FPGAs Marvin Tom* Xilinx Inc.

Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of.

1 Parallelizing FPGA Placement with TMSteffan Parallelizing FPGA Placement with Transactional Memory Steven Birk*, Greg Steffan**, and Jason Anderson**

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig 1 EECS 527 Paper Presentation High-Performance.

University of British Columbia Dept. of Electrical and Computer Engineering November 30, 2007 A Combined Clustering and Placement Algorithm for FPGAs Mark.

Julien Lamoureux and Steven J.E Wilton ICCAD

FPGA Routing Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.

1 Keshav Pingali University of Texas, Austin Introduction to parallelism in irregular algorithms.

Path Scheduling on Digital Microfluidic Biochips Dan Grissom and Philip Brisk University of California, Riverside Design Automation Conference San Francisco,

A Routing Approach to Reduce Glitches in Low Power FPGAs Quang Dinh, Deming Chen, Martin D. F. Wong Department of Electrical and Computer Engineering University.

Embedding Constraint Satisfaction using Parallel Soft-Core Processors on FPGAs Prasad Subramanian, Brandon Eames, Department of Electrical Engineering,

Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.

Incremental Placement Algorithm for Field Programmable Gate Arrays David Leong Advisor: Guy Lemieux University of British Columbia Department of Electrical.

Introduction to FPGAs Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.

A Configurable High-Throughput Linear Sorter System Jorge Ortiz Information and Telecommunication Technology Center 2335 Irving Hill Road Lawrence, KS.

1 - CPRE 583 (Reconfigurable Computing): VHDL to FPGA: A Tool Flow Overview Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 5: 9/7/2011.

1 A Min-Cost Flow Based Detailed Router for FPGAs Seokjin Lee *, Yongseok Cheon *, D. F. Wong + * The University of Texas at Austin + University of Illinois.

Timing-Driven Routing for FPGAs Based on Lagrangian Relaxation

FPGA CAD 10-MAR-2003.

1 Field-programmable Gate Array Architectures and Algorithms Optimized for Implementing Datapath Circuits Andy Gean Ye University of Toronto.

FPGA Logic Cluster Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.

Interconnect Driver Design for Long Wires in FPGAs Edmund Lee University of British Columbia Electrical & Computer Engineering MASc Thesis Presentation.

DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jason Cong ， Computer Science Department ， UCLA Presented.

Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.

Uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.

FPGA Routing Pathfinder [Ebeling, et al., 1995] Introduced negotiated congestion During each routing iteration, route nets using shortest.

Area-Efficient Instruction Set Synthesis for Reconfigurable System on Chip Designs Philip BriskAdam KaplanMajid Sarrafzadeh Embedded and Reconfigurable.

Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL, School of Computer and Communication Sciences Improving.

Register-Transfer (RT) Synthesis Greg Stitt ECE Department University of Florida.

Congestion-Driven Re-Clustering for Low-cost FPGAs MASc Examination Darius Chiu Supervisor: Dr. Guy Lemieux University of British Columbia Department of.

Interconnect Driver Design for Long Wires in FPGAs Edmund Lee, Guy Lemieux & Shahriar Mirabbasi University of British Columbia, Canada Electrical & Computer.

A Study of the Scalability of On-Chip Routing for Just-in-Time FPGA Compilation Roman Lysecky a, Frank Vahid a*, Sheldon X.-D. Tan b a Department of Computer.

1 Architecture of Datapath- oriented Coarse-grain Logic and Routing for FPGAs Andy Ye, Jonathan Rose, David Lewis Department of Electrical and Computer.

Ph.D. in Computer Science

Antonia Zhai, Christopher B. Colohan,

Anne Pratoomtong ECE734, Spring2002

Verilog to Routing CAD Tool Optimization

Chin Hau Hoo, Akash Kumar

Dynamic FPGA Routing for Just-in-Time Compilation

Off-path Leakage Power Aware Routing for SRAM-based FPGAs

Presentation transcript:

Parallel Routing for FPGAs based on the operator formulation Yehdhih Ould Mohamed Moctar & Philip Brisk Department of Computer Science & Engineering University of California Riverside Speculation based approach for parallelizing an FPGA router Design Automation Conference (DAC 2014) San Francisco, CA, USA, June 1-5, 2014

Motivation Runtime of circuit design is dominated by P&R Maze Expansion consumes over 65% of Runtime Large number of non-conflicting operations executed at each iteration FPGA adaptation is slowed by CAD tools. Placement and routing dominates runtime of CAD for FPGAs The router spend nearly two thirds of its time exploring nodes of the RRG If we can speculate about the nature of the operation being performed on RRG nodes, we can identify many non-conflicting operations at each iteration

Contribution Application of Speculative Parallelism to FPGA routing Use of non-blocking priority queues for the Maze Expansion Implementation of the parallel router in VPR

FPGA Routing Find a physical path for every signal in the circuit Disjoint-path problem; NP-complete S To configure the programmable switches of the routing fabric to connect the logic blocks and the I/O pads of the circuit Technology variant of the well-known disjoint paths problem in Graph theory One of Carp’s 21 Pathfinder allows negotiation among signals that share the routing resources. T1 T2 Pathfinder: Negotiation-based algorithm

Routing Resource Graph (RRG) 2-LUT in1 in2 out wire1 wire2 wire3 wire4 sink out wire4 wire2 in2 in1 wire1 wire3 Large data structure representing the routing resources of the FPGA, pins and wires become nodes and switches become edges RRG represents the routing resources of the FPGA 5

Serial Pathfinder We Parallelize the Maze Expansion Triple nested loop, Global router establish the negotiation criteria, Signal router control negotiation among signals, and Maze router finds routes for individual nets Priority Queue driven diercted BFS on the RRG We Parallelize the Maze Expansion

Maze Expansion Operation PQ contains nodes that have not been fully explored

Galois Software framework for parallelizing irregular algorithms Employ speculation based approach to parallelism Operator formulation of algorithms Data-centric view of parallelism (Amorphous data parallelism) Works better on algorithms whose behavior is not known until runtime sparse graphs Parallel program = Operator + Schedule + Parallel data structure

Operator formulation of algorithms Computation at active element Activity: application of operator to active element Amorphous data-parallelism Multiple active nodes can be processed in parallel subject to neighborhood and ordering constraints : active node : neighborhood Parallel program = Operator + Schedule + Parallel data structure 9

Maze Expansion in Galois Each thread speculatively explore potential candidate nodes in the RRG Galois offers Low mis-speculation/rollback cost Threads speculatively explore the node of RRG Each Thread has a local Priority Queue

Benchmarks We selected 10 of the largest IWLS benchmarks. We target 65nm CMOS (BPTM)

Maze Router Speedup Achieved up-to 5.5x speedup (Using 8 threads) Steady Scalability up to 8 threads

Maze Router – Configuration Options (Normalized Speedup) STM PQ + Iteration Coalescing achieved 5.46x speedup

Maze Router - Critical Path Delay (CPD) # of Threads has no impact on Critical Path Delay (CPD) Parallel implementation achieved better CPD than VPR

Conclusion & Future Work Speculative parallelism can be good choice for parallel CAD algorithms Achieved Near-linear speedup (up to 5.5x) over Serial FPGA Router. Future work includes applying this speculative model to parallelize Placement.