Download presentation
Presentation is loading. Please wait.
Published byFinn Jespersen Modified over 5 years ago
1
VPR 5.0: FPGA CAD and Architecture Exploration Tools with Single-Driver Routing, Heterogeneity and Process Scaling Jason Luu, Ian Kuon, Peter Jamieson, Ted Campbell, Andy Ye, Wei Mark Fang and Jonathan Rose The Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto
2
The Architecture Question
Number of transistors on chip grows exponentially Ever Harder to create custom chips General purpose chips are the way to go The key question: How to most effectively deploy transistors on a chip? What is the architecture? What is the function & interconnect? Similar questions for CPU, GPU, and FPGA architects Jluu: Reduced in # words Used more formal language
3
To Answer: Need Exploration Infrastructure
Circuits Applications Architecture FPGA Architecture Synthesis CAD Flow Quality of Architecture (3)
4
FPGA Infrastructure Needs Advancing
Goals: Deal with modern FPGA architecture constructs Enable use of much larger size circuits! Current benchmarks stuck in mid-90’s size because we can’t handle the basic constructs in larger circuits To Move to New Process Technologies more easily To Provide a pathway to future hybrid architectures Punting on the ease of programming issue for now;
5
The VPACK/VPR Exploration Toolset
Are two CAD tools widely used in the exploration of FPGA architecture and CAD The Goals of VPR: Flexible description of FPGA architecture To enable architecture exploration Platform for CAD Tool algorithm exploration/development Also formed basis of Right Track CAD startup and thence Altera’s tools
6
Basic Features of Previous VPR (v4.30)
Models homogeneous array of soft logic Extensive flexibility and generation of different routing architectures Global Router Nice Graphics First-in-class modeling of delay and area Robust Circuits Packing Architecture Placement Routing Timing Analysis & Area Estimation
7
Four New Features of VPR 5.0
Single Driver Routing Architecture Unidirectional/Direct Drive; dominates [Virtex98] [Lewis03] [Lemieux04] Heterogeneity Can model different blocks types Wide Selection of Architecture Files Transistor-Level Design Optimized; different Area/Speed Trade-offs IC process down to 22nm Regression Test Suite To maintain robustness Saying this because it is a struggle to release one of high quality, and if I talk about here, it will make it more embarrassing to fail
8
Single Driver Routing Architecture
Jluu: Deleted slides on switch box pattern, unanswered questions should be put at beginning or end and I believe its too much detail to include it here
9
Single Driver vs. Multiple Driver Routing
Multi-Driver Routing Tri-state buffers and pass transistors Single-Driver Routing One driver, fan-in to multiplexer More Local connectivity Directional Wires
10
Single Driver Routing Architecture
Single Driver Dominates Multi-Driver Lewis et al. 2003: single-driver dominates multi-driver Lemieux et al. 2004: 25% area improvement, 9% delay improvement vs. multi-driver Used in industry for years Important that whole research community uses!
11
Switch Pattern Generation
Problem: Achieve best routability with given number of switches, and meet architecture specification Was issue in past; now more restrictions w. single-drivers
12
Simple Experiment: Single vs. Multi Driver
Repeat of previous experiments: Compared two FPGAs, with Fs = 3, Fcin 0.25, Fcout full All multi-driver, length 4 tracks, All single-driver, length 4 tracks, Measured Minimum Channel Width needed to route set of 20 circuits The old, small ones!
13
Simple Experiment: Single vs. Multi Driver
Average 14% minimum channel width increase! Lewis et al. 10%, Lemieux 0%
14
Heterogeneous Logic Blocks
Jluu: Deleted slide describing single-driver routing as it does not belong in the heterogeneous block section
15
Can Specify Hetereogenous Logic Blocks
Column based Each block has parameterized (multi-row) height Transparent routing Can specify all input-output timing paths of block combinational or registered outputs
16
New FPGA Architecture Input Format
Key to architecture exploration: a language to describe FPGA architecture Uses XML to leverage its inherent hierarchy Parsers easy to get; old VPR parser kinda rough Easy to extend language
17
Sample: Heterogeneous Block
<type name=".mult" height=“2"> <subblocks max_subblocks="1" max_subblock_inputs=“8" max_subblock_outputs=“8"> <timing> [Timing Matrix] </timing> <fc_in type="frac">0.25</fc_in> <fc_out type="full" /> <pinclasses> <class type="in">[pin numbers]</class> <class type="out">[pin numbers]</class> <class type=“global">[pin numbers]</class> </pinclasses> …
18
Electrically Optimized Architecture Files
Deleted slides with pictures of architecture listing
19
Architecture Files For any give logical architecture (L, Fs, N, I, etc), the best transistor-level design will be different! Also can be significantly different designs depending on the optimization goals – area vs. speed Architecture File FPGA Parameters Circuit Timing & Area VPR Original Notes: Goal Publishable (previously not) Previously just one area-delay point; Vaughn, area-delay minimum based on 0.35um Capability Automated transistor sizing Output Vast array of architecture files Different L, K, Fc Each one could have several: different area-delay points Different technologies!!
20
Optimized Timing and Area Models
Betz took ~ 2 months to create “the” area-delay optimized architecture file in 350nm; so did Ahmed Previously, NDA’s prevented release of accurate timing and area models Goal: Provide publishable optimized timing and area models for a large number of logical FPGA architectures
21
Timing and Area Models from PTM & Kuon
Used Predictive Technology Models from Cao/Arizona 180 nm to 22 nm CMOS Not as accurate as foundry models but publishable! Used custom automatic transistor sizing tool [Kuon08] Easily create optimized designs for a range of architectures Adds new dimension for exploration Circuit Design Objective Area, Delay, or Areaa x Delayb
22
Architecture Repository
Selected FPGA arch/designs with varied: architecture parameters (K, N, L, W, Fs, Fc …) technology (180 nm CMOS 22 nm CMOS) design objective (area-delay, delay, …)
23
Architecture Files Listing Part 1
24
Architecture Files Listing Part 2
25
Robustness of Software
26
Robustness Goal: make it easier to maintain quality of software in the face of continuing development Now have regression test infrastructure for VPR Scripts to run a suite of tests on VPR Each test runs a test script on list of circuits and architectures
27
Several Different Regression Tests:
Check-in regression: Quick tests, varying coverage For quick test of new code changes Functionality Tests: All architecture combinations of K = 2..7, N = 1..12 Randomly generated architectures QoR: Quality of Results over 20 largest MCNC circuits Compare against golden, known results within threshold Options sweep: Test all VPR CAD options
28
Sample Results
29
Revisiting LUT Size Again!
When transistor-level design optimized at each point, smaller LUTs get better area!
30
Delay vs. LUT and Cluster Size
31
Area Delay vs LUT and Cluster Size
32
Process Scaling – Area vs. LUT Size
Uaffected by process
33
VPR 5.0 Release Download Website: www.eecg.utoronto.ca/vpr/
VPR 5.0 Beta Released February 15th, 2008 VPR 5.0 Full Release July 22nd, 2008 Has been downloaded 203 unique times outside UofT as of Feb 16, 2009 VPR 5.0 Patch Release Feb 21, 2009 Next release should be in mid-summer Added note about patch
34
Current and Future Work
35
Future Features – Complete/In Progress
Combining VPACK with VPR To use same timing engines Done! Generic Timing-Driven Packing Algorithms for New Heterogeneous Blocks Underway (Jason Luu) Selectable registered inputs and outputs for logic blocks Robust Flow from Verilog through VPR Underway – work from Peter Jamieson & Ken Kent
36
Full Synthesis from 1 Architecture File
Verilog Circuits Odin II Elaboration Single Architecture File ABC Synthesis & Tech Map Packing VPR Placement Routing Timing Analysis & Area Estimation
37
More Future Features – Need Help!
Full Power Modeling Bus-Based Routing Carry Chains Depopulation of logic clusters Direct supply ratio specification of heterogeneous blocks Tileable switch block pattern Tileable quantization (not done) Tileable switch block (not done) Simulatable transistor-level output
38
Acknowledgements Many people, in addition to authors, have contributed to this work: Vaughn Betz Sandy Marquardt Russ Tessier Danny Paladino
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.