-1- Soft Core Viterbi Decoder EECS 290A Project Dave Chinnery, Rhett Davis, Chris Taylor, Ning Zhang.

Slides:



Advertisements
Similar presentations
FPGA (Field Programmable Gate Array)
Advertisements

RAPID Memory Compiler Evaluation by David Artz
Copyright 2001, Agrawal & BushnellVLSI Test: Lecture 261 Lecture 26 Logic BIST Architectures n Motivation n Built-in Logic Block Observer (BILBO) n Test.
EGRE 427 Advanced Digital Design Figures from Application-Specific Integrated Circuits, Michael John Sebastian Smith, Addison Wesley, 1997 Chapter 5 Programmable.
Logic Synthesis – 3 Optimization Ahmed Hemani Sources: Synopsys Documentation.
Las Palmas de G.C., Dec IUMA Projects and activities.
Ch.3 Overview of Standard Cell Design
High-Level Constructors and Estimators Majid Sarrafzadeh and Jason Cong Computer Science Department
RTL Hardware Design by P. Chu Chapter 161 Clock and Synchronization.
Graduate Computer Architecture I Lecture 15: Intro to Reconfigurable Devices.
EECE579: Digital Design Flows
Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi.
Timing Analysis Timing Analysis Instructor: Dr. Vishwani D. Agrawal ELEC 7770 Advanced VLSI Design Team Project.
Behavioral Synthesis Outline –Synthesis Procedure –Example –Domain-Specific Synthesis –Silicon Compilers –Example Tools Goal –Understand behavioral synthesis.
Kazi Fall 2006 EEGN 4941 EEGN-494 HDL Design Principles for VLSI/FPGAs Khurram Kazi.
1 GPS Waypoint Navigation Team M-2: Charles Norman (M2-1) Julio Segundo (M2-2) Nan Li (M2-3) Shanshan Ma (M2-4) Design Manager: Zack Menegakis Presentation.
Team W3: Anthony Marchetta Derek Ritchea David Roderick Adam Stoler Milestone 3: Feb. 4 th Size Estimates/Floorplan Overall Project Objective: Design an.
1 GPS Waypoint Navigation Team M-2: Charles Norman (M2-1) Julio Segundo (M2-2) Nan Li (M2-3) Shanshan Ma (M2-4) Design Manager: Zack Menegakis Presentation.
Evolution of implementation technologies
1 EECS Components and Design Techniques for Digital Systems Lec 21 – RTL Design Optimization 11/16/2004 David Culler Electrical Engineering and Computer.
Logic Design Outline –Logic Design –Schematic Capture –Logic Simulation –Logic Synthesis –Technology Mapping –Logic Verification Goal –Understand logic.
Sprinkler Buddy Presentation #8: “Testing/Finalization of all Modules and Global Placement” 3/26/2007 Team M3 Kartik Murthy Panchalam Ramanujan Sasidhar.
1 GPS Waypoint Navigation Team M-2: Charles Norman (M2-1) Julio Segundo (M2-2) Nan Li (M2-3) Shanshan Ma (M2-4) Design Manager: Zack Menegakis Presentation.
M2: Team Paradigm :: Milestone 3 2-D Discrete Cosine Transform Group M2: Tommy Taylor Brandon Hsiung Changshi Xiao Bongkwan Kim Project Manager: Yaping.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Hao Ji.
Random Number Generator Dmitriy Solmonov W1-1 David Levitt W1-2 Jesse Guss W1-3 Sirisha Pillalamarri W1-4 Matt Russo W1-5 Design Manager – Thiago Hersan.
Random Number Generator Dimtriy Solmonov W1-1 David Levitt W1-2 Jesse Guss W1-3 Sirisha Pillalamarri W1-4 Matt Russo W1-5 Design Manager – Thiago Hersan.
Timing control in verilog Module 3.1 Delays in Verilog.
Digital Integrated Circuits for Communication
Hierarchical Physical Design Methodology for Multi-Million Gate Chips Session 11 Wei-Jin Dai.
TM Efficient IP Design flow for Low-Power High-Level Synthesis Quick & Accurate Power Analysis and Optimization Flow JAN Asher Berkovitz Yaniv.
ISE. Tatjana Petrovic 249/982/22 ISE software tools ISE is Xilinx software design tools that concentrate on delivering you the most productivity available.
Kazi ECE 6811 ECE 681 VLSI Design Automation Khurram Kazi* Lecture 10 Thanks to Automation press THE button outcomes the Chip !!! Reality or Myth (*Mostly.
Ronny Krashinsky Seongmoo Heo Michael Zhang Krste Asanovic MIT Laboratory for Computer Science SyCHOSys Synchronous.
Introduction to VLSI Design – Lec01. Chapter 1 Introduction to VLSI Design Lecture # 2 A Circuit Design Example.
Review: Basic Building Blocks  Datapath l Execution units -Adder, multiplier, divider, shifter, etc. l Register file and pipeline registers l Multiplexers,
CAD for Physical Design of VLSI Circuits
Lecture 17 Lecture 17: Platform-Based Design and IP ECE 412: Microcomputer Laboratory.
ASIC/FPGA design flow. FPGA Design Flow Detailed (RTL) Design Detailed (RTL) Design Ideas (Specifications) Design Ideas (Specifications) Device Programming.
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
ASIC Design Flow – An Overview Ing. Pullini Antonio
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
HDL-Based Layout Synthesis Methodologies Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {
1 Moore’s Law in Microprocessors Pentium® proc P Year Transistors.
J. Christiansen, CERN - EP/MIC
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Chonnam national university VLSI Lab 8.4 Block Integration for Hard Macros The process of integrating the subblocks into the macro.
Design of a High-Throughput Low-Power IS95 Viterbi Decoder Xun Liu Marios C. Papaefthymiou Advanced Computer Architecture Laboratory Electrical Engineering.
TOPIC : SYNTHESIS INTRODUCTION Module 4.3 : Synthesis.
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
CHAPTER 8 Developing Hard Macros The topics are: Overview Hard macro design issues Hard macro design process Physical design for hard macros Block integration.
1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung.
IMPLEMENTATION OF MIPS 64 WITH VERILOG HARDWARE DESIGN LANGUAGE BY PRAMOD MENON CET520 S’03.
Baseband Implementation of an OFDM System for 60GHz Radios: From Concept to Silicon Jing Zhang University of Toronto.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
Seok-jae, Lee VLSI Signal Processing Lab. Korea University
VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 성균관대학교 조 준 동 교수
ASIC/FPGA design flow. Design Flow Detailed Design Detailed Design Ideas Design Ideas Device Programming Device Programming Timing Simulation Timing Simulation.
EEL 5722 FPGA Design Fall 2003 Digit-Serial DSP Functions Part I.
Introduction to Field Programmable Gate Arrays (FPGAs) EDL Spring 2016 Johns Hopkins University Electrical and Computer Engineering March 2, 2016.
FEC decoding algorithm overview VLSI 자동설계연구실 정재헌.
Adapted from Krste Asanovic
ASIC Design Methodology
The Interconnect Delay Bottleneck.
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
EE141 Design Styles and Methodologies
Timing Analysis 11/21/2018.
ECE 551: Digital System Design & Synthesis
Presentation transcript:

-1- Soft Core Viterbi Decoder EECS 290A Project Dave Chinnery, Rhett Davis, Chris Taylor, Ning Zhang

-2- High Level Architecture 23% 36% 29% 0% 48% 18% 38% 8% 21% 18% 4% 15% 9% 2% 8% 4% 1% 4% 2% 1% 4% % Gates % Area % Power

-3- Branch & Path Metric Generation U L U L U L U L U L U L U L U L l Branch Metrics Computation apparently implemented with a CORDIC block (contains 840 MUX’s, 58 adders & flip-flops, bit busses) l Branch Metrics Hard- wired to each ACS unit l Path Metrics Stored in ACS units l Each ACS unit handles 16 states Hard-wired Path Metric Interconnect

-4- ACS Architecture l Each ACS unit stores 32 path metrics l Only two SRAM’s are active at a time l Across all four ACS units, each path metric is stored twice l SRAM accounts for 88% of the area and 27% of the power for each ACS unit 8x9 SRAM PM U PM L PM U BM U PM L BM L Add Compare Select Pipeline Register MUX

-5- Traceback Architecture l State-Machine blocks are just large sum-of products combinational networks (351 gates each) l Each memory unit contains a 16x64 SRAM and logic (192 MUX’s, 128 flip- flops) Decision Bits Traceback Next_ramin Pipeline Register MUX SRAM Traceback Memory Unit 192 Out Decision Bits Traceback Memory Unit 22% Area 20% Power Finite State Machine 11% Area 13% Power Traceback Unit

-6- Design Flow l Design Compiler Synthesis script (from Mentor/Inventra) l SRAM Generator (from Norman Walker) l VHDL gate-level sims (timing verification, switching activity annotation) l PowerMill Simulations (SRAM, core) l Design Compiler, Power Compiler (Static timing, power analysis) l Floor Planning (Preview) l Place & Route (Silicon Ensemble) l Interconnect Parasitic Extraction (“report simcap”) l PowerMill simulations, PathMill static analysis l Design Compiler, Power Compiler (Static timing, power analysis with back-annotated interconnect parasitics) Synthesis & Module Generation Pre-Layout Verification & Analysis Post-Layout Verification & Analysis Floor Planning Place & Route

-7- Synthesis and SRAM Generation l Synthesis with Synopsys Design Compiler »Constraint: 66 kHz clock (effectively infinite) »Bottom-up synthesis of 62 VHDL entities l Low-Power SRAM generator (from Pleiades) »Very large sense-amps, control logic »Optimized for power, speed at low supply- voltages »Word-length limited to a power of 2

-8- Simulation Models Behavioral C Behavioral VHDL RTL VHDL Parameterized, bit-true, and fast Used for system level design and BER simulations Synthesizable, crafted for specific parameters and implementation structure Used for synthesis quality Parameterized, bit-true, and cycle-true Used for structural simulations and test bench reference

-9- BER Simulation Results

-10-SRAM l Simulation Tools: TimeMill & PowerMill l Parameters »66 MHz clock »Voltage 2.5V »Random Generated Test Vectors l Results »Power Analysis »Timing Analysis

-11- SRAM: Power Numbers l SRAM used for ACS Unit » 8 words by 9 data bits OperationsAvg.(µA)Avg.(mW)Avg.(pJ) Read Activity Write Activity Read/Write Parasitic Extraction OperationsAvg.(µA)Avg.(mW)Avg.(pJ) Read Activity Write Activity Read/Write

-12- SRAM: Power Numbers l SRAM used for Traceback Unit » 16 words by 64 data bits OperationsAvg.(µA)Avg.(mW)Avg.(pJ) Read Activity Write Activity Read/Write

-13- SRAM: Timing Numbers l Delays » Delays – Setup Time; Hold Time – time needed for data address to become stable Setup(ns) Hold(ns)Data Resolution(ns) ACS SRAM ~1 ~2~1.8 Traceback SRAM ~1 ~2~5

-14- Place and Route l Floor planning of the Viterbi SRAM macro cells and standard cells was done in Preview, and Silicon Ensemble was used for routing. l Total SRAM macro cell area was 1.58 mm 2 (1.08 mm 2 with 9x8 SRAMs) »Area of the 16 9x8 bit SRAM macro cells: mm 2 each, 62% larger than required, as 16x8 bit SRAMs were used (SRAM generator output had been verified for powers of 2) »Area of the 3 16x64 bit SRAM macro cells: 0.25 mm 2 each l Area of the standard cells 1.02 mm 2 (0.35 mm 2 from DEF file) l Final chip area was 4.0 mm 2 (original estimate 2.5 mm 2 ) l Parasitics for timing simulation were extracted from the final routed nets in Silicon Ensemble.

-15- Wiring Statistics l Six metal layers, layers 5 and 6 used for power and ground respectively l Ground and power spaced alternately 100 um apart horizontally and vertically. l There were about 6200 nets and 46,114 vias. Total wire lengths: l metal layer 1: 3,293 um l metal layer 2: 458,440 um l metal layer 3: 510,517 um l metal layer 4: 218,023 um l metal layer 5: 96,882 um signal, and 38,400 um power l metal layer 6: 8,660 um signal, and 37,500 um ground l wire length: 685 mm horizontal, 611 mm vertical, total 1296 mm

-16- Final Placement and Routing l Significant routing congestion at 16 by 64 bit SRAM outputs, due to Silicon Ensemble grid size of 1 um (observe white and light blue wires). l Minimum of 6 unroutable nets observed, even at 12 mm 2 chip area. l Final size was 1.25 mm x 3.2 mm, 4 mm 2, with 9 unroutable nets. l Violation reports in Silicon Ensemble did not identify which nets were unroutable, other than problems with ground and power connections.

-17- Static Timing Checks l All timing checks performed with Design Compiler’s report_timing command l Parasitic capacitances back-annotated with the set_load command l No RC parasitics annotated l No SRAM model was used for timing checks l Critical Path was from ACS control logic, through a PM ouput MUX select signal (in an ACS unit), through the following ACS unit. l Checks performed at 2.5V

-18- Static Power Checks l All timing checks performed with Design Compiler’s report_power command l Switching activity was measured for every output port (transition counts over 16,000-cycle simulation) l Back-annotation performed with SAIF files l No SRAM model was used for power checks (added in manually) l Checks performed at 2.5V w/ 60 MHz clock

-19- Delay and Energy Scaling

-20- Performance Results For fixed throughput requirement 100ksps:

-21-Summary l Performance in intended operation (100ksps) »Clock Speed: 1.6 MHz »Power Dissipation: 0.14 mW »Power Density: 34.9 uW per mm 2 l Cost »Die Size: 4 mm 2 »Design effort: 30 work days l Predictability and portability »Mentor/Inventra predictions vs. measured results