Handshake protocols for de-synchronization I. Blunno, J. Cortadella, A. Kondratyev, L. Lavagno, K. Lwin and C. Sotiriou Politecnico di Torino, Italy Universitat.

Slides:



Advertisements
Similar presentations
Asynchronous Circuits Jordi Cortadella Universitat Politècnica de Catalunya, Barcelona Collège de France May 14 th, 2013.
Advertisements

Digital Systems Verification Lecture 13 Alessandra Nardi.
Self-Timed Logic Timing complexity growing in digital design -Wiring delays can dominate timing analysis (increasing interdependence between logical and.
Data Synchronization Issues in GALS SoCs Rostislav (Reuven) Dobkin and Ran Ginosar Technion Christos P. Sotiriou FORTH ICS- FORTH.
Automatic Verification Book: Chapter 6. What is verification? Traditionally, verification means proof of correctness automatic: model checking deductive:
Systematic method for capturing “design intent” of Clock Domain Crossing (CDC) logic in constraints Ramesh Rajagopalan Cisco Systems.
Automated Method Eliminates X Bugs in RTL and Gates Kai-hui Chang, Yen-ting Liu and Chris Browy.
Self-Timed Systems Timing complexity growing in digital design -Wiring delays can dominate timing analysis (increasing interdependence between logical.
ECE C03 Lecture 81 Lecture 8 Memory Elements and Clocking Hai Zhou ECE 303 Advanced Digital Design Spring 2002.
ELEC 256 / Saif Zahir UBC / 2000 Timing Methodology Overview Set of rules for interconnecting components and clocks When followed, guarantee proper operation.
Avshalom Elyada, Ran GinosarPipeline Synchronization 1 A Unique and Successfully Implemented Approach to the Synchronization Problem Based on the article.
Digital Logic Design Lecture 22. Announcements Homework 7 due today Homework 8 on course webpage, due 11/20. Recitation quiz on Monday on material from.
Jordi Cortadella, Universitat Politecnica de Catalunya, Barcelona Mike Kishinevsky, Intel Corp., Strategic CAD Labs, Hillsboro.
FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.
CSE241 Formal Verification.1Cichy, UCSD ©2003 CSE241A VLSI Digital Circuits Winter 2003 Recitation 6: Formal Verification.
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 19: Timing Issues; Introduction to Datapath.
RTL Hardware Design by P. Chu Chapter 161 Clock and Synchronization.
Hazard-free logic synthesis and technology mapping I Jordi Cortadella Michael Kishinevsky Alex Kondratyev Luciano Lavagno Alex Yakovlev Univ. Politècnica.
Hardware and Petri nets Synthesis of asynchronous circuits from Signal Transition Graphs.
Digital Integrated Circuits© Prentice Hall 1995 Timing ISSUES IN TIMING.
Hardware and Petri nets: application to asynchronous circuit design Jordi CortadellaUniversitat Politècnica de Catalunya, Spain Michael KishinevskyIntel.
Formal Verification of Safety Properties in Timed Circuits Marco A. Peña (Univ. Politècnica de Catalunya) Jordi Cortadella (Univ. Politècnica de Catalunya)
© Ran GinosarAsynchronous Design and Synchronization 1 VLSI Architectures Lecture 2: Theoretical Aspects (S&F 2.5) Data Flow Structures.
ECE Synthesis & Verification1 ECE 667 Spring 2011 Synthesis and Verification of Digital Systems Verification Introduction.
1 Logic synthesis from concurrent specifications Jordi Cortadella Universitat Politecnica de Catalunya Barcelona, Spain In collaboration with M. Kishinevsky,
Jordi Cortadella, Universitat Politècnica de Catalunya, Spain
Chapter 7 Design Implementation (II)
1 Bridging the gap between asynchronous design and designers Thanks to Jordi Cortadella, Luciano Lavagno, Mike Kishinevsky and many others.
STG-based synthesis and Petrify J. Cortadella (Univ. Politècnica Catalunya) Mike Kishinevsky (Intel Corporation) Alex Kondratyev (University of Aizu) Luciano.
Design of Fault Tolerant Data Flow in Ptolemy II Mark McKelvin EE290 N, Fall 2004 Final Project.
Synthesis of synchronous elastic architectures Jordi Cortadella (Universitat Politècnica Catalunya) Mike Kishinevsky (Intel Corp.) Bill Grundmann (Intel.
1 State Encoding of Large Asynchronous Controllers Josep Carmona and Jordi Cortadella Universitat Politècnica de Catalunya Barcelona, Spain.
Synthesis of Asynchronous Control Circuits with Automatically Generated Relative Timing Assumptions Jordi Cortadella, University Politècnica de Catalunya.
1 32-bit parallel load register with clock gating ECE Department, 200 Broun Hall, Auburn University, Auburn, AL 36849, USA Lan Luo ELEC.
ENEE 408C Lab Capstone Project: Digital System Design Fall 2005 Sequential Circuit Design.
Automatic synthesis and verification of asynchronous interface controllers Jordi CortadellaUniversitat Politècnica de Catalunya, Spain Michael KishinevskyIntel.
Embedded Systems Hardware: Storage Elements; Finite State Machines; Sequential Logic.
Chapter #6: Sequential Logic Design 6.2 Timing Methodologies
مرتضي صاحب الزماني  The registers are master-slave flip-flops (a.k.a. edge-triggered) –At the beginning of each cycle, propagate values from primary inputs.
Asynchronous Circuit Verification and Synthesis with Petri Nets J. Cortadella Universitat Politècnica de Catalunya, Barcelona Thanks to: Michael Kishinevsky.
1 Recap: Lectures 5 & 6 Classic Pipeline Styles 1. Williams and Horowitz’s PS0 pipeline 2. Sutherland’s micropipelines.
Advanced Digital Design Asynchronous EDA by A. Steininger, J. Lechner and R. Najvirt Vienna University of Technology.
Micropipeline design in asynchronous circuit Wilson Kwan M.A.Sc. Candidate Ottawa-Carleton Institute for Electrical & Computer Engineering (OCIECE) Carleton.
Using Mathematica for modeling, simulation and property checking of hardware systems Ghiath AL SAMMANE VDS group : Verification & Modeling of Digital systems.
© 2003 Xilinx, Inc. All Rights Reserved FPGA Design Techniques.
Synthesis Presented by: Ms. Sangeeta L. Mahaddalkar ME(Microelectronics) Sem II Subject: Subject:ASIC Design and FPGA.
2/6/2003IDEAL-IST Workshop, Christos P. Sotiriou, ICS-FORTH 1 IDEAL-IST Workshop Christos P. Sotiriou, Institute of Computer Science, FORTH.
Approaches to design entry Paolo PRINETTO Politecnico di Torino (Italy) University of Illinois at Chicago, IL (USA)
FORMAL VERIFICATION OF ADVANCED SYNTHESIS OPTIMIZATIONS Anant Kumar Jain Pradish Mathews Mike Mahar.
© 2003 Xilinx, Inc. All Rights Reserved Global Timing Constraints FPGA Design Flow Workshop.
1 COMP541 Sequential Circuits Montek Singh Feb 1, 2012.
Sp09 CMPEN 411 L18 S.1 CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 16: Static Sequential Circuits [Adapted from Rabaey’s Digital Integrated Circuits,
CEC 220 Digital Circuit Design Latches and Flip-Flops Monday, March 03 CEC 220 Digital Circuit Design Slide 1 of 19.
Automatic Pipelining during Sequential Logic Synthesis Jordi Cortadella Universitat Politècnica de Catalunya, Barcelona Joint work with Marc Galceran-Oms.
03/31/031 ECE 551: Digital System Design & Synthesis Lecture Set 8 8.1: Miscellaneous Synthesis (In separate file) 8.2: Sequential Synthesis.
1 Bridging the gap between asynchronous design and designers Peter A. BeerelFulcrum Microsystems, Calabasas Hills, CA, USA Jordi CortadellaUniversitat.
An Abstract Model of De- synchronous Circuit Design and Its Area Optimization Jin Gang University of Manchester.
Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California,
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 20: October 25, 2010 Pass Transistors.
CENG 241 Digital Design 1 Lecture 7 Amirali Baniasadi
Dept. of Electrical Engineering
Specification mining for asynchronous controllers Javier de San Pedro† Thomas Bourgeat ‡ Jordi Cortadella† † Universitat Politecnica de Catalunya ‡ Massachusetts.
1 Advanced Digital Design Asynchronous Design Automation by A. Steininger and J. Lechner Vienna University of Technology.
Asynchronous Interface Specification, Analysis and Synthesis
Reactive Clocks with Variability-Tracking Jitter
Ring Oscillator Clocks and Margins
Synthesis of asynchronous controllers from Signal Transition Graphs:
HIGH LEVEL SYNTHESIS.
De-synchronization: from synchronous to asynchronous
Design Methodology & HDL
Presentation transcript:

Handshake protocols for de-synchronization I. Blunno, J. Cortadella, A. Kondratyev, L. Lavagno, K. Lwin and C. Sotiriou Politecnico di Torino, Italy Universitat Politecnica de Catalunya, Barcelona, Spain Cadence Berkeley Lab, Berkeley, USA ICS-FORTH, Crete, Greece

Asynchronous for dummies I. Blunno, J. Cortadella, A. Kondratyev, L. Lavagno, K. Lwin and C. Sotiriou Politecnico di Torino, Italy Universitat Politecnica de Catalunya, Barcelona, Spain Cadence Berkeley Lab, Berkeley, USA ICS-FORTH, Crete, Greece

Outline What is de-synchronization ? What is de-synchronization ? Behavioral equivalence Behavioral equivalence 4-phase protocols for de-synchronization 4-phase protocols for de-synchronization Concurrency Concurrency Correctness Correctness An example An example

Synchronous CLK Asynchronous De-synchronize CLK

MS flip-flop Synchronous circuit CLK LLLL L L

De-synchronization LLLL L L C C C C C C

C C C C C C Distributed controllers substitute the clock network The data path remains intact !

Design flow Think synchronous Think synchronous Design synchronous: one clock and edge-triggered flip-flops Design synchronous: one clock and edge-triggered flip-flops De-synchronize (automatically) De-synchronize (automatically) Run it asynchronously Run it asynchronously

Prior work Micropipelines (Sutherland, 1989) Micropipelines (Sutherland, 1989) Local generation of clocks Local generation of clocks  Varshavsky et al., 1995  Kol and Ginosar, 1996 Theseus Logic (Ligthart et al., 2000) Theseus Logic (Ligthart et al., 2000)  Commercial HDL synthesis tools  Direct translation and special registers Phased logic (Linder and Harden, 1996) (Reese, Thornton, Traver, 2003) Phased logic (Linder and Harden, 1996) (Reese, Thornton, Traver, 2003)  Conceptually similar  Different handshake protocol (2 phase vs. 4 phase)

Automatic de-synchronization Devise an automatic method for de-synchronization Devise an automatic method for de-synchronization Identify a subclass of synchronous circuits suitable for de-synchronization Identify a subclass of synchronous circuits suitable for de-synchronization Formally prove correctness Formally prove correctness

Outline What is de-synchronization ? What is de-synchronization ? Behavioral equivalence Behavioral equivalence 4-phase protocols for de-synchronization 4-phase protocols for de-synchronization Concurrency Concurrency Correctness Correctness An example An example

Synchronous flow

De-synchronized flow

+

Flow equivalence [Guernic, Talpin, Lann, 2003]

A B

Flow equivalence CLK A B A B Synchronous behavior De-synchronized behavior

Flow equivalence CLK A B A B Synchronous behavior De-synchronized behavior

Outline What is de-synchronization ? What is de-synchronization ? Behavioral equivalence Behavioral equivalence 4-phase protocols for de-synchronization 4-phase protocols for de-synchronization Concurrency Concurrency Correctness Correctness An example An example

LLLL L L C C C C C C

C C C C C C

C L

ABCD A+ B- C+ D- A- B+ C- D+ A+ B- C+ D- A- B+ C- D+ A+ B- C+ D A latch cannot read another data item until the successor has captured the current one

ABCD A+ B- C+ D- A- B+ C- D+ A+ B- C+ D- A- B+ C- D+ A+ B- C+ D A latch cannot read another data item until the successor has captured the current one

ABCD A+ B- C+ D- A- B+ C- D+ A+ B- C+ D- A- B+ C- D+ A+ B- C+ D A latch cannot read another data item until the successor has captured the current one

ABCD A+ B- C+ D- A- B+ C- D+ A+ B- C+ D- A- B+ C- D+ A+ B- C+ D A latch cannot read another data item until the successor has captured the current one

ABCD A+ B- C+ D- A- B+ C- D+ A+ B- C+ D- A- B+ C- D+ A+ B- C+ D- 0000

ABCD A- B+ C- D+ A+ B- C+ D- A- B+ C- D+ A+ B- C+ D- 0001

ABCD A- B+ C- D+ A+ B- C+ D- A- B+ C- D+ A+ B- C+ D- 0000

ABCD A- B+ C- D+ A+ B- C+ D- A- B+ C- D+ A+ B- C+ D A latch cannot become opaque before having captured the data item from its predecessor

ABCD A+ B- C+ D- A- B+ C- D+ A+ B- C+ D- A- B+ C- D+ A+ B- C+ D A latch cannot become opaque before having captured the data item from its predecessor

ABCD A+ B- C+ D- A- B+ C- D+ A+ B- C+ D- A- B+ C- D+ A+ B- C+ D A latch cannot become opaque before having captured the data item from its predecessor

ABCD A+ B- C+ D- A- B+ C- D+ A+ B- C+ D- A- B+ C- D+ A+ B- C+ D A latch cannot become opaque before having captured the data item from its predecessor

ABCD A+ B- C+ D- A- B+ C- D+ A+ B- C+ D- A- B+ C- D+ A+ B- C+ D- 0000

ABCD A+ B+ C+ D+ A- B- C- D- A B

Outline What is de-synchronization ? What is de-synchronization ? Behavioral equivalence Behavioral equivalence 4-phase protocols for de-synchronization 4-phase protocols for de-synchronization Concurrency Concurrency Correctness Correctness An example An example

A+ B+ A- B- Can we increase concurrency ? A+ B+ A- B- A+ B+ A- B- not flow-equivalent

A B A B data overrun A B data lost

A+ B+ A- B- Can we reduce concurrency ? How much ?

A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- (8 states) (6 states) A+ B+ A- B- (5 states) A+ B+ A- B- A+ B+ A- B- (4 states)

A B A B A B A B A B A B fully decoupled (Furber & Day) simple 4-phase semi-decoupled (Furber & Day) non-overlapping GasP, IPCMOS de-synchronizationmodel

A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- simple 4-phase non-overlapping semi-decoupled (Furber & Day) fully decoupled (Furber & Day) GasP, IPCMOS de-synchronizationmodel

AB cntrl Ri Ai Rx Ax Ro Ao Ri+ A- Rx+ B- Ro+ Ai+ Ax+ Ao+ Ri- A+ Rx- B+ Ro- Ai- Ax- Ao- (semi-decoupled 4-phase protocol)

AB cntrl Ri Ai Rx Ax Ro Ao A- B- A+ B+ (semi-decoupled 4-phase protocol)

AB cntrl Ri Ai Rx Ax Ro Ao A- B- A+ B+ (semi-decoupled 4-phase protocol)

AB cntrl Ri Ai Rx Ax Ro Ao A- B- A+ B+ (semi-decoupled 4-phase protocol)

AB cntrl Ri Ai Rx Ax Ro Ao A- B- A+ B+ (semi-decoupled 4-phase protocol)

AB cntrl Ri Ai Rx Ax Ro Ao A- B- A+ B+ (semi-decoupled 4-phase protocol)

AB cntrl Ri Ai Rx Ax Ro Ao A- B- A+ B+ (semi-decoupled 4-phase protocol)

A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- A+ B+ A- B-

Outline What is de-synchronization ? What is de-synchronization ? Behavioral equivalence Behavioral equivalence 4-phase protocols for de-synchronization 4-phase protocols for de-synchronization Concurrency Concurrency Correctness Correctness An example An example

Which protocols are valid for de-synchronization ?

A+ B+ A- B- Theorem: the de-synchronization protocol preserves flow-equivalence Proof: by induction on the length of the traces Induction hypothesis: same latch values at reset Induction step: same values at cycle i  same values at cycle i+1

A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- A+ B+ A- B-

A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- Theorem: any reduction in concurrency preserves flow-equivalence

Any hybrid approach preserves flow-equivalence ! Semi-decoupledSemi-decoupled Semi-decoupled non-overlappingnon-overlapping FullydecoupledFullydecoupled

ABCD A+ B+ C+ D+ A- B- C- D-

ABCD A+ B+ C+ D+ A- B- C- D- semi- decoupled non- overlapping fully decoupled Flow-equivalence is preserved, … but …

Liveness Preservation of flow-equivalence: all the generated traces are equivalent Preservation of flow-equivalence: all the generated traces are equivalent Are all traces generated ? (Is the marked graph live ?) Not always ! Are all traces generated ? (Is the marked graph live ?) Not always !

A+ B+ C+ D+ A- B- C- D- Liveness: all cycles have at least one token [Commoner 1971] Semi-decoupled 4-phase handshake protocol

A+ B+ C+ D+ A- B- C- D- Simple 4-phase handshake protocol

Results about liveness At least three latches in a ring are required with only one data token circulating [Muller 1962] At least three latches in a ring are required with only one data token circulating [Muller 1962] (this paper): any hybrid combination of protocols is live if the simple 4-phase protocol is not used any cycle has at least one token Theorem (this paper): any hybrid combination of protocols is live if the simple 4-phase protocol is not used Proof: any cycle has at least one token

A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- simple 4-phase non-overlapping semi-decoupled (Furber & Day) fully decoupled (Furber & Day) GasP, IPCMOS de-synchronizationmodel

Outline What is de-synchronization ? What is de-synchronization ? Behavioral equivalence Behavioral equivalence 4-phase protocols for de-synchronization 4-phase protocols for de-synchronization Concurrency Concurrency Correctness Correctness An example An example

Async DLX block diagram

Synchronous RTL = Synchronous Desynchronized Cycle: 4.4ns Power: 70.9mW Area: 372,656  m Cycle: 4.45ns Power: 71.2mW Area: 378,058  m All numbers are after Placement & Routing All numbers are after Placement & Routing Total of 1500 flip-flops, 3000 latches Total of 1500 flip-flops, 3000 latches DE-SYNC design includes 5 controllers, each driving 2 clock trees DE-SYNC design includes 5 controllers, each driving 2 clock trees Power numbers include the clock tree Power numbers include the clock tree Technology: UCM/Virtual Silicon 0.18 µm Technology: UCM/Virtual Silicon 0.18 µm

De-synchronized DLX on FPGA (demo outside the conference room)

Discussion The de-synchronization model provides an abstraction of the timing behavior The de-synchronization model provides an abstraction of the timing behavior

[2,3] [1,2][8,9] [5,7] [3,5] [2,4] ABE F G C D [0,0][3,5] [5,7] [2,3] [2,4] [1,2] [8,9] [1,2] Timing analysis Exploration of the design space

A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- simple 4-phase non-overlapping semi-decoupled (Furber & Day) fully decoupled (Furber & Day) GasP, IPCMOS de-synchronizationmodel

Conclusions EDA tools require a formal support (they must work for all circuits) EDA tools require a formal support (they must work for all circuits) A complete characterization of 4-phase protocols has been presented (partial order based on concurrency) A complete characterization of 4-phase protocols has been presented (partial order based on concurrency) Design flow developed at Cadence Berkeley Labs Design flow developed at Cadence Berkeley Labs  Automated from gate netlist  Static timing analysis to derive matched delays  Constrained P&R to meet timing constraints