Download presentation
Presentation is loading. Please wait.
Published byVincent O’Connor’ Modified over 9 years ago
1
A Cycle-Based Decomposition Method for Burst-Mode Asynchronous Controllers Melinda Y. Agyekum Steven M. Nowick Department of Computer Science Columbia University {melinda, nowick}@cs.columbia.edu
2
2 Research Objective Transform a Burst-Mode (BM) or Extended Burst-Mode (XBM) Decomposed BM & XBM controllers must Collectively maintain the same behavior as the original Collectively maintain the same behavior as the original Individually adhere to all BM & XBM controller rules Individually adhere to all BM & XBM controller rules Original BM Controller 0 1 4 2 5 3 7 Decomposed BM Controllers 1 4 2 5 1 3 7 0 1 asynchronous controller into a set of decomposed controllers
3
3 Challenges & Motivation Decomposition Technique used to divide a controller into smaller controllers Technique used to divide a controller into smaller controllers Synchronous Decomposition Synchronous Decomposition A large amount of work in this area Challenges Asynchronous more challenging than synchronous Asynchronous more challenging than synchronous No regular clock or discrete schedule system Loosely coupled concurrent system Limited work in this area Motivation Our Primary Goal: Our Primary Goal: Improve runtime of CAD tool (esp. for larger controllers)
4
4 Challenges & Motivation (continued) Motivation Our Secondary Goals: Our Secondary Goals: Reduce next-state complexity –Decomposed controllers: much smaller next-state logic –Simplifies timing requirements »Narrows BM fundamental mode timing constraint Potential reduction in power consumption –Only a single controller is active at a time –Control passed from controller to controller Assists the designer –Alleviating manual decomposition –Providing a higher level of abstraction »Can write a single testbench for original BM controller »Apply it to the set of decomposed controllers
5
5Contributions Novel Method for Decomposition For Burst-Mode: For Burst-Mode: 4 major parts 4 major parts Decomposition algorithm Controller micro-architecture Inter-controller communication protocol Auxiliary hardware –Optimizations to eliminate or simplify hardware Method for Extended Burst-Mode (XBM) Method for Extended Burst-Mode (XBM) CAD Tool Implementation For both BM & XBM For both BM & XBM
6
6 Contributions (continued) Improved Synthesis Results Runtime: 16-200x improvement Runtime: 16-200x improvement 1 st time synthesis of some examples 1 st time synthesis of some examples Using a burst-mode synthesis tool (Minimalist/3D) Combinational blocks of logic Combinational blocks of logic For several decomposed controllers
7
7Outline Background Review of Burst-Mode Controllers Review of Burst-Mode Controllers Overview of Approach Decomposition Method Details of Hardware Implementation Decomposed Controllers Decomposed Controllers Output Generators Output Generators XBM Extension Experimental Results Related Work and Conclusions
8
8 Background: Burst-Mode Specification 05 43 2 1 Ain+ Bin - Bin + Cin+ Ain- | Zout+ Yout+ Zout+ Zout- Bin - Cin- Yout- Din-Zout+ Yout- Din+ Zout- Yout+ Bin + STATE TRANSITION INPUTSOUTPUTS Ain+ ARC INPUTS: Ain Bin Cin Din OUTPUTS: Yout Zout PROPERTIES: 1) Non-empty input burst 2) Maximal Input Set 3) Unique Entry Point
9
9 Background: Burst-Mode Implementation Burst-Mode Controllers A Huffman-Style asynchronous state machine A Huffman-Style asynchronous state machine Consists of: Consists of: Primary inputs Primary outputs Fed-back state State is stored in the fed-back loops State is stored in the fed-back loops Combinational Logic i1 i2 o1 o2 Delay s1 s2
10
10 Background: Burst-Mode Implementation Two Simple One-Sided Timing Constraints Fed-back State Requirement Fed-back State Requirement Fed-back path must be slower than the worse case forward output path Generalized Fundamental Mode Requirement Generalized Fundamental Mode Requirement New inputs can not arrive until the entire machine has stabilized from the previous input burst Hold-time requirement Combinational Logic i1 i2 o1 o2 Delay s1 s2
11
11 Burst-Mode Applications BM Machines in Practice Used in a large number of applications Used in a large number of applications Fabricated chips for: Fabricated chips for: Hewlett-Packard – Mayfly & Stetson projects NASA Goddard Space Flight Center (2006-present) –Uses Minimalist & BM controllers for space instrumentation –First fabricated chip has just come back Additional substantial real-world applications Additional substantial real-world applications Cache, Diff-eq Solver, DRAM- & SCSI-controllers Several of these projects perform manual decomposition for complex specifications
12
12Outline Background Review of Burst-Mode Controllers Review of Burst-Mode Controllers Overview of Approach Decomposition Method Details of Hardware Implementation Decomposed Controllers Decomposed Controllers Output Generators Output Generators XBM Extension Experimental Results Related Work and Conclusions
13
13 Overview of Approach: Decomposition Method Example Zout+ Ain+ 0 1 2 Bin - Zout- 3 2 Bin + Cin+ Yout+ Bin - Cin- Yout- 0 5 4 3 2 1 Ain+ Bin - Bin + Cin+ Ain- | Zout+ Yout+ Zout+ Zout- Bin - Cin- Yout- Bin + Ain+ Din- Zout+ Yout- Din+ Zout- Yout+ 5 4 Din- Zout+ Yout- Din+ Zout- Yout+ Original Monolithic Spec. Decomposed Specs. 6 ACKinA- | REQ1+ ACKinA+ | REQ1- REQ2+ 7 ACKinD+ | ACKinD- | REQ2- Child Monitoring Arcs ACK2+ ACK2- ACK1a+ ACK1a- ACK1b- ACK1b+ Req2+ REQ1+ Entry pt. REQ2- 4 Entry pt. 2 1 Ain- Zout+ Bin - Zout- Bin + Ain+ Entry pt. Top-Level Controller GOAL: Govern inter-controller communication & synchronization on channels
14
14 Overview of Approach: Partial Micro-Architecture 2 1 3 1 0 1 0 Implicit Connection BM Cntrl A (Parent) BM Cntrl B (Leaf) BM Cntrl C (Parent) BM Cntrl D (Leaf) 3 4 Ack1b+ … REQ2+ … | Ack1b- … | REQ1+ … | ACK2- … | ACK2+ … | ACK1a+ … | ACK1a- ACKinA- | REQ1+ 6 ACKinA+ | REQ1- REQ2+ 7 ACKinC+ | ACKinC- | REQ2- Specification Micro-Architecture Top-level Controller REQ1 ACK1a REQ2 ACK2 ACK1b BM Cntrl A (Parent) BM Cntrl B (Leaf) BM Cntrl C (Parent) BM Cntrl D (Leaf) Channel 2 Channel 1 Add Communicatio n Channel
15
15 Overview of Approach: Top-Level Communication Protocol Micro-Architecture Top-level Controller REQ1 ACK1a REQ2 ACK2 ACK1b BM Cntrl A (Parent) BM Cntrl B (Leaf) BM Cntrl C (Parent) BM Cntrl D (Leaf) Channel 2 Channel 1 Top-Level controller is active The parent broadcasts a REQ to all of its children Only a single child responds with an ACK Parent de-asserts REQ, passes control to the child, and suspends The child is now active and continues At some point the child completes and de-asserts ACK The parent then becomes active and resumes control Parent Suspends Child Disabled Disabled Child Disabled Child Active Parent Active Parent sends REQ Child is polled Child loses Child wins
16
16 Complete System Micro-Architecture zout Output Generator zout C_to_zout B_to_zout A_to_zout yout D_to_yout C_to_yout Output Generator yout Primary Outputs Primary Output Generators Decomposed Controllers Primary Inputs bin ain cin din Top-level Controller REQ1 ACK1a REQ2 ACK2 ACK1b BM Cntrl A (Parent) BM Cntrl B (Leaf) BM Cntrl C (Parent) BM Cntrl D (Leaf) ain bin A_to_zout ain bin cin din B_to_zout C_to_zout C_to_yout D_to_yout Intermediate Outputs Signals
17
17Outline Background Review of Burst-Mode Controllers Review of Burst-Mode Controllers Overview of Approach Decomposition Method Details of Hardware Implementation Decomposed Controllers Decomposed Controllers Output Generators Output Generators XBM Extension Experimental Results Related Work and Conclusions
18
18 Decomposition Method: Intuition on Approach Goal: determine where partitioning is possible Idea: start at root and traverse graph “Region” = self-contained sub-graph “Region” = self-contained sub-graph If region is “closed” If region is “closed” Cut & form a new partition Continue hierarchical exploration If region is “not closed” If region is “not closed” Indicates multiple ways to exit region Do not cut & continue exploring hierarchically
19
19 Decomposition Method: Formal View Main Idea: Identify and cut “closed regions” Region = self-contained sub-graph Region = self-contained sub-graph Reachable via “ancestor path” = simple path from root to a decision point Region starts at a decision point Includes a given outgoing arc Contains all reachable states & arcs not previously visited Closed = only a single point for entry and exit Closed = only a single point for entry and exit Must enter and exit region through same point! Decision pt Ancestor path Region Outgoing arcs Start Point Region closed Entry Exit
20
20 Decomposition Method: Formal View Region NOT closed Decision pt Ancestor path Region Decision pt Outgoing arcs Entry Main Idea: Identify and cut “closed regions” Region = self-contained sub-graph Region = self-contained sub-graph Reachable via “ancestor path” = simple path from root to a decision point Region starts at a decision point Includes a given outgoing arc Contains all reachable states & arcs not previously visited Closed = only a single point for entry and exit Closed = only a single point for entry and exit Must enter and exit region through same point! Start Point Exit
21
21 Decomposition Method: Example 3 2 1 0 Decision Point Example with two decision points Top-Level Segment = Ancestor Path Exit point Closed Region Cut-point Can we cut here? Exit point Closed Region Cut-point 4 Can we cut here? Entry point
22
22 Decomposition Method: Example 3 2 1 0 Example with two decision points Cut-point Closed Region Cut-point 4 Can we cut here? Top-Level segment Top-Level segment= Ancestor Path Decision Point Entry point Exit point Cut-point Can we cut here? Exit point Do not cut! Hit ancestor decision pt.
23
23 Decomposition Method: Example 3 2 1 0 Decision Point Example with a two decision points 4 1 0 2 1 4 3 4 Partitions Created Entry Pt. Top-level Controller Entry Pt. Uncut region 3 01
24
24 Decomposition Method: Algorithm Formal Algorithm Graph-Based algorithm performs modified DFS Graph-Based algorithm performs modified DFS Forward Direction –Explores reachable regions –Only marks decision points (revisits non-decision points) Backward Direction –Controller strands grown –Tests for “closed reachability” –When detected, cut strands (= create a new controller) Complete details of the formal algorithm are presented in the paper
25
25Outline Background Review of Burst-Mode Controllers Review of Burst-Mode Controllers Overview of Approach Decomposition Method Details of Hardware Implementation Decomposed Controllers Decomposed Controllers Output Generators Output Generators XBM Extension Experimental Results Related Work and Conclusions
26
26 Details of Hardware Imp.: Decomposed Controllers Primary Input Latches Transparent D-latches Transparent D-latches Control when primary inputs can be received By default all primary inputs are blocked By default all primary inputs are blocked Input Latch Enable: controlled by “activation channel” Handles two scenarios: Handles two scenarios: “Controller as a child” = activated “Controller as a parent” = activating D Q Latch Enable Primary Input REQACK Generic Input Latch Structure Filtered Input BM Cntrl ACK Core BM Cntrl Decomposed Controller Activation Channel
27
27 Details of Hardware Imp.: Decomposed Controllers “Controller as a Child” Handles case when controller is activated Handles case when controller is activated ACK REQ D Q Ain Ain_i Latch Enable Unit 1 2 3 BM Specification Fragment Ain+ | Zout+ Bin+ | Zout- Entry Pt. Activation Channel Latch Structure for input Ain
28
28 Details of Hardware Imp.: Decomposed Controllers “Controller as a Parent” Idea: “parent” gets latch disabled when control passed to child “parent” gets latch disabled when control passed to child latch re-enabled when child completes latch re-enabled when child completes Parent’s REQ Parent’s ACK Disabling Unit Child’s REQ Child’s ACK Enabling Unit D Q Filtered Input Primary Input Generic Input Latch Structure Gate-Level Implementation
29
29 Details of Hardware Imp.: Output Generator BM CntrlA Output Logic BM CntrlD Output Logic Decomposed BM Controllers CntrlA_To_Output CntrlB_To_Output CntrlD_To_Output Primary Output Generator Block View Can be XOR, XNOR, AND, OR, or a single wire Output generator is determined by the initial output value of the decomposed controllers and the original BM controller. BM CntrlB Output Logic Primary output
30
30Outline Background Review of Burst-Mode Controllers Review of Burst-Mode Controllers Overview of Approach Decomposition Method Details of Hardware Implementation Decomposed Controllers Decomposed Controllers Output Generators Output Generators XBM Extension Experimental Results Related Work and Conclusions
31
31 Extended Burst-Mode (XBM) Extension XBM Background More expressive form of BM controller More expressive form of BM controller Supports 2 new features: Supports 2 new features: Directed Don’t Cares (DDCs) – –Allow concurrent inputs and outputs Conditionals – –Permits level sampling of signals 0123456 ok+ Rin*/ FRout+ FAin+ Rin*/ FRout- FAin- Rin+/ Aout+ Rin* FAin+/ FRout- Rin-/ Aout- FRout+ Rin-/ Aout- ok- Rin*/ -- XBM can handle glitchy inputs! Decomposition method can also be applied to XBM Rin+ FAin-/ Aout+
32
32 Extended Burst-Mode (XBM) Extension XBM Decomposition Method Graph-based decomposition: Graph-based decomposition: Uses same method as for BM New: simple post-processing step New: simple post-processing step Remove/modify some XBM signals: locally mimic BM spec Most signals remain unaffected
33
33Outline Background Review of Burst-Mode Controllers Review of Burst-Mode Controllers Overview Decomposition Method Details of Hardware Implementation Decomposed Controllers Decomposed Controllers Output Generators Output Generators XBM Extension Experimental Results Related Work and Conclusions
34
34 Experimental Results Automated CAD Tool (bm-decomp) Approx. 2100 lines C Code Fully automated & implements all optimizations Benchmarks: From wide range of academic & industrial projects Cutoff criteria used to focus on larger examples –BM examples > 12-71 states & up to 16 inputs/19 outputs –XBM examples > 9–28 states & up to 21 inputs/24 outputs BM Synthesis Flow Uses Minimalist CAD framework [Fuhrer/Nowick] –Default runs: Used existing speed script up to 10 hours »With optimal state assignment –If run failed: Used command-line mode »With basic critical race free state assignment XBM Synthesis Flow Uses 3D CAD tool [Yun/Dill] –Default runs: Used script given with tool –If run failed: No backup mode – 3D has only one mode
35
35 Experimental Results (Burst-Mode) 83.72 2.58 Over 200x runtime improvement Over 400x runtime improvement Produced simple combinational logic blocks for 7 out of 10 runs Less than 1 second to run on all examples 5 failed to complete optimal script after 10 hrs 1 benchmark for 1 decomposed controller failed on optimal script 1 failed manual run
36
36 Experimental Results (XBM) In some cases between 4-12x runtime improvement No implementation returned
37
37 Experimental Results: Input Optimizations Basic Goal: Remove or simplify input latches Complete Input Latch Removal Reduction in Strength 31%: Unlatched inputs 44%: 2-input gates 25%: Latched inputs Two Techniques
38
38 Experimental Results: Output Optimization Reduction in Strength Basic Idea: XOR/XNOR can always be used Basic Idea: XOR/XNOR can always be used Goal: Replace with AND/OR or single wire (when possible) Goal: Replace with AND/OR or single wire (when possible) AND, OR, & single wire used 84% of the time
39
39Outline Background Review of Burst-Mode Controllers Review of Burst-Mode Controllers Overview of Approach Decomposition Method Details of Hardware Implementation Decomposed Controllers Decomposed Controllers Output Generators Output Generators XBM Extension Experimental Results Related Work and Conclusions
40
40 Related Work System-Level Decomposition: asynchronous A large system is decomposed into datapath & control Handshake circuits synthesis (Berkel92, Bardsley97) Handshake circuits synthesis (Berkel92, Bardsley97) Quasi-Delay insensitive (QDI) flow (Martin86, 90) Quasi-Delay insensitive (QDI) flow (Martin86, 90) High-Level synthesis flow (Theobald01) High-Level synthesis flow (Theobald01) Differences: Differences: Do not focus on individual controllers Fairly coarse-grained High-Level synthesis flow (Kudva96) High-Level synthesis flow (Kudva96) Control partitioned into sub-controllers Limitations: Specification must follow a strict series-parallel structure Controller-Based Decomposition: sync and async Synchronous: Synchronous: Decomposition for low power (Benini98) –Differences: Partitions based on computational locality
41
41 Related Work (continued) Controller-Based Decomposition Asynchronous: QDI Circuits Asynchronous: QDI Circuits Net contraction (Chu87, Yoneda04) –Projects a Petri-net specification into smaller controllers Source language-level decomposition technique (Kapoor04) –Introduce heuristics to resolve state coding conflicts Direct mapping approach (Bystrov02) –Template-based mapping of places into David Cells Differences: –Limited structure –Alternative methods for decomposing Asynchronous: Burst-Mode Controllers (Beister99) Asynchronous: Burst-Mode Controllers (Beister99) Output partitioning to translate a Petri-net into XBM controller –Variant of net contraction –Limitations: »A complicated basic method »No benchmark results reported
42
42Conclusions Decomposition Approach Decomposition technique for BM and XBM controllers Decomposition technique for BM and XBM controllers Main Idea: Partitions if a sub-region is “closed” Inter-controller communication protocol Inter-controller communication protocol Additional hardware Additional hardware Optimizations proposed to remove & reduce hardware CAD tool developed CAD tool developed Significant improvements: Significant improvements: 16-200x greater runtime 1 st time synthesis of several larger examples
43
43 Any Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.