Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Cycle-Based Decomposition Method for Burst-Mode Asynchronous Controllers Melinda Y. Agyekum Steven M. Nowick Department of Computer Science Columbia.

Similar presentations


Presentation on theme: "A Cycle-Based Decomposition Method for Burst-Mode Asynchronous Controllers Melinda Y. Agyekum Steven M. Nowick Department of Computer Science Columbia."— Presentation transcript:

1 A Cycle-Based Decomposition Method for Burst-Mode Asynchronous Controllers Melinda Y. Agyekum Steven M. Nowick Department of Computer Science Columbia University {melinda, nowick}@cs.columbia.edu

2 2 Research Objective Transform a Burst-Mode (BM) or Extended Burst-Mode (XBM) Decomposed BM & XBM controllers must Collectively maintain the same behavior as the original Collectively maintain the same behavior as the original Individually adhere to all BM & XBM controller rules Individually adhere to all BM & XBM controller rules Original BM Controller 0 1 4 2 5 3 7 Decomposed BM Controllers 1 4 2 5 1 3 7 0 1 asynchronous controller into a set of decomposed controllers

3 3 Challenges & Motivation Decomposition Technique used to divide a controller into smaller controllers Technique used to divide a controller into smaller controllers Synchronous Decomposition Synchronous Decomposition  A large amount of work in this area Challenges Asynchronous more challenging than synchronous Asynchronous more challenging than synchronous  No regular clock or discrete schedule system  Loosely coupled concurrent system  Limited work in this area Motivation Our Primary Goal: Our Primary Goal:  Improve runtime of CAD tool (esp. for larger controllers)

4 4 Challenges & Motivation (continued) Motivation Our Secondary Goals: Our Secondary Goals:  Reduce next-state complexity –Decomposed controllers: much smaller next-state logic –Simplifies timing requirements »Narrows BM fundamental mode timing constraint  Potential reduction in power consumption –Only a single controller is active at a time –Control passed from controller to controller  Assists the designer –Alleviating manual decomposition –Providing a higher level of abstraction »Can write a single testbench for original BM controller »Apply it to the set of decomposed controllers

5 5Contributions  Novel Method for Decomposition For Burst-Mode: For Burst-Mode: 4 major parts 4 major parts  Decomposition algorithm  Controller micro-architecture  Inter-controller communication protocol  Auxiliary hardware –Optimizations to eliminate or simplify hardware Method for Extended Burst-Mode (XBM) Method for Extended Burst-Mode (XBM)  CAD Tool Implementation For both BM & XBM For both BM & XBM

6 6 Contributions (continued)  Improved Synthesis Results Runtime: 16-200x improvement Runtime: 16-200x improvement 1 st time synthesis of some examples 1 st time synthesis of some examples  Using a burst-mode synthesis tool (Minimalist/3D) Combinational blocks of logic Combinational blocks of logic  For several decomposed controllers

7 7Outline  Background Review of Burst-Mode Controllers Review of Burst-Mode Controllers  Overview of Approach  Decomposition Method  Details of Hardware Implementation Decomposed Controllers Decomposed Controllers Output Generators Output Generators  XBM Extension  Experimental Results  Related Work and Conclusions

8 8 Background: Burst-Mode Specification 05 43 2 1 Ain+ Bin - Bin + Cin+ Ain- | Zout+ Yout+ Zout+ Zout- Bin - Cin- Yout- Din-Zout+ Yout- Din+ Zout- Yout+ Bin + STATE TRANSITION INPUTSOUTPUTS Ain+ ARC INPUTS: Ain Bin Cin Din OUTPUTS: Yout Zout PROPERTIES: 1) Non-empty input burst 2) Maximal Input Set 3) Unique Entry Point

9 9 Background: Burst-Mode Implementation Burst-Mode Controllers A Huffman-Style asynchronous state machine A Huffman-Style asynchronous state machine Consists of: Consists of:  Primary inputs  Primary outputs  Fed-back state State is stored in the fed-back loops State is stored in the fed-back loops Combinational Logic i1 i2 o1 o2 Delay s1 s2

10 10 Background: Burst-Mode Implementation Two Simple One-Sided Timing Constraints Fed-back State Requirement Fed-back State Requirement  Fed-back path must be slower than the worse case forward output path Generalized Fundamental Mode Requirement Generalized Fundamental Mode Requirement  New inputs can not arrive until the entire machine has stabilized from the previous input burst  Hold-time requirement Combinational Logic i1 i2 o1 o2 Delay s1 s2

11 11 Burst-Mode Applications BM Machines in Practice Used in a large number of applications Used in a large number of applications Fabricated chips for: Fabricated chips for:  Hewlett-Packard – Mayfly & Stetson projects  NASA Goddard Space Flight Center (2006-present) –Uses Minimalist & BM controllers for space instrumentation –First fabricated chip has just come back Additional substantial real-world applications Additional substantial real-world applications  Cache, Diff-eq Solver, DRAM- & SCSI-controllers Several of these projects perform manual decomposition for complex specifications

12 12Outline  Background Review of Burst-Mode Controllers Review of Burst-Mode Controllers  Overview of Approach  Decomposition Method  Details of Hardware Implementation Decomposed Controllers Decomposed Controllers Output Generators Output Generators  XBM Extension  Experimental Results  Related Work and Conclusions

13 13 Overview of Approach: Decomposition Method Example Zout+ Ain+ 0 1 2 Bin - Zout- 3 2 Bin + Cin+ Yout+ Bin - Cin- Yout- 0 5 4 3 2 1 Ain+ Bin - Bin + Cin+ Ain- | Zout+ Yout+ Zout+ Zout- Bin - Cin- Yout- Bin + Ain+ Din- Zout+ Yout- Din+ Zout- Yout+ 5 4 Din- Zout+ Yout- Din+ Zout- Yout+ Original Monolithic Spec. Decomposed Specs. 6 ACKinA- | REQ1+ ACKinA+ | REQ1- REQ2+ 7 ACKinD+ | ACKinD- | REQ2- Child Monitoring Arcs ACK2+ ACK2- ACK1a+ ACK1a- ACK1b- ACK1b+ Req2+ REQ1+ Entry pt. REQ2- 4 Entry pt. 2 1 Ain- Zout+ Bin - Zout- Bin + Ain+ Entry pt. Top-Level Controller GOAL: Govern inter-controller communication & synchronization on channels

14 14 Overview of Approach: Partial Micro-Architecture 2 1 3 1 0 1 0 Implicit Connection BM Cntrl A (Parent) BM Cntrl B (Leaf) BM Cntrl C (Parent) BM Cntrl D (Leaf) 3 4 Ack1b+ … REQ2+ … | Ack1b- … | REQ1+ … | ACK2- … | ACK2+ … | ACK1a+ … | ACK1a- ACKinA- | REQ1+ 6 ACKinA+ | REQ1- REQ2+ 7 ACKinC+ | ACKinC- | REQ2- Specification Micro-Architecture Top-level Controller REQ1 ACK1a REQ2 ACK2 ACK1b BM Cntrl A (Parent) BM Cntrl B (Leaf) BM Cntrl C (Parent) BM Cntrl D (Leaf) Channel 2 Channel 1 Add Communicatio n Channel

15 15 Overview of Approach: Top-Level Communication Protocol Micro-Architecture Top-level Controller REQ1 ACK1a REQ2 ACK2 ACK1b BM Cntrl A (Parent) BM Cntrl B (Leaf) BM Cntrl C (Parent) BM Cntrl D (Leaf) Channel 2 Channel 1 Top-Level controller is active The parent broadcasts a REQ to all of its children Only a single child responds with an ACK Parent de-asserts REQ, passes control to the child, and suspends The child is now active and continues At some point the child completes and de-asserts ACK The parent then becomes active and resumes control Parent Suspends Child Disabled Disabled Child Disabled Child Active Parent Active Parent sends REQ Child is polled Child loses Child wins

16 16 Complete System Micro-Architecture zout Output Generator zout C_to_zout B_to_zout A_to_zout yout D_to_yout C_to_yout Output Generator yout Primary Outputs Primary Output Generators Decomposed Controllers Primary Inputs bin ain cin din Top-level Controller REQ1 ACK1a REQ2 ACK2 ACK1b BM Cntrl A (Parent) BM Cntrl B (Leaf) BM Cntrl C (Parent) BM Cntrl D (Leaf) ain bin A_to_zout ain bin cin din B_to_zout C_to_zout C_to_yout D_to_yout Intermediate Outputs Signals

17 17Outline  Background Review of Burst-Mode Controllers Review of Burst-Mode Controllers  Overview of Approach  Decomposition Method  Details of Hardware Implementation Decomposed Controllers Decomposed Controllers Output Generators Output Generators  XBM Extension  Experimental Results  Related Work and Conclusions

18 18 Decomposition Method: Intuition on Approach Goal: determine where partitioning is possible Idea: start at root and traverse graph “Region” = self-contained sub-graph “Region” = self-contained sub-graph If region is “closed” If region is “closed”  Cut & form a new partition  Continue hierarchical exploration If region is “not closed” If region is “not closed”  Indicates multiple ways to exit region  Do not cut & continue exploring hierarchically

19 19 Decomposition Method: Formal View Main Idea: Identify and cut “closed regions” Region = self-contained sub-graph Region = self-contained sub-graph  Reachable via “ancestor path” = simple path from root to a decision point  Region starts at a decision point  Includes a given outgoing arc  Contains all reachable states & arcs not previously visited Closed = only a single point for entry and exit Closed = only a single point for entry and exit  Must enter and exit region through same point! Decision pt Ancestor path Region Outgoing arcs Start Point Region closed Entry Exit

20 20 Decomposition Method: Formal View Region NOT closed Decision pt Ancestor path Region Decision pt Outgoing arcs Entry Main Idea: Identify and cut “closed regions” Region = self-contained sub-graph Region = self-contained sub-graph  Reachable via “ancestor path” = simple path from root to a decision point  Region starts at a decision point  Includes a given outgoing arc  Contains all reachable states & arcs not previously visited Closed = only a single point for entry and exit Closed = only a single point for entry and exit  Must enter and exit region through same point! Start Point Exit

21 21 Decomposition Method: Example 3 2 1 0 Decision Point Example with two decision points Top-Level Segment = Ancestor Path Exit point Closed Region Cut-point Can we cut here? Exit point Closed Region Cut-point 4 Can we cut here? Entry point

22 22 Decomposition Method: Example 3 2 1 0 Example with two decision points Cut-point Closed Region Cut-point 4 Can we cut here? Top-Level segment Top-Level segment= Ancestor Path Decision Point Entry point Exit point Cut-point Can we cut here? Exit point Do not cut! Hit ancestor decision pt.

23 23 Decomposition Method: Example 3 2 1 0 Decision Point Example with a two decision points 4 1 0 2 1 4 3 4 Partitions Created Entry Pt. Top-level Controller Entry Pt. Uncut region 3 01

24 24 Decomposition Method: Algorithm Formal Algorithm Graph-Based algorithm performs modified DFS Graph-Based algorithm performs modified DFS  Forward Direction –Explores reachable regions –Only marks decision points (revisits non-decision points)  Backward Direction –Controller strands grown –Tests for “closed reachability” –When detected, cut strands (= create a new controller) Complete details of the formal algorithm are presented in the paper

25 25Outline  Background Review of Burst-Mode Controllers Review of Burst-Mode Controllers  Overview of Approach  Decomposition Method  Details of Hardware Implementation Decomposed Controllers Decomposed Controllers Output Generators Output Generators  XBM Extension  Experimental Results  Related Work and Conclusions

26 26 Details of Hardware Imp.: Decomposed Controllers Primary Input Latches Transparent D-latches Transparent D-latches  Control when primary inputs can be received By default all primary inputs are blocked By default all primary inputs are blocked Input Latch Enable: controlled by “activation channel” Handles two scenarios: Handles two scenarios:  “Controller as a child” = activated  “Controller as a parent” = activating D Q Latch Enable Primary Input REQACK Generic Input Latch Structure Filtered Input BM Cntrl ACK Core BM Cntrl Decomposed Controller Activation Channel

27 27 Details of Hardware Imp.: Decomposed Controllers “Controller as a Child” Handles case when controller is activated Handles case when controller is activated ACK REQ D Q Ain Ain_i Latch Enable Unit 1 2 3 BM Specification Fragment Ain+ | Zout+ Bin+ | Zout- Entry Pt. Activation Channel Latch Structure for input Ain

28 28 Details of Hardware Imp.: Decomposed Controllers “Controller as a Parent” Idea: “parent” gets latch disabled when control passed to child “parent” gets latch disabled when control passed to child latch re-enabled when child completes latch re-enabled when child completes Parent’s REQ Parent’s ACK Disabling Unit Child’s REQ Child’s ACK Enabling Unit D Q Filtered Input Primary Input Generic Input Latch Structure Gate-Level Implementation

29 29 Details of Hardware Imp.: Output Generator BM CntrlA Output Logic BM CntrlD Output Logic Decomposed BM Controllers CntrlA_To_Output CntrlB_To_Output CntrlD_To_Output Primary Output Generator Block View Can be XOR, XNOR, AND, OR, or a single wire Output generator is determined by the initial output value of the decomposed controllers and the original BM controller. BM CntrlB Output Logic Primary output

30 30Outline  Background Review of Burst-Mode Controllers Review of Burst-Mode Controllers  Overview of Approach  Decomposition Method  Details of Hardware Implementation Decomposed Controllers Decomposed Controllers Output Generators Output Generators  XBM Extension  Experimental Results  Related Work and Conclusions

31 31 Extended Burst-Mode (XBM) Extension XBM Background More expressive form of BM controller More expressive form of BM controller Supports 2 new features: Supports 2 new features:  Directed Don’t Cares (DDCs) – –Allow concurrent inputs and outputs  Conditionals – –Permits level sampling of signals 0123456 ok+ Rin*/ FRout+ FAin+ Rin*/ FRout- FAin- Rin+/ Aout+ Rin* FAin+/ FRout- Rin-/ Aout- FRout+ Rin-/ Aout- ok- Rin*/ -- XBM can handle glitchy inputs! Decomposition method can also be applied to XBM Rin+ FAin-/ Aout+

32 32 Extended Burst-Mode (XBM) Extension XBM Decomposition Method Graph-based decomposition: Graph-based decomposition:  Uses same method as for BM New: simple post-processing step New: simple post-processing step   Remove/modify some XBM signals: locally mimic BM spec   Most signals remain unaffected

33 33Outline  Background Review of Burst-Mode Controllers Review of Burst-Mode Controllers  Overview  Decomposition Method  Details of Hardware Implementation Decomposed Controllers Decomposed Controllers Output Generators Output Generators  XBM Extension  Experimental Results  Related Work and Conclusions

34 34 Experimental Results Automated CAD Tool (bm-decomp)  Approx. 2100 lines C Code  Fully automated & implements all optimizations Benchmarks: From wide range of academic & industrial projects  Cutoff criteria used to focus on larger examples –BM examples > 12-71 states & up to 16 inputs/19 outputs –XBM examples > 9–28 states & up to 21 inputs/24 outputs BM Synthesis Flow  Uses Minimalist CAD framework [Fuhrer/Nowick] –Default runs: Used existing speed script  up to 10 hours »With optimal state assignment –If run failed: Used command-line mode »With basic critical race free state assignment XBM Synthesis Flow  Uses 3D CAD tool [Yun/Dill] –Default runs: Used script given with tool –If run failed: No backup mode – 3D has only one mode

35 35 Experimental Results (Burst-Mode) 83.72 2.58 Over 200x runtime improvement Over 400x runtime improvement Produced simple combinational logic blocks for 7 out of 10 runs Less than 1 second to run on all examples 5 failed to complete optimal script after 10 hrs 1 benchmark for 1 decomposed controller failed on optimal script 1 failed manual run

36 36 Experimental Results (XBM) In some cases between 4-12x runtime improvement No implementation returned

37 37 Experimental Results: Input Optimizations Basic Goal: Remove or simplify input latches Complete Input Latch Removal Reduction in Strength 31%: Unlatched inputs 44%: 2-input gates 25%: Latched inputs Two Techniques

38 38 Experimental Results: Output Optimization Reduction in Strength Basic Idea: XOR/XNOR can always be used Basic Idea: XOR/XNOR can always be used Goal: Replace with AND/OR or single wire (when possible) Goal: Replace with AND/OR or single wire (when possible) AND, OR, & single wire used 84% of the time

39 39Outline  Background Review of Burst-Mode Controllers Review of Burst-Mode Controllers  Overview of Approach  Decomposition Method  Details of Hardware Implementation Decomposed Controllers Decomposed Controllers Output Generators Output Generators  XBM Extension  Experimental Results  Related Work and Conclusions

40 40 Related Work System-Level Decomposition: asynchronous A large system is decomposed into datapath & control Handshake circuits synthesis (Berkel92, Bardsley97) Handshake circuits synthesis (Berkel92, Bardsley97) Quasi-Delay insensitive (QDI) flow (Martin86, 90) Quasi-Delay insensitive (QDI) flow (Martin86, 90) High-Level synthesis flow (Theobald01) High-Level synthesis flow (Theobald01) Differences: Differences:  Do not focus on individual controllers  Fairly coarse-grained High-Level synthesis flow (Kudva96) High-Level synthesis flow (Kudva96)  Control partitioned into sub-controllers  Limitations: Specification must follow a strict series-parallel structure Controller-Based Decomposition: sync and async Synchronous: Synchronous:  Decomposition for low power (Benini98) –Differences: Partitions based on computational locality

41 41 Related Work (continued) Controller-Based Decomposition Asynchronous: QDI Circuits Asynchronous: QDI Circuits  Net contraction (Chu87, Yoneda04) –Projects a Petri-net specification into smaller controllers  Source language-level decomposition technique (Kapoor04) –Introduce heuristics to resolve state coding conflicts  Direct mapping approach (Bystrov02) –Template-based mapping of places into David Cells  Differences: –Limited structure –Alternative methods for decomposing Asynchronous: Burst-Mode Controllers (Beister99) Asynchronous: Burst-Mode Controllers (Beister99)  Output partitioning to translate a Petri-net into XBM controller –Variant of net contraction –Limitations: »A complicated basic method »No benchmark results reported

42 42Conclusions Decomposition Approach Decomposition technique for BM and XBM controllers Decomposition technique for BM and XBM controllers  Main Idea: Partitions if a sub-region is “closed” Inter-controller communication protocol Inter-controller communication protocol Additional hardware Additional hardware  Optimizations proposed to remove & reduce hardware CAD tool developed CAD tool developed Significant improvements: Significant improvements:  16-200x greater runtime  1 st time synthesis of several larger examples

43 43 Any Questions?


Download ppt "A Cycle-Based Decomposition Method for Burst-Mode Asynchronous Controllers Melinda Y. Agyekum Steven M. Nowick Department of Computer Science Columbia."

Similar presentations


Ads by Google