Lab for Reliable Computing Generalized Latency-Insensitive Systems for Single-Clock and Multi-Clock Architectures Singh, M.; Theobald, M.; Design, Automation.

Slides:



Advertisements
Similar presentations
Presenter : Ching-Hua Huang 2013/11/4 Temporal Parallel Simulation: A Fast Gate-level HDL Simulation Using Higher Level Models Cited count : 3 Dusung Kim.
Advertisements

Seyedehmehrnaz Mireslami, Mohammad Moshirpour, Behrouz H. Far Department of Electrical and Computer Engineering University of Calgary, Canada {smiresla,
Finite State Machines (FSMs)
Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.
Digital Design - Sequential Logic Design Chapter 3 - Sequential Logic Design.
Presenter : Ching-Hua Huang 2012/4/16 A Low-latency GALS Interface Implementation Yuan-Teng Chang; Wei-Che Chen; Hung-Yue Tsai; Wei-Min Cheng; Chang-Jiu.
Spartan II Features  Plentiful logic and memory resources –15K to 200K system gates (up to 5,292 logic cells) –Up to 57 Kb block RAM storage  Flexible.
RTL Hardware Design by P. Chu Chapter 161 Clock and Synchronization.
1 Asynchronous Bit-stream Compression (ABC) IEEE 2006 ABC Asynchronous Bit-stream Compression Arkadiy Morgenshtein, Avinoam Kolodny, Ran Ginosar Technion.
Addressing the System-on-a-Chip Interconnect Woes Through Communication-Based Design N. Vinay Krishnan EE249 Class Presentation.
Montek Singh COMP Nov 10,  Design questions at various leves ◦ Network Adapter design ◦ Network level: topology and routing ◦ Link level:
Technical Architectures
Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi.
Low Power Design for Wireless Sensor Networks Aki Happonen.
COMP Clockless Logic and Silicon Compilers Lecture 3
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Models of Computation for Embedded System Design Alvise Bonivento.
High Speed Digital Systems Lab Spring/Winter 2010 Part A final presentation Instructor: Rolf Hilgendorf Students: Elad Mor, Ilya Zavolsky Integration of.
System-Level Types for Component-Based Design Paper by: Edward A. Lee and Yuhong Xiong Presentation by: Dan Patterson.
Lecture 11 MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
INPUT-OUTPUT ORGANIZATION
Sequential Circuits Chapter 4 S. Dandamudi To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer,  S.
All Optical Switching Architectures. Introduction Optical switches are necessary for achieving reliable, fast and flexible modular communication means.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
Presenter : Cheng-Ta Wu Vijay D’silva, S. Ramesh Indian Institute of Technology Bombay Arcot Sowmya University of New South Wales, Sydney.
Digital System Bus A bus in a digital system is a collection of (usually unbroken) signal lines that carry module-to-module communications. The signals.
LOGO BUS SYSTEM Members: Bui Thi Diep Nguyen Thi Ngoc Mai Vu Thi Thuy Class: 1c06.
COE4OI5 Engineering Design. Copyright S. Shirani 2 Course Outline Design process, design of digital hardware Programmable logic technology Altera’s UP2.
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
CS1Q Computer Systems Lecture 11 Simon Gay. Lecture 11CS1Q Computer Systems - Simon Gay2 The D FlipFlop A 1-bit register is called a D flipflop. When.
Mohamed Hefeeda 1 School of Computing Science Simon Fraser University, Canada Video Streaming over Cooperative Wireless Networks Mohamed Hefeeda (Joint.
Top Level View of Computer Function and Interconnection.
CMOS Design Methods.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
(More) Interfacing concepts. Introduction Overview of I/O operations Programmed I/O – Standard I/O – Memory Mapped I/O Device synchronization Readings:
Paper review: High Speed Dynamic Asynchronous Pipeline: Self Precharging Style Name : Chi-Chuan Chuang Date : 2013/03/20.
Optimal digital circuit design Mohammad Sharifkhani.
Presenter : Ching-Hua Huang 2012/6/25 A High-Throughput, Metastability-Free GALS Channel Based on Pausible Clock Method Mohammad Ali Rahimian, Siamak Mohammadi,
Parallel architecture Technique. Pipelining Processor Pipelining is a technique of decomposing a sequential process into sub-processes, with each sub-process.
I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.
EEE440 Computer Architecture
Correct-by-construction asynchronous implementation of modular synchronous specifications Jacky Potop Benoît Caillaud Albert Benveniste IRISA, France.
Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.
1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
1 Presenter: Min Yu,Lo 2015/12/21 Kumar, S.; Jantsch, A.; Soininen, J.-P.; Forsell, M.; Millberg, M.; Oberg, J.; Tiensyrja, K.; Hemani, A. VLSI, 2002.
Multi-Split-Row Threshold Decoding Implementations for LDPC Codes
Reaching Agreement in the Presence of Faults M. Pease, R. Shotak and L. Lamport Sanjana Patel Dec 3, 2003.
Francine Lalooses David Lancia Arkadiusz Slanda Donald Traboini
A Survey on Interlaken Protocol for Network Applications Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan,
SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF.
07/11/2005 Register File Design and Memory Design Presentation E CSE : Introduction to Computer Architecture Slides by Gojko Babić.
03/30/031 ECE Digital System Design & Synthesis Lecture Design Partitioning for Synthesis Strategies  Partition for design reuse  Keep related.
VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 성균관대학교 조 준 동 교수
Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California,
REGISTER TRANSFER LANGUAGE (RTL) INTRODUCTION TO REGISTER Registers1.
Introduction to Communication Lecture (07) 1. Bandwidth utilization Bandwidth utilization is the wise use of available bandwidth to achieve specific goals.
+ PPP Protocol. + Outline WAN Data Link Layer protocols Point-to-point serial communications Transmission Synchronization HDLC.
Power-aware NOC Reuse on the Testing of Core-based Systems* CSCE 932 Class Presentation by Xinwang Zhang April 26, 2007 * Erika Cota, et al., International.
On Reliable Modular Testing with Vulnerable Test Access Mechanisms Lin Huang, Feng Yuan and Qiang Xu.
REGISTER TRANSFER LANGUAGE (RTL)
Francine Lalooses David Lancia Arkadiusz Slanda Donald Traboini
Introduction to cosynthesis Rabi Mahapatra CSCE617
Dynamic Packet-filtering in High-speed Networks Using NetFPGAs
Dynamically Scheduled High-level Synthesis
Clockless Logic: Asynchronous Pipelines
Mark McKelvin EE249 Embedded System Design December 03, 2002
Serial Communications
Presentation transcript:

Lab for Reliable Computing Generalized Latency-Insensitive Systems for Single-Clock and Multi-Clock Architectures Singh, M.; Theobald, M.; Design, Automation and Test in Europe Conference and Exhibition, 2004.

Lab for Reliable Computing, 2004/4 Kun-Sheng Huang 2 Reference  A methodology for correct-by-construction latency insensitive design Carloni, L.P.; McMillan, K.L.; Saldanha, A.; Sangiovanni-Vincentelli, A.L.; Computer-Aided Design, Digest of Technical Papers IEEE/ACM International Conference

Lab for Reliable Computing, 2004/4 Kun-Sheng Huang 3 Outline  Introduction  Latency-Insensitive Systems  New Approach – Part I: More Flexible Synchronous Modules  New Approach – Part II: Arbitrary Communication Network Topologies  Conclusions

Lab for Reliable Computing, 2004/4 Kun-Sheng Huang 4 Introduction  Latency-insensitive systems were originally proposed for the design of single-clock SoC’s  A synchronous module is said to be latency- insensitive if it can operate correctly in the presence of arbitrary delays on its input and output channels  Two limitations of original research: 1. Assumption that the data rates on all input and output channels of a synchronous module are identical 2. Only considers point-to-point interconnects

Lab for Reliable Computing, 2004/4 Kun-Sheng Huang 5 Latency-Insensitive Systems(1/4)  Using clock gating to stall a module whenever any of its communication channels is unavailable  Encapsulating the synchronous modules inside specially-designed “wrapper” circuits  As a result of this encapsulation, the synchronous blocks become more modular, thereby facilitating design reuse

Lab for Reliable Computing, 2004/4 Kun-Sheng Huang 6 Latency-Insensitive Systems(2/4) Figure 1. Carloni et al.’s approach to latency-insensitive design

Lab for Reliable Computing, 2004/4 Kun-Sheng Huang 7 Latency-Insensitive Systems(3/4)  Communication between the modules is achieved using point-to-point channels  The complete design flow consists of four basic steps: 1. Specification of synchronous components 2. Encapsulation 3. Physical layout, placement and routing 4. Relay station insertion

Lab for Reliable Computing, 2004/4 Kun-Sheng Huang 8 Latency-Insensitive Systems(4/4)

Lab for Reliable Computing, 2004/4 Kun-Sheng Huang 9 More Flexible Synchronous Modules  Carloni et al.’s approach uses a simplifying assumption: Every input/output channel is exercised by a module on every clock tick  Thus, may cause a significant loss of throughput by generating more stalls than necessary

Lab for Reliable Computing, 2004/4 Kun-Sheng Huang 10 Example(1/2)

Lab for Reliable Computing, 2004/4 Kun-Sheng Huang 11 Example(2/2)  Carloni et al.’s approach can be made to work in this scenario provided M1 sends nine “garbage” data values to M2  This approach may introduce additional critical paths into the system, thereby potentially causing loss of performance  Transmitting unnecessary data values is wasteful of power  Apply to the stall generation circuitry inside the wrapper circuit

Lab for Reliable Computing, 2004/4 Kun-Sheng Huang 12 Generalized Latency-Insensitive Modules(1/3) Simple combinational gate

Lab for Reliable Computing, 2004/4 Kun-Sheng Huang 13 Generalized Latency-Insensitive Modules(2/3) More sophisticated finite-state machine (FSM)

Lab for Reliable Computing, 2004/4 Kun-Sheng Huang 14 Generalized Latency-Insensitive Modules(3/3)  The generalization presented here has two key benefits 1. A significant reduction in unnecessary stalls may be obtained, since stalls are no longer caused by the unavailability of those channels that are not currently needed 2. Modules that are not currently producing needed outputs can be safely stalled, without fear of stalling their neighbors; As a result, significant savings in power consumption may be obtained

Lab for Reliable Computing, 2004/4 Kun-Sheng Huang 15 Wrapper Specification and Synthesis(1/2)

Lab for Reliable Computing, 2004/4 Kun-Sheng Huang 16 Wrapper Specification and Synthesis(2/2)  There are two interesting features of Figure 5: 1. The machine’s output g is latched by a register on the negative clock edge before being used to gate the module’s clock 2. The second feature is that the register that stores the state bits is controlled by gclock, not by the original clock (clock)

Lab for Reliable Computing, 2004/4 Kun-Sheng Huang 17 Example synchronous module (b) xx/1: the module’s clock is enabled */0: represent the remaining conditions, i.e., when the module is stalled (c) g = y’ac + ybc; Y = y’ac; S0: y=0; S1: y=1; g: the FSM output; Y: next-state value

Lab for Reliable Computing, 2004/4 Kun-Sheng Huang 18 Arbitrary Communication Network Topologies  The basic approach to latency-insensitive design assumes that all channels in the system are point-to-point channels  Augment the basic approach with arbitrary communication network topologies

Lab for Reliable Computing, 2004/4 Kun-Sheng Huang 19 Example The actual throughput obtained may even be less than the rate of the slowest module because, in addition, the slowest module may be stalled at times

Lab for Reliable Computing, 2004/4 Kun-Sheng Huang 20 Generalized Communication Network (1/3)  Using a number of specialized blocks to implement the communication network  Specialized blocks include: (i) forks, which replicate one input data stream onto multiple output channels (ii) splits, which distribute data from one input channel onto multiple output channels (iii) merges, which combine (i.e., interleave) multiple input data streams onto one output channel

Lab for Reliable Computing, 2004/4 Kun-Sheng Huang 21 Generalized Communication Network (2/3)  Three steps: 1. Specify the communication network topology, either using the specialized blocks, or using a high-level CSP- like language such as Tangram or Balsa 2. Choose between a synchronous and an asynchronous implementation; If synchronous, implement as stallable finite-state machines; If asynchronous, implement using predesigned handshake circuits available in Tangram and Balsa 3. Identify wires with long latencies. Segment them, and insert relay stations (synchronous) or FIFO handshake cells (asynchronous)

Lab for Reliable Computing, 2004/4 Kun-Sheng Huang 22 Generalized Communication Network (3/3)  The net impact of the proposed generalization of the communication network is two-fold: 1. A significantly greater degree of expressivity is offered for the specification of inter-module communication 2. The designer is offered much greater freedom to “mix-’n-match” modules of different speeds and different types of interfaces

Lab for Reliable Computing, 2004/4 Kun-Sheng Huang 23 Conclusions  The first extension allows much greater flexibility in interfacing a synchronous module with its I/O channels, thereby allowing higher system throughput through elimination of unnecessary stalls  The second extension proposes more general communication network topologies than the currently popular point-to-point interconnects  The third extension allows the handling of multiple clock domains

Lab for Reliable Computing, 2004/4 Kun-Sheng Huang 24 Relay station