Performance-oriented Peephole Optimisation of Balsa Dual-Rail Circuits

Slides:



Advertisements
Similar presentations
USTC iGEM 2007 Extensible Logic Circuit in Bacteria Aims How to implement elementary computations? How to form a more complex one?
Advertisements

Plink-O-Rama Dave Hoffman Ben Breen. Presentation Outline 1. A Review of our Proposal −Compare / Contrast:  What did we set out to do?  What have we.
NanoFabric Chang Seok Bae. nanoFabric nanoFabric : an array of connect nanoBlocks nanoBlock : logic block that can be progammed to implement Boolean function.
VHDL - I 1 Digital Systems. 2 «The designer’s guide to VHDL» Peter J. Andersen Morgan Kaufman Publisher Bring laptop with installed Xilinx.
Code optimization: –A transformation to a program to make it run faster and/or take up less space –Optimization should be safe, preserve the meaning of.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
C Chuen-Liang Chen, NTUCS&IE / 321 OPTIMIZATION Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University.
Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.
06/05/08 Biscotti: a Framework for Token-Flow based Asynchronous Systems Charlie Brej.
Clockless Logic System-Level Specification and Synthesis Ack: Tiberiu Chelcea.
1 BalsaOpt a tool for Balsa Synthesis Francisco Fernández-Nogueira, UPC (Spain) Josep Carmona, UPC (Spain)
Southampton: Oct 99AMULET3i - 1 AMULET3i - asynchronous SoC Steve Furber - n Agenda: AMULET3i Design tools Future problems.
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
CHIMAERA: A High-Performance Architecture with a Tightly-Coupled Reconfigurable Functional Unit Kynan Fraser.
Southampton: Oct 99Asynchronous Circuit Compilation- 1 AMULET3-H n Asynchronous macrocell ARM compatible processor core Full custom RAM Compiled ROM Balsa.
Multithreaded FPGA Acceleration of DNA Sequence Mapping Edward Fernandez, Walid Najjar, Stefano Lonardi, Jason Villarreal UC Riverside, Department of Computer.
1/14 A Result Forwarding Unit for a Synthesisable Asynchronous Processor Luis Tarazona and Doug Edwards Advanced Processor Technologies Group School of.
1/42 Changkun Park Title Dual mode RF CMOS Power Amplifier with transformer for polar transmitters March. 26, 2007 Changkun Park Wave Embedded Integrated.
Introduction to asynchronous circuit design: specification and synthesis Part IV: Synthesis from HDL Other synthesis paradigms.
Combining Decomposition and Unfolding for STG Synthesis (application paper) Victor Khomenko 1 and Mark Schaefer 2 1 School of Computing Science, Newcastle.
1 Design of 4- BIT ALU Swetha Challawar Anupama Bhat Leena Kulkarni Satya Kattamuri Advisor: Dr.David Parent 05/11/2005.
1/26 Performance-oriented Optimisation of Balsa Dual-Rail Circuits Luis Tarazona Advanced Processor Technologies Group School of Computer Science.
CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware
Language Issues of Compiling Ada to Hardware Michael Ward Real-Time Systems Group University of York
Lab for Reliable Computing Generalized Latency-Insensitive Systems for Single-Clock and Multi-Clock Architectures Singh, M.; Theobald, M.; Design, Automation.
ELEN468 Lecture 11 ELEN468 Advanced Logic Design Lecture 1Introduction.
A Fault-tolerant Architecture for Quantum Hamiltonian Simulation Guoming Wang Oleg Khainovski.
UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD.
Bridging the chasm between MDE and the world of compilation Nondini Das 1.
Advanced Digital Design Asynchronous EDA by A. Steininger, J. Lechner and R. Najvirt Vienna University of Technology.
COE4OI5 Engineering Design. Copyright S. Shirani 2 Course Outline Design process, design of digital hardware Programmable logic technology Altera’s UP2.
Verilog HDL: A solution for Everybody By, Anil Kumar Ram Rakhyani
StreamX10: A Stream Programming Framework on X10 Haitao Wei School of Computer Science at Huazhong University of Sci&Tech.
Languages for HW and SW Development Ondrej Cevan.
CWRU EECS 317 EECS 317 Computer Design LECTURE 1: The VHDL Adder Instructor: Francis G. Wolff Case Western Reserve University.
A User-Lever Concurrency Manager Hongsheng Lu & Kai Xiao.
Parallel architecture Technique. Pipelining Processor Pipelining is a technique of decomposing a sequential process into sub-processes, with each sub-process.
Fall 2004EE 3563 Digital Systems Design EE 3563 VHSIC Hardware Description Language  Required Reading: –These Slides –VHDL Tutorial  Very High Speed.
CISC Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware
Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick.
Introduction to VHDL Simulation … Synthesis …. The digital design process… Initial specification Block diagram Final product Circuit equations Logic design.
A Genetic Differential Amplifier: Design, Simulation, Construction and Testing Seema Nagaraj and Stephen Davies University of Toronto Edward S. Rogers.
Copyright © 2010 Houman Homayoun Houman Homayoun National Science Foundation Computing Innovation Fellow Department of Computer Science University of California.
Advanced Processor Group The School of Computer Science A Dynamic Link Allocation Router Wei Song, Doug Edwards Advanced Processor Group The University.
An Automated Development Framework for a RISC Processor with Reconfigurable Instruction Set Extensions Nikolaos Vassiliadis, George Theodoridis and Spiridon.
Slack Analysis in the System Design Loop Girish VenkataramaniCarnegie Mellon University, The MathWorks Seth C. Goldstein Carnegie Mellon University.
ICDCS 05Adaptive Counting Networks Srikanta Tirthapura Elec. And Computer Engg. Iowa State University.
A theory of reverse engineering N.Y. Louis Lee (1) & P.N. Johnson-Laird (2) (1)Department of Educational Psychology, Faculty of Education, The Chinese.
EECE 320 L8: Combinational Logic design Principles 1Chehab, AUB, 2003 EECE 320 Digital Systems Design Lecture 8: Combinational Logic Design Principles.
1 Advanced Digital Design Asynchronous Design Automation by A. Steininger and J. Lechner Vienna University of Technology.
1 Introduction to Engineering Spring 2007 Lecture 18: Digital Tools 2.
Advanced Higher Computing Science
Web: Parallel Computing Rabie A. Ramadan , PhD Web:
Synthesis from HDL Other synthesis paradigms
Asynchronous Interface Specification, Analysis and Synthesis
Synthesis of Speed Independent Circuits Based on Decomposition
Roadmap History Synchronized vs. Asynchronous overview How it works
Computational Thinking, Problem-solving and Programming: General Principals IB Computer Science.
A tutorial guide to start with ISE
SOFTWARE DESIGN AND ARCHITECTURE
Complexity Time: 2 Hours.
Part IV: Synthesis from HDL Other synthesis paradigms
Exploring Concentration and Channel Slicing in On-chip Network Router
*current controlled assessment plans are unknown
Shanna-Shaye Forbes Ben Lickly Man-Kit Leung
Simulation of computer system
Optimizing MapReduce for GPUs with Effective Shared Memory Usage
Dynamically Scheduled High-level Synthesis
Clockless Logic: Asynchronous Pipelines
Suggested Layout ** Designed to be printed on A3 paper in an assortment of colours. This is directly linked to the Computer Science Specification.
Presentation transcript:

Performance-oriented Peephole Optimisation of Balsa Dual-Rail Circuits Luis Tarazona and Doug Edwards Advanced Processor Technologies Group School of Computer Science

Syntax-directed compilation Used in Tangram and Balsa One-to-one mapping of each language construct into a network of handshake components (HCs) Benefits: Transparency and flexibility to the designer Drawback: medium-low performance Solutions to this have been proposed using: Control resynthesis Peephole optimisation

Related work Tangram and Balsa compilers perform some peephole optimisations as a post processing step T. Chelcea has proposed resynthesis and peephole optimisations for Balsa, targeting a burst-mode back-end Plana et al. have proposed some optimised HCs for Balsa targeting single rail and dual-rail back-ends Main interest of this work is on dual-rail back-ends due to its potential immunity to process variability.

The optimisations Eliminating redundant FalseVariable components New Concurrent RTZ Fetch component Conditional parallel/sequencer component: ParSeq

Eliminating redundant FVs i -> then CMD end Targets active input control Single access, single read-port FalseVariable or eagerFalseVariable HCs

Eliminating redundant FVs - Example a, b -> then o <- a + b end Latency and area reduction Preserves external behaviour

Wires-only dual-rail Fetch and its STG Concurrent RTZ Fetch Wires-only dual-rail Fetch and its STG

New concurrent RTZ Fetch and its STG

The ParSeq Acts conditionally as a Concur (parallel) or as a Sequencer HC Few opportunities to apply it in the design examples available Perhaps caused by its inexistence at that time? Interesting increase in performance, though.

Handshake circuit implementation ||

Optimised ParSeq Schematics

Simulation Results Pre-layout, transistor-level simulations, 180nm technology

Conclusions and Future Work To incorporate the optimisations into the Balsa design flow ParSeq as a construct or as a peephole optimisation? To evaluate other peephole and HCs optimisations currently under study

Thank you very much! Questions? Acknowledgement Thanks to Luis Plana, Andrew, Charlie and Will for their suggestions and comments. This work and PhD are supported by EPSCR and UoM School of Computer Science scholarships.