ROUTING ARCHITECTURE AND ALGORITHMS FOR A SUPERCONDUCTIVITY CIRCUITS-BASED COMPUTING HARDWARE Farhad Mehdipour, Hiroaki Honda, Hiroshi Kataoka, Koji Inoue,

Slides:



Advertisements
Similar presentations
Digital System Design Subject Name : Digital System Design Course Code : IT-314.
Advertisements

Performance Evaluations of Finite Difference Applications Realized on a Single Flux Quantum Circuits-Based Reconfigurable Accelerator Hiroaki Honda 1,
1 An Accelerator Based on Single-Flux Quantum Circuits for a High-Performance Reconfigurable Computer F. Mehdipour*, Hiroaki Honda**, H. Kataoka*, K. Inoue*
Kyushu University KL, Malaysia Hardware and Software Requirements for Implementing a High-Performance Superconductivity Circuits-Based Accelerator Farhad.
EECE579: Digital Design Flows
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
University of Michigan Electrical Engineering and Computer Science 1 Reducing Control Power in CGRAs with Token Flow Hyunchul Park, Yongjun Park, and Scott.
Term Project Overview Yong Wang. Introduction Goal –familiarize with the design and implementation of a simple pipelined RISC processor What to do –Build.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
A High Performance Application Representation for Reconfigurable Systems Wenrui GongGang WangRyan Kastner Department of Electrical and Computer Engineering.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Physical Design Outline –What is Physical Design –Design Methods –Design Styles –Analysis and Verification Goal –Understand physical design topics Reading.
Dynamically Reconfigurable Architectures: An Overview Juanjo Noguera Dept. Computer Architecture (DAC-UPC)
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
Dynamic Hardware Software Partitioning A First Approach Komal Kasat Nalini Kumar Gaurav Chitroda.
A Thermal-Aware Mapping Algorithm for Reducing Peak Temperature of an Accelerator Deployed in a 3D Stack A Thermal-Aware Mapping Algorithm for Reducing.
Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# ENG*6530 Tues, June, 10,
COE4OI5 Engineering Design. Copyright S. Shirani 2 Course Outline Design process, design of digital hardware Programmable logic technology Altera’s UP2.
1 Nios II Processor Architecture and Programming CEG 4131 Computer Architecture III Miodrag Bolic.
Paper Review: XiSystem - A Reconfigurable Processor and System
ASIC/FPGA design flow. FPGA Design Flow Detailed (RTL) Design Detailed (RTL) Design Ideas (Specifications) Design Ideas (Specifications) Device Programming.
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
Automated Design of Custom Architecture Tulika Mitra
1 Rapid Estimation of Power Consumption for Hybrid FPGAs Chun Hok Ho 1, Philip Leong 2, Wayne Luk 1, Steve Wilton 3 1 Department of Computing, Imperial.
Speculative Software Management of Datapath-width for Energy Optimization G. Pokam, O. Rochecouste, A. Seznec, and F. Bodin IRISA, Campus de Beaulieu
Efficient Mapping onto Coarse-Grained Reconfigurable Architectures using Graph Drawing based Algorithm Jonghee Yoon, Aviral Shrivastava *, Minwook Ahn,
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
A Combined Analytical and Simulation-Based Model for Performance Evaluation of a Reconfigurable Instruction Set Processor Farhad Mehdipour, H. Noori, B.
Kyushu University Koji Inoue ICECS'061 Supporting A Dynamic Program Signature: An Intrusion Detection Framework for Microprocessors Koji Inoue Department.
Los Alamos National Lab Streams-C Maya Gokhale, Janette Frigo, Christine Ahrens, Marc Popkin- Paine Los Alamos National Laboratory Janice M. Stone Stone.
L11: Lower Power High Level Synthesis(2) 성균관대학교 조 준 동 교수
Field Programmable Gate Arrays (FPGAs) An Enabling Technology.
HYPER: An Interactive Synthesis Environment for Real Time Applications Introduction to High Level Synthesis EE690 Presentation Sanjeev Gunawardena March.
CML REGISTER FILE ORGANIZATION FOR COARSE GRAINED RECONFIGURABLE ARCHITECTURES (CGRAs) Dipal Saluja Compiler Microarchitecture Lab, Arizona State University,
An Integrated Temporal Partitioning and Mapping Framework for Handling Custom Instructions on a Reconfigurable Functional Unit Farhad Mehdipour †, Hamid.
EE3A1 Computer Hardware and Digital Design
Implementation and Evaluation of Fock Matrix Calculation Program on the Cell Processor Hiroaki Honda a), Tetsuo Hayashi b), Yuichi Inadomi a), Koji Inoue.
A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator Farhad Mehdipour, Hamid Noori, Hiroaki Honda, Koji Inoue, Kazuaki.
COARSE GRAINED RECONFIGURABLE ARCHITECTURES 04/18/2014 Aditi Sharma Dhiraj Chaudhary Pruthvi Gowda Rachana Raj Sunku DAY
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
DEVICES AND DESIGN : ASIC. DEFINITION Any IC other than a general purpose IC which contains the functionality of thousands of gates is usually called.
Design Space Exploration for a Coarse Grain Accelerator Farhad Mehdipour, Hamid Noori, Morteza Saheb Zamani*, Koji Inoue, Kazuaki Murakami Kyushu University,
Developing an Architecture for a Single-Flux Quantum Based Reconfigurable Accelerator F. Mehdipour, Hiroaki Honda *, H. Kataoka, K. Inoue and K. Murakami.
Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki.
Optimizing the Architecture of SFQ-RDP (Single Flux Quantum- Reconfigurable Datapath) F. Mehdipour*, Hiroaki Honda **, H. Kataoka*, K. Inoue* and K. Murakami*
High Performance, Low Power Reconfigurable Processor for Embedded Systems Farhad Mehdipour, Hamid Noori, Koji Inoue, Kazuaki Murakami Kyushu University,
Greg Alkire/Brian Smith 197 MAPLD An Ultra Low Power Reconfigurable Task Processor for Space Brian Smith, Greg Alkire – PicoDyne Inc. Wes Powell.
1 November 11, 2015 A Massively Parallel, Hybrid Dataflow/von Neumann Architecture Yoav Etsion November 11, 2015.
Let’s Open Up New Fields for Next 10X! Koji Inoue Kyushu University, Japan
WARP PROCESSORS ROMAN LYSECKY GREG STITT FRANK VAHID Presented by: Xin Guan Mar. 17, 2010.
ADAPTIVE CACHE-LINE SIZE MANAGEMENT ON 3D INTEGRATED MICROPROCESSORS Takatsugu Ono, Koji Inoue and Kazuaki Murakami Kyushu University, Japan ISOCC 2009.
CBP 2002ITY 270 Computer Architecture1 Module Structure Whirlwind Review – Fetch-Execute Simulation Instruction Set Architectures RISC vs x86 How to build.
1 Field-programmable Gate Array Architectures and Algorithms Optimized for Implementing Datapath Circuits Andy Gean Ye University of Toronto.
PARBIT Tool 1 PARBIT Partial Bitfile Configuration Tool Edson L. Horta Washington University, Applied Research Lab August 15, 2001.
CML Path Selection based Branching for CGRAs ShriHari RajendranRadhika Thesis Committee : Prof. Aviral Shrivastava (Chair) Prof. Jennifer Blain Christen.
ASIC/FPGA design flow. Design Flow Detailed Design Detailed Design Ideas Design Ideas Device Programming Device Programming Timing Simulation Timing Simulation.
Los Alamos National Laboratory Streams-C Maya Gokhale Los Alamos National Laboratory September, 1999.
Robust Low Power VLSI R obust L ow P ower VLSI Using Module Compiler to build FPGA Structures Seyi Ayorinde ECE 6505.
Microprocessor Design Process
1 Architecture of Datapath- oriented Coarse-grain Logic and Routing for FPGAs Andy Ye, Jonathan Rose, David Lewis Department of Electrical and Computer.
Introduction to the FPGA and Labs
ASIC Design Methodology
Hamid Noori*, Farhad Mehdipour†, Norifumi Yoshimastu‡,
Dynamically Reconfigurable Architectures: An Overview
Hyunchul Park, Kevin Fan, Manjunath Kudlur,Scott Mahlke
Masamitsu Tanaka, Nagoya Univ.
A High Performance SoC: PkunityTM
A New Design Approach for High-Throughput Arithmetic Circuits for Single-Flux-Quantum Microprocessors Masamitsu Tanaka, Nagoya Univ., JSPS Co-workers:
Course Outline for Computer Architecture
Mapping DSP algorithms to a general purpose out-of-order processor
Presentation transcript:

ROUTING ARCHITECTURE AND ALGORITHMS FOR A SUPERCONDUCTIVITY CIRCUITS-BASED COMPUTING HARDWARE Farhad Mehdipour, Hiroaki Honda, Hiroshi Kataoka, Koji Inoue, Kazuaki Murakami Kyushu University, Japan CCECE 2011

CREST-JST (2006~): Low-power, high-performance, reconfigurable processor using single-flux quantum (SFQ) circuits SFQ-LSRDP K. Murakami K. Inoue H. Honda F. Mehdipour H. Kataoka K. Murakami K. Inoue H. Honda F. Mehdipour H. Kataoka Kyushu Univ. Architecture, Compiler and Applications Kyushu Univ. Architecture, Compiler and Applications S. Nagasawa et al. Superconducting Research Lab. (SRL) SFQ process Superconducting Research Lab. (SRL) SFQ process N. Yoshikawa et al. Yokohama National Univ. SFQ-FPU chip, cell library Yokohama National Univ. SFQ-FPU chip, cell library A. Fujimaki et al. Nagoya Univ. SFQ-RDP chip, cell library, and wiring Nagoya Univ. SFQ-RDP chip, cell library, and wiring N. Takagi (Leader) et al. N. Takagi (Leader) et al. Nagoya Univ. CAD for logic design and arithmetic circuits Nagoya Univ. CAD for logic design and arithmetic circuits Our mission: Architecture, compiler and application development 2

Outline of Large-Scale Reconfigurable Data-Path (LSRDP) Processor 3 SFQ Features: High-speed switching and signal transmission Low power consumption Compact implementation (smaller area) Suitable for pipeline processing SFQ Features: High-speed switching and signal transmission Low power consumption Compact implementation (smaller area) Suitable for pipeline processing

… … … … … … Buffers LSRDP Memory inst; … conf_LSRDP ( ); Loop: rearrange_input_data ( ); set_IO_info ( ); run_LSRDP ( ); inst; … sync_lsrdp ( ); rearrange_output_data ( ); End_Loop inst; … inst conf_LSRDP(); conf. bit-stream … … … … rearrange_input_data () GPP Memory Controller set_IO_info ( ); Memory Controller … … … … … … … … … … … run_LSRDP ( );inst sync_lsrdp ( ); GPP Waiting for the LSRDP LSRDP terminating the operation rearrange_output_data ( ) GPP How it works 4

Architecture Exploration FUTU PE arch. I 4-inps/3-outs FU TU PE arch. II 3-inps/3-outs TU FUTU Basic PE arch. 3-inps/2-outs MCL= 1 Number of rows = 1.5×M Number of columns = 4×MCL Number of rows = 2×M Number of columns = 6×MCL+2 MCL= 1 Number of rows = 1.5×M Number of columns = 4×MCL+1 MCL= 2 LSRDP Layouts PE structures ORN structures 5

LSRDP Tool Chain Application C code Application C code 1 Modified application code Modified application code 2 Modifying application code Inserting LSRDP instructions in the code Modifying application code Inserting LSRDP instructions in the code 1 ISAcc or COINS compiler 2 DFG Extraction 1.asm code for MIPS-based GPP.asm code for MIPS-based GPP 2 Data flow graphs Placing and Routing Tool 2 Configuration file + various text & schematic reports Configuration file + various text & schematic reports 1 LSRDP library file Function definitions & declarations 1 LSRDP architecture description 2 1: flow of the assembly code generation for GPP 2: flow of configuration bit-stream generation for the LSRDP 1: flow of the assembly code generation for GPP 2: flow of configuration bit-stream generation for the LSRDP Simulator Performance evaluation Simulator Performance evaluation 6

Mapping DFGs onto LSRDP 7 Longest connections DFG LSRDP Architecture Description LSRDP Architecture Description Placing Input Nodes Placing Operational & Output Nodes Placing Operational & Output Nodes Routing Nets Routing IO Nets Final Map

Global routing algorithms src dest src dest vacant fully- occupied exhaustive search-based very time consuming exhaustive search-based very time consuming branch and bound alg. Very fast branch and bound alg. Very fast Routing DFG connections between source and destination PEs 8

Micro-Routing-Problem Definition Inputs – LSRDP basic specifications Layout, Width (W), MCL, PE arch., and etc. List of connections b/w consecutive rows – ORN structure including The number of CBs and T2s in each row The number of CB rows Topology of connections among CBs Output – Detailed routes via cross-bar switches The list of CBs used for routing each connection Configuration of CBs FUT T T T … T T T T … ORN i-th row (i+1)-th row A micro-routing algorithm has been implemented for the LSRDP with underlying layout II and PE arch. III

ORN Micro-routing CB ½ CB (PE1  PE 5) (PE2  PE5, PE6, PE7) (PE3  PE6, PE8 ) (PE4  PE7, PE8) (PE1  PE 5) (PE2  PE5, PE6, PE7) (PE3  PE6, PE8 ) (PE4  PE7, PE8) 1/2CB: 1-input/2-ouput CB: 2-input/2-output Micro-nets Example 10 PE 1 PE 2 PE 3 PE 5 PE 6 PE 7 PE 4 PE 8 ½ CB CB (CB) CB

… … … … PEs in 3 rd Row PEs in 4 th row ORN Micro-Routing Example: Heat 8x2- ORN b/w 3rd and 4th Rows

Specifications of Attempted DFGs total # of nodes # of Inputs# of outputs# of ops Heat-8x Heat-8x Heat-16x Poisson-3x Vibration-4x Vibration-8x Vibration-8x ERI ERI CCECE

Example of a DFG Mapping Vibration- 8x2 CCECE

Results of routing nets using the proposed algorithms DFGavg. hor. C.L. avg./max. ver. C.L. # of global/micro nets to route Time to map (sec) Heat-8x /336/ Heat-8x /5 68/ Heat-16x /7 204/ Poisson-3x /16 67/ Vibration-4x /9 50/ Vibration-8x /10 154/ Vibration-8x /16 348/ ERI /9 111/ ERI /9 95/ CCECE

Thank you for your attention!