Experiences Implementing Tinuso in gem5 Maxwell Walter, Pascal Schleuniger, Andreas Erik Hindborg, Carl Christian Kjærgaard, Nicklas Bo Jensen, Sven Karlsson.

Slides:



Advertisements
Similar presentations
RAMP Gold : An FPGA-based Architecture Simulator for Multiprocessors Zhangxi Tan, Andrew Waterman, David Patterson, Krste Asanovic Parallel Computing Lab,
Advertisements

Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.
Floating-Point FPGA (FPFPGA) Architecture and Modeling (A paper review) Jason Luu ECE University of Toronto Oct 27, 2009.
Application-Specific Customization of FPGA Soft- core Processors Journal Paper Presentation Presented by: Ahmad Sghaier Course Instructor: Dr. Shawki Areibi.
Extensible Networking Platform 1 Liquid Architecture Cycle Accurate Performance Measurement Richard Hough Phillip Jones, Scott Friedman, Roger Chamberlain,
Embedded Software Optimization for MP3 Decoder Implemented on RISC Core Yingbiao Yao, Qingdong Yao, Peng Liu, Zhibin Xiao Zhejiang University Information.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Application-Specific Customization of Parameterized FPGA Soft-Core Processors David Sheldon a, Rakesh Kumar b, Roman Lysecky c, Frank Vahid a*, Dean Tullsen.
A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning Roman Lysecky, Frank Vahid* Department.
Platforms, ASIPs and LISATek Federico Angiolini DEIS Università di Bologna.
Source Code Optimization and Profiling of Energy Consumption in Embedded System Simunic, T.; Benini, L.; De Micheli, G.; Hans, M.; Proceedings on The 13th.
Configurable System-on-Chip: Xilinx EDK
CS252 Project Presentation Optimizing the Leon Soft Core Marghoob Mohiyuddin Zhangxi TanAlex Elium Dept. of EECS University of California, Berkeley.
Enhancing Embedded Processors with Specific Instruction Set Extensions for Network Applications A. Chormoviti, N. Vassiliadis, G. Theodoridis, S. Nikolaidis.
Climate Machine Update David Donofrio RAMP Retreat 8/20/2008.
1 Chapter 14 Embedded Processing Cores. 2 Overview RISC: Reduced Instruction Set Computer RISC-based processor: PowerPC, ARM and MIPS The embedded processor.
Compiler Optimization-Space Exploration Adrian Pop IDA/PELAB Authors Spyridon Triantafyllis, Manish Vachharajani, Neil Vachharajani, David.
Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)
Implementation of DSP Algorithm on SoC. Characterization presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompany engineer : Emilia Burlak.
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
HW/SW CODESIGN OF THE MPEG-2 VIDEO DECODER Matjaz Verderber, Andrej Zemva, Andrej Trost University of Ljubljana Faculty of Electrical Engineering Trzaska.
Students: Oleg Korenev Eugene Reznik Supervisor: Rolf Hilgendorf
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
Ross Brennan On the Introduction of Reconfigurable Hardware into Computer Architecture Education Ross Brennan
Content Project Goals. Term A Goals. Quick Overview of Term A Goals. Term B Goals. Gantt Chart. Requests.
Trigger design engineering tools. Data flow analysis Data flow analysis through the entire Trigger Processor allow us to refine the optimal architecture.
B212/MAPLD 2005 Craven1 Configurable Soft Processor Arrays Using the OpenFire Processor Stephen Craven Cameron Patterson Peter Athanas Configurable Computing.
Operating Systems for Reconfigurable Systems John Huisman ID:
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
ASIP Architecture for Future Wireless Systems: Flexibility and Customization Joseph Cavallaro and Predrag Radosavljevic Rice University Center for Multimedia.
© 2007 Xilinx, Inc. All Rights Reserved This material exempt per Department of Commerce license exception TSU Hardware Design INF3430 MicroBlaze 7.1.
SPREE RTL Generator RTL Simulator RTL CAD Flow 3. Area 4. Frequency 5. Power Correctness1. 2. Cycle count SPREE Benchmarks Verilog Results 3. Architecture.
Configurable, reconfigurable, and run-time reconfigurable computing.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Hybrid Prototyping of MPSoCs Samar Abdi Electrical and Computer Engineering Concordia University Montreal, Canada
TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.
Physical Design of FabScalar Generated Superscalar Processors EE6052 Class Project Wei Zhang.
ISSS 2001, Montréal1 ISSS’01 S.Derrien, S.Rajopadhye, S.Sur-Kolay* IRISA France *ISI calcutta Combined Instruction and Loop Level Parallelism for Regular.
MILAN: Technical Overview October 2, 2002 Akos Ledeczi MILAN Workshop Institute for Software Integrated.
The Microarchitecture of FPGA-Based Soft Processors Peter Yiannacouras CARG - June 14, 2005.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
An FX software correlator for VLBI Adam Deller Swinburne University Australia Telescope National Facility (ATNF)
Combining Software and Hardware Monitoring for Improved Power and Performance Tuning Eric Chi, A. Michael Salem, and R. Iris Bahar Brown University Division.
CISC Machine Learning for Solving Systems Problems Microarchitecture Design Space Exploration Lecture 4 John Cavazos Dept of Computer & Information.
Content Project Goals. Workflow Background. System configuration. Working environment. System simulation. System synthesis. Benchmark. Multicore.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
CML Path Selection based Branching for CGRAs ShriHari RajendranRadhika Thesis Committee : Prof. Aviral Shrivastava (Chair) Prof. Jennifer Blain Christen.
Onchip Interconnect Exploration for Multicore Processors Utilizing FPGAs Graham Schelle and Dirk Grunwald University of Colorado at Boulder.
Continuous Flow Multithreading on FPGA Gilad Tsoran & Benny Fellman Supervised by Dr. Shahar Kvatinsky Bsc. Winter 2014 Final Presentation March 1 st,
Architectural Effects on DSP Algorithms and Optimizations Sajal Dogra Ritesh Rathore.
Application-Specific Customization of Soft Processor Microarchitecture Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Electrical.
Quantifying Acceleration: Power/Performance Trade-Offs of Application Kernels in Hardware WU DI NOV. 3, 2015.
Implementing RISC Multi Core Processor Using HLS Language - BLUESPEC Liam Wigdor Instructor Mony Orbach Shirel Josef Semesterial Winter 2013.
Automated Software Generation and Hardware Coprocessor Synthesis for Data Adaptable Reconfigurable Systems Andrew Milakovich, Vijay Shankar Gopinath, Roman.
Presenter: Darshika G. Perera Assistant Professor
New Opportunities for Computer Architecture Research Using High-Density FPGAs and Design Tools Nahi Abdul-Ghani, Patrick Akl, Mohammad El-Majzoub, Maroulla.
Andreas Hoffmann Andreas Ropers Tim Kogel Stefan Pees Prof
Dynamo: A Runtime Codesign Environment
Ph.D. in Computer Science
Design-Space Exploration
High-throughput Online Hash Table on FPGA
Application-Specific Customization of Soft Processor Microarchitecture
Head-to-Head Xilinx Virtex-II Pro Altera Stratix 1.5v 130nm copper
FPGAs in AWS and First Use Cases, Kees Vissers
Matlab as a Development Environment for FPGA Design
Computer Architecture: A Science of Tradeoffs
Application-Specific Customization of Soft Processor Microarchitecture
Presentation transcript:

Experiences Implementing Tinuso in gem5 Maxwell Walter, Pascal Schleuniger, Andreas Erik Hindborg, Carl Christian Kjærgaard, Nicklas Bo Jensen, Sven Karlsson Technical University of Denmark

14/06/2015Maxwell Walter2 DTU Compute, Technical University of Denmark Motivation We have developed the Tinuso architecture –For multi-core research –Targeted for FPGAs Application dependent accelerators are important for multi-core research Software/hardware co-design is difficult!

14/06/2015Maxwell Walter3 DTU Compute, Technical University of Denmark Motivation We have developed the Tinuso architecture –For multi-core research Application dependent accelerators are important for multi-cores Software/hardware co-design is difficult! app

14/06/2015Maxwell Walter4 DTU Compute, Technical University of Denmark Motivation We have developed the Tinuso architecture –For multi-core research Application dependent accelerators are important for multi-cores Software/hardware co-design is difficult! So we would like to do it automatically app parameters toolchain evaluate feedback

14/06/2015Maxwell Walter5 DTU Compute, Technical University of Denmark Contributions Implementation of the Tinuso processor architecture in gem5 Discussion of gem5 and designing application specific accelerators

14/06/2015Maxwell Walter6 DTU Compute, Technical University of Denmark Outline: Motivation Contributions Tinuso Architecture Gem5 Implementation Design Space Exploration Conclusions

14/06/2015Maxwell Walter7 DTU Compute, Technical University of Denmark Tinuso Philosophy: move complexity to software –Predicated execution to lower branch costs –Very fast 8 stage pipeline –No pipeline interlocking; Compiler must produce a valid schedule GCC 4.9 toolchain Designed for FPGA synthesis Will be released as open source Small and fast TinusoMicroBlaze 376 MHz 194 MHz 1322 LUTs 2024 LUTs

14/06/2015Maxwell Walter8 DTU Compute, Technical University of Denmark Gem5 Implementation Instruction Predication –Easily handled in the instruction decoder Configurable branch delay slots –New PCState with counter and NNPC Instruction delay slots for compiler validation –Tracked by the Decoder –Validated at instruction decode

14/06/2015Maxwell Walter9 DTU Compute, Technical University of Denmark Gem5 Implementation Instruction Predication –Easily handled in the instruction decoder Configurable branch delay slots –New PCState with counter and NNPC Instruction delay slots for compiler validation –Tracked by the ISA/Decoder –Validated at instruction decode Gem5 implementation was easy and painless –A good fit into our workflow

14/06/2015Maxwell Walter10 DTU Compute, Technical University of Denmark Gem5 In Our Workflow RTL simulator validation –Simulator built directly from VHDL sources Toolchain validation TestRTL TimeGem5 Time memcpy-chk.x16.47s3.5s memmove.x421.78s3.7s

14/06/2015Maxwell Walter11 DTU Compute, Technical University of Denmark Design Space Exploration Tinuso is intended for multi-core accelerator systems –Easily configured for specific applications Many configuration parameters –ISA, cache sizes, pipeline depth, #of cores

14/06/2015Maxwell Walter12 DTU Compute, Technical University of Denmark Design Space Exploration Tinuso is intended for multi-core accelerator systems –Easily configured for specific applications Many configuration parameters

14/06/2015Maxwell Walter13 DTU Compute, Technical University of Denmark Tinuso multicore systems PE NI R PE NI R PE NI R PE NI R PE NI R PE NI R PE NI R PE NI R PE NI R PE NI R Barrel shifter Multiplier FPU instructions Profiling infrastructure Cache sizes Pipeline depth Barrel shifter Multiplier FPU instructions Profiling infrastructure Cache sizes Pipeline depth Data Link width Arbitration scheme Data Link width Arbitration scheme PE NI R PE NI R PE NI R PE NI R PE NI R Up to 480 processor cores on Xilinx Virtex- 7 device synthesizable processor cores packet switched 2D mesh interconnect

14/06/2015Maxwell Walter14 DTU Compute, Technical University of Denmark Design Space Exploration Tinuso is intended for multi-core accelerator systems –Easily configured for specific applications Many configuration parameters –ISA, cache sizes, pipeline depth, #of cores Changing parameters manually is tedious and can be error prone Effective searching requires fast simulation

14/06/2015Maxwell Walter15 DTU Compute, Technical University of Denmark Design Space Exploration Use gem5 for quick performance estimation –Can help direct the performance optimization Use more accurate tools, like Vivado, for power estimation and resource usage app parameters toolchain feedback

14/06/2015Maxwell Walter16 DTU Compute, Technical University of Denmark Conclusions We have implemented the Tinuso architecture in gem5 –It was an easy and painless process The Tinuso gem5 implementation is useful for a number of workflow considerations We leverage gem5 for design space exploration of custom multi-core accelerators