Hybrid Prototyping of MPSoCs Samar Abdi Electrical and Computer Engineering Concordia University Montreal, Canada

Slides:



Advertisements
Similar presentations
Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.
Advertisements

Middleware Support for RDMA-based Data Transfer in Cloud Computing Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas Robertazzi Department of Electrical.
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Final Class, ECE472 Midterm #2 due today – 1-5% extra credit for written report of Dally’s video Oral presentation of class project: today Graduate students:
University of Michigan Electrical Engineering and Computer Science 1 Polymorphic Pipeline Array: A Flexible Multicore Accelerator with Virtualized Execution.
Processor history / DX/SX SX/DX Pentium 1997 Pentium MMX
© ABB Group Jun-15 Evaluation of Real-Time Operating Systems for Xilinx MicroBlaze CPU Anders Rönnholm.
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
Performed by: Moshe Emmer, Harar Meir Instructor: Alkalay Daniel Cooperated with: AE faculty המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory.
A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning Roman Lysecky, Frank Vahid* Department.
UC Berkeley 1 Time dilation in RAMP Zhangxi Tan and David Patterson Computer Science Division UC Berkeley.
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
The Xilinx EDK Toolset: Xilinx Platform Studio (XPS) Building a base system platform.
Many-Core Programming with GRAMPS Jeremy Sugerman Kayvon Fatahalian Solomon Boulos Kurt Akeley Pat Hanrahan.
CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.
Performance Analysis of Processor Characterization Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh Instructor:
1 Fast Communication for Multi – Core SOPC Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab.
Climate Machine Update David Donofrio RAMP Retreat 8/20/2008.
HS/DSL Project Yael GrossmanArik Krantz Implementation and Synthesis of a 3-Port PCI- Express Switch Supervisor: Mony Orbach.
Experiences Implementing Tinuso in gem5 Maxwell Walter, Pascal Schleuniger, Andreas Erik Hindborg, Carl Christian Kjærgaard, Nicklas Bo Jensen, Sven Karlsson.
GCSE Computing - The CPU
המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory הטכניון - מכון טכנולוגי לישראל הפקולטה להנדסת חשמל Technion - Israel institute of technology.
1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.
Lecture 7 Lecture 7: Hardware/Software Systems on the XUP Board ECE 412: Microcomputer Laboratory.
System Architecture A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Hyong-Youb Kim, Paul Willmann, Dr. Scott Rixner Rice.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Back-end Timing Models Core Models.
A Flexible Architecture for Simulation and Testing (FAST) Multiprocessor Systems John D. Davis, Lance Hammond, Kunle Olukotun Computer Systems Lab Stanford.
(1) Modeling Digital Systems © Sudhakar Yalamanchili, Georgia Institute of Technology, 2006.
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Spring 2009.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Parallel and Distributed Simulation Hardware Platforms Simulation Fundamentals.
Presenter: Zong Ze-Huang Fast and Accurate Resource Conflict Simulation for Performance Analysis of Multi- Core Systems Stattelmann, S. ; Bringmann, O.
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
A Programmable Processing Array Architecture Supporting Dynamic Task Scheduling and Module-Level Prefetching Junghee Lee *, Hyung Gyu Lee *, Soonhoi Ha.
IT253: Computer Organization
J. Christiansen, CERN - EP/MIC
Electronics Lab, Physics Dept., Aristotle Univ. of Thessaloniki, Greece 17th IEEE International Conference on Electronics, Circuits, and Systems ICECS.
TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.
LATA: A Latency and Throughput- Aware Packet Processing System Author: Jilong Kuang and Laxmi Bhuyan Publisher: DAC 2010 Presenter: Chun-Sheng Hsueh Date:
© 2012 xtUML.org Bill Chown – Mentor Graphics Model Driven Engineering.
Embedding Constraint Satisfaction using Parallel Soft-Core Processors on FPGAs Prasad Subramanian, Brandon Eames, Department of Electrical Engineering,
The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
Harmony: A Run-Time for Managing Accelerators Sponsor: LogicBlox Inc. Gregory Diamos and Sudhakar Yalamanchili.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Manifold Execution Model and System.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
Min Lee, Vishal Gupta, Karsten Schwan
Jason Li Jeremy Fowers 1. Speedups and Energy Reductions From Mapping DSP Applications on an Embedded Reconfigurable System Michalis D. Galanis, Gregory.
DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –
VAPRES A Virtual Architecture for Partially Reconfigurable Embedded Systems Presented by Joseph Antoon Abelardo Jara-Berrocal, Ann Gordon-Ross NSF Center.
Teaching The Principles Of System Design, Platform Development and Hardware Acceleration Tim Kranich
Somervill RSC 1 125/MAPLD'05 Reconfigurable Processing Module (RPM) Kevin Somervill 1 Dr. Robert Hodson 1
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
Scott Sirowy, Chen Huang, and Frank Vahid † Department of Computer Science and Engineering University of California, Riverside {ssirowy,chuang,
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
1June 9, 2006Connections 2006 FPGA-based Prototyping of the Multi-Level Computing Architecture presented by Davor Capalija Supervisor: Prof. Tarek S. Abdelrahman.
Background Computer System Architectures Computer System Software.
1 of 14 Lab 2: Design-Space Exploration with MPARM.
Runtime Reconfigurable Network-on- chips for FPGA-based systems Mugdha Puranik Department of Electrical and Computer Engineering
6. Structure of Computers
COEN 421- Embedded System and Software Design
by Manuel Saldaña, Daniel Nunes, Emanuel Ramalho, and Paul Chow
CprE 588 Embedded Computer Systems
Portable SystemC-on-a-Chip
Presentation transcript:

Hybrid Prototyping of MPSoCs Samar Abdi Electrical and Computer Engineering Concordia University Montreal, Canada

V IRTUAL VS. FPGA P ROTOTYPING Ease of debug Flexibility Scalability Speed Accuracy Ease of debug Flexibility Scalability Speed Accuracy Can we get the best of both worlds? Observations Only a few unique SW processors Heterogeniety of clock freq./memory org. Virtual Prototyping FPGA Prototyping

H YBRID P ROTOTYPING S YSTEM  Only one core instantiated in FPGA  Multicore Emulation Kernel (MEK) executes on physical core  MEK provides services of a simulation scheduler  Application task code executed directly on the target core Ease of debug Flexibility Scalability Speed Accuracy

M ULTI - CORE E MULATION K ERNEL  MEK supports discrete event simulation [DATE 2013]  Blocking waits and non-blocking notifies  Logical timestamps associated with each task  Events keep track of notification and wait times  Complex communication models built on top of discrete events  Time management  Physical time advanced by hardware time when app. tasks execute  Logical time advanced only inside MEK primitives  Task (core) state management  Task (core) context switched when a running task is blocked  Round-Robin scheduling policy used by MEK

S IMULATION ON H YBRID P ROTOTYPE Emulation of tasks on two different cores Case 1: MEK runs T1 first T1 T2 notify CS wait t 11 t 12 t 21 t 22

T1 T2 notify CS wait t 21 t 11 t 12 t 22 CS S IMULATION ON H YBRID P ROTOTYPE Emulation of tasks on two different cores Case 2: MEK runs T2 first

JPEG C ASE S TUDY  JPEG application with 5 tasks (easily pipelined)  Microblaze-based MPSoC platforms with up to 5 cores  Connected with fast simplex links (FSL)  Operating at 60 MHz [3.04mW] or 125 MHz [6.28 mW]  On-chip block RAMs (BRAMs) used for program and data  Single Microblaze used for hybrid prototyping  Total 162 designs modeled  Differentiated by number of cores, frequency and mapping DCT1Quant.ZigzagHuff.Read iterations JPEG Application MPSoC Platform 64 Core1 (MB) Core2 (MB) Core3 (MB) Core4 (MB) Core5 (MB)

R ESULTS : S IMULATION Q UALITY  Hybrid prototype enables fast, scalable and accurate simulation  ~seconds compared to hours for cycle-accurate software simulation  scales linearly with number of cores  assuming inter-core communication scales accordingly  Accuracy depends on accuracy of communication timing model  <0.001% error for JPEG compared to FPGA prototype Simulation time (ms) # cores

R ESULTS : D ESIGN S PACE E XPLORATION  Hybrid prototype enables extensive design space exploration  162 JPEG design alternatives evaluated in ~5 mins*  Full FPGA prototyping of all alternatives takes >5 hours* Execution time (ms) Energy consumption (nJ) Ideal designs * Includes FPGA synthesis time only. Simulation time is negligible.

F UTURE P LANS  Memory hierarchy  Model caches as peripherals [DSD 2013]  Swap cache context when core context changes  Dynamically scheduled tasks  Build RTOS model on top of MEK [ICCD 2012, ISQED 2013]  Posix-API to support unmodified applications  Hardware accelerators  Model using MEK primitives (similar to communication)  Implement on FPGA alongside emulation core  Asymmetric cores  Instantiate one emulation core for each core type  Maintain consistency of simulation time across cores  Looking for collaborations!!