MPSoCSim extension: An OVP Simulator for the Evaluation of Cluster-based Multi and Many-core architectures Philipp Wehner, Jens Rettkowski and Diana Göhringer.

Slides:



Advertisements
Similar presentations
Professur für Technische Informatik A Self Distributing Virtual Machine for FPGA Multicores Klaus Waldschmidt J. W. Goethe-University Technische Informatik.
Advertisements

System-level Trade-off of Networks-on-Chip Architecture Choices Network-on-Chip System-on-Chip Group, CSE-IMM, DTU.
System Simulation Of 1000-cores Heterogeneous SoCs Shivani Raghav Embedded System Laboratory (ESL) Ecole Polytechnique Federale de Lausanne (EPFL)
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
1 Multi-Core Debug Platform for NoC-Based Systems Shan Tang and Qiang Xu EDA&Testing Laboratory.
ARTIST2 Network of Excellence on Embedded Systems Design cluster meeting –Bologna, May 22 nd, 2006 System Modelling Infrastructure Activity leader : Jan.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
Sven Ubik, Petr Žejdl CESNET TNC2008, Brugges, 19 May 2008 Passive monitoring of 10 Gb/s lines with PC hardware.
Yao Wang, Yu Wang, Jiang Xu, Huazhong Yang EE. Dept, TNList, Tsinghua University, Beijing, China Computing System Lab, Dept. of ECE Hong Kong University.
Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Simulation of Cloud Environments
CHALLENGING SCHEDULING PROBLEM IN THE FIELD OF SYSTEM DESIGN Alessio Guerri Michele Lombardi * Michela Milano DEIS, University of Bologna.
High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim Performance Analysis of TCP/IP Data.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
DiProNN Resource Management System (DiProNN = Distributed Programmable Network Node) Tomáš Rebok Faculty of Informatics MU, Brno Czech.
Of Rostock University DuDE: A D istributed Computing System u sing a D ecentralized P2P E nvironment The 4th International Workshop on Architectures, Services.
TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.
Advanced Spectrum Management in Multicell OFDMA Networks enabling Cognitive Radio Usage F. Bernardo, J. Pérez-Romero, O. Sallent, R. Agustí Radio Communications.
Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.
Computation and data migration in an embedded many-core SoC January Matthieu BRIEDA Anca MOLNOS Julien.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
Performance Analysis of a JPEG Encoder Mapped To a Virtual MPSoC-NoC Architecture Using TLM 林孟諭 Dept. of Electrical Engineering National Cheng Kung.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
Teaching The Principles Of System Design, Platform Development and Hardware Acceleration Tim Kranich
A Memory-hierarchy Conscious and Self-tunable Sorting Library To appear in 2004 International Symposium on Code Generation and Optimization (CGO ’ 04)
Multi-objective Topology Synthesis and FPGA Prototyping Framework of Application Specific Network-on-Chip m Akram Ben Ahmed Xinyu LI, Omar Hammami.
Pushpin Computing System Overview Joshua Lifton et. al Ubicomp class reading Presented by BURT.
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
1 of 14 Lab 2: Design-Space Exploration with MPARM.
POLITECNICO DI MILANO A SystemC-based methodology for the simulation of dynamically reconfigurable embedded systems Dynamic Reconfigurability in Embedded.
Fast Energy Evaluation of Embedded Applications for Many-core Systems Felipe Rosa, Luciano Ost, Thiago Raupp, Fernando Moraes, Ricardo Reis.
Research Interests  NOCs – Networks-on-Chip  Embedded Real-Time Software  Real-Time Embedded Operating Systems (RTOS)  System Level Modeling and Synthesis.
1 Scalability and Accuracy in a Large-Scale Network Emulator Nov. 12, 2003 Byung-Gon Chun.
Liana Duenha (FACOM and Unicamp) A SystemC Benchmark Suite for Evaluating MPSoC Tools and Methodologies Rodolfo Azevedo (Unicamp)
Runtime Reconfigurable Network-on- chips for FPGA-based systems Mugdha Puranik Department of Electrical and Computer Engineering
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
CLOUD ARCHITECTURE Many organizations and researchers have defined the architecture for cloud computing. Basically the whole system can be divided into.
Deterministic Communication with SpaceWire
LIGHTWEIGHT CLOUD COMPUTING FOR FAULT-TOLERANT DATA STORAGE MANAGEMENT
Simulation and load testing with TTCN-3 Mobile Node Emulator
Introduction to Load Balancing:
Distributed Network Traffic Feature Extraction for a Real-time IDS
Definition of Distributed System
Course Outline Introduction in algorithms and applications
ESE532: System-on-a-Chip Architecture
Grid Computing.
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
Map-Scan Node Accelerator for Big-Data
Azeddien M. Sllame, Amani Hasan Abdelkader
Improving java performance using Dynamic Method Migration on FPGAs
FTCS Explicit Finite Difference Method for Evaluating European Options
Advanced Operating Systems
Israel Cidon, Ran Ginosar and Avinoam Kolodny
by Manuel Saldaña, Daniel Nunes, Emanuel Ramalho, and Paul Chow
Emu: Rapid FPGA Prototyping of Network Services in C#
Department of Computer Science University of California, Santa Barbara
Milind A. Bhandarkar Adaptive MPI Milind A. Bhandarkar
Dynamic Packet-filtering in High-speed Networks Using NetFPGAs
Energy Efficient Scheduling in IoT Networks
Paraskevi Raftopoulou, Euripides G.M. Petrakis
Department of Electrical Engineering Joint work with Jiong Luo
Operating System Introduction.
Mark McKelvin EE249 Embedded System Design December 03, 2002
Maria Méndez Real, Vincent Migliore, Vianney Lapotre, Guy Gogniat
Department of Computer Science University of California, Santa Barbara
Relax and Adapt: Computing Top-k Matches to XPath Queries
Presentation transcript:

MPSoCSim extension: An OVP Simulator for the Evaluation of Cluster-based Multi and Many-core architectures Philipp Wehner, Jens Rettkowski and Diana Göhringer Ruhr-University, Bochum, Germany Maria Méndez Real, Vincent Migliore, Vianney Lapotre and Guy Gogniat Université Bretagne-Sud, Lab-STICC, Lorient, France

2 Outline Motivation MPSoCSim MPSoCSim extension Results Use case Conclusion

3 Motivation Evaluation of -Multi/Many core architectures -Clustered Network-on-Chip (NoC)-based architectures -Shared resources within clusters -Independent applications running in parallel -Very fast simulation time MPSoCSim: An Open Virtual Platform (OVP)-based simulator for the evaluation of NoC-based systems Extension of MPSoCSim in order to support -Clusters composed of several processors -Processors access private and shared resources -Processors execute different applications

4 MPSoCSim MPSoCSim [1] motivation: Design space of NoC-based systems  An adjustable SystemC NoC  Traffic generators + OVP processor models  Provides performance and communication statistics results Overview [1] “P. Wehner, et al., “MPSoCSim: An extended OVP Simulator for Modeling and Evaluation of NoC based heterogeneous MPSoCs”, in proc. of ViPES in SAMOS, 2015.

5 MPSoCSim Adjustable SystemC NoC (NoC size, NoC frequency, routing parameters) 2-D mesh supporting wormhole routing Several routing algorithms: XY, minimal west-first, adaptive west-first NoC [1] “P. Wehner, et al., “MPSoCSim: An extended OVP Simulator for Modeling and Evaluation of NoC based heterogeneous MPSoCs”, in proc. of ViPES in SAMOS, 2015.

6 MPSoCSim SystemC Transaction Level Modeling (TLM) Network Interface (NI) Memory accessible by local elements to communicate with distant nodes processors => SendData API Traffic generators + OVP processors models OVP suitable as it provides several peripheral and processor models (ARM, MIPS, Xilinx, ORK1,…) An MPSoCSim node [1] Node [1] “P. Wehner, et al., “MPSoCSim: An extended OVP Simulator for Modeling and Evaluation of NoC based heterogeneous MPSoCs”, in proc. of ViPES in SAMOS, 2015.

7 MPSoCSim MPSoCSim parameters NoC parameters -NoC size -NoC frequency -routing algorithm -delay routing -delay pass-through OVP processor Frequency, MIPS, Quantum

8 MPSoCSim Exploitation results NoC parameters OVP processor NI statistics -Number of messages received -Mean number of hops -Mean delay -Bytes received -Bytes transmitted -Max data rate received -Max data rate transmitted -… OVP results -User time -System time -Elapsed time -Simulated time -Number of simulated instructions Simulation -Simulated exec. Time (Timer module)

9 c00c01c02c03 c10c11c12c13 c20c21c22c23 c30c31c32c33 ARM µB2 µB1 MPSoCSim Matrices multiplications: ARM generates the matrices, splits the computation, sends data to µBs and collects the results Xilinx ZedBoard, 667 MHz (ARM), 100MHz (FPGA) 2x2 NoC Comparison with HW implementation a00a01a02a03 a10a11a12a13 a20a21a22a23 a30a31a32a33 b00b01b02b03 b10b11a12b13 b20b21b22b23 b30b31b32b33 µB3 [1] “P. Wehner, et al., “MPSoCSim: An extended OVP Simulator for Modeling and Evaluation of NoC based heterogeneous MPSoCs”, in proc. of ViPES in SAMOS, 2015.

10 MPSoCSim Comparison with HW implementation Deviation (sim. exec. time / exec. time on HW) from 17% down to 2,5% [1] “P. Wehner, et al., “MPSoCSim: An extended OVP Simulator for Modeling and Evaluation of NoC based heterogeneous MPSoCs”, in proc. of ViPES in SAMOS, 2015.

11 MPSoCSim extension Cluster

12 MPSoCSim extension Subgroup Shared RAM Local RAM Local bus Shared bus A cluster Local memory for code, heap and stack Shared memory for communication Adjustable/heterogeneous number of subgroups and processors

13 Evaluation Regular clusters composed of 4 µBs, one cluster composed of 1 ARM 2x2 (12 µBs + 1 ARM) and 4x4 (60 µBs + 1 ARM) clusters architecture Evaluation through matrices multiplications Evaluation of the scalability of simulated systems Debugging, detection of bottleneck… Simulation of homogeneous/heterogeneous systems Experimental protocol

14 Evaluation Simulated execution time - Scalabilité en termes de temps de simulation -Aussi on peut voir ici le moment à partir duquel ça vaut le coup de passer à une architecture plus large (lorsque le cout comm est amorti) Trade-off between computing resources and communication costs

15 Evaluation Simulated execution time + scalability Very fast simulations on the host machine for a large number of simulated instructions N.B. Results generated on an Intel Core 2 Quad Q9400, 2.66GHz frequency PC with 3.87 GB RAM (usable)

16 Evaluation Validation of execution scenarios Communication problem detection, validation of the execution scenario The µBs perform the same number of instructions, only the communication distance varies between clusters Also, NI statistics are useful for validation of execution scenarios

17 Use case Dynamic scenario A specific processor behaves as a controller of the platform executing strategies of dynamic deployment of applications: Scheduling, monitoring, mapping, resources allocation Applications R R R RR R Ctr R Many-core architecture L2 PE R R L1 PE L1 [2] “M. Méndez Real, et al., “Dynamic Spatially Isolated Secure Zones for NoC-based Many-core Accelerators”, in proc. of ReCoSoC, 2016.

18 MPSoCSim extension for evaluation of clustered NoC-based systems -Clusters composed of an adjustable number of subgroups, that can be heterogeneous between different clusters -Processor models within clusters may be heterogeneous -Local memory for code, heap and stack -Shared memory within clusters, distributed between clusters for communication -Processors may execute independent concurrent applications Future work -Comparison of a large system with the HW implementation -Evaluation on further applications/benchmarks Conclusion

19 Thank you for your attention Questions?

20 Router

21 Flits

22 Related work Overview of simulation platforms for NoC-based MPSoCs [1] SimulatorModelling language Communication infrastructure TopologyParameterizableProcessing elementsSimulation Results NirgamSystemCNoCMesh, Torus, Butterfly, etc. YesTraffic GeneratorsPerformance, power NoximSystemCNoCMeshYesTraffic GeneratorsPerformance, power BooksimC++NoCMesh, Torus, Butterfly, etc. YesTraffic GeneratorsPerformance HNoCsC++NoCMeshYesTraffic GeneratorsPerformance, power Rosa et al.SystemCBus--Traffic GeneratorsPerformance, power … MpSoCSimSystemCNoCMeshYesTraffic Generators + OVP processor models Performance A large number of NoC-based MPSoCs Each suitable for a specific type of exploration problems We searched for complex clustered NoC-based multi/many-core systems supporting processor models [1] “P. Wehner, et al., “MPSoCSim: An extended OVP Simulator for Modeling and Evaluation of NoC based heterogeneous MPSoCs”, in proc. of ViPES in SAMOS, 2015.

23 MPSoCSim Exploitation results NoC parameters OVP processor NI statistics OVP results -User time: Time spent for the execution on the host machine -System time: Spent by the host machine to execute instr. of the simulation process -Elapsed time: Simulation time from beginning to end -Simulated time: Duration of the simulation process in simulated time -Number of simulated instructions Simulation