Presentation is loading. Please wait.

Presentation is loading. Please wait.

Liana Duenha (FACOM and Unicamp) A SystemC Benchmark Suite for Evaluating MPSoC Tools and Methodologies Rodolfo Azevedo (Unicamp)

Similar presentations


Presentation on theme: "Liana Duenha (FACOM and Unicamp) A SystemC Benchmark Suite for Evaluating MPSoC Tools and Methodologies Rodolfo Azevedo (Unicamp)"— Presentation transcript:

1 Liana Duenha (FACOM and Unicamp) A SystemC Benchmark Suite for Evaluating MPSoC Tools and Methodologies Rodolfo Azevedo (Unicamp)

2 2 Outline Motivations and Goals Simulation Infrastructure How to use the benchmark Characterization

3 3 SystemC Motivation The complexity of Multiprocessor System-on-Chip (MPSoC) designs forces designers into an even higher level design methodology. Credits: http://www.arm.com/markets/mobile/smartphones.php http://www.sonymobile.com/br/products/phones/xperia-play/

4 4 Motivation - New Challenges The adoption of Multiprocessor System-on-Chip (MPSoCs) in the embedded systems state of the art Hardware & Software - Design Productivity By providing a software development platform before the final MPSoC architecture details are fixed - Requirements The lack of a benchmark suite to assist validation and evaluation of new techniques and tools causes delays in the development tools life cycle.

5 5 Our Goal is... … to provide a complete SystemC simulation infrastructure in a hardware/software multiprocessor environment in order to facilitate the deployment, analysis, and verification of new concepts, new tools, and methodologies in MPSoC designs.

6 6 An open source scalable set of MPSoCs, with 1 up to 64 cores Four different processor models: ARM, PowerPC, SPARC, and MIPS Ips, Interconnection and devices using TLM 2.0 Different abstraction levels Power Characterization for MIPS and SPARC 17 parallel application, including a POSIX thread emulation library 864 configurations – automated by scripts The MPSoCBench has... http://archc.org/benchs/mpsocbench/

7 7 Infrastructure Interconnection Memory Core 1 Core 2 Core 64 … 1, 2, 4, 8, 16, 32, or 64 cores A shared memory (pre-configure with 512MB) A hardware lock device Different interconnections lock

8 8 ArchC Processor Models Interconnection Memory Core 1 Core 2 Core 64 … lock MIPS and SPARC include Power Consumption Estimates ArchC is a SystemC-based Architecture Description Language PowerPC MIPS SPARC ARM

9 9 Design using a Router Router Memory Core 1 Core 2 Core n …  The communication is achieved by TLM 2 blocking transport interface with timing annotation lock Loosely Timed

10 10 PPPP WWWW... Design using a NoC N 0,0 N 0,1 N 0,0 N 0,2 N 0,8... N 1,1 N 1,0 N 1,2 N 1,8 N 0,0 N 2,1 N 2,0 N 2,2 N 2,8... N 7,1 N 7,0 N 7,2 N 7,8... Ni,j: noc nodes W: wrappers P: processors or IPs Mesh based NoC using XY routing protocol NoC is totally configurable in runtime through user parameters Different approaches: NoC-LT: Loosely Timed NoC-AT: Approximately timed

11 11 NOC NODE (sc_thread) NOC NODE (sc_thread) simple_init_socket simple_target_socket N S W E LOCAL SOCKETs Initiator (core) (sc_thread) Initiator (core) (sc_thread) Target NOC AT Forward and backward transport nb_fw_transport nb_bw_transport One sc_thread per core One sc_thread per NoC node NOC Approximately timed NOC-AT Wrapper

12 12 Design using a NOC N N Wrapper Initiator (sc_thread) Target - Creates generic payload and send a request using forward path simple_init_socket simple_target_socket - wait (answer) - Creates a extension with route information - Puts the package into the NOC N N N N N N N N N N N N N N N N... - XY Protocol Wrapper - Clear extension Non blocking Forward path Target node

13 13 Design using a NOC N N Wrapper Initiator (sc_thread) Target simple_init_socket simple_target_socket - answer.notify() - Clear extension - Puts the package into the NOC N N N N N N N N N N N N N N N N... - XY Protocol Wrapper Non blocking Backward path b_transport - Creates a extension with route information

14 14 7 From ParMiBench 4 From SPLASH-2 5 Multisoftware Application composed of a set of single- core programs from Mibench 1 Multisoftware Parallel Applications composed of 4 applications from ParMiBench combined in different multithreaded versions (1-thread, 2-threads, 4-threads, 8- threads, and 16-threads each) 17 Parallel Applications

15 15./mpsocbench How To Use Build and run $./MPSoCbench -r -p=mips -s=fft -pw -i=noc.at -n=64 Software FFT 64-mips platforms With power consumption Using NoC-AT Build (without running) -b -p=all -s=all -i=router -n=16 All software All 16-core Platforms (with all processor models) Using router (default LT) $./MPSoCBench

16 16./mpsocbench How To Use build $./MPSoCBench -b -p=all -s=all -i=all -n=all all software all platforms all inter- connection -c enabling execution in a condor cluster This command line will create a directory for each platform, including all executable files and input files required for parallel execution on a cluster.

17 17 Number of Instructions executed in single-core platforms using the four processor models Applications with a higher computational load Applications with a lower computational load

18 18 How To Use the Reports SystemC 2.3.0-ASI --- Apr 20 2013 11:53:51 Copyright (c) 1996-2012 by all Contributors, ALL RIGHTS RESERVED ArchC: Reading ELF application file: susanedges.powerpc.x ArchC: -------------------- Starting Simulation --------------- ArchC: -------------------- Starting Simulation ---------------- -------------------------------------------------------------------- ------------------------- MPSoCBench ------------------------- --------------------- Running: susanedges ---------------------- --------------- The results will be available in ----------------- --------------------- the output.pgm file ------------------------- Total Time Taken (seconds):63.620517 Simulation advance (seconds):0.133877 MPSoCBench: Ending the time simulation measurement. ArchC: Simulation statistics Times: 64.50 user, 0.16 system, 63.63 real Number of instructions executed: 66519093 Simulation speed: 1031.30 K instr/s ArchC: Simulation statistics Times: 64.50 user, 0.16 system, 63.63 real Number of instructions executed: 66792904 Simulation speed: 1035.55 K instr/s Platform./platform.router.lt.x with 4 cores running susanedges.powerpc.x Total Time Taken (seconds):63.620517 Simulation advance (seconds):0.133877 Lock Access:3825 Router Access:74357076 Memory Reads:57192219 Memory Writes:173 2

19 19 Application Output Example (susan-edges)

20 20 Memory access (read/write) Number of memory accesses on a single-powerpc platforms.

21 21 Number of lock accesses All processors – 1 to 64 cores

22 22 Simulation Time Multi-PowerPC platforms (1 to 16 cores) Strong scaling applications Weak scaling applications

23 23 Simulation Time Comparition among Router,NOC-LT and NOC-AT Multi-PowerPC running Dijkstra using Router, NOC-LT, and NOC-AT Simulation Time Comparition among Router,NOC-LT and NOC-AT Simulation Time Comparison among Router x NOC-LT x NOC-AT Multi-PowerPC running Dijkstra using Router, NOC-LT and NOC-AT as interconnection device

24 24 Power measurements Dual-mips running FFT

25 25 Power measurements Quad mips running Basicmath

26 26 Power measurements dual-SPARC running Dijkstra

27 27 Power measurements dual-SPARC running Dijkstra

28 28 We provide a library which emulates the main functionalities of the POSIX Pthread for running well known parallel software in our infrastructure. To adapt a parallel software that uses Pthreads for running in the MPSoCBench, we need: To Include our library insted the pthread.h library To ensure the previos execution of a particular start code (our specific main funcion) before the application main function POSIX Pthread emulation

29 29 In development: – NoC RTL (PucRS) – NoC Power Consumption Model – Cache Open issues: – Heterogeneous platforms (ARM big.LITTLE ?) – Distributed memories – Clusterization – AcMPI Some issues

30 30 Conclusions  We've proposed the MPSoCBench, an open-source benchmark composed of a scalable, configurable and extensible set of MPSoCs  Available in two ways:  a virtual machine with all infrastructure ready for use;  a source code Ready for your research and evaluation! The benchmark and tutorials at: http://archc.org/benchs/mpsocbench/

31 31 Liana Duenha liana.duenha@lsc.ic.unicamp.br lianaduenha@gmail.com A SystemC Benchmark Suite for Evaluating MPSoC Tools and Methodologies Thank you! Do you have any questions?

32 32 BACKUP

33 33 New tools or methodologies in MPSoC design Design refinements for lower abstraction levels Evaluation of different techniques for parallelization and scalability characterization Analysis and optimization of new hardware components Comparisons among different techniques for power consumption measurement Main Goals

34 34 What is happening in each node tlm_node::nb_transport_fw (…) { addToBuffer (payload,phase,timeInfo); this->wake_up.notify(); phase = tlm::END_REQ; return tlm::TLM_UPDATED; } tlm_node::threadNode (…) { while(true){ while (getNumberOfPackagesInBuffer() != 0) { removeFromBuffer(...); // it routes the package //... wait(1,SC_NS); } wait(wake_up); } tlm_node::nb_transport_bw (…) { addToBuffer (payload,phase,timeInfo); this->wake_up.notify(); phase = tlm::END_RESP; return tlm::TLM_COMPLETED; } Forward path Backward path

35 35 13 Parallel Applications 7 From ParMiBench: – Basicmath, Dijkstra, SHA, Stringsearch, Susan corners, Susan edges, Susan smoothing 4 From SPLASH-2 – FFT, LU, Water-nquared, Water-spatial 1 Multisoftware Application – 16 sequential applications of MiBench compiled as a single application running in a 16-core environment with no data dependency among applications 1 Multisoftware Parallel Applications – Four parallel applications from ParMiBench (Dijkstra, sha, Stringsearch and Basicmath) compiled as an single application – 64-core with 16 threads for each application; – 32-core with 8 threads each application and so on... Applications

36 36 Summarizing 4 Processor Models PowerPC, MIPS, SPARC, ARM (MIPS and SPARC with power consumption estimates) 3 interconnection devices Router-LT, NOC-LT, NOC-AT 7 multicore configurations 1, 2, 4, 8, 16, 32, 64 cores 13 parallel applications 844 configurations ParMibench, Splash-2, Mibench

37 37./eslbench -r or --run: to run -b or --build: to build simulators -pw or --power: to enable power consumption for SPARC and MIPS platforms -p or --processor : to choose processor models -n or --numcores : to choose the number of cores (1,2,4,8,16,32,or 64) -s or --software : to choose the application -i or --interconnection : to choose the interconnection device -t or --temp : to choose LT or AT as abstraction level of NOC -c or --condor: to enable execution on HTCondor -l or --clean): clean -h or --help): help Examples:./eslbench -r -p mips -pw -s all -i noc -t LT -n 64 -b - build and run all programs in the 64-mips including power consumption, using a NOC-LT as interconnection device - all option makes easier to fully execute the benchmark eslbench script

38 38... Design using a NOC Blocking Transport TLM 2.0 with timing annotation NON-blocking transport TLM 2.0 NOC is totally configured in runtime N 0,0 N 0,1 N 0,0 N 0,2 N 0,8... N 1,1 N 1,0 N 1,2 N 1,8 N 0,0 N 2,1 N 2,0 N 2,2 N 2,8... N 7,1 N 7,0 N 7,2 N 7,8... Loosely Timed or Approximately timed

39 39 Number of Instructions executed on 16-PowerPC platform running multosoftware application

40 40 Number of wait() calls – powerpc.router.lt – strong scaling applications

41 41 Number of wait() calls – powerpc.router.lt – weak scaling applications


Download ppt "Liana Duenha (FACOM and Unicamp) A SystemC Benchmark Suite for Evaluating MPSoC Tools and Methodologies Rodolfo Azevedo (Unicamp)"

Similar presentations


Ads by Google