Download presentation
Presentation is loading. Please wait.
Published byDaisy Andrews Modified over 9 years ago
1
1 of 14 Lab 2: Design-Space Exploration with MPARM
2
2 of 14 2 Outline The MPARM simulation framework Hardware platform Software platform Design-space exploration for energy minimization Communication in MPSoC Mapping and scheduling
3
3 of 14 3 MPSoC Architecture Bus ARM Interrupt Device Private Memory Private Memory Private Memory Semaphore Device Shared Memory CACHE
4
4 of 14 4 System Design Flow Informal specification, constraints Modeling System model Mapped and scheduled model Estimation System architecture Architecture selection Hardware and Software Implementation Prototype Fabrication ok Testing Functional simulation not ok Mapping Scheduling not ok
5
5 of 14 5 System Design Flow Hardware platform Software Application(s) Extract Task Graph Extract Task Parameters Optimize (Mapping & Sched) Implement Formal Simulation
6
6 of 14 6 MPARM: Hardware Up to eight ARM7 processors Variable frequency (dynamic and static) Instruction and data caches Private memory Scratchpad Shared memory Communication bus Read more in: /home/TDTS07/sw/mparm/MPARM/doc/simulator_statistics.txt
7
7 of 14 7 MPARM: Software C crosscompiler toolchain for building software applications No operating system Small set of functions (such as WAIT and SIGNAL) (look in the application code)
8
8 of 14 8 MPARM: Why? Cycle-accurate simulation of the system Various statistics: number of clock cycles executed, bus utilization, cache efficiency, and energy/power consumption of the components (CPUs, buses, and memories)
9
9 of 14 9 MPARM: How? mpsim.x -c2 — run on two processors, collecting default statistics mpsim.x -c2 -w — run on two processors, collecting power/energy statistics mpsim.x -c1 --is=9 --ds=10 — run on one processor with instruction cache of 512 bytes and data cache of 1024 bytes mpsim.x -c2 -F0,2 -F1,1 -F3,3 — run on two processors operating at 100 MHz and 200 MHz and the bus operating at 66 MHz 200 MHz is the ”default” frequency mpsim.x -h — show other options Simulation results are in the file stats.txt
10
10 of 14 10 Design-Space Exploration Platform optimization Select the number of processors Select the speed of each processor Select the type, associativity, and size of the cache Select the bus type Application optimization Select the interprocessor communication style (shared memory or distributed message passing) Select the best mapping and schedule
11
11 of 14 11 Given a GSM codec Running on one ARM7 processor Variables are: cache parameters processor frequency Using MPARM, find a hardware configuration that minimizes the energy of the system Assignment 1
12
12 of 14 12 Energy/Speed Tradeoff CPU model
13
13 of 14 13 Frequency Selection: ARM Core Energy 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 1 1.5 2 2.5 3 3.5 4 Energy [mJ] Freq. divider
14
14 of 14 14 Frequency Selection: Total Energy 8.5 9.0 9.5 10 10.5 11 1 1.5 2 2.5 3 3.5 4 Energy [mJ] Freq. divider
15
15 of 14 15 Instruction Cache Size: Total Energy log2(CacheSize) 8.50 9.00 9.50 10.0 10.5 11.0 11.5 12.0 12.5 9 10 11 12 13 14 Energy [mJ] 2 9 =512 bytes 2 14 =16 kbytes
16
16 of 14 16 Instruction Cache Size: Execution Time t [cycles] 5e+07 5.5e+07 6e+07 6.5e+07 7e+07 7.5e+07 8e+07 8.5e+07 9e+07 9.5e+07 1e+08 9 10 11 12 13 14 log2(CacheSize)
17
17 of 14 17 Interprocessor Data Communication CPU 1 CPU 2... a =1... print a ;... BUS How?
18
18 of 14 18 Shared Memory CPU 1 CPU 2... a =1... print a ; BUS Shared Mem a a =2 a=? Synchronization
19
19 of 14 19 Synchronization CPU 1 CPU 2 a =1 signal(sem_a) print a ; Shared Mem a a =2 Semaphore sem_a wait(sem_a) a =2 With semaphores BUS
20
20 of 14 20 Synchronization Internals (1) CPU 1 CPU 2 a =1 signal(sem_a) print a ; Shared Mem a a =2 Semaphore sem_a while (sem_a==0) wait(sem_a) polling sem_a=1 BUS
21
21 of 14 21 Synchronization Internals (2) Disadvantages of polling Results in higher power consumption (energy from the battery) Larger execution time of the application Blocking important communication on the bus
22
22 of 14 22 Distributed Message Passing Instead Direct CPU-CPU communication with distributed semaphores Each CPU has its own scratchpad Smaller and faster than a RAM Smaller energy consumption than a cache Put frequently used variables on the scratchpad Cache controlled by hardware (cache lines, hits/misses, …) Scratchpad controlled by software (e.g., compiler) Semaphores allocated on scratchpads No polling
23
23 of 14 23 Distributed Message Passing (1) CPU 1 a=1 signal(sem_a) BUS Shared Mem a CPU 2 print a; a=2 wait(sem_a) sem_a
24
24 of 14 24 Distributed Message Passing (2) CPU 1 (prod) signal(sem_a) BUS CPU 2 (cons) print a; wait(sem_a) sem_a a=1
25
25 of 14 25 Assignment 2 Given two implementations of the GSM codec Shared memory Distributed message passing Simulate and compare these two approaches Energy Runtime
26
26 of 14 26 Thank you! Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.