Download presentation
Presentation is loading. Please wait.
1
Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University, Sweden
2
2 GSM Phone: Search Radio Link Control Talking GSM Phone: Search Radio Link Control Talking MP3 player Digital Camera: Take Photo Restore Photo Digital Camera: Take Photo Restore Photo... High performance Low power Predictable
3
3 Design Flow Hardware platform Software Application(s) Extract Task Graph Extract Task Parameters Optimize Formal Simulation CPU0 ASIC0 CPU1 Bus for (i=0;i<99;i++) x=x+a[i]; for(j=0;j<100;j++) y=y+b[i]; if (x<y)z=y; Worst case execution times Task power dl for (i=0;i<99;i++) x=x+a[i]; for (j=0;j<100;j++) y=y+b[i]; if (x<y)z=y; Implement Extract Task Parameters Optimize
4
4 Application Model dl
5
5 Hardware Architecture Bus CPU Interrupt Device Private Memory Private Memory Private Memory Semaphore Device Shared Memory CACHE
6
6 Execution Model CPU 1 CPU 2 BUS Shared Mem Private Mem 1 Cache Private Mem 2 copy(s,y) use(y) 2:2: y Instructions 2 Original TG copy(x,s) comp(x) x Instructions 1 1:1: s
7
7 Task Model ii jj Original TG wi rj Explicit communication ii jj Extended TG
8
8 Motivational Example 11 22 wi WCET: 1 =60; 2 =25; w2 =12 1 and 2 have a deadline at time 63 PMem 1 Bus CPU 1 CPU 2 ShMem PMem 2 11 22 wi
9
9 Motivational Example (2) CPU 1 CPU 2 BUS 11 22 Implicit communication w2 M1M1 M3M3 M5M5 M2M2 M4M4 I1I1 I2I2 0 6915 0 61117 24 3339 36 57 Explicit communication dl=63 I5I5 w2 I4I4 I3I3
10
10 w2 I5I5 I4I4 I3I3 I2I2 Motivational Example (3) CPU 1 CPU 2 BUS 11 22 w2 M1M1 M3M3 M5M5 M2M2 M4M4 I1I1 0 691818 0 31121217 24 36364949 43 6767 dl=63 0 61218 24 31 Deadline violation ! 434349 Using a FCFS bus arbiter
11
11 w2 I5I5 I2I2 I3I3 I4I4 Motivational Example (4) CPU 1 CPU 2 BUS 11 22 w2 M1M1 M3M3 M2M2 I1I1 0 691818 0 32121217 2626 33939 39 5757 dl=63 0 69 2121 323249 1515 M4M4 M4M4 2626 3939 Using a bus schedule
12
12 Motivational Example Message In multiprocessor systems, the WCET depends on the bus load ! In multiprocessor systems, the WCET depends on the schedule ! In multiprocessor systems, the schedule depends on the WCET !
13
13 Implicit Communication BenchmarkBus UtilizationImpl.Communication GSM 1) 12%39% MP3 2) 26%42% MP3 3) 49%86% Setup: ARM7 cores, ST bus protocol 1) Icache: 4096b, Dcache: 1024b 2) Icache: 4096b, Dcache: 1024b 3) Icache: 16b, Dcache: 256b
14
14 WCET Analysis Difficult both for single and multiprocessor systems Single processor tools: Symta/P, Absint aiT Handle instruction and data caches Basic idea: enumerate all the possible paths of the program (CFG) and consider always the longest one
15
15 WCET Analisys Flow source files analysis Data flow Instr. address extraction Program segment simulation Abstract syntax tree generation Data dependency analysis Data flow extraction Data address analysis Data cache binary file CFG construction Annotated CFG WCET Instruction cache Data cache Instr. Cache analysis
16
16 WCET Analysis: Example void foo() { int i, temp; for (i=0; i<100; i++) { temp=a[i]; a[temp]=0; }
17
17 WCET Analysis: CFG 1:void foo() { 2: int i, temp; 3: for (i=0; 4: i<N; 5: i++) { 6:temp=a[i]; 7:a[temp]=0; 8: } 9:} id: 2 id: 17 Lno:3,4,9 id: 12 Lno:3,4,6 id: 4 id: 13 Lno:6,7,5,4,6 id: 16 Lno:6,7,5,4,8 id: 11
18
18 WCET Analysis: CFG id: 2 id: 17 Lno:3,4 id: 12 Lno:3,4 id: 4 id: 13 Lno:6,7,5,4,6 id: 16 Lno:6,7,5,4,8 id: 11 Control nodes: 2, 4, 11 Basic blocks: 12, 17, 13, 6 id: 4 Loop bound (for ex. N=100)
19
19 WCET Analysis with Instruction Cache Generate the address traces for each program block Assume always a miss at the beginning of each block Use a cache simulator to get the cache rate/miss ratio for each block We can do better
20
20 WCET Analysis with ICache: Unrolled CFG 1:void foo() { 2: int i, temp; 3: for (i=0; 4: i<100; 5: i++) { 6:temp=a[i]; 7:a[temp]=0; 8: } 9:} id: 2 id: 17 Lno:3,4 id: 12 Lno:3,4 id: 4 id: 13 Lno:6,7,5,4,6 id: 16 Lno:6,7,5,4,8 id: 11 id: 104 id: 13 Lno:6,7,5,4,6
21
21 WCET Analysis with ICache: Unrolled CFG id: 2 id: 17 Lno:3,4 id: 12 Lno:3,4 id: 4 id: 13 Lno:6,7,5,4,6 id: 16 Lno:6,7,5,4,8 id: 11 id: 104 id: 13 Lno:6,7,5,4,6 miss lno 6 (d) lno 6 miss lno 7 (d) lno 7, 5, 4 miss lno 6 (d) miss lno 6 (i) lno 6 miss lno 7 (i) miss lno 7 (d) lno 7 miss lno 5 (i) lno 5, 4 miss lno 3 (i) miss lno 3 (d) lno 3 miss lno 4 (i) lno 4
22
22 WCET Analysis: Multiprocessor Cache miss penalty is constant in single processor case Cache miss penalty is variable in the multiprocessor case
23
23 Predictable MPSoC Bus Access Partition the bus period in bus slots (TDMA) Assign bus slots to the processors The bus arbiter grants the bus to a processor only during its allocated slots Eliminates the bus interference Not flexible: an idle bus slot can not be used by another processor
24
24 Analysis & Bus Access id: 2 id: 17 Lno:3,4 id: 12 Lno:3,4 id: 4 id: 13 Lno:6,7,5,4,6 id: 16 Lno:6,7,5,4,8 id: 11 id: 104 id: 13 Lno:6,7,5,4,6 miss lno 6 (d) lno 6 miss lno 7 (d) lno 7, 5, 4 miss lno 6 (d) miss lno 6 (i) lno 6 miss lno 7 (i) miss lno 7 (d) lno 7 miss lno 5 (i) lno 5, 4 miss lno 3 (i) miss lno 3 (d) lno 3 miss lno 4 (i) lno 4 Bus schedule CPU1 CPU2 CPU1 CPU2 CPU1... 2432 0 816 42 52
25
25 Multiprocessor Analysis and Optimization In multiprocessor systems, the WCET depends on the schedule ! In multiprocessor systems, the schedule depends on the WCET !
26
26 55 Overall Approach CPU 1 CPU 2 CPU 3 BUS 11 22 33 CPU 1 : 1, 4 CPU 2 : 2 CPU 3 : 3, 5 44 11 33 11 22 33 22 44 22 33 44 44 22 55 22 55 44 44 44 55 55
27
27 Overall Approach starting at t for the time interval Select bus schedule B tasks from set Determine WCET of the is the earliest time a tasks from set finishes Schedule new task at time t>= that are active at time t is the set of all tasks New task to schedule optimization Bus schedule
28
28 Overall Approach starting at t for the time interval Select bus schedule B tasks from set Determine WCET of the is the earliest time a tasks from set finishes Schedule new task at time t >= that are active at time t is the set of all tasks New task to schedule optimization Bus schedule
29
29 Bus Schedule: BSA1 t0t0 t1t1 t3t3 CPU 2 t1t1 t2t2 t0t0 t4t4 t3t3 CPU 1 CPU 2... over a period slot_start owner CPU 1 CPU 2 CPU 1... t2t2
30
30 Bus Schedule: BSA2 t0t0 owners 1, 2 12 seg_size seg_start owner size 1 3 CPU 1 CPU 2 Segment 1 Segment 2 over a period... t1t1 t2t2 t0t0 t4t4 t3t3 CPU 2 CPU 1 CPU 2... t4t4 owners 2, 1 7 seg_size seg_start owner size 2 5 CPU 1 CPU 2 CPU 1 t5t5 t6t6...
31
31 Bus Schedule: BSA3 t0t0 seg_start owners 1, 2 3 slot_size t4t4 2, 1 6... over a period Segment 1 Segment 2 t1t1 t2t2 t0t0 t4t4 t3t3 CPU 2 CPU 1 CPU 2... CPU 2 CPU 1 t5t5 t6t6
32
32 Experimental Results BSA 4 BSA 3 BSA 2 BSA 1 Number of CPUs Normalized Schedule Length 1 1.5 2 2.5 3 3.5 4 2 4 6 8 10 12 14 16 18 20
33
33 Experimental Results Number of CPUs Normalized Schedule Length
34
34 Real-life Example Smart phone GSM voice codec (encoder+decoder) and Mp3 player 64 tasks, between 100-2000 lines of C code per task 4 ARM7 processors, interconnected via a bus
35
35 Real-life Example BSA_1BSA_2BSA_3BSA_4 1.171.331.311.62 GSM + Mp3 64 tasks 4 ARM7 processors
36
36 Conclusions Realistic model for MPSoC WCET analysis must be integrated in the system scheduling Tool for system level scheduling and WCET Tested on real applications
37
37 ARTIST LiU TU Brauschweig U. of Bologna Original SymtaP code Bus controller Implementation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.