Download presentation
Presentation is loading. Please wait.
Published byCornelius Bailey Modified over 9 years ago
1
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Marco Caccamo University of Illinois at Urbana-Champaign
2
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Outline Motivation PRedictable Execution Model (PREM) – Peripheral scheduler & real-time bridge – Memory-centric scheduling MemGuard – Memory bandwidth Isolation Colored Lockdown – Cache space management 2
3
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Real-Time Applications 3 Resource intensive real-time applications – Multimedia processing(*), real-time data analytic(**), object tracking Requirements – Need more performance and cost less Commercial Off-The Shelf (COTS) – Performance guarantee (i.e., temporal predictability and isolation) (*) ARM, QoS for High-Performance and Power-Efficient HD Multimedia, 2010 (**) Intel, The Growing Importance of Big Data and Real-Time Analytics, 2012
4
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Modern System-on-Chip (SoC) More cores – Freescale P4080 has 8 cores More sharing – Shared memory hierarchy (LLC, MC, DRAM) – Shared I/O channels 4 More performance Less energy, Less cost But, isolation?
5
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems In a multicore chip, memory controllers, last level cache, memory, on chip network and I/O channels are globally shared by cores. Unless a globally shared resource is over provisioned, it must be partitioned/reserved/scheduled. Otherwise – Complexity, cost and schedule: The schedulability analysis, testing and temporal certification of an IMA partition in a core will also depend on tasks running in other cores – Safety Concerns: The change of software in one core could cause the tasks in other cores’ IMA partitions missing their deadlines. This is unacceptable! 5 SoC: challenges for RT safety-critical systems
6
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Problem: Shared Memory Hierarchy Shared hardware resources OS has little control Core1 Core2 Core3 Core4 DRAM App 1 App 2 App 3App 4 6 Memory Controller (MC) Shared Last Level Cache (LLC) Space sharing Access contention
7
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems 7 Problem: Task-Peripheral conflict (1 core) Task-peripheral conflict: – Master peripheral working for Task B. – Task A suffers cache miss. – Processor activity can be stalled due to interference at the FSB level. How relevant is the problem? – Up to 49% increased wcet for memory intensive tasks. – Contention for access to main memory can greatly increase a task worst-case computation time! CPU Front Side Bus DDRAM Host PCI Bridge Master peripheral Slave peripheral Task A Task B This effect MUST be considered in wcet computation!! Sebastian Schonberg, Impact of PCI-Bus Load on Applications in a PC Architecture, RTSS 03 PCI Bus
8
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Experiment: Task and Peripherals Experiment on Intel Platform, typical embedded system speed. PCI-X 133Mhz, 64 bit fully loaded by traffic generator peripheral. Task suffers continuous cache misses. Up to 44% wcet increase. 8
9
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Experiment: 2 Cores Interference Task A suffers max number of cache misses (92% stall time). Task B has variable cache stall time. WCET increase proportional to cache stall time Max WCET increase ~= cache stall time of task A Adding PCI-E peripheral interference -> 196% WCET increase! Multicore interference is a serious problem!!! Multicore interference is a serious problem!!! 9
10
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Transaction LengthBandwidth (256B) No interference596MB/s (100%) 128 bytes441MB/s (74%) 256 bytes346MB/s (58%) 512 bytes241MB/s (40%) Problem: Bus Contention Two DMA peripherals transmitting at full speed on PCI-X bus. Round-robin arbitration does not allow timing guarantees. RAM CPU 10
11
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Problem: Bus Contention 0 8 16 t t 3 NO BUS SHARING RAM 6 Two DMA peripherals transmitting at full speed on PCI-X bus. Round-robin arbitration does not allow timing guarantees. CPU 11
12
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Problem: Bus Contention RAM 0 8 16 t t 6 BUS CONTENTION, 50% / 50% 10 4 Two DMA peripherals transmitting at full speed on PCI-X bus. Round-robin arbitration does not allow timing guarantees. CPU 11
13
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Problem: Bus Contention RAM 0 8 16 t t 9 BUS CONTENTION, 33% / 66% 9 Integration Nightmare!!! Integration Nightmare!!! Two DMA peripherals transmitting at full speed on PCI-X bus. Round-robin arbitration does not allow timing guarantees. CPU 11
14
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Compute worst case increase on task computation time due to peripheral interference (single core system). Main idea: treat the memory subsystem as a switch that multiplexes accesses between the CPU and peripherals. The same analysis was later extended to multicore platforms. Cache Delay Analysis (contention-based access) t Cache fetches t Bandwidth t Cache fetches wcet increase Task Peripherals wcet (no interfence) 12 R. Pellizzoni and M. Caccamo, "Impact of Peripheral-Processor Interference on WCET Analysis of Real-Time Embedded Systems" IEEE Transactions on Computers (TC), Vol. 59, No. 3, March 2010.
15
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Modeling I/O traffic: Peripheral Arrival Curve Key idea: the maximum task delay depends on the amount of peripheral traffic (single core). : maximum amount of time required by all peripherals to access main memory. Can be obtained using… – Measurement – Distributed traffic analysis – Enforced through engineering solution (more on that later…) 14
16
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems The Need for Engineering Solutions Analysis bounds are tight but depend on very peculiar arrival patterns. Average case significantly lower than worst case. – Main issue: COTS arbiters are not designed for predictability. We propose engineering solutions to: 1.schedule memory accesses at high level (coarse granularity) memory-centric real-time scheduling, 2.control cores’ memory bandwidth usage, 3.manage cache space in a predictable manner 26
17
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Outline Motivation PRedictable Execution Model (PREM) – Peripheral scheduler & real-time bridge – Memory-centric scheduling MemGuard – Memory bandwidth Isolation Colored Lockdown – Cache space management 17
18
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Peripheral Scheduling CPU RAM IMPLICIT SCHEDULE ENFORCEMENT 0 8 16 t t 3 BLOCK Solution: enforce peripheral schedule (single resource scheduling). No need to know low-level parameters! COTS peripherals do not provide block functionality, so how do we do this? COTS peripherals do not provide block functionality, so how do we do this? 28
19
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Real-Time I/O Management System Real-Time Bridge interposed between peripheral and bus. RT-Bridge buffers incoming/outgoing data and delivers it predictably. Peripheral Scheduler enforces traffic isolation. CPU North Bridge North Bridge PCIe South Bridge South Bridge ATA PCI-X RT Bridge RT Bridge RT Bridge RT Bridge RT Bridge RT Bridge RT Bridge RT Bridge Peripheral Scheduler RAM 29 E. Betti, S. Bak, R. Pellizzoni, M. Caccamo and L. Sha, "Real-Time I/O Management System with COTS Peripherals" IEEE Transactions on Computers (TC), Vol. 62, No. 1, pp. 45-58, January 2013.
20
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Peripheral Scheduler Peripheral Scheduler receives data_rdy i information from Real-Time Bridges and outputs block i signals. Server provides isolation by enforcing a timing reservation. Fixed priority, cyclic executive etc. can be implemented in HW with very little area. Server 1 Scheduler (FP) READY 1 EXEC 1 EXEC 1 = READY 1 EXEC 2 = READY 2 and not EXEC 1 EXEC i = READY i and not EXEC 1 … and not EXEC i-1... READY 2 EXEC 2 READY i EXEC i... data_rdy 1 block 1 data_rdy 2 block 2 data_rdy i block i Server 2 Server i 30
21
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Real-Time Bridge FPGA CPU PLB Interrupt Controller Interrupt Controller DMA Engine Local RAM PCI Bridge IntMain IntFPGA block System + PCI Host CPU Main Memory PCI Controlled Peripheral Controlled Peripheral FPGA FPGA System-on-Chip design with CPU, external memory, and custom DMA Engine. Connected to main system and peripheral through available PCI/PCIe bridge modules. Memory Controller Memory Controller PCI Bridge 31 data_rdy
22
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Real-Time Bridge The controlled peripheral reads/writes to/from Local RAM instead of Main Memory (completely transparent to the peripheral). DMA Engine transfers data from/to Main Memory to/from Local RAM. FPGA CPU PLB Interrupt Controller Interrupt Controller DMA Engine Local RAM PCI Bridge IntMain IntFPGA block data_rdy System + PCI Host CPU Main Memory PCI Controlled Peripheral Controlled Peripheral FPGA Memory Controller Memory Controller PCI Bridge 32
23
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Peripheral Virtualization RT-Bridge supports peripheral virtualization. Single peripheral (ex: Network Interface Card) can service different software partitions. HW virtualization enforces strict timing isolation. 33
24
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Implemented Prototype Xilinx TEMAC 1Gb/s ethernet card (integrated on FPGA). Optimized virtual driver implementation with no software packet copy (PowerPC running Linux). Full VHDL HW code and SW implementation available. 34
25
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Evaluation 3 x Real-Time Bridges, 1 x Traffic Generator with synthetic traffic. Rate Monotonic with Sporadic Servers. Scheduling flows without peripheral scheduler (block always low) leads to deadline misses! PeripheralTransfer Time BudgetPeriod RT Bridge7.5ms9ms72ms Generator4.4ms5ms8ms Utilization 1, harmonic periods. Generator RT-Bridge 35
26
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Evaluation PeripheralTransfer Time BudgetPeriod RT Bridge7.5ms9ms72ms Generator4.4ms5ms8ms No deadline misses with peripheral scheduler Generator RT-Bridge 3 x Real-Time Bridges, 1 x Traffic Generator with synthetic traffic. Rate Monotonic with Sporadic Servers. 36
27
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Testbed (single core, distributed) Embedded testbed used to prove the applicability of our techniques. System objective: control a 3DOF Quanser helicopter. – Non-linear control. – 100 Hz sensing and actuation. End-to-end delay control using: – I/O Management System. – Real-Time Bridge 38
28
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Sensor Node performs sensing/actuation. Control node executes control algorithm. Data exchanged on real-time network. Testbed (single core, distributed) Sensor Node Control Node Quanser 3DOF helicopter RT Network 39
29
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Testbed Sensing / actuation node Control Node RT Switch CPURAM Mem logic Peripheral Scheduler PCI RT Bridge Traffic Generator RT NIC Card RT NIC Card ADC/DAC Card NIC GUI Node NIC Sensing data Actuation Disturb 40
30
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Real-Time Bridge Demo 41
31
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Predictable Execution Model (PREM uni-core) (The rule) Real-time embedded applications should be compiled according to a new set of rules to achieve predictability (The effect) The execution of a task can be distinguished between a memory intensive phase (with cache prefetching) and a local computation phase (with cache hits) (The benefit)High-level coscheduling can be enforced among all active components of a COTS system contention for accessing shared resources is implicitly resolved by the high-level coscheduler without relaying on low level arbiters 30 R. Pellizzoni, E. Betti, S. Bak, G. Yao, J. Criswell, M. Caccamo, R. Kegley, "A Predictable Execution Model for COTS-based Embedded Systems", Proceedings of 17th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), Chicago, USA, April 2011.
32
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems
37
Memory-centric scheduling (multicore) It uses the PREM task model: each task is composed by a sequence of intervals, each including a memory phase followed by a computation phase. It enforces a coarse-grain TDMA schedule for granting memory access to each core. Each core can be analyzed in isolation as if tasks were running on a “single-core equivalent ” platform. G. Yao, R. Pellizzoni, S. Bak, E. Betti, and M. Caccamo, "Memory-centric scheduling for multicore hard real- time systems", Real-Time Systems Journal, Vol. 48, No. 6, pp. 681-715, November 2012.
38
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Two cores example: TDMA slot of core 1 memory phasecomputation phase J1J1 J2J2 J3J3 4128 0 With a coarse-grained TDMA, tasks on one core can perform the memory access only when the TDMA slot is granted Core Isolation
39
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Memory-centric scheduling: three rules Assumption: fixed priority, partitioned scheduling Rule 1: enforce a coarse-grain TDMA schedule among the cores for granting access to main memory; Rule 2: raise scheduling priority of memory phases over execution phases when TDMA memory slot is granted; Rule 3: memory phases are non-preemptive.
40
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Raise priority of mem. phases during TDMA slot memory phasecomputation phase J1J1 J2J2 J3J3 4128 0 J1J1 J2J2 J3J3
41
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Make memory phases non-preemptive J1J1 J2J2 J3J3 4128 0 J1J1 J2J2 J3J3 4 8 0
42
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Summary of two cores example 42 Rule 1 – TDMA memory schedule Rule 2 – Prioritize memory phases during a TDMA memory slot Rule 3 – memory phases are non-preemptive
43
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems J1J1 J2J2 J3J3 0 J4J4 J5J5 10 20 30 40 Intuition of response time analysis The linearized TDMA model: 1.b is the memory bandwidth assigned to the core (b = TDMA_slot/ TDMA_period). 2.each memory phase is inflated by a factor 1/b; each execution phase is inflated by a factor 1/(1-b); 3.Interfering jobs that contribute to worst case response time can be separated as a memory chain followed by an execution chain; Execution chainMemory chain
44
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems J1J1 J2J2 J3J3 0 J4J4 J5J5 10 20 30 40 Pipelining memory and exec. phases key observations: The inflated memory and execution phases can run in parallel. Only ONE joint job contributes to both memory and execution chains (in this figure, J 3 is the joint job). Execution chainMemory chain
45
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Worst-case response time of Job J i 2. Memory blocking from one lower priority job 3. Either memory or computation from hp(i) 4. Computation of job under analysis 1. Upper bound of the memory phase of the joint job 1.Both the memory and the computation of the joint job 2.Longest memory phase of one job with lower priority (due to non-preemptive memory) 3.The max of memory and computation phase for each higher priority job 4.The computation phase of the job under analysis
46
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Schedulability of synthetic tasks Core Util Memory Util In an 8-core, 10-task system, the memory-centric scheduling bound is superior to the contention-based scheduling bound. Schedulability ratio
47
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Schedulability of synthetic tasks Schedulability ratio Core Util Memory Util Ratio =.5 The contour line at 50% schedulable level
48
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Outline Motivation PRedictable Execution Model (PREM) – Peripheral scheduler & real-time bridge – Memory-centric scheduling MemGuard – Memory bandwidth Isolation Colored Lockdown – Cache space management 48
49
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Memory Interference Key observations: – Memory bandwidth(variable) != CPU bandwidth (constant) – Memory controller queuing/access delay is unpredictable 49 Core Shared Memory Core foreground X-axis background 470.lbm Intel Core2 L2 Foreground slowdown ratio (1.6GB/s)(1.5GB/s) (1.4GB/s) (2.1GB/s)
50
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Memory Access Pattern Memory access patterns vary over time Static resource reservation is inefficient 50 Time(ms) LLC misses Time(ms)
51
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Memory Bandwidth Isolation MemGuard provides an OS mechanism to enforce memory bandwidth reservation for each core 51 H. Yun, G. Yao, R. Pellizzoni, M. Caccamo, L. Sha, "MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isolation in Multi-core Platforms", to appear at IEEE RTAS, April 2013.
52
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems MemGuard Characteristics – Memory bandwidth reservation system – Memory bandwidth: guaranteed + best-effort – Prediction based dynamic reclaiming for efficient utilization of guaranteed bandwidth – Maximize throughput by utilizing best-effort bandwidth whenever possible Goal – Minimum memory performance guarantee – A dedicated (slower) memory system for each core in multi-core systems 52
53
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Memory Bandwidth Reservation Idea – Control interference by regulating per-core memory traffic – OS monitor and enforce each core’s memory bandwidth usage Using per-core HW performance counter(PMC) and scheduler 53 1020 0 Dequeue tasks Enqueue tasks Dequeue tasks Budget Core activity 2121 computation memory fetch
54
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Guaranteed Bandwidth: r min Definition – Minimum memory transfer rate when requests are back-logged in the DRAM controller worst-case access pattern: same bank & row miss Example (PC6400-DDR2*) – Peak B/W: 6.4GB/s – Measured minimum B/W: 1.2GB/s 54 (*) PC6400-DDR2 with 5-5-5 (RAS-CAS-CL latency setting)
55
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Memory Bandwidth Reservation System-wide reservation rule – up to the guaranteed bandwidth r min m: #of cores Memguard approximates a dedicated (ideal) memory subsystem – bandwidth: B i (bytes/sec) – latency: 1/B i (sec/byte) 55
56
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Memory Bandwidth Reclaim Key objective – Utilize guaranteed bandwidth efficiently Regulator – Predicts memory usage based on history – Donates surplus to the reclaim manager at the beginning of every period – When remaining budget (assigned – donated) is depleted, tries to reclaim from the reclaim manager Reclaim manager – Collects the surplus from all cores – Grants reclaimed bandwidth to individual cores on demand 56
57
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Hard/Soft Reservation on MemGuard Hard reservation (w/o reclaiming) – Guarantee memory bandwidth B i regardless of other cores – Selectively applicable on per-core basis Soft reservation (w/ reclaiming) – Does not guarantee reserved bandwidth due to potential misprediction – Error cases can occur due to misprediction – Error rate is small (shown in evaluation) Best-effort bandwidth – After all cores use their given budgets, and before the next period begins, MemGuard broadcasts all cores to continue to execute 57
58
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Evaluation Platform Intel Core2Quad 8400, 4MB L2 cache, PC6400 DDR2 DRAM Modified Linux kernel 3.6.0 + MemGuard kernel module – https://github.com/heechul/memguard/wiki/MemGuard https://github.com/heechul/memguard/wiki/MemGuard Used the entire 29 benchmarks from SPEC2006 and synthetic benchmarks 58 Core 0 L1-IL1-D L2 Cache Intel Core2Quad Core 1 L1-IL1-D Core 2 L1-IL1-D L2 Cache Core 3 L1-IL1-D System Bus DRAM
59
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Isolation Effect of Reservation 59 Isolation Core 0: 1.0 GB/s for X-axis Core 2: 0.2 – 2.0 GB/s for lbm Solo IPC@1.0GB/s
60
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Effects of Reclaiming and Spare Sharing Guarantee foreground (SPEC@1.0GB/s)SPEC@1.0GB/s Improve throughput of background (lbm@0.2GB/s): 368%lbm@0.2GB/s 60
61
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Effect of MemGuard Soft real-time application on each core. Provides differentiated memory bandwidth – weight for each core=1:2:4:8 for the guaranteed b/w, spare bandwidth sharing is enabled 61
62
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Outline Motivation PRedictable Execution Model (PREM) – Peripheral scheduler & real-time bridge – Memory centric scheduling MemGuard – Memory bandwidth Isolation Colored Lockdown – Cache space management 62
63
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems LVL3 Cache & Storage Interference Inter-core interference – The biggest issue wrt modular certification – Fetches by one core might evict cache blocks owned by another core – Hard to analyze! Inter-task/inter-partition interference Intra-task interference – Also present in single-core systems; intra-task interference is mainly a result of cache self-eviction.
64
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Inter-Core Interference: Options Private cache – It is often not the case: majority of COTS multicore platforms have last level cache shared among cores Cache-Way Partitioning – Easy to apply, but inflexible – Reducing number of ways per core can greatly increase cache conflicts Colored Lockdown – Our proposed approach – Use coloring to solve cache conflicts – Fine-grained assignment of cache resources (page size – 4Kbytes) – Use cache locking instructions to lock “hot” pages of rt critical tasks locked pages can not be evicted from cache R. Mancuso, R. Dudko, E. Betti, M. Cesati, M. Caccamo, R. Pellizzoni, "Real-Time Cache Management Framework for Multi-core Architectures", to appear at IEEE RTAS, Philadelphia, USA, April 2013.
65
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems How Coloring Works The position inside the cache of a cache block depends on the value of index bits within the physical address. Key idea: the OS decides the physical memory mapping of task’s virtual memory pages manipulate the indexes to map different pages into non-overlapping sets of cache lines (colors)
66
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems How Coloring Works The position inside the cache of a cache block depends on the value of index bits within the physical address. Key idea: the OS decides the physical memory mapping of task’s virtual memory pages manipulate the indexes to map different pages into non-overlapping sets of cache lines (colors)
67
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems How Coloring Works You can think of a set associative cache as an array…... 32 ways 16 colors
68
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems How Coloring Works You can think of a set associative cache as an array… Using only cache-way partitioning, you are restricted to assign cache blocks by columns. Note: assigning one way turns it into a direct-mapped cache!...
69
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems How Coloring + Locking Works You can think of cache as an array… Combining coloring and locking, you can assign arbitrary position to cache blocks independently of replacement policy...
70
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems T 1 CPU 1 Colored Lockdown Final goal Aimed model - suffer cache misses in hot memory regions only once: – During the startup phase, prefetch & lock the hot memory regions – Sharp improvement in terms of WCET reduction (and schedulability) T 2 CPU 2 startup memory access execution T 1 CPU 1 T 2 CPU 2 T 2 CPU 2
71
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems In the general case, the size of the cache is not enough to keep the working set of all running rt critical tasks. For each rt critical task, we can identify some high usage virtual memory regions, called: hot memory regions ( ). Such regions can be identified through profiling. Critical tasks do NOT color dynamically linked libraries. Dynamic memory allocation is allowed only during the startup phase. Detecting Hot Regions
72
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems How can we detect hot pages? Given an addr. space: Detecting Hot Regions Their location is unknown Their absolute virtual memory addresses change from run to run Process Addr. Space data text heap
73
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Execute the unmodified task inside a profiling environment The output is the list of every single accessed virtual memory address We keep per-page access counters. Hotter pages will record a higher number of accesses. Detecting Hot Regions Profiling Environment Observed Task Instrumentation code added at run-time Memory accesses are caught
74
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Detecting Hot Regions Rank the virtual pages by number of accesses. Since absolute addresses change from run to run, identify each page as a pair of values: – The index of the section which contains the page – The offset, expressed in pages, from the beginning of the section E.g.: virtual page #: 0x8040A → Section #3 (text) + 0x3 Execute the task again outside the profiling environment to obtain an unaltered list of sections. Compute the relative position of a hot page according to the unaltered list of sections.
75
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems The final memory profile will look like: Detecting Hot Regions # + page offset 1 + 0x0002 1 + 0x0004 25 + 0x0000 1 + 0x0001 25 + 0x0003 3 + 0x0000 4 + 0x0000 6 + 0x0002 1 + 0x0005 1 + 0x0000... ABCDEIKOPQABCDEIKOPQ Where A, B, … is the page ranking; Where “#” is the section index; It can be fed into the kernel to perform selective Colored Lockdown How many pages should be locked per process? Task WCET reduction as function of locked pages has approximately a convex shape; convex optimization can be used for allocating cache among rt critical tasks
76
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems EEMBC Automotive benchmarks – Benchmarks converted into periodic tasks – Each task has a 30 ms period ARM-based platform – 1 GHz Dual-core Cortex-A9 CPU – 1 MB L2 cache + private L1 (disabled) Tasks observed on Core 0 – Each plotted sample summarizes execution of 100 jobs Interference generated with synthetic tasks on Core 1 EEMBC Results
77
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems EEMBC Results Angle to time conversion benchmark (a2time) Baseline reached when 4 hot pages are locked / 81% accesses caught
78
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems EEMBC Results CAN remote data request benchmark (canrdr) Baseline reached when 3 pages are locked / 91% accesses caught
79
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems EEMBC Results Same experiment executed on 7 EEMBC benchmarks BenchmarkTotal PagesHot Pages % Accesses in Hot Pages a2time15481% basefp21697% bitmnp19580% cacheb30592% canrdr16385% rspeed14485% tblook17381%
80
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems EEMBC Results One benchmark at the time scheduled on Core 0 Only the hot pages are locked No Prot. No Interf. No Prot. Interf. Prot. Interf.
81
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems EEMBC Results Four benchmarks at the time scheduled on Core 0 Only the hot pages are locked Prio 4 (top priority) Prio 3 Prio 2 Prio 1 (low priority)
82
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems Conclusions In a multicore chip, memory controllers, last level cache, memory, on chip network and I/O channels are globally shared by cores. Unless a globally shared resource is over provisioned, it must be partitioned/reserved/scheduled. We proposed a set of engineering solutions to: 1.schedule memory accesses at high level (PREM + memory-centric scheduling), 2.control cores’ memory bandwidth usage (MemGuard), 3.manage cache space in a predictable manner (Colored Lockdown). We demonstrated our techniques on different platforms based on Intel and ARM, and tested them against other options. Questions?
83
Predictable Integration of Safety-Critical Software on COTS-based Embedded Systems 83 Part of this research is joint work with prof. Lui Sha and prof. Rodolfo Pellizzoni This presentation is from selected research sponsored by – National Science Foundation (NSF), Office of Naval Research (ONR) – Lockheed Martin Corporation – Rockwell Collins Graduate students and Postdocs involved in this research: Stanley Bach, Heechul Yun, Renato Mancuso, Roman Dudko, Emiliano Betti, Gang Yao References E. Betti, S. Bak, R. Pellizzoni, M. Caccamo and L. Sha, "Real-Time I/O Management System with COTS Peripherals”, IEEE Transactions on Computers (TC), Vol. 62, No. 1, pp. 45-58, January 2013. R. Pellizzoni, E. Betti, S. Bak, G. Yao, J. Criswell, M. Caccamo, R. Kegley, "A Predictable Execution Model for COTS-based Embedded Systems", Proceedings of 17 th RTAS, Chicago, USA, April 2011. G. Yao, R. Pellizzoni, S. Bak, E. Betti, and M. Caccamo, "Memory-centric scheduling for multicore hard real-time systems", Real-Time Systems Journal, Vol. 48, No. 6, pp. 681-715, November 2012. H. Yun, G. Yao, R. Pellizzoni, M. Caccamo, L. Sha, "MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isolation in Multi-core Platforms", to appear at IEEE RTAS, April 2013. R. Mancuso, R. Dudko, E. Betti, M. Cesati, M. Caccamo, R. Pellizzoni, "Real-Time Cache Management Framework for Multi-core Architectures", to appear at IEEE RTAS, Philadelphia, USA, April 2013. Acknowledgements 1
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.