Hardik Shah, Kai Huang and Alois Knoll

Hardik Shah, Kai Huang and Alois Knoll
The Priority Division Arbiter for low WCET and high Resource Utilization in Multi-core Architectures Hardik Shah, Kai Huang and Alois Knoll Department of Informatics – VI Technische Universität München

Shared resource arbiters
Shared resources to reduce cost Conflict on the shared memory Latency bounds on shared memory accesses Shared resources are employed to reduce overall cost of the product by reducing the package size Typically, arbiters are used to resolve access conflicts if multiple masters try to access the shared resource at the same time E.g. the cores in a multi-core processor share main memory. Cores are equipped with data and instruction caches. When a cache miss occurs on a core, the shared main memory is accessed. Typically, in a non-preemptive burst fashion. A shared memory access from one core may block an access from another core. Thus, the interference on the shared memory affects access latency and thereby, the execution time of applications executing on the cores. It is very difficult to predict collision of accesses from different cores. Hence, shared memory access latency is given by the best-case latency and the worst-case latency. 4/17/2019

Traditional arbiters Statically scheduled Dynamically scheduled
No interference, low efficiency Dynamically scheduled High interference, high efficiency Resource utilization 4/17/2019

Goal Priority Division arbiter [16]: hybrid, efficient, TDMA equal worst case latencies Study effects of an arbitration scheme on applications’ WCET and resource utilization Compare the SP, TDMA, RR and PD arbiters Exclusive focus on the shared memory interference Cache replacement policy, branch predictors etc. are assumed with constant effect Well addressed single-core problems 4/17/2019

Agenda Related work Background Priority Division Arbiter
Comparison of arbiters Conclusion 4/17/2019

Probabilistic analysis Our previous work Date ‘12, ‘13
Related work - I Special arbiters: Derived from traditional arbiters dTDMA [14], IABA [11], Slot reservation [13], Priority Division [16], On corner cases, all behave as the parent arbiter Randomized arbiters Lottery arbiter [10, 7], RT_Lottery [4] Budget based arbiters CCSP [2], PBS [20], MBBA [3], Deficit round robin [19] Probabilistic analysis Our previous work Date ‘12, ‘13 4/17/2019

Related work - II Arbiter comparisons
[12 – Pitter et al] and [9 – Kopetz et al] found TDMA to be the most predictable arbitration scheme [8 – Kelter et al] static WCET analysis approach for comparing SP, RR, TDMA and PD 4/17/2019

Background: latency and utilization
Cache-line fill using non-preemptive burst access Access latency: Employed arbitration scheme, instantaneous activity of co-existing masters and memory responsiveness Utilization: Ability of an arbiter to utilize the shared resource when only the “test-master” is active The efficiency of an arbiter in low load conditions 4/17/2019

Background: latency and utilization
Ability of an arbiter to utilize the shared resource when only the “test-master” is active The efficiency of an arbiter in low load conditions The right side of ‘|’ is only valid if the left side is true 4/17/2019

Background: Computation trace
Cache miss as a timeless event on an exe path A cache miss latency delays subsequent cache misses Can be used to calculate the BCET or the WCET 4/17/2019

Background: Static (fixed) priority arbiter
Each master is granted an access to the shared memory according to its priority Higher priority master cannot preempt ongoing lower priority burst access WLsp= 2 x SS, BLsp= SS for the highest priority master WL = ∞ for lower priority masters Work conserving – Usp = 100% 4/17/2019

Background: TDMA arbiter - I
Each master is granted a fixed exclusive window to access the shared memory (no interference) No effect of co-existing applications Poor resource utilization 4/17/2019

Background: TDMA arbiter - II
Latency: Utilization: Rotation of the wheel is fixed Latency and utilization depend on time between two cache misses, “computation time”- ci N = Total number of masters 4/17/2019

Background: Round robin arbiter
As soon as an active master is encountered, its slot is started – “greedy TDMA” WLrr = N x SS, BLrr = SS, Urr = 100% 4/17/2019

Priority Division arbiter - I
Mix of TDMA and SP Fixed slots, priorities inside slots Starvation free if every master has at least one slot where it has the highest priority 4/17/2019

Priority Division arbiter - II
Latency: Utilization: 4/17/2019

Priority Division arbiter - Benefits
Equal WCET as higher resource utilization Simple architecture Additional complexity compared to the complete system is negligible Incremental certification Using stress patterns on co-existing cores (m2 – m4) 4/17/2019

Priority Division arbiter – h1 configuration
Only one HRT master has the highest priority in all slots (mixed critical system with single HRT) Latency: Produces lower WCET bound for the highest priority master than the utilization penalty WLSP = 2 x SS 4/17/2019

Arbiter comparison: Complexity
SP TDMA RR PD Number of LEs 281 277 288 285 Dynamically scheduled arbiters (SP, RR and PD) are slightly more complex than the statically scheduled arbiter (TDMA) @125 MHz, Cyclone III FPGA, for a 4 port arbiter 4/17/2019

Arbiter comparison: Test architecture
Quad-core processor built using NIOS II F cores with 512 Bytes I$ and D$ On-chip memory as a shared main memory Test applications from the Mälerdalen WCET benchmark suit Recorded traces were extracted by probing the test master’s (m1’s) interface with the arbiter Utilization was measured by observing busy and idle cycles keeping co-existing masters (m2 – m4) off 4/17/2019

Arbiter comparison: TDMA vs RR vs PD
4/17/2019

Arbiter comparison: TDMA vs RR vs PD
Advantage of PD over TDMA and RR Drawback of PD over RR 4/17/2019

Arbiter comparison: SP vs PDh1
Advantage of PDh1 over SP Drawback of PDh1 over SP 4/17/2019

Thank you Questions? Conclusion
Priority Division is a promising arbitration scheme for predictable and high performance multi-core architectures Enables incremental certification and increases resource utilization at minor increase in complexity compared to TDMA and SP (in h1 mode) PDh1 produces lower WCET than SP for the highest priority master Thank you Questions? 4/17/2019

Hardik Shah, Kai Huang and Alois Knoll

Similar presentations

Presentation on theme: "Hardik Shah, Kai Huang and Alois Knoll"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hardik Shah, Kai Huang and Alois Knoll

Similar presentations

Presentation on theme: "Hardik Shah, Kai Huang and Alois Knoll"— Presentation transcript:

Similar presentations

About project

Feedback