Download presentation
Presentation is loading. Please wait.
1
Be-Nice Scheduling for embedded SMT processors Apr 6 th, 2008 Boston Handong Ye
2
Be-Nice Scheduling ITS (Inter-Thread Stall) Introduction Be-Nice Scheduling Some experimental results
3
Be-Nice Scheduling ITS Introduction –ITS in Out-Of-Order processor –ITS in In-Order processor Be-Nice Scheduling Some experimental results
4
ITS Introduction –ITS in Out-Of-Order machine A thread holds (or fulfills) shared resources too long, e.g., instruction queue/reservation station/..., and blocks others Flush, … –ITS in In-Order machine A thread holds Functional Units, blocking others 2 examples What can compiler do ? Be-Nice Scheduling
5
ITS Introduction –ITS In In-Order machine Examples, assume: –SMT, 2 threads –Embedded –2 LS units, and 2 ALU –Separate dispatch buffer Be-Nice Scheduling
6
ITS Introduction –ITS In In-Order machine Example – 1 (Same FU ITS) –A missed load can block other threads which are using the same LS unit Be-Nice Scheduling
7
add ld add EXE MEM WB Dispatch Buffer LS1LS2ALU1ALU2 ld add MISS Example - 1 : same-FU block Thread-A Thread-B
8
ITS Introduction –ITS In In-Order machine Example – 2 (Cross FU ITS) –A missed load can block other threads which are using non-LS Functional Units, e.g., ALU Be-Nice Scheduling
9
add ld add EXE MEM WB Dispatch Buffer LS1LS2ALU1ALU2 add MISS Example – 2 : cross-FU block add Thread-A Thread-B
10
ITS Introduction –ITS In In-Order machine Be-Nice Scheduling Assume: 1.Thread-A cache miss, around 1%~2% 2. Thread-B always hit Results: 1. Half of idle cycles are due to ITS 2. Almost 1/3 cycles are idle The effect of ITS, from thread-A to thread-B
11
ITS Introduction –ITS In In-Order machine What can compiler do ? –Focused on in-order embedded processor –Need a few simple HW supports –Using Open64, in Instruction Scheduling Be-Nice Scheduling
12
ITS (Inter-Thread Stall) Introduction Be-Nice Scheduling Some experimental results
13
Be-Nice Scheduling Intuitive thinking –Prefetch : Unacceptable for embedded system –Reduce Cross-FU ITS: Reduce the number of FUs hold by the thread-A –Reduce Same-FU ITS: Avoid issuing instructions from other threads into those blocked FUs Be-Nice Scheduling
14
add ld EXE MEM WB Dispatch Buffer LS1LS2ALU1ALU2 add Thread-A Thread-B add ld add sched Original Thread-A
15
Be-Nice Scheduling –Objective Schedule n (>=2) loads back-to-back Issue the n loads to same FU –Compiler + HW solution HW side –Add an extra load, ld.n (n=1,2), saying sending load only to the n th LS unit –Different threads has its prefer LS unit Compiler side –Profile to figure out the loads which are highly possible to miss, saying ‘load_a’ –Schedule another load, saying ‘load_b’, behind ‘load_a’, and glue them as a pseudo OP –Change ‘load_a’ and ‘load_b’ to the thread’s prefer LS unit, e.g., both are changed to ‘ld.1’ Be-Nice Scheduling
16
–A Compiler + HW solution Be-Nice Scheduling BB1: $r1 = ld $r2 $r2 = $r2 + 4 $r3 = ld $r4 $r3 = $r3 + 4 $r5 = $r1 + $r3 BB1: $r1 = ld $r2 $r3 = ld $r4 $r2 = $r2 + 4 $r3 = $r3 + 4 $r5 = $r1 + $r3 BB1: $r1 = ld $r2 $r2 = $r2 + 4 $r3 = ld $r4 $r3 = $r3 + 4 $r5 = $r1 + $r3 Identified to miss BB1: $r1 = ld.1 $r2 $r3 = ld.1 $r4 $r2 = $r2 + 4 $r3 = $r3 + 4 $r5 = $r1 + $r3
17
WHIRL CG-expand CGIR Control flow opt. If-conversion Loop optimizations Software pipelinin g Loop unrolling Scheduling pre- pass ( GCM here) Local register alloc Scheduling post-pass Prolog and Epilog Extended block optimizer Code emission.s Global register alloc Be-Nice Scheduling
18
Be-Nice Scheduling ( In Open64 GCM and LIS ) –The key points during code motion Use GCM to find candidates of pair Moving the pair as a ‘pseudo’ single instruction Be-Nice Scheduling
19
Some experimental results –Be-Nice Schedule on Thread-A –Performance difference on Thread-B
20
Be-Nice Scheduling Some experimental results The Number of ITS Cycles in thread-B: w/ Be-Nice vs. w/o Be-Nice
21
Be-Nice Scheduling Some experimental results IPC Improvement of thread-B with Be-Nice Instruction Scheduling
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.