Download presentation
Presentation is loading. Please wait.
1
Detailed look at the TigerSHARC pipeline Cycle counting for the IALU versionof the DC_Removal algorithm
2
DC_Removal algorithm performance 2 / 28 To be tackled today Expected and actual cycle count for J- IALU version of DC_Removal algorithm Understanding why the stalls occur and how to fix. Differences between first time into a function (cache empty) and second time into the function
3
DC_Removal algorithm performance 3 / 28 Set up time In principle 1 cycle / instruction 2 + 4 instructions
4
DC_Removal algorithm performance 4 / 28 First key element – Sum Loop -- Order (N) Second key element – Shift Loop – Order (log 2 N) 4 instructions N * 5 instructions 1 + 2 * log 2 N
5
DC_Removal algorithm performance 5 / 28 Third key element – FIFO circular buffer -- Order (N) 6 3 6 * N 2
6
DC_Removal algorithm performance 6 / 28 TigerSHARC pipeline
7
DC_Removal algorithm performance 7 / 28 Using the “Pipeline Viewer” Available with the TigerSHARC simulator ONLY VIEW | Debug Windows | Pipeline viewer F1 to F4 – instruction fetch unit pipeline PD, D, I -- Integer ALU pipeline A, EX1, EX2 – Compute Block pipeline
8
DC_Removal algorithm performance 8 / 28 Pipeline symbols Control - click A – Abort B – Bubble H – BTB Hit (Jumps) S – Stall W – Wait X – Illegal fetch(F1 – F4) X – Illegal instruction (PD – E2)
9
DC_Removal algorithm performance 9 / 28 Time in theory Set up pointers to buffers Insert values into buffers SUM LOOP SHIFT LOOP Update outgoing parameters Update FIFO Function return 2 4 4 + N * 5 1 + 2 * log 2 N 6 3 + 6 * N 2 --------------------------- 22 + 11 N + 2 log 2 N N = 128 – instructions = 1444 1444 cycles + 1100 delay cycles C++ debug mode – 9500 cycles??????? Note other tests executed before this test. Means “cache filled”
10
DC_Removal algorithm performance 10 / 28 Test environment Examine the pipeline the 2 nd time around the loop “Cache’s filled”?
11
DC_Removal algorithm performance 11 / 28 Set up time Expected 2 + 4 instructions Actual 2 + 4 instructions + 2 stalls Why not 4 stalls?
12
DC_Removal algorithm performance 12 / 28 First time round sum loop Expected 9 instructions LC0 load – 3 stalls Each memory fetch – 4 stalls Actual 9 + 11 stalls
13
DC_Removal algorithm performance 13 / 28 Other times around the loop Expected 5 instructions Each memory fetch – 4 stalls Actual 5 + 8 stalls
14
DC_Removal algorithm performance 14 / 28 Shift Loop – 1 st time around Expected 3 instructions No stalls on LC0 load? 4 stall on ASHIFTR BTB hit followed by 5 aborts
15
DC_Removal algorithm performance 15 / 28 Shift loop 2 nd and later times around Expect 2 Get 2
16
DC_Removal algorithm performance 16 / 28 Store back of &left, &right Expect 6 Actual 6 + 3 stalls
17
DC_Removal algorithm performance 17 / 28 Exercise 1 Based on knowledge to this points – determine the expected stalls during the last piece of code – FIFO buffer operatio
18
DC_Removal algorithm performance 18 / 28 Third key element – FIFO circular buffer -- Order (N) 6 3 6 * N 2
19
DC_Removal algorithm performance 19 / 28 Answer
20
DC_Removal algorithm performance 20 / 28
21
DC_Removal algorithm performance 21 / 28
22
DC_Removal algorithm performance 22 / 28
23
DC_Removal algorithm performance 23 / 28 Second time into function
24
DC_Removal algorithm performance 24 / 28 What happens if cache not full? – first time function called? Was 2 + 2 stalls in loop Now 11 + 12 stalls in loop
25
DC_Removal algorithm performance 25 / 28 First time function called 2 nd time around the loop Ditto 3, 4, 5, 6, 7, 8 times
26
DC_Removal algorithm performance 26 / 28 9 th time around the loop ditto 17 th, 25 th, 33 rd, 41 st, 49 th
27
DC_Removal algorithm performance 27 / 28 What is happening? With cache filled – memory read accesses require 4 cycles Unfilled – first one requires “12 cycles” Then next 7 require 4 cycles Total guess – is extra time associated with doing extra reads to fill the cache?
28
DC_Removal algorithm performance 28 / 28 Tackled today Expected and actual cycle count for J-IALU version of DC_Removal algorithm Understanding why the stalls occur and how to fix. Differences between first time into a function (cache empty) and second time into the function Further unknowns – how memory operations really work
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.