Download presentation
Presentation is loading. Please wait.
1
TigerSHARC CLU Closer look at the XCORRS M. Smith, University of Calgary, Canada smithmr@ucalgary.ca
3
The practice Suppose we have the vector – in-phase and out-of- phase data gathered over an antenna from a satellite for example. Gain issues make it x16 -16-16j, 16+16j, 16+16j, -16-16j 16+16j, 16+16j -16-16j, 16+16j, 16+16j, -16-16j 16+16j, 16+16j, -16-16j 16+16j, 16+16j, etc Question – if the original data from the satellite had this form -1-j,1+j,1+j, -1-j,1+j,1+j, -1-j,1+j,1+j, -1-j,1+j,1+j, -1-j,1+j,1+j, -1-j,1+j,1+j, How is the satellite data delayed? FOR THIS EXAMPLE …….. 0, 3, 6, 9, 12 etc
4
Tackle the issue with FIR First – modify correlation function to handle complex values Ignore that issue at the moment Imagine 1024 data points + 1024 PRN Need to do 1024 FIR each of 1024 taps We know how to optimize to do 2 taps every cycle (one in X and one in Y) Cycle time is 1024 * 512 cycles = 1 ms at 500 MHz XCORS can do 8 * 16 taps each cycle in each compute block – 148 times faster
5
Where does the CLU fit in?
6
XCORRS definition
7
THEORY Mathematical definition Uses registers TR D C And something called CUT
8
Satellite data Quad fetch brings in 8 complex values 8 bits each Pattern here is -1 + 0j, 1 + 0j, 1 + 0j, -1 + 0j, 1 + 0j, 1 + 0j, ……….
9
PRN code – 2 bit complex number Seems strange to have two dummy bits But actually makes sense PRN -1+ -1j, 1 + j, 1 + j, -1 + -1j, 1 + j, 1 + j, ………. +1, -1 are associated with the PSK – more next lecture Problem BINARY means 1 and 0, so how represent 1 and -1
10
PRN
11
0x3 value go in as C15 and C16 0011 -- C15 = -1 –j C16 = +1 + j
12
Loading the THR registers
13
Standard XCORRS instruction Lower 46 bits ofTHR1:0 R7:3 TR0, TR1, TR2 ……. TR15
14
TR15:0 = XCORRS(R7:4, THR3:0) TR0 += D7 * C22 + D6 * C21 +… 8 taps TR1 += D7 * C21 + D6 * C20 +… 8 taps ……….. TR15 += D7 * C7 + D6 * C6 + … 8 taps 64 taps each cycles – on both x and y compute blocks – if set up properly 128 taps each cycle – these are “complex taps” compared to 2 real taps / cycle after lab. 3
15
TR15:0 = XCORRS(R7:4, THR3:0) (CUT -7) TR0 += D7 * C22 + D6 * C21 + … 8 taps TR1 += D7 * C21 + D6 * C20 + … 8 taps ……….. TR14 += D7 * C8 + D6 * C7 2 taps TR15 += D7 * C7 1 taps
16
TR15:0 = XCORRS(R7:4, THR3:0) (CUT -15) TR0 += D7 * C22 + D6 * C21 … 8 taps TR1 += D7 * C21 + D6 * C20 … 7 taps ……….. TR7 += D7 * C15 … 1 taps TR0 += 0 … 0 taps ……….. TR15 += 0 … 0 taps
17
TR15:0 = XCORRS(R7:4, THR3:0) (CUT +15) TR0 += 0 … 0 taps TR1 += D0 *C14 1 taps ……….. TR7 += D6 * C14 + D5 * C13 + … 7 taps TR0 += D7 * C14 + D6 * C13 + … 8 taps ……….. TR15 += D7 * C7 + D6 * C7 + … 8 taps
19
TR15:0 = XCORRS(R7:4, THR3:0) (CUT -15) TR0 += D7 * C22 + D6 * C21 … 8 taps TR1 += D7 * C21 + D6 * C20 … 7 taps ……….. TR7 += D7 * C15 … 1 taps TR0 += 0 … 0 taps ……….. TR15 += 0 … 0 taps
21
TR15:0 = XCORRS(R7:4, THR3:0) (CUT -7) TR0 += D7 * C22 + D6 * C21 + … 8 taps TR1 += D7 * C21 + D6 * C20 + … 8 taps ……….. TR14 += D7 * C8 + D6 * C7 2 taps TR15 += D7 * C7 1 taps
23
TR15:0 = XCORRS(R7:4, THR3:0) TR0 += D7 * C22 + D6 * C21 +… 8 taps TR1 += D7 * C21 + D6 * C20 +… 8 taps ……….. TR15 += D7 * C7 + D6 * C6 + … 8 taps 64 taps each cycles – on both x and y compute blocks – if set up properly 128 taps each cycle – these are “complex taps” compared to 2 real taps / cycle after lab. 3
25
Problem at this point -- THR3:2 empty Need to bring in more PRN values
26
TR15:0 = XCORRS(R7:4, THR3:0) (CUT +15) TR0 += 0 … 0 taps TR1 += D0 *C14 1 taps ……….. TR7 += D6 * C14 + D5 * C13 + … 7 taps TR0 += D7 * C14 + D6 * C13 + … 8 taps ……….. TR15 += D7 * C7 + D6 * C7 + … 8 taps
28
Final Result Maximum correlation occurs every 3 shifts – which is what we expect Is it the correct results
29
Correlation – result expected In step -1 +0j, 1 + 0j, 1 + 0j, … 16 times with -1 - j, 1 + j, 1 + j, … 16 times -1 * -1 + 1 * 1 + 1 * 1 + 48 = 0x30 -- Real component Out of step -1 +0j, 1 + 0j, 1 + 0j, … 16 times with 1 + j, 1 + j, -1 - j, … 16 times -1 * 1 + 1 * 1 + 1 * -1 + -16 = -0x10 = 0xFFF0
30
Final Result 1) Now have correlation values for 16 shifts in TR registers – store to external memory Repeat for all other necessary shifts – find the maximum 2) Now make parallel in SISD mode 3) Now make parallel in SIMD
31
Take home Quiz 4 Old requirement Do Lab 4 with FFT and XCORRS Write tests and demonstrate XCORRS used for correlation a)Not parallel instruction format – but in a loop b) Now do in optimized SISD mode c) Now do in optimized SIMD mode
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.