Download presentation
Presentation is loading. Please wait.
1
Specialized Video (8-bit) and Vector (16-bit) Instructions on the Blackfin There is always a “MAKE-UP-YOUR-QUESTION-AND-ANSWER-IT” Question on a Dr. Smith Final. Must be at an appropriate level for a third year course. Expand on these ideas for Q9 question and answer on the final
2
6/2/2015Video, Copyright M. Smith, ECE, University of Calgary, Canada 2 / 22 Problem to solve Using the video capability on Blackfin Specialized instructions Examine and explain examples in detail (working program) for Q9 format for final. Take something we have done in a laboratory and vectorize it for example Vectorize – take a 32-bit set of data operations and demonstrate the same concept with 8-bit data – but doing 4 operations at the same time.
3
6/2/2015Video, Copyright M. Smith, ECE, University of Calgary, Canada 3 / 22 Blackfin Evaluation Board 8-bit values
4
6/2/2015Video, Copyright M. Smith, ECE, University of Calgary, Canada 4 / 22 Getting video information in DMA activity Values coming in using PPI (Parallel Port Interface) which shares many of its pins with PF Stored video information in SDRAM BF561 – 2 core Blackfin, handles both video in and out at the same time Possible Q9 question – one whole chapter in Hardware book on this area Video library for Blackfin developed in 2004 by Swiss International Internship students www.enel.ucalgary.ca/People/Smith/ECE-ADI- Project/CourseInfo/VideoCourseInfoFrame.htm
5
6/2/2015Video, Copyright M. Smith, ECE, University of Calgary, Canada 5 / 22 Special 8 bit ALUS
6
6/2/2015Video, Copyright M. Smith, ECE, University of Calgary, Canada 6 / 22 Video image Blanking information Frame 1 - luminance + colour information Blanking information Frame 2 - luminance + colour information Blanking information Have ability to manipulate frame information without touching blanking information
7
6/2/2015Video, Copyright M. Smith, ECE, University of Calgary, Canada 7 / 22 Frame information Pixel 1 uses G1 + CB1 + CR1 Pixel 2 uses G2 + CB1 + CR1 Pixel 3 uses G3 + CB3 + CR3 Pixel 4 uses G4 + CB3 + CR3 CB1G1CR1G2CB3G3CR3G4CB5G5CR5G6 Image brightness decreasing
8
6/2/2015Video, Copyright M. Smith, ECE, University of Calgary, Canada 8 / 22 Frame information R0 = [P0] brings in information on pixel 1 and 2 intensity and colour One memory access, 2 pixel info One memory access, 4 pixel intensity information CB1G1CR1G2CB3G3CR3G4CB5G5CR5G6 G1G2G3G4G5G6G7G8
9
6/2/2015Video, Copyright M. Smith, ECE, University of Calgary, Canada 9 / 22 BYTEOP16P (Quad 8-bit ADD) Four 8-bit ADDs in a single cycle 8 pixel values stored in R1 and R0 -- 2 registers used at the same time I0 register used to select which 4 pixel values used in operations – called “byte alignment” I0 = 0; use all bytes in R0 I0 = 1; Lowest byte in R1, top 3 in R0 8 pixel value stored in R3 and R2 I1 used to select the 4 pixels used If I0=1, I1 = 1; and (R4, R6) = byteop16p(R3:2, R1:0) ; // sum -- 6 registers at the same time then we got 4 16-bit answers -- “my byte notation” R4.H = R3.B0 + R1.B0; // Bottom byte R4.L = R2.B3 + R0.B3; // Highest byte R6.H = R2.B2 + R0.B2; // Next highest byte R6.L = R2.B1 + R0.B1; // Next highest byte
10
6/2/2015Video, Copyright M. Smith, ECE, University of Calgary, Canada 10 / 22 BYTEOP1P (Quad 8-bit Average) Four 8-bit ADDs + average in a single cycle 8 pixel values stored in R1 and R0 I0 register used to select which 4 pixel values used in operations – called “byte alignment” I0 = 0; use all bytes in R0 I0 = 1; Lowest byte in R1, top 3 in R0 8 pixel value stored in R3 and R2 I1 used to select the 4 pixels used If I0=1, I1 = 1; and R4 = byteop1p(R3:2, R1:0) ; // sum and average then we got 4 16-bit answers R4.B3 = (R3.B0 + R1.B0) / 2; // Bottom byte R4.B2 = (R2.B3 + R0.B3) / 2; // Highest byte R4.B1 = (R2.B2 + R0.B2) / 2; // Next highest byte R4.B0 = (R2.B1 + R0.B1) / 2; // Next highest byte
11
6/2/2015Video, Copyright M. Smith, ECE, University of Calgary, Canada 11 / 22 BYTEOP2P (Quad 8-bit Average Half word) Six 8-bit ADDs + average in a single cycle 8 pixel values stored in R1 and R0 I0 register used to select which 4 pixel values used in operations – called “byte alignment” I0 = 0; use all bytes in R0 I0 = 1; Lowest byte in R1, top 3 in R0 8 pixel value stored in R3 and R2 I1 used to select the 4 pixels used If I0=1, I1 = 1; and R4 = byteop2p(R3:2, R1:0) ; // sum and average then we got 4 16-bit answers R4.B2 = (R3.B0 + R1.B0 + R2.B3 + R0.B3) / 4; // Highest 4 bytes R4.B0 = (R2.B2 + R0.B2 + R2.B1 + R0.B1) / 4;
12
6/2/2015Video, Copyright M. Smith, ECE, University of Calgary, Canada 12 / 22 BYTEOP16M -- 4 subtracts in 1 cycle (Quad 8-bit SUBTRACT) 8 pixel values stored in R1 and R0 I0 register used to select which 4 pixel values used in operations – called “byte alignment” I0 = 0; use all bytes in R0 I0 = 1; Lowest byte in R1, top 3 in R0 8 pixel value stored in R3 and R2 I1 used to select the 4 pixels used If I0=1, I1 = 1; and (R4, R6) = byteop16M(R3:2, R1:0) ; // MINUS operation then we got 4 16-bit answers -- “my byte notation” R4.H = R3.B0 - R1.B0; // Bottom byte R4.L = R2.B3 - R0.B3; // Highest byte --- Video survelliance R6.H = R2.B2 - R0.B2; // Next highest byte R6.L = R2.B1 - R0.B1; // Next highest byte
13
6/2/2015Video, Copyright M. Smith, ECE, University of Calgary, Canada 13 / 22 SAA -- Quad 4-bit Subtract, Absolute and Accumulate Take the differences between 2 images R0 = [P0++]; -- 4 pixels – image 1 – reading from image 1 R1 = [I1++]; -- 4 pixels -- image 2 – reading from image 1 R2 = 0; -- sum of differences Loop N - 1 times -- Do a zero-overhead loop over all the images R2 = SAA(R1, R0) || R0 = [P0++] || R1 = [I1++]; R2 = SAA(R1, R0); -- Finish off the operations in the Blackfin pipeline Now demonstrate adding the 4 bytes together from R2 How to do efficiently – sounds like Q9 question to me R2 = SAA(R1, R0) || R0 = [P0++] || R1 = [I1++]; We have 4 subtracts down, 4 absolute values done and 8 pixel reads AND two pointer updates -- all in a single cycle – parallel instructions denoted by ||
14
6/2/2015Video, Copyright M. Smith, ECE, University of Calgary, Canada 14 / 22 Worked Examples
15
6/2/2015Video, Copyright M. Smith, ECE, University of Calgary, Canada 15 / 22 Vector operations Many of the operations give 16 bit results Example Quad 8-bit Add (R4, R6) = byteop16P(R3:2, R1:0) Now you want to add the results together R5 = R4 +|- R6; R4.H + R6.H with R4.L + R6.L R5.L = R5.H + R5.L (NS);
16
6/2/2015Video, Copyright M. Smith, ECE, University of Calgary, Canada 16 / 22 Vector operations R0 = R1 +|+ R0; R1.H + R0.H, R1.L + R0.L R0 = R1 +|+ R0 (co); R1.H + R0.L, R1.L + R0.H co – word order “cross over” Can be +|+, +|-, -|+ or -|-; R3 = R1 + R0, R4 = R1 – R0; 32 bit add and subtract at same time Must use same source registers R3 = R1 +|+ R0, R1 -|- R0; 2 16-bit adds and 2 16-bit subtracts at same time R3 = R1 +|+ R0, R1 -|- R0 (asr); 2 16-bit adds and 2 16-bit subtracts at same time – and then afterwards do an arithmetic shift right (add and average, subtract and average)
17
6/2/2015Video, Copyright M. Smith, ECE, University of Calgary, Canada 17 / 22 Vector operations Normal max instruction (32-bit) R0 = MAX(R1, R2); R0 is largest of R1 and R2 Ditto MIN(R1, R2); Vector max R0 = MAX(R1, R2) (V) R0.H is largest of R1.H and R2.H R0.L is largest of R0.L and R2.L
18
6/2/2015Video, Copyright M. Smith, ECE, University of Calgary, Canada 18 / 22 VIT_MAX – Compare and Select R0 = VIT_MAX(R1, R2); R1 = 0x23000002 R2 = 0x70000001 R2.H and R1.L are largest R0 = 0x70000002 A0.W = binary 10 indicating R2.H and R1.L was largest
19
6/2/2015Video, Copyright M. Smith, ECE, University of Calgary, Canada 19 / 22 Other neat vector operations Vector ABS R2 = ABS R1 (V); absolute values of R1.H and R1.L stored in R2 Vector arithmetic shift R2 = R1 >>> 3 (V) – 2 16-bit shifts Vector multiply and accumulate Vector Negate -- etc
20
6/2/2015Video, Copyright M. Smith, ECE, University of Calgary, Canada 20 / 22 Worked examples
21
6/2/2015Video, Copyright M. Smith, ECE, University of Calgary, Canada 21 / 22 Problem to solve Using the video capability on Blackfin Specialized instruction Examine and explain examples in detail (working program) for Q9 format for final. Take something we have done in a laboratory and vectorize it for example
22
6/2/2015Video, Copyright M. Smith, ECE, University of Calgary, Canada 22 / 22 Information taken from Analog Devices On-line Manuals with permission http://www.analog.com/processors/resources/technicalLibrary/manuals/ http://www.analog.com/processors/resources/technicalLibrary/manuals/ Information furnished by Analog Devices is believed to be accurate and reliable. However, Analog Devices assumes no responsibility for its use or for any infringement of any patent other rights of any third party which may result from its use. No license is granted by implication or otherwise under any patent or patent right of Analog Devices. Copyright Analog Devices, Inc. All rights reserved.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.