Download presentation
Presentation is loading. Please wait.
1
DMA example Video image manipulation
2
Video , Copyright M. Smith, ECE, University of Calgary, Canada
Problem to solve Build video images in SDRAM Scale all the images (increase grey scale by a fixed scaling factor) Determine whether is more efficient to Work using the images in SDRAM Bring images from SDRAM (using DMA), scale them, then put back Using a multi-threaded version of task 2 Multiplication and Division issues Some possible Q9 areas for the final Video , Copyright M. Smith, ECE, University of Calgary, Canada 4/6/2019
3
Video image Blanking information
Frame 1 - luminance + colour information Blanking information Frame 2 - luminance + colour information Blanking information Have ability to manipulate frame information with touching blanking information Video , Copyright M. Smith, ECE, University of Calgary, Canada 4/6/2019
4
Frame information Pixel 1 uses G1 + CB1 + CR1
Image brightness decreasing Video , Copyright M. Smith, ECE, University of Calgary, Canada 4/6/2019
5
Video , Copyright M. Smith, ECE, University of Calgary, Canada
Set up TEST Tasks done with DMA but occurring one after another Tasks done with DMA occurring at the same time as other tasks Video , Copyright M. Smith, ECE, University of Calgary, Canada 4/6/2019
6
3 threads – sequential Scaling intensity by 19
Video , Copyright M. Smith, ECE, University of Calgary, Canada 4/6/2019
7
Video , Copyright M. Smith, ECE, University of Calgary, Canada
Task being performed Note – out of order of instructions associated with C++ code Loop involves 1 read / 1 write + 2 operations not involving r / w memory which gives DMA operation some bus bandwidth to work with Video , Copyright M. Smith, ECE, University of Calgary, Canada 4/6/2019
8
Three threads in parallel Not the best solution?
Start first DMA transfer – wait Start second DMA transfer start doing math operation done in parallel Wait till second DMA done Transfer math results back – wait Start third DMA transfer Wait till third DMA done Video , Copyright M. Smith, ECE, University of Calgary, Canada 4/6/2019
9
Video , Copyright M. Smith, ECE, University of Calgary, Canada
Results of the tests Need to use “profiling of the code” to determine where the “waste of time now is” Video , Copyright M. Smith, ECE, University of Calgary, Canada 4/6/2019
10
Multiplication code – 16bit
Note – out of order of instructions associated with C++ code IS -- integer signed multiplication FS – fractional signed (form of block floating point) – on many processors Video , Copyright M. Smith, ECE, University of Calgary, Canada 4/6/2019
11
Multiplication details
Video , Copyright M. Smith, ECE, University of Calgary, Canada 4/6/2019
12
Multiplication possibilities
R1.L = R2.L * R3.L; // Using multiplier 0 R1.H = R2.H * R3.H; // Using multiplier 1 R1.L = R2.L * R3.L, R1.H = R2.H * R3.H; Using both multipliers in parallel R2 = [P0++]; R3 = [P1++]; [P2++] = R1; R1.L = R2.L * R3.L, R1.H = R2.H * R3.H || R4 = [P0++] || R5 = [I1++]; Video , Copyright M. Smith, ECE, University of Calgary, Canada 4/6/2019
13
Video , Copyright M. Smith, ECE, University of Calgary, Canada
Multiply and add test Video , Copyright M. Smith, ECE, University of Calgary, Canada 4/6/2019
14
Multiply and add result and code -- in SDRAM
3 cycle loop -- Note special MAC instruction A0 += R0.L * R1.L (IS) involves both an ADD and a multiplication MAC – multiply and accumulate Video , Copyright M. Smith, ECE, University of Calgary, Canada 4/6/2019
15
Video , Copyright M. Smith, ECE, University of Calgary, Canada
MAC syntax details Video , Copyright M. Smith, ECE, University of Calgary, Canada 4/6/2019
16
Hints at possible advantage
A0 += R2.L * R3.L, A1 -= R2.H * R3.H || R4 = [P0++] || R5 = [I1++]; Involves 2 multiplies Involves 4 adds -- A0 +=, A1+=, P0++ and I1++ Involves 2 memory reads MNOP || R2 = W[P0++] (X) || R3 = W[I1++] (X); // MNOP multiplier NOP P1 = 100 – 2 ; LSET (START, FINISH) LC1 = P1 >> 1; // Go round the loop 49 times START: A0 += R2.L * R3.L, A1 -= R2.H * R3.H || R4 = W[P0++] (X) || R5 = W[I1++] (X); FINISH: A0 += R4.L * R5.L, A1 -= R4.H * R5.H || R2 = W[P0++] (X) || R3 = W[I1++] (X); Using R2, R3 and then R4, R5 in an attempt to avoid pipeline issues May not be required – would have to examine pipeline viewer to see what happens FINAL EXAM REVIEW -- What is the syntax error? Video , Copyright M. Smith, ECE, University of Calgary, Canada 4/6/2019
17
Multiply and accumulate operation
Filter operation on 16-bit values sum = 0; for count = 0 to N – 1 sum = sum + value[count] * coeff[count]; sum = sum / N; Does not take much to overflow a signed sixteen-bit register value1 = 32000; value2 = 32000; value1 + value2 about as a signed 16-bit value value1 = value2 = 32000; coeff1 = coeff2 = 32000; value1 * coeff1 + value2 * coeff2 has overflowed as a 32-bit value Video , Copyright M. Smith, ECE, University of Calgary, Canada 4/6/2019
18
Multiply and accumulate operation – solving the problem
Does not take much to overflow a signed sixteen-bit register value1 = 32000; value2 = 32000; value1 + value2 about as a signed 16-bit value value1 = value2 = 32000; coeff1 = coeff2 = 32000; value1 * coeff1 + value2 * coeff2 has overflowed as a 32-bit value Take all input values and divide by N will guarantee that the sum of N values will not overflow the number representation – but does not give accurate answer – what if input 32000, 32000, today but 1, 3, 5, 7, tomorrow? Use a special 40 bit register for storing the sum. Makes it less likely to cause an overflow. Do theoretical calculation to determine how many bits are needed to store accurate answer Video , Copyright M. Smith, ECE, University of Calgary, Canada 4/6/2019
19
Video , Copyright M. Smith, ECE, University of Calgary, Canada
Mult 16 x 16 To give 32 bits Adder is 40 bits Accumulator is 40 bits Video , Copyright M. Smith, ECE, University of Calgary, Canada 4/6/2019
20
Example – filter 100 values in only 50 instructions
.section data1 .byte2 array[100], coeffs[100]; P0.H = hi(array); P0.L = lo(array); I1 = hi(coeff); I1 = lo(coeff); MNOP || R2 = W[P0++] (X) || R3 = W[I1++] (X); // MNOP multiplier NOP P1 = ; LSET (START, FINISH) LC1 = P1 >> 1; // Go round 49 times START: A0 += R2.L * R3.L, A1 -= R2.H * R3.H || R4 = [P0++] (X) || R5 = [I1++] (X); FINISH: A0 += R4.L * R5.L, A1 -= R4.H * R5.H || R2 = [P0++] (X) || R3 = [I1++] (X); R0.L = (A0 += R2.L * R3.L), R0.H = (A1 -= R2.H * R3.H); R0.L = R0.L + R0.H (NS); Video , Copyright M. Smith, ECE, University of Calgary, Canada 4/6/2019
21
Video , Copyright M. Smith, ECE, University of Calgary, Canada
Convert the following code using parallel instructions and ensuring maximum accuracy #define N 1024 .section data byte2 array[N]; // Convert the code #define N 1024 // short array[N]; // short CalculateAverage( ) { // Determine sum; // return average Video , Copyright M. Smith, ECE, University of Calgary, Canada 4/6/2019
22
Option for doing multiplication
R0 = R1 * R2; bit Mimics C++ multiplication User must make sure that multiplication does not overflow 32-bits – no flags on error R0.L = R1.L * R2.H (mode); 16 bit Default – signed fraction IS -- integer signed IU -- integer unsigned Uses A0 and A1 multipliers Video , Copyright M. Smith, ECE, University of Calgary, Canada 4/6/2019
23
Video , Copyright M. Smith, ECE, University of Calgary, Canada
Warning -- For more details see article When = 2; but 2 * 2 ! = 4; Published in Circuit Cellar magazine Link available from December 415 web-page Sounds like a good Q9 to me for the final if you add some more details Video , Copyright M. Smith, ECE, University of Calgary, Canada 4/6/2019
24
Addition and multiplication on Blackfin
If R0 = 0x – then what is result of R0.L = 0xFFFF, and why? Math question what is result of 0.1 * * 10-2? Express the answer in the format 0.XYZ * 10-2 Math question what is result of 0.1 * * 0.2* 10-2? Express the answer in the format 0.XYZ * 10-2 R0.L = 0x6; R1.L = 0x7; What is result of R2.L = R0.L + R1.L (NS); and why? Treated as a 2’s complement number Treated as a signed fractional number (format R0.L = 6 * 2-31) What is result of R2.H = R0.L * R1.L; and why? What is result of R2.H = R0.L * R1.L (IS); and why? Video , Copyright M. Smith, ECE, University of Calgary, Canada 4/6/2019
25
Other “multiplication” types
Multiply by 2 or 4 R0 = (R1 + R2) << 1: (or << 2) (or Pn) P0 = P1 + (P2 << 1); (or << 2) P only Useful when using P2 as the index in a loop Multiply by 1/2 , 1/4, 1/8, 1/2N R0 >>=3; divide by 8 (R0 unsigned number) 0x / 8 = 0x (unsigned (+ve) number) R0 >>>= 3; divide by 8 (R0 signed number) 0x / 8 = 0xE (negative number) R0 = ASHIFT R1 BY -3; (negative divide, +ve mult) Video , Copyright M. Smith, ECE, University of Calgary, Canada 4/6/2019
26
Video , Copyright M. Smith, ECE, University of Calgary, Canada
Division Fast divide by 2 , 4, 8, 2N using shift R0 >>=3; divide by 8 (R0 unsigned number) 0x / 8 = 0x (unsigned (+ve) number) R0 >>>= 3; divide by 8 (R0 signed number) 0x / 8 = 0xE (negative number) R0 = ASHIFT R1 BY -3; (negative divide, +ve mult) More flexible using DIVS and DIVQ Slow – must be performed in a loop Example code 70 / 5 Video , Copyright M. Smith, ECE, University of Calgary, Canada 4/6/2019
27
Video , Copyright M. Smith, ECE, University of Calgary, Canada
Code example -- P10-25 .global _DivideASM; _DivideASM: R0 = 70; // Divide(70, 5); R1 = 5; P0 = 15; // Evaluate quotentient to 15 bits (loop info) R0 <<= 1; // Book says "needed for integer division" DIVS(R0, r1); // Determines MSB of quotient LOOP .div_prim lc0 = P0; LOOP_BEGIN .div_prim; DIVQ(R0, R1); DIFFERENT LOOP SYNTAX LOOP_END .div_prim; R0 = R0.L(X); RTS; Video , Copyright M. Smith, ECE, University of Calgary, Canada 4/6/2019
28
Video , Copyright M. Smith, ECE, University of Calgary, Canada
Problem to solve Build video images in SDRAM Scale all the images (increase grey scale by a fixed scaling factor) Determine whether is more efficient to Work using the images in SDRAM Bring images from SDRAM (using DMA), scale them, then put back Using a multi-threaded version of task 2 Multiplication and Division issues Video , Copyright M. Smith, ECE, University of Calgary, Canada 4/6/2019
29
Video , Copyright M. Smith, ECE, University of Calgary, Canada
Information taken from Analog Devices On-line Manuals with permission Information furnished by Analog Devices is believed to be accurate and reliable. However, Analog Devices assumes no responsibility for its use or for any infringement of any patent other rights of any third party which may result from its use. No license is granted by implication or otherwise under any patent or patent right of Analog Devices. Copyright Analog Devices, Inc. All rights reserved. Video , Copyright M. Smith, ECE, University of Calgary, Canada 4/6/2019
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.