Systematic development of programs with parallel instructions SHARC ADSP2106X processor M. Smith, Electrical and Computer Engineering, University of Calgary,

Systematic development of programs with parallel instructions SHARC ADSP2106X processor M. Smith, Electrical and Computer Engineering, University of Calgary, Canada smithmr @ ucalgary.ca This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during your presentation In Slide Show, click on the right mouse button Select “Meeting Minder” Select the “Action Items” tab Type in action items as they come up Click OK to dismiss this box This will automatically create an Action Item slide at the end of your presentation with your points entered.

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 2 / 44 -- two days To be tackled today What’s the problem? Standard Code Development of “C”-code Process for “Code with parallel instruction” Rewrite with specialized resources Move to “resource chart” Unroll the loop Adjust code Reroll the loop Check if worth the effort

ADSP-2106x -- Parallelism opportunities Ability for parallel memory operation, One each on pm, dm and instruction cache busses Memory pointer operations Post modify 2 index registers Automatic circular buffer operations Automatic bit reverse addressing Many parallel operations and register to register bus transfers Rn = Rx + Ry or Rn = Rx * Ry Rm = Rx + Ry, Rn = Rx - Ry with/without Rp = Rq * Rr Zero overhead loops Instruction pipeline issues Key issue -- Only 48? bits available in OPCODE to describe 16 data registers in 3 destinations and 6 sources = 135 bits 2 * (8 index + 8 modify + 16 data) = 64 bits Condition code selection, 32 bit constants etc.

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 4 / 44 -- two days Compiler is only -- somewhat useful See article in course notes from Embedded System Design Sept./October 2000 Need to get a systematic process to provide Parallelism without pain Need to know what to worry about and what not to Lab 3 -- Implement FIR filter in Parallel -- Help provided Lab. Library version of FFT, custom version of Burg Algorithm (AR modeling)

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 5 / 44 -- two days Basic code development -- any system Write the “C” code for the function void Convert(float *temperature, int N) which converts an array of temperatures measured in “Celsius” (Canadian Market) to Fahrenheit (American Market) Convert the code to ADSP 21061/68K etc. assembly code, following the standard coding and documentation practices

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 6 / 44 -- two days Parallel Instruction Code Development Write the 21k assembly code for the function void Convert(float *temperature, int N) which etc…... Determine the instruction flow through the architecture using a resource usage diagram Theoretically optimize the code -- a 2 minute counting process Compare and contrast the amount of time to perform the subroutine before and after customization.

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 7 / 44 -- two days Standard “C” code void Convert(float *temperature, int N) { int count; for (count = 0; count < N; count++) { *temperature = (*temperature) * 9 / 5 + 32; temperature++ } Standard Warning -- What does optimizing compiler do with 9 / 5 becomes 1 or 1.8?

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 8 / 44 -- two days Process for developing parallel code Rewrite the “C” code using “LOAD/STORE” techniques Accounts for the SHARC super scalar RISC DSP architecture Write the assembly code using a hardware loop Rewrite the assembly code using instructions that could be used in parallel you could find the correct optimization approach Move algorithm to “Resource Usage Chart” Optimize using techniques Compare and contrast time -- setup and loop

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 9 / 44 -- two days 21061-style load/store “C” code void Convert(register float *temperature, register int N) { register int count; register float *pt = temperature; register float scratch; for (count = 0; count < N; count++) { scratch = *pt; scratch = scratch * (9 / 5); scratch = scratch + 32; *pt = scratch; pt++; }

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 10 / 44 -- two days Process for developing parallel code Rewrite the “C” code using “LOAD/STORE” techniques Accounts for the SHARC super scalar RISC DSP architecture Write the assembly code using a hardware loop Check that end of loop label is in the correct place Rewrite the assembly code using instructions that could be used in parallel you could find the correct optimization approach Move algorithm to “Resource Usage Chart” Optimize using techniques Compare and contrast time -- setup and loop

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 11 / 44 -- two days Assembly code PROLOGUE Appropriate defines to make easy reading of code Saving of non-volatile registers BODY Try to plan ahead for parallel operations Know which 21k “multi-function” instructions are valid EPILOGUE Recover non-volatile registers

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 12 / 44 -- two days Straight conversion -- PROLOGUE // void Convert( reg float *temperature, reg int N ) {.segment/pm seg_pmco;.global _Convert; _Convert: // register int count = GARBAGE; #define countR1 scratchR1 //register float *pt = temperature; #define pt scratchDMpt pt = INPAR1; //float scratch = GARBAGE; #define scratchF2 F2 // For the CURRENT code -- no volatile // registers are needed -- may not remain true

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 13 / 44 -- two days Straight conversion of code //for (count = 0; count < N; count++) { LCNTR = INPAR2, DO LOOP_END - 1 UNTIL LCE: //scratch = *pt; scratchF2 = dm(0, pt);// Not ++ as pt re-used // scratch = scratch * (9 / 5); // INPAR1 (R4) is dead -- can reuse as F4 #define constantF4 F4// Must be float constantF4 = 1.8 // No division, Use register constant scratchF2 = scratchF2 * constantF4; // scratch = scratch + 32; #define F0_32 F0// Must be float F0_32 = 32.0; scratchF2 = scratchF2 + F0_32; // *pt = scratch; pt++; dm(pt, 1) = scratchF2; LOOP_END: 5 magic lines of code // NOT F0 = 32 gives F0 = 1 * 10 -45

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 14 / 44 -- two days Avoid this error LCNTR = INPAR2, DO LOOP_END UNTIL LCE: scratchF2 = dm(0, pt); scratchF2 = scratchF2 * constantF4; F0_32 = 32.0; scratchF2 = scratchF2 + F0_32; LOOP_END:dm(pt, 1) = scratchF2; INTENDED LAST LINE OF LOOP LCNTR = INPAR2, DO LOOP_END UNTIL LCE: scratchF2 = dm(0, pt); scratchF2 = scratchF2 * constantF4; F0_32 = 32.0; scratchF2 = scratchF2 + F0_32; dm(pt, 1) = scratchF2; LOOP_END:Rest of the code STILL LAST LINE OF LOOP First line of “rest of code” has now become part of loop

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 15 / 44 -- two days Process to avoid the error This particularly error is going to be very easy to make as the “Rest of the code” is going to look very similar to the “loop internals” once we have taken account of the ALU/FPU pipeline to maximize parallelism SUGGESTED APPROACH TO AVOID THIS TIME WASTING ERROR LCNTR = INPAR2, DO LOOP_END - 1 UNTIL LCE: scratchF2 = dm(0, pt); scratchF2 = scratchF2 * constantF4; F0_32 = 32.0; scratchF2 = scratchF2 + F0_32; dm(pt, 1) = scratchF2; LOOP_END:Rest of the code This was a process adopted from the compiler output -- the concept of a label was beyond most people in ENCM415

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 16 / 44 -- two days Process for developing parallel code Rewrite the “C” code using “LOAD/STORE” techniques Accounts for the SHARC super scalar RISC DSP architecture Write the assembly code using a hardware loop Check that end of loop label is in the correct place Rewrite the assembly code using instructions that could be used in parallel you could find the correct optimization approach. Means -- place values in appropriate registers to permit parallelism BUT don’t actually write the parallel operations at this point. Move algorithm to “Resource Usage Chart” Optimize using techniques (Attempt to) Compare and contrast time -- setup and loop

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 17 / 44 -- two days Speed rules for memory access scratch = dm(0, pt); scratch = dm(pt, 0);// Not ++ as to be re-used dm(pt, 1) = scratch; Use of constants as modifiers is not allowed -- not enough bits in the opcode -- need 32 bits for each constant Must use Modify registers to store these constants. Several useful constants placed in modify registers (DAG1 and DAG2) during “C-code” initialization (if linked in) scratch = dm(pt, zeroDM);// Not ++ as to be re-used dm(pt, plus1DM) = scratch; Can’t use PREMODIFY PERIOD Can’t use POST MODIFY OPERATIONS with CONSTANTS

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 18 / 44 -- two days Speed rules IF you want adds and multiplys to occur on the same line F1 = F2 * F3, F4 = F5 + F6; Want to do as a single instruction Not enough bits in the opcode Register description 4 + 4 + 4 + 4 + 4 + 4 (bits) Plus bits for describing math operations, conditions and memory ops? Fn = F(0, 1, 2 or 3) * F(4, 5, 6 or 7) Fm = F(8, 9, 10 or 11) + F(12, 13, 14 or 15) Must rearrange register usage with program code for this to be possible Register description 4 + 2 + 2 + 4 + 2 + 2 (bits) -- other bits “understood” Inconvenient rather than limiting e.g. F6 = F0 * F4, F7 = F8 + F12, F9 = F8 - F12; Not accepted F6 = F4 * F0, F7 = F8 + F12, F9 = F8 - F12; Not accepted F7 = F8 + F12, F9 = F8 - F12, F6 = F0 * F4;

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 19 / 44 -- two days When should we worry about the register assignment? #define count scratchR1 #define pt scratchDMpt #define scratchF2 F2 LCNTR = INPAR2, DO LOOP_END- 1 UNTIL LCE: scratchF2 = dm(pt, 0);// Not ++ as to be re-used // INPAR1 (R4) is dead -- can reuse #define constantF4 F4// Must be float constantF4 = 1.8; scratchF2 = scratchF2 * constantF4 #define F0_32 F0// Must be float F0_32 = 32.0; scratchF2 = scratchF2 + F0_32; dm(pt, 1) = F0_32; LOOP_END:

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 20 / 44 -- two days Check on required register use #define count scratchR1 #define pt scratchDMpt #define scratchF2 F2 LCNTR = INPAR2, DO LOOP_END - 1 UNTIL LCE: scratchF2 = dm(pt, zeroDM); Are there special requirements here on F2 -- becomes source later?? // INPAR1 (R4) is dead -- can reuse #define constantF4 F4// Must be float constantF4 = 1.8; scratchF2 = scratchF2 * constantF4 Fn = F(0,1,2 or 3) * F(4,5,6 or 7), #define F0_32 F0// Must be float F0_32 = 32.0; scratchF2 = scratchF2 + F0_32; Fm = F(8, 9, 10 or 11) + F(12, 13, 14 or 15) dm(pt, plus1DM) = scratchF2;

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 21 / 44 -- two days Register re-assignment -- Step 1 #define count scratchR1 #define pt scratchDMpt #define scratchF2 F2 -- OKAY LCNTR = INPAR2, DO LOOP_END - 1 UNTIL LCE: scratchF2 = dm(pt, zeroDM); // INPAR1 (R4) is dead -- can reuse #define constantF4// Must be float -- OKAY constantF4 = 1.8; scratchF2 = scratchF2 * constantF4 -- SOURCES okay here Fn = F(0,1,2 or 3) * F(4,5,6 or 7), #define F0_32 F0// Must be float F0_32 = 32.0; -- WRONG to use F0 here -- ADDITION scratchF2 = scratchF2 + F0_32; -- WRONG to use F2 as DEST early Fm = F(8, 9, 10 or 11) + F(12, 13, 14 or 15) dm(pt, plus1DM) = scratchF2; -- OKAY

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 22 / 44 -- two days Register re-assignment -- Step 2 #define count scratchR1 #define pt scratchDMpt #define scratchF2 F2 LCNTR = INPAR2, DO LOOP_END - 1 UNTIL LCE: scratchF2 = dm(pt, zeroDM); // INPAR1 (R4) is dead -- can reuse #define constantF4 F4// Must be float constantF4 = 1.8; scratchF8 = scratchF2 * constantF4 answer must be in F(8, 9, 10 or 11) #define F12_32 F12// INPAR3 is available F12_32 = 32.0; scratchF2 = scratchF8 + F12_32 ; Fm = F(8, 9, 10 or 11) + F(12, 13, 14 or 15) dm(pt, plus1DM) = scratchF2;

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 23 / 44 -- two days Fix poor coding practice -- “C” or assembly #define count scratchR1 #define pt scratchDMpt #define scratchF2 F2 LCNTR = INPAR2, DO LOOP_END - 1 UNTIL LCE: scratchF2 = dm(pt, zeroDM); // INPAR1 (R4) is dead -- can reuse #define constantF4 F4// Must be float constantF4 = 1.8; MOVE OUTSIDE LOOP scratchF8 = scratchF2 * constantF4 answer must be in F(8, 9, 10 or 11) #define F12_32 F12// INPAR3 is available F12_32 = 32.0; MOVE OUTSIDE LOOP scratchF2 = scratchF8 + F12_32 ; Fm = F(8, 9, 10 or 11) + F(12, 13, 14 or 15) dm(pt, plus1DM) = scratchF2;

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 24 / 44 -- two days Process for developing parallel code Rewrite the “C” code using “LOAD/STORE” techniques Accounts for the SHARC super scalar RISC DSP architecture Write the assembly code using a hardware loop Check that end of loop label is in the correct place Rewrite the assembly code using instructions that could be used in parallel you could find the correct optimization approach Means -- place values in appropriate registers to permit parallelism BUT don’t actually write the parallel operations at this point. Move algorithm to “Resource Usage Chart” Optimize using techniques (Attempt to) Compare and contrast time -- setup and loop

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 25 / 44 -- two days Resource Management -- Chart1 -- Basic code LOOPEND: -1 UNTIL LCE In theory -- if we could find out how *, + and dm in parallel DATA-BUS is limiting resource dm 2 cycle loop possible Before proceeding -- Is 2 cycle loop needed? Is 2 cycle loop enough?

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 26 / 44 -- two days Process for developing parallel code Rewrite the “C” code using “LOAD/STORE” techniques Accounts for the SHARC super scalar RISC DSP architecture Write the assembly code using a hardware loop Check that end of loop label is in the correct place Rewrite the assembly code using instructions that could be used in parallel you could find the correct optimization approach Means -- place values in appropriate registers to permit parallelism BUT don’t actually write the parallel operations at this point. Move algorithm to “Resource Usage Chart” Optimize parallelism using techniques Attempt to -- watch out for special situations where code will fail Compare and contrast time -- setup and loop

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 27 / 44 -- two days Un-roll the loop For various methods on “unrolling the loop” see papers by Jeanne Anne Booth Final Exam question -- What are relative advantages of the various techniques (with examples)?

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 28 / 44 -- two days Resource 2 -- unroll the loop -- 5 times here Each pass through the loop involves Read Multiply Add Write

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 29 / 44 -- two days Resource Management 3 -- identify resource usage during decode and writeback stages of each instructions Model used -- depends on where operands are relative to equals sign ‘Reading’ -- fetching things for ALU/FPU -- Like 68K decode phase ‘Writeback’ -- storing results from ALU/FPU THESE PHASES ARE ‘CONCEPTS’ RATHER THAN “ IMPLEMENTED’ Reading

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 30 / 44 -- two days Resource Management 4 Check what can be moved in parallel with other instructions OKAY TO MOVE F2 src freed up before F2 dest occurs OKAY TO MOVE Empty spot if can move * and + instructs which this instruction MUST follow NO !!! or just possible NO? Why a problem? F2 =

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 31 / 44 -- two days Memory resource availability Move up F2 = dm(pt, ZERODM) from second loop into first loop However now we have a possible conflict about which F2 should be used for the dm(pt, plus1DM) = F2 instruction if we further optimize by trying to fill the other empty delay slots -- see next slide

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 32 / 44 -- two days Resource management Overlapping two parts of the loop

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 33 / 44 -- two days Resource Management 5 -- What’s up, Doc? Attempting to fill all unused resource availability Why spend time on simulating algorithm to see if problem really exists when there is a simple solution -- use different registers Problem may/may not exist with this simple example but very likely to exist in more complex algorithm

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 34 / 44 -- two days Resource 6 -- Solution -- Save and then use F9

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 35 / 44 -- two days Resource Management 7 -- Some parallelism possible with Read, Mult, Add and Write mixed across 5 loop comps. Problem 1 -- No resource in maximum usage -- code in-efficient Problem 2 -- Worth about 50% on an exam question on parallelism. We have answered “Optimize the straight line code for a loop of the form ‘for count = 0, count <5’ “ -- What if loop size 2048 or more?

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 36 / 44 -- two days WRONG -- CONCEPT GOOD, IMPLEMENTATION BAD as we are no longer indexing correctly through the data. Problem 1 -- No resource in maximum usage -- code in-efficient Problem 2 -- Worth about 50% on an exam question on parallelism. We have answered “Optimize the straight line code for a loop of the form ‘for count = 0, count <5’ “ -- What if loop size 2048 or more?

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 37 / 44 -- two days Need 1 resource to be maxed out Otherwise algorithm is inefficient Have to try a lot of different approaches Here is my code

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 38 / 44 -- two days Resource Management 8 Unroll the loop a bit more -- 9 loop components DM BUS USAGE NOW MAXed OUT (after a while) CODE PATTERN APPEARING

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 39 / 44 -- two days Now to to “reroll the loop” The loop is currently just straight line coded. Must put back into the “loop format” for coding efficiency, maintainability and seg_pmco limitations. Three components of “rerolled loop” for loop of form “count = 0, count <N” Fill the ALU/FPU pipeline (typically 1 stage from loop) Overlap N - 2 stages Empty the ALU/FPU pipeline (typically 1 stage)

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 40 / 44 -- two days Resource Management 9 Identify the loop components LOOP BODY FILL ALU/FPU PIPE EMPTY ALU/FPU PIPELINE

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 41 / 44 -- two days Resource 9 -- Final code version -1 UNTIL LCE LOOPEND : FILL USE EMPTY ALU/FPU PIPE

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 42 / 44 -- two days Speed improvements BEFORE START LOOP EXIT ENTRY 4 + N*4 + 5 + 5 = 14 + 4 * N NOW with 2-fold loop unfolding START LOOP EXIT ENTRY 4 + 7+ (N – 2) * 5 / 2 + 5 + 8 + 5 = 24 + 2.5 * N NOW with 3-fold loop unfolding START LOOP EXIT ENTRY 4 + 5 + (N – 2) * 6 / 3 + 5 + 1 + 5 = 16 + 2 * N Factor of 4 / 2.5 with a little effort -- Factor of 4 /2 with more effort

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 43 / 44 -- two days Question to Ask We now know the final code Should we have made the substitution F2 to F9? Who cares -- do it anyway as more likely to be necessary rather than unnecessary in most algorithms! No real disadvantage since we can probably overlap the save and recovery of the non-volatile R9 with other instructions! Will the code work?

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 44 / 44 -- two days Resource 9 -- Final code version -1 UNTIL LCE LOOPEND : N = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 Only works if (N - 2) / 3 is an integer.

6/2/2015 ENCM515 -- Systematic development of parallel instructions on SHARC ADSP21061 Copyright smithmr@ucalgary.ca 45 / 44 -- two days Tackled today What’s the problem? Standard Code Development of “C”-code Process for “Code with parallel instruction” Rewrite with specialized resources Move to “resource chart” Unroll the loop Adjust code Reroll the loop Check if worth the effort To come -- Tutorial practice of parallel coding To come -- Optimum FIR filter with parallelism

Systematic development of programs with parallel instructions SHARC ADSP2106X processor M. Smith, Electrical and Computer Engineering, University of Calgary,

Similar presentations

Presentation on theme: "Systematic development of programs with parallel instructions SHARC ADSP2106X processor M. Smith, Electrical and Computer Engineering, University of Calgary,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Systematic development of programs with parallel instructions SHARC ADSP2106X processor M. Smith, Electrical and Computer Engineering, University of Calgary,

Similar presentations

Presentation on theme: "Systematic development of programs with parallel instructions SHARC ADSP2106X processor M. Smith, Electrical and Computer Engineering, University of Calgary,"— Presentation transcript:

Similar presentations

About project

Feedback