Squish-DSP Application of a Project Management Tool to manage low-level DSP processor resources M. Smith, University of Calgary, Canada ucalgary.ca.

Slides:



Advertisements
Similar presentations
Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
Advertisements

Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
Overview of Popular DSP Architectures: TI, ADI, Motorola R.C. Maher ECEN4002/5002 DSP Laboratory Spring 2003.
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
Boot Issues Processor comparison TigerSHARC multi-processor system Blackfin single-core.
Systematic development of programs with parallel instructions SHARC ADSP2106X processor M. Smith, Electrical and Computer Engineering, University of Calgary,
Systematic development of programs with parallel instructions SHARC ADSP2106X processor M. Smith, Electrical and Computer Engineering, University of Calgary,
Building a simple loop using Blackfin assembly code M. Smith, Electrical and Computer Engineering, University of Calgary, Canada.
Process for changing “C-based” design to SHARC assembler ADDITIONAL EXAMPLE M. R. Smith, Electrical and Computer Engineering University of Calgary, Canada.
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
Software and Hardware Circular Buffer Operations First presented in ENCM There are 3 earlier lectures that are useful for midterm review. M. R.
ENCM 515 Review talk on 2001 Final A. Wong, Electrical and Computer Engineering, University of Calgary, Canada ucalgary.ca.
CACHE-DSP Tool How to avoid having a SHARC thrashing on a cache-line M. Smith, University of Calgary, Canada B. Howse, Cell-Loc, Calgary, Canada Contact.
Generation of highly parallel code for TigerSHARC processors An introduction This presentation will probably involve audience discussion, which will create.
Instruction Level Parallelism (ILP) Colin Stevens.
Generation of highly parallel code for 2106X processors An introduction Developed by M. R. Smith Presented by S. Lei SHARC2000 Workshop, Boston, September.
TigerSHARC processor General Overview. 6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 2 Concepts tackled Introduction to.
Ultra sound solution Impact of C++ DSP optimization techniques.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
Understanding the TigerSHARC ALU pipeline Determining the speed of one stage of IIR filter – Part 3 Understanding the memory pipeline issues.
Averaging Filter Comparing performance of C++ and ‘our’ ASM Example of program development on SHARC using C++ and assembly Planned for Tuesday 7 rd October.
Understanding the TigerSHARC ALU pipeline Determining the speed of one stage of IIR filter – Part 2 Understanding the pipeline.
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.
Efficient Loop Handling for DSP algorithms on CISC, RISC and DSP processors M. Smith, Electrical and Computer Engineering, University of Calgary, Alberta,
Systematic development of programs with parallel instructions SHARC ADSP21XXX processor M. Smith, Electrical and Computer Engineering, University of Calgary,
Building a simple loop using Blackfin assembly code If you can handle the while-loop correctly in assembly code on any processor, then most of the other.
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
Instruction Scheduling for Instruction-Level Parallelism
TigerSHARC processor General Overview.
Introduction to Test Driven Development
Program Flow on ADSP2106X SHARC Pipeline issues
Overview of SHARC processor ADSP and ADSP-21065L
The planned and expected
Overview of SHARC processor ADSP Program Flow and other stuff
Comparing 68k (CISC) with 21k (Superscalar RISC DSP)
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
M. R. Smith, University of Calgary, Canada ucalgary.ca
Comparing 68k (CISC) with 21k (Superscalar RISC DSP)
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
Control unit extension for data hazards
Moving Arrays -- 2 Completion of ideas needed for a general and complete program Final concepts needed for Final DMA.
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
-- Tutorial A tool to assist in developing parallel ADSP2106X code
Single Value Processing Multi-Threaded Process
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
Moving Arrays -- 2 Completion of ideas needed for a general and complete program Final concepts needed for Final DMA.
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
* M. R. Smith 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint.
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
Concept of TDD Test Driven Development
Explaining issues with DCremoval( )
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
Tutorial on Post Lab. 1 Quiz Practice for parallel operations
Overview of SHARC processor ADSP-2106X Compute Operations
Building a simple loop using Blackfin assembly code
Overview of SHARC processor ADSP-2106X Compute Operations
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
Overview of SHARC processor ADSP-2106X Memory Operations
Understanding the TigerSHARC ALU pipeline
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
Working with the Compute Block
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
* M. R. Smith 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint.
Presentation transcript:

Squish-DSP Application of a Project Management Tool to manage low-level DSP processor resources M. Smith, University of Calgary, Canada ucalgary.ca

Squish-DSP Tool 2/28 Series of Talks and Workshops CACHE-DSP – Talk on a simple process tool to identify cache conflicts in DSP code. SQUISH-DSP – Talk on using a project management tool to automate identification of parallel DSP processor instructions. SHARC Ecology 101 – Workshop showing how to systematically write parallel 2106X code. SHARC Ecology 201 – Workshop on SQUISH-DSP and CACHE-DSP tools.

Squish-DSP Tool 3/28 Scope of Talk Overview of hand optimization of code Paradigm shift in microprocessor resource scheduling Project Management Tool Application Translating ‘microprocessor’ language into a ‘business’ format Examples and limitations Better optimization from VisualDSP code Future directions

Squish-DSP Tool 4/28 Standard “C” code void Convert(float *temperature, int N) { int count; for (count = 0; count < N; count++) { *temperature = (*temperature) * 9 / ; temperature++ }

Squish-DSP Tool 5/ X-style load/store “C” code void Convert( register float *temperature, register int N ) { register int count; register float *pt = temperature; // Ireg <- Dreg register float scratch; for (count = 0; count < N; count++) { scratch = *pt; scratch = scratch * (9 / 5); scratch = scratch + 32; // Order of Ops *pt = scratch; pt++; }

Squish-DSP Tool 6/28 Check on required register use #define count scratchR1 #define pt scratchDMpt #define scratchF2 F2 LCNTR = INPAR2, DO LOOP_END UNTIL LDE: scratchF2 = dm(pt, zeroDM); Any special requirements here on F2?? // INPAR1 (R4) is dead -- can reuse #define constantF4 F4// Must be float constantF4 = 1.8; scratchF2 = scratchF2 * constantF4 Fn = F(0,1,2 or 3) * F(4,5,6 or 7), #define F0_32 F0// Must be float F0_32 = 32.0; scratchF2 = scratchF2 + F0_32; Fm = F(8, 9, 10 or 11) + F(12, 13, 14 or 15) LOOP_END:dm(pt, plus1DM) = scratchF2;

Squish-DSP Tool 7/28 Resource Chart -- Basic code

Squish-DSP Tool 8/28 Unroll the loop -- 5 times here

Squish-DSP Tool 9/28 Parallelism causes Register/Resource Conflicts SRC DEST

Squish-DSP Tool 10/28 c Unroll the loop a bit more

Squish-DSP Tool 11/28 Final code version

Squish-DSP Tool 12/28 Real Life is not made up of ‘short loops’ Probably using DSP-intelligent compiler as a starting point Longer loops -- more tasks to make parallel Many different opportunities for task ordering Complicated resource management and register dependency issues Need a tool to help get the product ‘out the door’

Squish-DSP Tool 13/28 Business Management Tool One evening went looking for a ‘tree’ program to manage the scheduling of microprocessor resources. In frustration, decided to take the 2106X tasks and put them into Microsoft Project. By mistake, found that I had developed a very useful microprocessor management tool, especially with the MS Project GUI! Question -- how to get it to function in a systematic manner?

Squish-DSP Tool 14/28 MS Project -- 21XXX processor Requires a paradigm shift Business project concept -- One person can’t be doing two tasks in the same time slot. Becomes one data bus can’t be transferring two data items at same time Handled by identifying the ‘processor resources’ needed to complete each ‘basic task’.

Squish-DSP Tool 15/28 MS Project -- 21XXX processor Business project concept. If you delay building a wall (Task A), then you must delay painting it (Task B) HOWEVER If you build the wall earlier, you could paint it earlier, but you don’t have to. Might make more sense to delay Task B so that Task C can be done earlier since doing Task C allows Task D to be completed in parallel with Task B so that the whole project is finished earlier.

Squish-DSP Tool 16/28 Simple Example 1) F6 = dm(I4, M4); 10) F1 = F2 * F4, F8 = F8 + F12, F12 = pm(I12, M12); 16) F5 = F3 * F6, F8 = F8 + F12, F12 = pm(I12, M12); Might be able to move Task 1 in parallel with any instruction 2 through 15 BUT not in parallel with 16 If Task 10 moves earlier, so can Task 16, BUT not before Task 10 In Task 10 ‘F12=….’ can be made parallel with ‘F6=….’, BUT Task 10 ‘F8=….’ can’t!

Squish-DSP Tool 17/28 SquishDSP -- parser 1) F6 = dm(I4, M4); 10) F1 = F2 * F4, F8 = F8 + F12, F12 = pm(I12, M12); 16) F5 = F3 * F6, F8 = F8 + F12, F12 = pm(I12, M12); Task 16 split into 3 atomic tasks F12 = pm(I12, M12) -- PMBUS resource, must come after ‘F12=…’ from Task 10, and after ‘F8=…’ in current Task F8 = F8 + F12 -- ALU resource, must come after ‘F8=…’ and ‘F12=…’ from Task 10 F5 = F3 * F6 -- MULTIPLIER resource, must come after ‘F6=…’ from Task 1

Squish-DSP Tool 18/28 Preparation for Microsoft Project.asm Code broken up into sub-tasks with intra and inter dependencies recognized Reformatted as Microsoft Project Text file Rescheduled within Microsoft Project, either automatically or using GUI interface Reformatted as.asm code with increased parallelism

Squish-DSP Tool 19/28 Example GUI screen capture INSTR. Broken into ATOMIC TASKS ATOMIC TASKS showing RESOURCE and DEPENDENCIES ATOMIC TASKS with RESOURCE CONFLICTS

Squish-DSP Tool 20/28 Task scheduling after ‘LEVELING’

Squish-DSP Tool 21/28 Initial ‘C’ code

Squish-DSP Tool 22/28 Code from ‘Visual-DSP’ VisualDSP unrolled loop 3 times

Squish-DSP Tool 23/28 Code from SQUISH-DSP 12 VisualDSP cycles squished to 8

Squish-DSP Tool 24/28 Final version of code (loop change)

Squish-DSP Tool 25/28 Final SQUISH 12 VisualDSP cycles squished to 6

Squish-DSP Tool 26/28 Advantages and Limitations Current version intended to handle the inner critical loop of algorithm Not handling ‘Cache’ conflicts Not optimized for instructions in delay slots in jumps and conditional jumps Not optimized for multiple DAG delays e.g. I4 = …. ; DM(I4, M2) = ; I5 =… Moving to ‘task profile management’ macros with Primavera PV3 Tool

Squish-DSP Tool 27/28 Conclusion SquishDSP is a prototype scheduling tool to identify and reschedule microprocessor resource operations in parallel Already useful in current form for ‘inner DSP loops’ Microsoft Project used for concept work but Primavera PV3 tool offers more long term promise

Squish-DSP Tool 28/28 Acknowledgements Financial support of Natural Sciences and Engineering Research Council (NSERC) of Canada and University of Calgary Financial support from Analog Devices. Dr. Mike Smith is ADI University Professor 2001/2002 Future financial support from Alberta Provincial Government through Alberta Software Engineering Research Consortium (ASERC)