Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.

Slides:



Advertisements
Similar presentations
Microprocessor or Microcontroller Not just a case of “you say tomarto and I say tomayto” M. Smith, ECE University of Calgary, Canada.
Advertisements

Blackfin BF533 EZ-KIT Control The O in I/O Activating a FLASH memory “output line” Part 2.
Boot Issues Processor comparison TigerSHARC multi-processor system Blackfin single-core.
Blackfin BF533 EZ-KIT Control The O in I/O Activating a FLASH memory “output line” Part 2.
Daddy! -- Where do instructions come from? Program Sequencer controls program flow and provides the next instruction to be executed Straight line code,
6/2/2015 Labs in ENCM415. Laboratory 2 PF control, Copyright M. Smith, ECE, University of Calgary, Canada 1 Temperature Sensor Laboratory 2 Part 2 – Developing.
Thermal arm-wrestling Design of a video game using two programmable flags (PF) interrupts Tutorial on handling 2 Hardware interrupts from an external device.
Building a simple loop using Blackfin assembly code M. Smith, Electrical and Computer Engineering, University of Calgary, Canada.
Specialized Video (8-bit) and Vector (16-bit) Instructions on the Blackfin There is always a “MAKE-UP-YOUR-QUESTION-AND-ANSWER-IT” Question on a Dr. Smith.
Review of Blackfin Syntax Moves and Adds 1) What we already know and have to remember to apply 2) What we need to learn.
Lab. 2 – More details – Tasks 4 to 6 1. What concepts are you expected to understand after the Lab. 2 is finished? 2. How do you demonstrate that you have.
Software and Hardware Circular Buffer Operations First presented in ENCM There are 3 earlier lectures that are useful for midterm review. M. R.
Microprocessor or Microcontroller Not just a case of “you say tomarto and I say tomayto” M. Smith, ECE University of Calgary, Canada.
Core Timer Code Development How you could have done the Take- Home Quiz using a test driven development (TDD) approach.
HD44780 LCD programming From the Hardware Side Design and implementation details on the way to a valid SPI-LCD interface driver.
Specialized Video (8-bit) and Vector (16-bit) Instructions on the Blackfin Expand on these ideas for Q9 question and answer on the final.
A look at interrupts What are interrupts and why are they needed in an embedded system? Equally as important – how are these ideas handled on the Blackfin.
Understanding the Blackfin ADSP-BF5XX Assembly Code Format
A look at interrupts What are interrupts and why are they needed.
Microprocessor or Microcontroller Not just a case of “you say tomarto and I say tomayto” M. Smith, ECE University of Calgary, Canada.
TigerSHARC processor General Overview. 6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 2 Concepts tackled Introduction to.
Getting the O in I/O to work on a typical microcontroller Ideas of how to send output signals to the radio controlled car. The theory behind the LED controller.
Laboratory 1 – ENCM415 Familiarization with the Analog Devices’ VisualDSP++ Integrated Development Environment.
Blackfin BF533 EZ-KIT Control The O in I/O Activating a FLASH memory “output line” Part 2.
Getting the O in I/O to work on a typical microcontroller Activating a FLASH memory “output line” Part 1 Main part of Laboratory 1 Also needed for “voice.
Blackfin BF533 EZ-KIT Control The O in I/O
Microprocessor or Microcontroller Not just a case of “you say tomarto and I say tomayto” M. Smith, ECE University of Calgary, Canada.
HD44780 LCD programming From the Hardware Side
Developing a bicycle speed-o-meter Midterm Review.
A Play Core Timer Interrupts Acted by the Human Microcontroller Ensemble from ENCM511.
Generating “Rectify( )” Test driven development approach to TigerSHARC assembly code production Assembly code examples Part 1 of 3.
Blackfin Array Handling Part 1 Making an array of Zeros void MakeZeroASM(int foo[ ], int N);
A first attempt at learning about optimizing the TigerSHARC code TigerSHARC assembly syntax.
Building a simple loop using Blackfin assembly code If you can handle the while-loop correctly in assembly code on any processor, then most of the other.
“Lab. 5” – Updating Lab. 3 to use DMA Test we understand DMA by using some simple memory to memory DMA Make life more interesting, since hardware is involved,
Developing a bicycle speed-o-meter
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.
Software and Hardware Circular Buffer Operations
Generating the “Rectify” code (C++ and assembly code)
A Play Core Timer Interrupts
SPI Compatible Devices
Thermal arm-wrestling
DMA example Video image manipulation
The planned and expected
Trying to avoid pipeline delays
Generating a software loop with memory accesses
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
Lab. 2 – More details – Later tasks
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.
Moving Arrays -- 2 Completion of ideas needed for a general and complete program Final concepts needed for Final DMA.
Thermal arm-wrestling
Using Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
Moving Arrays -- 2 Completion of ideas needed for a general and complete program Final concepts needed for Final DMA.
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
Expand on these ideas for Q9 question and answer on the final
Thermal arm-wrestling
Concept of TDD Test Driven Development
Explaining issues with DCremoval( )
General Optimization Issues
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
DMA example Video image manipulation
Developing a bicycle speed-o-meter
Independent timers build into the processor
Developing a bicycle speed-o-meter
Developing a bicycle speed-o-meter
Thermal arm-wrestling
Building a simple loop using Blackfin assembly code
Understanding the TigerSHARC ALU pipeline
A first attempt at learning about optimizing the TigerSHARC code
A first attempt at learning about optimizing the TigerSHARC code
Presentation transcript:

Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

11/12/2015DMA, Copyright M. Smith, ECE, University of Calgary, Canada 2 / 29 Tackled today Declaring and initializing arrays off the stack – Review and a little bit of new Useful for background DMA tasks Useful for minimizing total memory used in non-general program Declaring arrays and variables on the stack – Review and a little bit of new Re-entrant code and thread safe Demonstrating memory to memory DMA

11/12/2015DMA, Copyright M. Smith, ECE, University of Calgary, Canada 3 / 29 Declaring fixed arrays in memory – not on the stack short foo_startarray[40]; short far_finalarray[40]; void HalfWaveRectifyASM( ) { // Take the signal from foo_startarray[ ] and rectify the signal // Half wave rectify – if > 0 keep the same; if < 0 make zero //Full wave rectify – if > 0 keep the same; if < 0 then abs value // Rectify startarray[ ] and place result in finalarray[ ] for (int count = 0; count < 40; count++) { if (foo_startarray[count] < 0) far_finalarray[count] = 0; else far_finalarray[count] = foo_startarray[count]; } The program code is the same – but the data part is not

11/12/2015DMA, Copyright M. Smith, ECE, University of Calgary, Canada 4 / 29 First attempt to get correct answer.section data1 Tells linker to place this stuff in memory map location data1.align 4 – adjust address to end in 0, 4, 8 or C We know processor works best when we start things on a boundary between groups of 4 bytes [N * 2] We need N short ints We know the processor works with address working in bytes. Therefore need N * 2 bytes sounds sensible

11/12/2015DMA, Copyright M. Smith, ECE, University of Calgary, Canada 5 / 29 “ wrong approach” – does not match with what C / C++ does with memory 20 bytes (16 bits) for N short value in C++ = N * 2 bytes

11/12/2015DMA, Copyright M. Smith, ECE, University of Calgary, Canada 6 / 29 “Correct approach was NOT what I expected” ASM Array with space for N long ints.var arrayASM[N]; better.byte4 arrayASM[N]; ASM Array with space for N short ints var arrayASM[N / 2]; better.byte2 arrayASM[N}; ASM Array with space for N chars var arrayASM[N / 4]; better.byte arrayASM[N];

11/12/2015DMA, Copyright M. Smith, ECE, University of Calgary, Canada 7 / 29 Better answer is “Look at the assembler manual”

11/12/2015DMA, Copyright M. Smith, ECE, University of Calgary, Canada 8 / 29 Improving what we did before Big warning – external array initialization occurs on “reload” of your program code and NOT on “restart” of your program code (WHY?) Understanding why this is true and why it is a problem will solve many issues when programming

11/12/2015DMA, Copyright M. Smith, ECE, University of Calgary, Canada 9 / 29

11/12/2015DMA, Copyright M. Smith, ECE, University of Calgary, Canada 10 / 29 When DMA might be useful -- Video manipulation Program Wait for picture 1 to come in – video-in Process picture 1 – lots of mathematics perhaps Wait for picture 1 to be transmitted – video out Spending a lot of time waiting rather than doing

11/12/2015DMA, Copyright M. Smith, ECE, University of Calgary, Canada 11 / 29 When DMA might be useful -- Double Buffering Program 1. Wait for picture 2 memory to fill – video-in 2. Picture 3 comes into memory – background DMA task from input Process picture 2 – place result into picture 0 location 3. Picture 4 comes into memory – background DMA task from input Process picture 3 – place result into picture 1 location Transmit picture 0 – background DMA task to output 4. Picture 0 comes into memory – background DMA task from input Process picture 4 – place result into picture 2 location Transmit picture 1– background DMA task to output 5. Picture 1 comes into memory – background DMA task from input Process picture 0 – place result into picture 3 location Transmit picture 2 – background DMA task to output 6. Picture 2 comes into memory – background DMA task from input Process picture 1 – place result into picture 4 location Transmit picture 3– background DMA task to output 7. REPEAT STEPS FOR EVER

11/12/2015DMA, Copyright M. Smith, ECE, University of Calgary, Canada 12 / 29 We are only going to look at a simple DMA task Normal code when trying to move data from one location to another Number of simple examples in Lab. 3 using SPI interface 1) P0  address of start_array[0]; 2) P1  address of final_array[0]; 3) R0  number of data items to be transferred needed to transfer 4) R1  How many values already transferred 5) R1 = 0; LOOP: 6) CC = R0 <= R1 7) IF CC JUMP DONE: 8) R2 = [P0++]; VERY BIG PIPELINE 9) [P1++] = R2; LATENCY ISSUES 10) JUMP LOOP; MANY INTERNAL PROCESSOR STALLS ON DATA BUS DONE: WHILE WAIT FOR R2 TO BE Must wait to Do something else READ, STORED and then TRANSMITTED INSTRUCTION BUS STALLS EVERY TIME THE CODE JUMPS -- LOSE 4 CYCLES

11/12/2015DMA, Copyright M. Smith, ECE, University of Calgary, Canada 13 / 29 We are only going to look at a simple DMA task DMA special hardware that works without the processor 1) DMA_source_address_register  address of start_array[0]; 2) DMA_destination_address_register  address of final_array[0]; 3) DMA_max_count_register  max-value needed to transfer 4) DMA_count_register  How many values already transferred R1 = 0; LOOP: CC = R0 <= R1 IF CC JUMP DONE: 5) DMA_enable = true R2 = [P0++]; DMA transfer happen in background [P1++] = R2; Miminized pipeline issues JUMP LOOP; DONE: Do something else Processor can do something else immediately while DMA hardware handles all the memory transfers WITHOUT PROCESSOR HELP.

11/12/2015DMA, Copyright M. Smith, ECE, University of Calgary, Canada 14 / 29 Write some tests so we know how to proceed -- Test 1 Is DMA useful when the arrays being moved are in the processor’s internal memory and placed on the stack as with this code

11/12/2015DMA, Copyright M. Smith, ECE, University of Calgary, Canada 15 / 29 Write some test so we know how to proceed -- Test 2 IS DMA useful when both the arrays are placed in external memory SDRAM is needed for large video images SDRAM -- MANY MEGS AVAILABLE SDRAM addresses hard- coded in this example

11/12/2015DMA, Copyright M. Smith, ECE, University of Calgary, Canada 16 / 29 Write some test so we know how to proceed -- Test 3 Most probable way to use DMA – Store video arrays in SLOW external memory Move to FAST internal memory for processing, put result back into external SDRAM addresses hard- coded in this example WAIL -- Can use compiler section (“SDRAM”) syntax

11/12/2015DMA, Copyright M. Smith, ECE, University of Calgary, Canada 17 / 29 Some results Code details later Compiler Debug Mode Compiler Release Mode L1  L1 Internal memory L1  L1 DMA DMA slower SDRAM  SDRAM external SDRAM  SDRAM DMA SDRAM  L1 DMA SDRAM  L1 DMA L1  SDRAM DMA

11/12/2015DMA, Copyright M. Smith, ECE, University of Calgary, Canada 18 / 29 Memory to memory move Debug Code

11/12/2015DMA, Copyright M. Smith, ECE, University of Calgary, Canada 19 / 29 Review for final A) What happened here? B) What happened here? C) What happened here? E) What happened here? F) Determine loop efficiency in terms of instructions in terms of cycles / read_write op D) Why did this happen?

11/12/2015DMA, Copyright M. Smith, ECE, University of Calgary, Canada 20 / 29 Answer questions ABCDEABCDE

11/12/2015DMA, Copyright M. Smith, ECE, University of Calgary, Canada 21 / 29 Review for final Internal memory to Internal memory F) Determine loop efficiency in terms of cycles / read_write op internal memory -> internal memory size was 300 Useful reads 300 Useful writes 300 Cycles 8748 as measured 8748 / 600 = Why not an exact number? Instructions in loop? 19 Total # of reads / write 9 / loop 2700 read / writes – around 3 cycles

11/12/2015DMA, Copyright M. Smith, ECE, University of Calgary, Canada 22 / 29 Review for final SDRAM to SDRAM F) Determine loop efficiency in terms of cycles / read_write op SDRAM external -> SDRAM memory Useful reads / writes 300 each Cycles as measured / 600 = Why not an exact number? Instructions in loop? 19 Total # of reads / write 9 / loop 7 * 300 read / writes internal 2 * 300 read / writes external Time r/w external = – 2100* / 600 = 5.5 cycles Factor of 2 slower

11/12/2015DMA, Copyright M. Smith, ECE, University of Calgary, Canada 23 / 29 Memory to memory move Release Mode

11/12/2015DMA, Copyright M. Smith, ECE, University of Calgary, Canada 24 / 29 Review for final A) What happened here? B) What happened here? C) What happened here? E) What happened here? F) Determine loop efficiency in terms of instructions in terms of cycles / read_write op D) Why did this happen inside loop?

11/12/2015DMA, Copyright M. Smith, ECE, University of Calgary, Canada 25 / 29 Answer questions ABCDEABCDE

11/12/2015DMA, Copyright M. Smith, ECE, University of Calgary, Canada 26 / 29 F) Determine loop efficiency in terms of cycles / read_write op internal memory -> internal memory size was 300 Useful reads 300 Useful writes 300 Cycles 625 as measured 625 / 600 = 1.05 Why not an exact number? Instructions in loop? * 4 = 1200 WE WOULD EXPECT 1200 cycles!!!! Where did the difference go? Release mode internal to internal

11/12/2015DMA, Copyright M. Smith, ECE, University of Calgary, Canada 27 / 29 F) Determine loop efficiency in terms of cycles / read_write op SDRAM -> internal memory size was 300 Useful reads 300 Useful writes 300 Cycles as measured / 600 = 47 SDRAM access 47 cycles L1 memory 1 cycle Would make sense to process in L1 memory – so move SDRAM to L1 to process Release mode external to external

11/12/2015DMA, Copyright M. Smith, ECE, University of Calgary, Canada 28 / 29 F) Determine loop efficiency in terms of cycles / read_write op SDRAM -> internal memory size was 300 Useful reads 300 Useful writes 300 Cycles 4836 as measured 300 of those are L1 writes Leaving / 300 = 15 SDRAM read before 47 cycles SDRAM read now 15 cycles L1 -> L1 1 cycle Would make sense to process in L1 memory – so move SDRAM to L1 to process Loads of overhead in SDRAM to SDRAM External to internal

11/12/2015DMA, Copyright M. Smith, ECE, University of Calgary, Canada 29 / 29 Tackled today Review of handling external arrays (global arrays) from assembly code Arrays declared in another file Arrays declared in this file -- NEW Needed for arrays used by ISRs Arrays declared on the stack Pointers passed as parameters to a subroutine Can’t use arrays on the stack when used by ISR

11/12/2015DMA, Copyright M. Smith, ECE, University of Calgary, Canada 30 / 29 Information taken from Analog Devices On-line Manuals with permission Information furnished by Analog Devices is believed to be accurate and reliable. However, Analog Devices assumes no responsibility for its use or for any infringement of any patent other rights of any third party which may result from its use. No license is granted by implication or otherwise under any patent or patent right of Analog Devices. Copyright  Analog Devices, Inc. All rights reserved.