M. Smith Electrical and Computer Engineering University of Calgary,

Slides:



Advertisements
Similar presentations
A look at interrupts What are interrupts and why are they needed.
Advertisements

My attempt to multi-thread an audio talk-though program using batches of data M. Smith Electrical and Computer Engineering University of Calgary, Smithmr.
Boot Issues Processor comparison TigerSHARC multi-processor system Blackfin single-core.
Lab. 2 – More details – Tasks 4 to 6 1. What concepts are you expected to understand after the Lab. 2 is finished? 2. How do you demonstrate that you have.
TigerSHARC CLU Closer look at the XCORRS M. Smith, University of Calgary, Canada
6/3/20151 Developing a multi-thread product – Introduction (ENCM491 – real time operating systems in 1 hr) M. Smith Electrical Engineering, University.
A look at interrupts What are interrupts and why are they needed in an embedded system? Equally as important – how are these ideas handled on the Blackfin.
Lab. 2 Overview 1. What concepts are you expected to understand after the Lab. 2 is finished? 2. How do you demonstrate that you have that knowledge?
Squish-DSP Application of a Project Management Tool to manage low-level DSP processor resources M. Smith, University of Calgary, Canada ucalgary.ca.
TigerSHARC CLU Closer look at the XCORRS M. Smith, University of Calgary, Canada
7/14/20151 Introduction toVisual DSP Kernel VDK for Multi-threaded environment ENCM491 – Real Time (in 1 hour) M. Smith, Electrical and Computer Engineering,
Introduction To C++ Programming 1.0 Basic C++ Program Structure 2.0 Program Control 3.0 Array And Structures 4.0 Function 5.0 Pointer 6.0 Secure Programming.
Ultra sound solution Impact of C++ DSP optimization techniques.
PROCStar III Performance Charactarization Instructor : Ina Rivkin Performed by: Idan Steinberg Evgeni Riaboy Semestrial Project Winter 2010.
Lab. 2 Overview Move the tasks you developed in Lab. 1 into the more controllable TTCOS operating system Manual control of RC car.
Lab. 4 Demonstrating and understanding multi-processor boot TigerSHARC multi-processor system.
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.
Lab. 2 Overview. Echo Switches to LED Lab1 Task 7 12/4/2015 TDD-Core Timer Library, Copyright M. Smith, ECE, University of Calgary, Canada 2 / 28.
Mistakes, Errors and Defects. 12/7/2015Mistakes, Errors, Defects, Copyright M. Smith, ECE, University of Calgary, Canada 2 Basic Concepts  You are building.
Multi-threaded projects Services and Drivers Alternate ways of doing Labs 1, 2, 3 and 4.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
– BlackAn – The Blackfin Analyzer by Jacob Zurasky and Paul Deffenbaugh.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
Building a simple loop using Blackfin assembly code If you can handle the while-loop correctly in assembly code on any processor, then most of the other.
“Lab. 5” – Updating Lab. 3 to use DMA Test we understand DMA by using some simple memory to memory DMA Make life more interesting, since hardware is involved,
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
Low-power Digital Signal Processing for Mobile Phone chipsets
Developing a multi-thread Simulation of GPS system You’ll only need to add the threads – all functions (except correlation( )) provided M. Smith Electrical.
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.
General Optimization Issues
TigerSHARC processor General Overview.
2P13 Week 2.
Developing a multi-thread product -- Introduction
Thermal arm-wrestling
Trying to avoid pipeline delays
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
Blackfin Volume Control
Developing a multi-thread product -- Introduction
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
TigerSHARC processor and evaluation board
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.
Lecture Topics: 11/1 General Operating System Concepts Processes
Moving Arrays -- 2 Completion of ideas needed for a general and complete program Final concepts needed for Final DMA.
Lab. 4 – Part 1 Demonstrating and understanding multi-processor boot
Single Value Processing Multi-Threaded Process
Moving Arrays -- 2 Completion of ideas needed for a general and complete program Final concepts needed for Final DMA.
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
* M. R. Smith 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint.
File-System Structure
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
EE 472 – Embedded Systems Dr. Shwetak Patel.
Getting serious about “going fast” on the TigerSHARC
General Optimization Issues
Ainsley Smith Tel: Ex
Concept of TDD Test Driven Development
Explaining issues with DCremoval( )
Lab. 4 – Part 2 Demonstrating and understanding multi-processor boot
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
Jacob R. Lorch and Alan Jay Smith University of California, Berkeley
Independent timers build into the processor
Lab. 2 Overview Move the tasks you developed in Lab. 1 into the more controllable TTCOS operating system.
Overview of SHARC processor ADSP-2106X Memory Operations
Understanding the TigerSHARC ALU pipeline
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
Mistakes, Errors and Defects
A first attempt at learning about optimizing the TigerSHARC code
Working with the Compute Block
ADSP 21065L.
COT 5611 Operating Systems Design Principles Spring 2014
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Presentation transcript:

My attempt to multi-thread an audio talk-though program using batches of data M. Smith Electrical and Computer Engineering University of Calgary, Smithmr @ ucalgary.ca

Laboratory 5 – Done in “C and C++” Stage 1 – 30% Develop and investigate a multi-tasking system where the threads are free-running. Thread tasks are “Sleep(time_task)” Develop and investigate a multi-tasking system where the threads communicate through semaphores to control order of operation Stage 2 – 55% Demonstrate and investigate turning an “audio – talk-through program” into a multi-threaded system – one point processed per interrupt Stage 3 – 15% Demonstrate a batch processing system as a multi-threaded system Options Use SHARC ADSP-21061 boards (40 MHz) – existing audio-libraries – have not attempted Use Blackfin ADSP-BF533 boards (600 MHz) – existing audio-libraries – have been successful at home, but not here Use Blackfin ADSP-BF533 boards (600 MHz) – using very simple, no frills, audio-talk though library – surprising simple with 1 to 32 points being processed. Fails with 33 points. Code logic issue, not a timing issue as I can waste 25000 cycles per block at 32 points

Implementing a multi-thread system -- Laboratory 5 – Part B concepts Collect 1 pts @ 44 kHz  array1 Collect 1 pts @ 44 kHz  array2 Collect 1 pts @ 44 kHz  array3 Move array1  array4 SimulateComplex Move array2  array5 SimulateComplex Move array3  array6 SimulateComplex Transmit N pts @ 44 kHz  array4 Transmit N pts @ 44 kHz  array5 Transmit N pts @ 44 kHz  array6

Final ReadThread – Single Point

Final ProcessThread – Single Point

Final WriteThread – Single Point

Read Thread – ISR driven

Thread Status History – ISR driven

Laboratory 5 – Done in “C and C++” Stage 1 – 30% Develop and investigate a multi-tasking system where the threads are free-running. Thread tasks are “Sleep(time_task)” Develop and investigate a multi-tasking system where the threads communicate through semaphores to control order of operation Stage 2 – 55% Demonstrate and investigate turning an “audio – talk-through program” into a multi-threaded system – one point processed per interrupt Stage 3 – 15% Demonstrate a batch processing system as a multi-threaded system Options Use SHARC ADSP-21061 boards (40 MHz) – existing audio-libraries – have not attempted Use Blackfin ADSP-BF533 boards (600 MHz) – existing audio-libraries – have been successful at home, but not here Use Blackfin ADSP-BF533 boards (600 MHz) – using very simple, no frills, audio-talk though library – surprising simple with 1 to 32 points being processed. Fails with 33 points. Code logic issue, not a timing issue as I can waste 25000 cycles per block at 32 points

Implementing a multi-thread system -- Laboratory 5 concepts Collect N pts @ 44 kHz  array1 Collect N pts @ 44 kHz  array2 Collect N pts @ 44 kHz  array3 Move array1  array4 SimulateComplex Move array2  array5 SimulateComplex Move array3  array6 SimulateComplex Transmit N pts @ 44 kHz  array4 Transmit N pts @ 44 kHz  array5 Transmit N pts @ 44 kHz  array6

Read – Handling 4 points No-audio intention – just see if it will work test

Process – Handling 4 points No-audio intention – just see if it will work test

Write – Handling 4 points No-audio intention – just see if it will work test

Nett Result Not working as expected – equal priority (5) on each task We are obviously missing samples

Changing Priorities Priorities ReadThread 3 -- obviously the most critical ProcessThread 5 WriteThread 5

Different Priorities Priorities ReadThread 3 -- obviously the most critical ProcessThread 5 WriteThread 4

No idle time available – Optimize the code Priorities ReadThread 3 ProcessThread 5 WriteThread 5 Priorities ReadThread 3 ProcessThread 5 WriteThread 4

Implementing a multi-thread system -- Laboratory 5 Collect N pts @ 44 kHz  array1 Collect N pts @ 44 kHz  array2 Collect N pts @ 44 kHz  array3 Move array1  array4 SimulateComplex Move array2  array5 SimulateComplex Move array3  array6 SimulateComplex Transmit N pts @ 44 kHz  array4 Transmit N pts @ 44 kHz  array5 Transmit N pts @ 44 kHz  array6

Problem – NOT coding what we intended Collect N pts @ 44 kHz  array1 Collect N pts @ 44 kHz  array2 Collect N pts @ 44 kHz  array3 Move array1  array4 SimulateComplex Move array2  array5 SimulateComplex Move array3  array6 SimulateComplex Transmit N pts @ 44 kHz  array4 Transmit N pts @ 44 kHz  array5 Transmit N pts @ 44 kHz  array6

Proper Code

Net Result TOTAL SYSTEM HANG BLOCKED SEMAPHORES

Tried a number of things Worked out which semaphore was blocking Different priorities Different TIC times Better – but obviously missing cycles – particularly write

Decided to tie WriteThread to interrupt as well as ReadThread

Final Test Result Seems to behaving as expected However – when changed MAXIMUM COUNT FOR READ / WRITE ISR Semaphores – status history changes This could indicate that missing some interrupts Could mean nothing – interrupts asynchronous to timer TICs

Could handle 800 waste “times” every 32 samples – plenty of time 50000 cycles + Inner loop =2 * BUFFERLENGTH Outer loop = Wastetime * (3 + INNER) Total = 13 + Inner loop BUFFER = 32, waste time = 800 Cycles around 800 * 64 = 50000+ 68K Blackfin SHARC D0 (8) R0 (16) R0 A0 (6) P0 (with a bit of MIPS) (6) I0 (4) I0 (16)

Real life test -- small buffers Absolutely nothing However 4 audio connections in 6 audio connections out Got the correct connections Set buffer = 1 – worked first time Set buffer = 32 – worked first time

Larger buffers BUFFERSIZE – 64 – out of bsz memory error Fix .LDF file – manually (GUI window works how?) MEMORY { mem_VDK_strt { TYPE(RAM) START(0xFFA00000) END(0xFFA00003) WIDTH(8) } mem_l1_code { TYPE(RAM) START(0xFFA00004) END(0xFFA0FFFF) WIDTH(8) } mem_l1_code_cache { TYPE(RAM) START(0xFFA10000) END(0xFFA13FFF) WIDTH(8) } mem_EVT_all { TYPE(RAM) START(0xFF900000) END(0xFF900003) WIDTH(8) } mem_EVT_NMI { TYPE(RAM) START(0xFF900004) END(0xFF900007) WIDTH(8) } mem_EVT_EVX { TYPE(RAM) START(0xFF900008) END(0xFF90000B) WIDTH(8) } mem_EVT_IRPTEN { TYPE(RAM) START(0xFF90000C) END(0xFF90000F) WIDTH(8) } mem_EVT_IVHW { TYPE(RAM) START(0xFF900010) END(0xFF900013) WIDTH(8) } mem_EVT_IVTMR { TYPE(RAM) START(0xFF900014) END(0xFF900017) WIDTH(8) } mem_EVT_IVG7 { TYPE(RAM) START(0xFF900018) END(0xFF90001B) WIDTH(8) } mem_EVT_IVG8 { TYPE(RAM) START(0xFF90001C) END(0xFF90001F) WIDTH(8) } mem_EVT_IVG9 { TYPE(RAM) START(0xFF900020) END(0xFF900023) WIDTH(8) } mem_EVT_IVG10 { TYPE(RAM) START(0xFF900024) END(0xFF900027) WIDTH(8) } mem_EVT_IVG11 { TYPE(RAM) START(0xFF900028) END(0xFF90002B) WIDTH(8) } mem_EVT_IVG12 { TYPE(RAM) START(0xFF90002C) END(0xFF90002F) WIDTH(8) } mem_EVT_IVG13 { TYPE(RAM) START(0xFF900030) END(0xFF900033) WIDTH(8) } mem_EVT_IVG14 { TYPE(RAM) START(0xFF900034) END(0xFF900037) WIDTH(8) } mem_EVT_IVG15 { TYPE(RAM) START(0xFF900038) END(0xFF90003B) WIDTH(8) } mem_sysstack { TYPE(RAM) START(0xFF90003C) END(0xFF90083B) WIDTH(8) } mem_l1_data_b { TYPE(RAM) START(0xFF90083C) END(0xFF903FFF) WIDTH(8) } mem_l1_data_b_cache { TYPE(RAM) START(0xFF904000) END(0xFF907FFF) WIDTH(8) }

Memory issues – on 64 point data batches Still did not work Did I change the memory map correctly? No – seems okay as works with 32 – but perhaps having caching issue Went back to old memory map Went to configure external SDRAM and use that Modified only 1 array – left channel Left channel fails – right channel works Spending too much time in context switching Group ReadThread and WriteThread code together Does not even work with 32 !!!!!! Am convinced that there is a logical issue associated with the semaphore handling.

Bonus – 20% bonus If you can get all parts of Lab. 5 running and then solve this issue of why fails at 64 points (even when not wasting cycles) – 20% bonus on this lab. marks and either a mention or a “co-author-ship” on one of the Circuit Cellar articles May even be worth some money if I manage to sell the articles