My attempt to multi-thread an audio talk-though program using batches of data M. Smith Electrical and Computer Engineering University of Calgary, Smithmr.

Slides:



Advertisements
Similar presentations
Computer Architecture
Advertisements

A look at interrupts What are interrupts and why are they needed.
TigerSHARC processor and evaluation board Different capabilities Different functionality.
Boot Issues Processor comparison TigerSHARC multi-processor system Blackfin single-core.
Lab. 2 – More details – Tasks 4 to 6 1. What concepts are you expected to understand after the Lab. 2 is finished? 2. How do you demonstrate that you have.
TigerSHARC CLU Closer look at the XCORRS M. Smith, University of Calgary, Canada
CMPT 300: Operating Systems I Dr. Mohamed Hefeeda
6/3/20151 Developing a multi-thread product – Introduction (ENCM491 – real time operating systems in 1 hr) M. Smith Electrical Engineering, University.
A look at interrupts What are interrupts and why are they needed in an embedded system? Equally as important – how are these ideas handled on the Blackfin.
Lab. 2 Overview 1. What concepts are you expected to understand after the Lab. 2 is finished? 2. How do you demonstrate that you have that knowledge?
A look at interrupts What are interrupts and why are they needed.
TigerSHARC CLU Closer look at the XCORRS M. Smith, University of Calgary, Canada
Computer Organization and Architecture
Background Telemetry Channel (BTC) on the BlackFin Presented by Alan Martin Winter ENCM 515.
Embedded Systems Interrupts C.-Z. Yang Sept.-Dec
7/14/20151 Introduction toVisual DSP Kernel VDK for Multi-threaded environment ENCM491 – Real Time (in 1 hour) M. Smith, Electrical and Computer Engineering,
Introduction To C++ Programming 1.0 Basic C++ Program Structure 2.0 Program Control 3.0 Array And Structures 4.0 Function 5.0 Pointer 6.0 Secure Programming.
0 Deterministic Replay for Real- time Software Systems Alice Lee Safety, Reliability & Quality Assurance Office JSC, NASA Yann-Hang.
Hardware Architecture of a real-world Digital Signal Processing platform: ADSP BlackFin Processor, Software Development on DSPs, and Signal Processing.
Microcomputer Systems Project By Shriram Kunchanapalli.
1 I-Logix Professional Services Specialist Rhapsody IDF (Interrupt Driven Framework) CPU External Code RTOS OXF Framework Rhapsody Generated.
Ultra sound solution Impact of C++ DSP optimization techniques.
Volume. 1-the idea of the program is to increase, decrease the volume. 2-the program does the following: A-PF8:decrease the volume B-Pf9:increase the.
Getting Started with the µC/OS-III Real Time Kernel Akos Ledeczi EECE 6354, Fall 2015 Vanderbilt University.
Interrupts, Buses Chapter 6.2.5, Introduction to Interrupts Interrupts are a mechanism by which other modules (e.g. I/O) may interrupt normal.
CE Operating Systems Lecture 14 Memory management.
Lab. 2 Overview Move the tasks you developed in Lab. 1 into the more controllable TTCOS operating system Manual control of RC car.
Lab. 4 Demonstrating and understanding multi-processor boot TigerSHARC multi-processor system.
Ultra sound solution Profiles and other optimizations.
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.
Lab. 2 Overview. Echo Switches to LED Lab1 Task 7 12/4/2015 TDD-Core Timer Library, Copyright M. Smith, ECE, University of Calgary, Canada 2 / 28.
Over-view of Lab. 1 See the Lab. 1 web-site and the lecture notes for more details.
Mistakes, Errors and Defects. 12/7/2015Mistakes, Errors, Defects, Copyright M. Smith, ECE, University of Calgary, Canada 2 Basic Concepts  You are building.
Multi-threaded projects Services and Drivers Alternate ways of doing Labs 1, 2, 3 and 4.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
Discussion Week 2 TA: Kyle Dewey. Overview Concurrency Process level Thread level MIPS - switch.s Project #1.
– BlackAn – The Blackfin Analyzer by Jacob Zurasky and Paul Deffenbaugh.
MicroC/OS-II S O T R.  MicroC/OS-II (commonly termed as µC/OS- II or uC/OS-II), is the acronym for Micro-Controller Operating Systems Version 2.  It.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
Presented by: Mark Fraysier Richard Jennings 11/28/2012.
A first attempt at learning about optimizing the TigerSHARC code TigerSHARC assembly syntax.
Building a simple loop using Blackfin assembly code If you can handle the while-loop correctly in assembly code on any processor, then most of the other.
“Lab. 5” – Updating Lab. 3 to use DMA Test we understand DMA by using some simple memory to memory DMA Make life more interesting, since hardware is involved,
On-chip Parallelism Alvin R. Lebeck CPS 220/ECE 252.
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
Digital Sound Projection ECE 477 Group 6 Software Narrative Steve Anderson Mike Goldfarb Shao-Fu Shih Josh Smith.
Developing a multi-thread product -- Introduction
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
Blackfin Volume Control
Developing a multi-thread product -- Introduction
TigerSHARC processor and evaluation board
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.
Lecture Topics: 11/1 General Operating System Concepts Processes
Moving Arrays -- 2 Completion of ideas needed for a general and complete program Final concepts needed for Final DMA.
Lab. 4 – Part 1 Demonstrating and understanding multi-processor boot
Single Value Processing Multi-Threaded Process
Moving Arrays -- 2 Completion of ideas needed for a general and complete program Final concepts needed for Final DMA.
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
EE 472 – Embedded Systems Dr. Shwetak Patel.
Getting serious about “going fast” on the TigerSHARC
General Optimization Issues
Lab. 4 – Part 2 Demonstrating and understanding multi-processor boot
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
Independent timers build into the processor
- When you approach operating system concepts there might be several confusing terms that may look similar but in fact refer to different concepts:  multiprogramming, multiprocessing, multitasking,
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
Mistakes, Errors and Defects
M. Smith Electrical and Computer Engineering University of Calgary,
Working with the Compute Block
ADSP 21065L.
Presentation transcript:

My attempt to multi-thread an audio talk-though program using batches of data M. Smith Electrical and Computer Engineering University of Calgary, ucalgary.ca

Laboratory 5 – Done in “C and C++” Stage 1 – 30%  Develop and investigate a multi-tasking system where the threads are free-running. Thread tasks are “Sleep(time_task)”  Develop and investigate a multi-tasking system where the threads communicate through semaphores to control order of operation Stage 2 – 55%  Demonstrate and investigate turning an “audio – talk-through program” into a multi-threaded system – one point processed per interrupt Stage 3 – 15%  Demonstrate a batch processing system as a multi-threaded system Options  Use SHARC ADSP boards (40 MHz) – existing audio-libraries – have not attempted  Use Blackfin ADSP-BF533 boards (600 MHz) – existing audio-libraries – have been successful at home, but not here  Use Blackfin ADSP-BF533 boards (600 MHz) – using very simple, no frills, audio-talk though library – surprising simple with 1 to 32 points being processed. Fails with 33 points. Code logic issue, not a timing issue as I can waste cycles per block at 32 points

Implementing a multi-thread system -- Laboratory 5 – Part B concepts Collect 1 44 kHz  array1 Collect 1 44 kHz  array2 Collect 1 44 kHz  array3 Collect 1 44 kHz  array1 Collect 1 44 kHz  array2 Move array1  array4 SimulateComplex Move array2  array5 SimulateComplex Move array3  array6 SimulateComplex Move array1  array4 SimulateComplex Transmit N 44 kHz  array4 Transmit N 44 kHz  array5 Transmit N 44 kHz  array6

Final ReadThread – Single Point

Final ProcessThread – Single Point

Final WriteThread – Single Point

Read Thread – ISR driven

Thread Status History – ISR driven

Laboratory 5 – Done in “C and C++” Stage 1 – 30%  Develop and investigate a multi-tasking system where the threads are free-running. Thread tasks are “Sleep(time_task)”  Develop and investigate a multi-tasking system where the threads communicate through semaphores to control order of operation Stage 2 – 55%  Demonstrate and investigate turning an “audio – talk-through program” into a multi-threaded system – one point processed per interrupt Stage 3 – 15%  Demonstrate a batch processing system as a multi-threaded system Options  Use SHARC ADSP boards (40 MHz) – existing audio-libraries – have not attempted  Use Blackfin ADSP-BF533 boards (600 MHz) – existing audio-libraries – have been successful at home, but not here  Use Blackfin ADSP-BF533 boards (600 MHz) – using very simple, no frills, audio-talk though library – surprising simple with 1 to 32 points being processed. Fails with 33 points. Code logic issue, not a timing issue as I can waste cycles per block at 32 points

Implementing a multi-thread system -- Laboratory 5 concepts Collect N 44 kHz  array1 Collect N 44 kHz  array2 Collect N 44 kHz  array3 Collect N 44 kHz  array1 Collect N 44 kHz  array2 Move array1  array4 SimulateComplex Move array2  array5 SimulateComplex Move array3  array6 SimulateComplex Move array1  array4 SimulateComplex Transmit N 44 kHz  array4 Transmit N 44 kHz  array5 Transmit N 44 kHz  array6

Read – Handling 4 points No-audio intention – just see if it will work test

Process – Handling 4 points No-audio intention – just see if it will work test

Write – Handling 4 points No-audio intention – just see if it will work test

Nett Result Not working as expected – equal priority (5) on each task We are obviously missing samples

Changing Priorities Priorities ReadThread 3 -- obviously the most critical ProcessThread 5 WriteThread 5

Different Priorities Priorities ReadThread 3 -- obviously the most critical ProcessThread 5 WriteThread 4

No idle time available – Optimize the code Priorities ReadThread 3 ProcessThread 5 WriteThread 5 Priorities ReadThread 3 ProcessThread 5 WriteThread 4

Implementing a multi-thread system -- Laboratory 5 Collect N 44 kHz  array1 Collect N 44 kHz  array2 Collect N 44 kHz  array3 Collect N 44 kHz  array1 Collect N 44 kHz  array2 Move array1  array4 SimulateComplex Move array2  array5 SimulateComplex Move array3  array6 SimulateComplex Move array1  array4 SimulateComplex Transmit N 44 kHz  array4 Transmit N 44 kHz  array5 Transmit N 44 kHz  array6

Problem – NOT coding what we intended Collect N 44 kHz  array1 Collect N 44 kHz  array2 Collect N 44 kHz  array3 Collect N 44 kHz  array1 Collect N 44 kHz  array2 Move array1  array4 SimulateComplex Move array2  array5 SimulateComplex Move array3  array6 SimulateComplex Move array1  array4 SimulateComplex Transmit N 44 kHz  array4 Transmit N 44 kHz  array5 Transmit N 44 kHz  array6

Proper Code

Net Result TOTAL SYSTEM HANG BLOCKED SEMAPHORES

Tried a number of things Worked out which semaphore was blocking Different priorities Different TIC times Better – but obviously missing cycles – particularly write

Decided to tie WriteThread to interrupt as well as ReadThread

Final Test Result Seems to behaving as expected However – when changed MAXIMUM COUNT FOR READ / WRITE ISR Semaphores – status history changes This could indicate that missing some interrupts Could mean nothing – interrupts asynchronous to timer TICs

Could handle 800 waste “times” every 32 samples – plenty of time cycles + Inner loop =2 * BUFFERLENGTH Outer loop = Wastetime * (3 + INNER) Total = 13 + Inner loop BUFFER = 32, waste time = 800 Cycles around 800 * 64 = K Blackfin SHARC D0 (8) R0 (16) R0 A0 (6) P0 (with a bit of MIPS) (6) I0 (4) I0 (16)

Real life test -- small buffers Absolutely nothing However  4 audio connections in  6 audio connections out Got the correct connections  Set buffer = 1 – worked first time  Set buffer = 32 – worked first time

Larger buffers BUFFERSIZE – 64 – out of bsz memory error Fix.LDF file – manually (GUI window works how?) MEMORY { mem_VDK_strt { TYPE(RAM) START(0xFFA00000) END(0xFFA00003) WIDTH(8) } mem_l1_code { TYPE(RAM) START(0xFFA00004) END(0xFFA0FFFF) WIDTH(8) } mem_l1_code_cache { TYPE(RAM) START(0xFFA10000) END(0xFFA13FFF) WIDTH(8) } mem_EVT_all { TYPE(RAM) START(0xFF900000) END(0xFF900003) WIDTH(8) } mem_EVT_NMI { TYPE(RAM) START(0xFF900004) END(0xFF900007) WIDTH(8) } mem_EVT_EVX { TYPE(RAM) START(0xFF900008) END(0xFF90000B) WIDTH(8) } mem_EVT_IRPTEN { TYPE(RAM) START(0xFF90000C) END(0xFF90000F) WIDTH(8) } mem_EVT_IVHW { TYPE(RAM) START(0xFF900010) END(0xFF900013) WIDTH(8) } mem_EVT_IVTMR { TYPE(RAM) START(0xFF900014) END(0xFF900017) WIDTH(8) } mem_EVT_IVG7 { TYPE(RAM) START(0xFF900018) END(0xFF90001B) WIDTH(8) } mem_EVT_IVG8 { TYPE(RAM) START(0xFF90001C) END(0xFF90001F) WIDTH(8) } mem_EVT_IVG9 { TYPE(RAM) START(0xFF900020) END(0xFF900023) WIDTH(8) } mem_EVT_IVG10 { TYPE(RAM) START(0xFF900024) END(0xFF900027) WIDTH(8) } mem_EVT_IVG11 { TYPE(RAM) START(0xFF900028) END(0xFF90002B) WIDTH(8) } mem_EVT_IVG12 { TYPE(RAM) START(0xFF90002C) END(0xFF90002F) WIDTH(8) } mem_EVT_IVG13 { TYPE(RAM) START(0xFF900030) END(0xFF900033) WIDTH(8) } mem_EVT_IVG14 { TYPE(RAM) START(0xFF900034) END(0xFF900037) WIDTH(8) } mem_EVT_IVG15 { TYPE(RAM) START(0xFF900038) END(0xFF90003B) WIDTH(8) } mem_sysstack { TYPE(RAM) START(0xFF90003C) END(0xFF90083B) WIDTH(8) } mem_l1_data_b { TYPE(RAM) START(0xFF90083C) END(0xFF903FFF) WIDTH(8) } mem_l1_data_b_cache { TYPE(RAM) START(0xFF904000) END(0xFF907FFF) WIDTH(8) }

Memory issues – on 64 point data batches Still did not work  Did I change the memory map correctly?  No – seems okay as works with 32 – but perhaps having caching issue Went back to old memory map  Went to configure external SDRAM and use that  Modified only 1 array – left channel Left channel fails – right channel works Spending too much time in context switching  Group ReadThread and WriteThread code together  Does not even work with 32 !!!!!! Am convinced that there is a logical issue associated with the semaphore handling.

Bonus – 20% bonus If you can get all parts of Lab. 5 running and then solve this issue of why fails at 64 points (even when not wasting cycles) – 20% bonus on this lab. marks and either a mention or a “co-author-ship” on one of the Circuit Cellar articles May even be worth some money if I manage to sell the articles