Download presentation
Presentation is loading. Please wait.
Published byDaniel Robbins Modified over 9 years ago
1
s3.kth.se DSP Lecture 30/3-2010 Per Zetterberg
2
Agenda General. Starting CCS Comparing matlab and DSP results. Profiling when comparing matlab and DSP results. Matlab DSP communication. EDMA QUAD_DAC_ADC (headphones). _empty State-machine using case statement. Data formats. Overlap and add. Stack and heap. Simple optimization rules. Cache Some advices.
3
DSP Programming Setup in the project course: PC or ”host” DSP or “target” (or DSK)
4
What is a DSP ? A CPU which is optimized for signal processing: Special instructions for common signal processing operations, e.g. multiply and accumulate. Often on-chip circuits that handle input/output (IO). Low power consumption. Cheap (compared to processors in e.g. desktop computers).
5
Project Prototype: DSP versus PC Concurrently running programs at both the DSP and the PC. DSP-card used for: Signal processing IO (sampling/playback) PC used for: Graphical User Interface (GUI) Controlling the application, receiving results.
6
The DSP in the project course You will use a Texas Instruments C6713 floating point digital signal processor. Massively parallel architecture (VLIW) - up to eight 32 bit instructions are executed simultaneously. Running at 225 MHz, giving 1.2 GFlops peak performance. Belongs to the TI C6x family of DSPs Widely used in industry
7
Software pipelining The processor can be programmed to perform eight operations in paralell (e.g. MULT, ADD, MV) Every instruction has a certain latency. The compiler will pipeline code i.e. perform several instructions in parallell in loops if: –There are no function calls in the loop. –Optimization –o3 is selected. –.. Check that important loops are pipelined.
8
Technical Requirements of Prototype Real-time functionality DSP-card: signal processing, PC: user interface User interface through a GUI (windows style) implemented in matlab. No unnecessary use of processor time on the PC Well structured and adequately commented source code For more details see www.s3.kth.se/signal/edu/projekt/examination.shtml
9
Development Tools Matlab –Algorithm development. –Prototype verification. –User interface development (GUI) –Control of DSP card –Control of code profiling. DSP: Code Composer Studio –Algorithm implementation in C/Assembler –Debugging in conjunction with Matlab implementation –Code profiling.
10
How to learn … How to Quickly Learn DSP Programming : http://www.s3.kth.se/signal/edu/projekt/DSPsupport/ getting_started.shtml Our web-pages: http://www.s3.kth.se/signal/edu/projekt/DSPsupport/ Ask me: perz@ee.kth.se Search on the net, newsgroups, ….
11
PC programming (GUI) Two methods: Using a GUIDE (a GUI for creating a GUI ) Programmatically.
12
CCStudio v3.3 is the code development environment. Use Setup CCStudion v3.3 when you need to change between targets. –C6713 DSK-USB –C6713 Device Cycle Accurate Simulator (little endian) –C6416 Device Cycle Accurate Simulator (little endian) Connnect to matlab –cc=ccsdsp; –cc.visible(0), cc.run, cc.isrunning. Starting CCS The hardware When doing tutorial
13
Comparing matlab and DSP result Principle to test isolated functions e.g. a decoder: Generate input in matlab. Write input to the DSP. Call DSP version of function. Read output from the DSP. Call matlab version of function. Compare results. Let’s have a look at the compare_with_matlab_31 skeleton!
14
Test important functions by Copy the entire compare_with_matlab_31.pjt project. Replace FuncionToBeTested with your code: –In the C-code. –In the matlab code. Define input and output data¶meters as relevant for your function. Change the matlab code to generate relevant input data. Sometimes called ”test harness” in industry.
15
Sending data between matlab and DSP when the DSP is not running: Input_obj=createobj(cc,’Input’); % Input is a global % in the DSP code. write(Input_obj,Input); % write data Input=read(Input_obj); % read data Matlab DSP communication 1(2). matlab code
16
DSP -> PC communication 2(3) When the DSP is running (RTDX): On the DSP side: RTDX_write(&ctrl_chan_dsp2pc, &data_to_matlab, sizeof(float)*NO_FLOATS_TO_MATLAB ); On the matlab side: data_from_DSP=readmsg(cc.rtdx,'ctrl_chan_dsp2p c', 'single') Recommendation: Re-use code in the ”_empty” skeletons.
17
Matlab DSP communication 3(3) The PC DSP interface is slow Allowed cheating (if necessary): Pre-read data into memory before real-time processing. Read result from memory, after real-time processing. Large memory areas available in external memory: #pragma DATA_SECTION(Data,".external_mem") // On DSP short Data[1000]; // On DSP write(cc,h_Data.address(1), int16(Data)); % In matlab The data is not cleared when the program is reloaded.
18
Enhanced Direct Memory Access (EDMA) TX buffer RX buffer DXR McBSP DRR ADC DAC EDMA channel EDMA channel Memory Triggers interrupt HWI_INT8 when ready. Leaves DSP free from moving data back and forth to ADC/DAC!
19
EDMA PaRAM
20
Ping-Pong Buffering hEdmaReloadXmtPinghEdmaReloadXmtPong SRC=&gBufferXmtPing SRC=&gBufferXmtPong LINK= hEdmaReloadXmtPong LINK= hEdmaReloadXmtPing DST=DXR Let me show you EDMA_RTDX_GPIO_empty and QUAD_DAC_ADC_empty!
21
Skeleton programs handling EDMA+RTDX ”Single-antenna” EDMA_RTDX_GPIO_31_empty EDMA_RTDX_GPIO_31. ”Dual-antenna” QUAD_ADC_DAC_31_empty QUAD_ADC_DAC_31. Code development Matlab prototype Code development Matlab prototype
22
QUAD_DAC_ADC_31 Let’s go through QUAD_DAC_ADC_31_empty Then go through QUAD_DAC_ADC_31 This is the DSP matlab interface to be used in the matlab prototype!! Note: Documentation in “main.c”!
23
State Machine using Case Statement in appl_Process
24
Data formats C-types: char=8bits, short=16bits, int=32bits, float 32bits. Integers are signed or unsigned. Float. Sign=1bit, exponent=8bits, fraction 23 bits. In C, conversion is automatic (when pointers are not involved…). However, note the range …..
25
The buffers in QUAD_DAC_ADC … appl_Process(short *receive_buffer,short *transmit_buffer) The buffers consists of BUFFSIZE shorts (range [-2^15,2^15-1]). BUFFSIZE is defined in EDMA_RTDX_GPIO.h to be 1024. The number of bytes is 2*BUFFSIZE=2048. In EDMA_RTDX_GPIO there are 4 channels (i.e. ADC and DAC converters) which are interleaved. Thus the number of 4-dimensional vector samples is BUFFSIZE/2=256. BUFFSIZE can be changed.
26
Overlap and add Say we want to do implement a FIR filter. The input buffer is 128 samples. The filter is 10 samples. The filtered signal is 128+10-1=137 samples. But the output filter is 128 samples …. Solution: overlap and add. Variant 1: Save the last 9 samples. Add them to the next buffer. Variant 2: Overlap-and-add. See next slide.
27
Overlap and Add: With additional buffer 128 samples 9 9 Zero these samples Add the new signal Move 128+9 samples Good if transmit signal is 128 samples and unsynchronized!
28
Stack and Heap float myfunction(short *buffer) { float internal_buffer[1000]; … This data is stored in the stack. At least 4000 bytes needed. The stack size is set in ”build options”. No warning is given by the compiler of the stack size is to small!!! float *internal_buffer; internal_buffer = (float *) malloc(1000*sizeof(float)); … Allocated in heap The heap size is also set in ”build options”. Also no warning!!!
29
Code Optimization Let me show you optimization_example.
30
Simple Optimization Rules 1(2) Turn optimization on. Flags ”-o3”, program mode compilation ”–pm” and ”-op3” if possible. Turn debug off i.e do not use ”-g”. Avoid function calls inside loops! Use of division ”/” is a function call!, use _rcpsp instead. Other intrinsics see table 8-6 in spru187n. Avoid math-functions such as ”sin(x)” use look-up tables instead. Check that all important loops are pipelined by searching for "SOFTWARE PIPELINE INFORMATION“ in generated “.asm” files.
31
Simple Optimization Rules 2(2) Allocate all time-critical code and data in internal memory (in our skeletons this is default allocating to external memory requires #pragma statement). Use the touch function in an initialization routine to have the most important data structure cached in internal memory. (This function can be copied from the cache_miss_example skeleton) float ImportantData[100]; …. touch(ImportantData,100);
32
TMS320C6713 cache CPU core L1P. (Program cache) 4kB L1D. (Data cache) 4kB Memory 256kb Internal 16Mb External
33
One-way cache (L1P) Line 0 Line 1 Line 127 Mem 0x-0x1F Mem 0x20-0x3F Mem 0x0FE0-0x0FFF Mem 0x1000-0x101F Mem 0x1020-0x103F Mem 0x1FE0-0x1FFF Cache SDRAM
34
Two-way cache (L1D) Line 0A Line 1A Line 63A Line 0B Line 1B Line 63B Mem 0x-0x1F Mem 0x20-0x3F Mem 0x7E0-0x7FF Mem 0x800-0x81F Mem 0x820-0x83F Mem 0x0FE0-0x0FFF
35
L1D cache TagSet index Offset 0 4510 11 31 L1D address allocation: A new line of 32bytes is loaded on a read-miss with a penalty 4 clock-cycles. If two words are loaded per clock-cycle (reading sequentially from a memory segment) the overhead is 8/32*4=1clock-cykle per instruction cycle. A write-miss doesn’t lead to a loading of a new-line. A write buffer of four words handle up to four misses without penalty.
36
main.c: Illustrates impact of L1D write and read misses (compulsory misses). main2.c: Illustrates the problem with several data objects in the same set (thrashing) Two data objects are in the same set if: Aa = K*2048+ Ab, for some address Aa and Ab in Object A or B respectively, and for some K. Two code objects are in the same set if: Aa = K*4096+ Ab, for some address Aa and Ab in Object A or B respectively, and for some K. cache_miss_example
37
What to consider when programming to make good use of the cache Align all data buffers on 32byte boundaries. (#pragma DATA_ALIGN). Avoid to allocate more than two objects that map to the same set in the same algorithm. Avoid having two or more computationally complex algorithms that map to the same set. Profile the algorithms with and without cached data and program (see cache_miss_example). Force caching of important data and code before starting the realtime program starts (e.g in appl_Init()) by reading the data (touch) and calling the functions. Test processing data in smaller buffers to see if performance improves.
38
Some advices 1(2) Start with a skeleton. Only insert functions which have been checked against matlab. Make one change at a time => much easier to find out what went wrong. Save ”before” and ”after” code. Don’t use printf.
39
Some advices 2(2) Check that all pointers are initialized. If a variable are corrupted, check.map file to se how it could be over-written. Use extern declaration both in the file where variable is declared and where it is used. In real-time debugging. Store results to ”debug- globals”. When using sqrt, log, log10 use ”#include ”.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.