Lab 2 – DSP software architecture and the real life DSP characteristics of signals that make it necessary.

Slides:



Advertisements
Similar presentations
Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.
Advertisements

Homework Reading Machine Projects Labs
The Fetch – Execute Cycle
Machine cycle.
Is There a Real Difference between DSPs and GPUs?
DSPs Vs General Purpose Microprocessors
Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.
DAP teaching computer architecture at Berkeley since 1977
Computer Science and Engineering Laboratory, Transport-triggered processors Jani Boutellier Computer Science and Engineering Laboratory This.
Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
CPU Review and Programming Models CT101 – Computing Systems.
COMP3221 lec23-decode.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lectures 23: Instruction Representation; Assembly and Decoding.
The Microprocessor and its Architecture
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
1-1 Welcome to: CSC225 Introduction to Computer Organization Paul Hatalsky.
1 Lecture-2 CSIT-120 Spring 2001 Revision of Lecture-1 Introducing Computer Architecture The FOUR Main Elements Fetch-Execute Cycle A Look Under the Hood.
TigerSHARC CLU Closer look at the XCORRS M. Smith, University of Calgary, Canada
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
ELEC Fall 05 1 Very- Long Instruction Word (VLIW) Computer Architecture Fan Wang Department of Electrical and Computer Engineering Auburn.
Processor Architectures and Program Mapping 5kk10 TU/e 2006 Henk Corporaal Jef van Meerbergen Bart Mesman.
Stored Program Concept: The Hardware View
1 Lecture-2 CS-120 Fall 2000 Revision of Lecture-1 Introducing Computer Architecture The FOUR Main Elements Fetch-Execute Cycle A Look Under the Hood.
TigerSHARC CLU Closer look at the XCORRS M. Smith, University of Calgary, Canada
TigerSHARC processor General Overview. 6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 2 Concepts tackled Introduction to.
PSU CS 106 Computing Fundamentals II Introduction HM 1/3/2009.
ECEN4002 Spring 2002DSP Lab Intro R. C. Maher1 A Short Introduction to DSP Microprocessor Architecture R.C. Maher ECEN4002/5002 DSP Laboratory Spring 2002.
EE 445S Real-Time Digital Signal Processing Lab Spring 2012 Lab #3.1 Digital Filters Some contents are from the book “Real-Time Digital Signal Processing.
Digital Signal Processors for Real-Time Embedded Systems By Jeremy Kohel.
Computer Organization
COMPUTER ORGANIZATION CSCE 230 Final Project. OVERVIEW  Implemented RISC processor  VHDL  Test program created to demonstrate abilities.
Basics and Architectures
 Send in audio signals and use sharp FIR filter to pick out 42 Hz and 59 Hz signals and send out warning tones ◦ Try FIR filter of 256 taps, down sample.
Ultra sound solution Impact of C++ DSP optimization techniques.
Processor Architecture Needed to handle FFT algoarithm M. Smith.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
Pipeline And Vector Processing. Parallel Processing The purpose of parallel processing is to speed up the computer processing capability and increase.
Microprocessor Dr. Rabie A. Ramadan Al-Azhar University Lecture 2.
CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page
Software Defined Radio 長庚電機通訊組 碩一 張晉銓 指導教授 : 黃文傑博士.
DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000.
Averaging Filter Comparing performance of C++ and ‘our’ ASM Example of program development on SHARC using C++ and assembly Planned for Tuesday 7 rd October.
Operating Systems Lecture No. 2. Basic Elements  At a top level, a computer consists of a processor, memory and I/ O Components.  These components are.
An Operation Rearrangement Technique for Low-Power VLIW Instruction Fetch Dongkun Shin* and Jihong Kim Computer Architecture Lab School of Computer Science.
Computer Architecture And Organization UNIT-II General System Architecture.
Flynn’s Architecture. SISD (single instruction and single data stream) SIMD (single instruction and multiple data streams) MISD (Multiple instructions.
Compilers for Embedded Systems Ram, Vasanth, and VJ Instructor : Dr. Edwin Sha Synthesis and Optimization of High-Performance Systems.
Which one? You have a vector, a[ ], of random integers, which can modern CPUs do faster and why? //find max of vector of random ints max=0; for (inda=0;
Vector and symbolic processors
Additional Hardware Optimization m Yumiko Kimezawa October 25, 20121RPS.
Lecture 3: Computer Architectures
Parallel Processing Presented by: Wanki Ho CS147, Section 1.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Statement Of Work Define static processor, DSP Profiles, memory and bus architectures. Define interconnections between DLX and DSP processors while helping.
Architectural Effects on DSP Algorithms and Optimizations Sajal Dogra Ritesh Rathore.
Winter-Spring 2001Codesign of Embedded Systems1 Essential Issues in Codesign: Architectures Part of HW/SW Codesign of Embedded Systems Course (CE )
Embedded Systems Design
Digital Signal Processors
Foundations of Computer Science
General Optimization Issues
Introduction to Digital Signal Processors (DSPs)
TigerSHARC processor General Overview.
CISC AND RISC SYSTEM Based on instruction set, we broadly classify Computer/microprocessor/microcontroller into CISC and RISC. CISC SYSTEM: COMPLEX INSTRUCTION.
Computer Structure S.Abinash 11/29/ _02.
Chapter Six.
Computer Architecture
MARIE: An Introduction to a Simple Computer
General Optimization Issues
Explaining issues with DCremoval( )
General Optimization Issues
The Von Neumann Machine
Presentation transcript:

Lab 2 – DSP software architecture and the real life DSP characteristics of signals that make it necessary

Lab. 2 – same as Lab. 1 but more so 1.Handling FIR algorithms in C++ and assembly code 2.Working to understand how C++ is trying to use the processor architecture 3.Trying to understand why we are not achieving th 2.5 GFLOPs (or what ever) that is available 4.Using the C++ compiler as a tutor 1.Get the compiler to do something, look at code generated and understand it 5.The more so – Persuading the compiler to use 1.Both data busses (pmda and dmda) and instruction bus (pmco) 2.Use SIMD

FIR filter is a natural for all these optimizations – from 24N down to N/2 cycles SISD -- Von Neuman Architecture – One memory – may have superscalar capability FIR_output = Fetch in instructions I MEM for this operation and then do (3N to 24N fetches) sum ( X MEM [n] * coeff MEM [n] ) ; 0 <= n < N SISD -- Harvard Architecture – Two memory with superscalar capability FIR_output = Fetch in instructions I PM for future operation and then do (only 2N fetches) sum (X DM [n] * coeff DM [n] ) ; 0 <= n < N SISD -- Super Harvard Architecture – Three memory with superscalar capability FIR_output = Fetch in instructions I PM-CO for future operation and then do (only N fetches) sum (X DM-DA [n] * coeff PM-DA [n] ) ; 0 <= n < N SIMD -- Super Harvard Architecture – Three memory with superscalar capability FIR_output = Fetch in instructions I PM-CO for future operation and then do (only N /2 fetches) sum (X DM-DA [2m] * coeff PM-DA [2m] ) + sum (X DM-DA [(2m + 1)] * coeff PM-DA [2m + 1] ) ; 0 <= m < N/2

Analog low pass anti-aliasing filter Problem to tackle – we want to filter a signal – How many FIR taps are needed? You must have Signal Strength of LP filter must match Sample rate Nyquist criterium A/D Sample rate D/A Sample rate Analog low pass anti-aliasing filter Strength of LP filter must match Sample rate Nyquist criterium Analog signal with amplitude values with changed frequency characteristics DSP Algorithm Discrete sample values with changed frequency characteristics from input signal

Analog Devices A/D work roughly like this Analog low pass anti-aliasing filter You must have Signal Strength of LP filter must match Sample rate Nyquist criterium A/D 8 TIMES OVER SAMPLING FIRMWARE (LP Averaging) anti ALIASING FILTER Discrete sample values with changed frequency characteristics from input signal Down-sampler by x2, x4, x8, x16 44 kHz

How much processing power per sample? FIR works on time domain signals – so that’s where we calculate the performance power needed for an N-tap FIR filter on any architecture N * (FIR values fetched + coeffs fetched + add + multiply + 4 Instructions) + (N – 1) * (2 * FIR values fetched + 2 instruction) So what is the value of N is needed

Work out the impact of N by examining FIR characteristics in frequency domain If you sample at 44 kHz then frequency characteristics of digital are like this -22 kHz +22 kHz -66 kHz -22 kHz -66 kHz +66 kHz -22 kHz +22kHz +66 kHz S -2 S -1 S0S0 S1S1 S2S2 The digitized signal if the sum of all frequencies to infinity grouped in stages of 44 kHz Process signal == S0 + S1 + S2 + S…….. + S-1 + S-2 + S…….s

In DSP – YOU MUST WORK WITH BAND-LIMITED SIGNALS AND BAND-LIMITED NOISE Signals must be in the range -22 kHz to +22kHz if sample at 44 kHz – Strict Nyquist Better signal processing if you arrange, and can afford the time, to process at 44 kHz but use band limited signals of-11 kHz to + 11 kHz Hence the need of an ANALOG anti-aliasing filter because once you have sampled – you have an aliased digital signal

Work out the impact of N by examining FIR characteristics in frequency domain If you sample at 44 kHz then frequency characteristics of digital are like this -22 kHz +22 kHz The digitized signal if the sum of all frequencies to infinity grouped in stages of 44 kHz Process signal == S0 + S1 + S2 + S…….. + S-1 + S-2 + S…….s N point FIR filter x x x x x x x x x x x x x Frequency resolution 44 kHz / N Say you want a filter of 3.3 kHz low pass with fairly smooth frequency characteristics 44/ N = 6.6 / 20 N = 44 * 20 / 7 around 128 tap filter x x x x x x x x x x x x x -22 kHz -3 kHz +3 kHz +22 kHz

Ranchlands -- 3 sound sources as Hz, Hz, Hz That means the bandwidth of each of 3 filters is 2 Hz If a 3.3 kHz filter needed 128 tap filter Then a 2 Hz filter will need more than 128, 000 tap filter Problem – each filter is doing its own independent frequency modification – very inefficient Solution – combine some of the frequency modification from all the filters Solution 1 – Combine frequency modification of data in time domain – Lab 2 Solution 2 – switch to modification of data infrequency domain – Lab 4 (fastest but not frequently done because of bad historical press Solution – bring in co-processor support (Labs 2 and 4)

Lab 2 – we are going to do this Analog low pass anti-aliasing filter Signal A/D FIRMWARE DIGITAL (summing) anti ALIASING FILTER Down-sampler x2, x4, x8, x16 44 kHz DIGITAL (32 x FIR) anti (LP) ALIASING FILTER Down-sampler x32 4 kHz DIGITAL (256 x FIR) BAND PASS FILTER 42 Hz DIGITAL (256 x FIR) BAND PASS FILTER 57 Hz DIGITAL (256 x FIR) BAND PASS FILTER 37 Hz 3 * N * N operations

Lab 4 – we are going to do this Analog low pass anti-aliasing filter Signal A/D FIRMWARE DIGITAL (summing) anti ALIASING FILTER Down-sampler x2, x4, x8, x16 44 kHz DIGITAL (32 x FIR) anti (LP) ALIASING FILTER Down-sampler x32 4 kHz DIGITAL (256 x FFT) BP FILTER 42 Hz 57 Hz 37 Hz 2 * N * log2(N) operations

Works like this Init_UTTCOS Add_PreEmptiveThread( ) While (1) Wait Dispatch Pre-emptive task 1) Do low-pass FIR on new value at place in Output Buffer 2) When full – down-sample the output buffer (discard 31 out of 32 points) into a new filters FIFO buffer 3) Launch 3 task (outside of interrupt handler) to apply FIR filters 2, 3 and 4 Non-emptive task – 3 band- pass FIR filters 1) Do FIR on new value at place in Output Buffer 2) Do an average power calculation on the output buffer 3) If average power is above a certain level then launch task to turn on LED or launch a task to turn it off

Software FIR Loop -- Used many places – when call functions in loop Pseudo code Count = 0; Sum_Rx = 0; LOOP_START: If count > MAX jump PAST_LOOP_END nop; nop; LOOP BODY -- sum_Rx[I] = sum_Rx + FIR[I++] ………. count++; jump LOOP START; nop; nop; PAST_LOOP_END: OUTSIDE LOOP

Hardware loop Pseudo code R0 = MAX; Sum_Rx = 0; LCNTR R0, DO (PC /* LOOP START IS NEXT PC */, PAST_LOOP_END-1) UNTIL LCE; LOOP_START: LOOP BODY -- sum_Rx = sum_Rx + FIR[I++] ………. PAST_LOOP_END: // Note PAST_LOOP_END-1 OUTSIDE LOOP

Hardware loop with SIMD MAX EVEN Pseudo code R0 = MAX / 2; SWITCH TO SIMD nop; nop; Sum_Rx = 0 // Hidden Sum_Sx = 0; LCNTR R0, DO (PC /* LOOP START IS NEXT PC */, PAST_LOOP_END-1) UNTIL LCE; LOOP_START: LOOP BODY -- sum_Rx = sum_Rx + FIR[I++] ………. // Hidden -- sum_Sx[I] = sum_Sx + FIR[I++] ………. // DOUBLE MEMORY INCREMENT PAST_LOOP_END: // Note PAST_LOOP_END-1 OUTSIDE LOOP SWITCH TO SISD nop; nop; Ry = Sx; // STALL; Rx = Rx + Ry

Hardware loop with SIMD MAX ODD R0 = MAX / 2; -- rounds down SWITCH TO SIMD nop; nop; Sum_Rx = 0 // Hidden Sum_Sx = 0; LCNTR R0, DO (PC /* LOOP START IS NEXT PC */, PAST_LOOP_END-1) UNTIL LCE; LOOP_START: LOOP BODY -- sum_Rx = sum_Rx + FIR[I++] ………. // Hidden -- sum_Sx = sum_Sx + FIR[I++] ………. // DOUBLE MEMORY INCREMENT PAST_LOOP_END: // Note PAST_LOOP_END-1 OUTSIDE LOOP SWITCH TO SISD nop; nop; Ry = Sx; // STALL; Rx = Rx + Ry sum_Rx = sum_Rx + FIR[I++] …