Speech Recognition Front End Pre-emphasis Temporal Features Consolidate Features Frequency Features Spectral Analysis windowing Enhance Features Speech.

Speech Recognition Front End Pre-emphasis Temporal Features Consolidate Features Frequency Features Spectral Analysis windowing Enhance Features Speech Feature Vectors This week’s focus is on the spectral analysis

Spectral Analysis Goal: Find useful frequency related features Approaches – Without Fourier Analysis: Apply a recursive band pass bank of filters Use linear predictive coding (LPC) – With Fourier Analysis: Calculate a Fourier transform warp results based on the MEL scale Applications: Auditory models mimicking human hearing – Eliminate noise by removing non-voice frequencies – Detect formants present in signal – Perform Cepstral analysis to detect pitch and recognize speech – Auditory nerves stop responding to extended occurrences of the same frequency Idea: Deemphasize frequencies present for extended periods. Results: Effective for speech recognition in noisy environments This week’s emphasis will be on Fourier Analysis

The Fourier Transform Family Fourier Series A decomposed weighted sum of sinusoidal functions that models an arbitrary infinitely periodic continuous function Fourier Transform A linear operation that maps an arbitrary function with infinite range into a spectrum of its frequency components Discrete Fourier Transform (DFT) A Fourier Transform applied to a discrete infinitely repeating periodic series of complex numbers. Discrete Time Fourier Transform (DTFT) A Fourier Transform applied to a a-periodic discrete series of complex numbers that extend from ± ∞. Fast Fourier Transform: Fast way to calculate DFT

The number e e = lim n->∞ {(1 + 1/n) n } When n = 1 e ≈ 2 When n = 2 e ≈ (1 + ½) 2 = 9/4 = 2.25 When n = 3 e ≈ (1 + 1/3) 3 = 64/27 = 2.37037 When n is extremely large, it approaches the value: e = 2.718281828 … What does this have to do with sound? Answer: The future slides will tell.

Quick Calculus Review The derivative of a function at a point is the slope of the function at that point (change in y over change in x). The derivative of x 2 = 2x (Notation: f’(x 2 ) = 2x) lim ∆x->0 ( (x+∆x) 2 – x 2 )/ ∆x = lim ∆x->0 (x 2 + 2x∆x + ∆x 2 – x 2 )/∆x = lim ∆x->0 (2x + ∆x) = 2x Tables of derivatives proved by mathematicians exist We will need these: – f’(x n ) = nx n-1 – f’(Sin x) = Cos x, f’(Cos x) = -Sin x – f’(e x ) = e x, f’(e ax ) = a e ax

Complex Numbers Extends the number line to a plane – Horizontal axis: Real Numbers – Vertical axis: Complex Numbers – Rectangular Notation: a + bi a along the real axis b along the imaginary axis Operations – Addition: (a+bi) + (c+di) = (a+b) + (b+d)I – Multiplication: (a+bi) * (c+di) = (ac – bd) + (ad + bc)I – Division: (a+bi)/(c+di) solved by multiplying numerator and denominator by the conjugate of c+di, which equals c-di

Polar Notation Rectangular Form 4+i3 Convert to Polar Form (5,36.87) – M = sqrt(4 2 +3 2 ) = 5 – Ө = arctan(3/4) Convert to Rectangular – A+ib = M(cos Ө + i * sinӨ) Distance and angle from the origin Note: At 90 and 270 degrees we have a divide by zero

McLauren Series for e, sin, cos McLauren Series to estimate any well-behaved function in terms of polynomials f(x) = f(0)x 0 /0! + f’(0)x 1 /1! + … + f n (0)x n /n! + … Try it out say for the third derivative at x = 0 f 3 (0) = 0 + 0 + 0 + 3*2*1 f 3 (0)/(3*2*1) + 0 + 0 + … All the derivatives match at x = 0. Series that we will need e x = 1 + x + x 2 /2! + x 3 /3! + x 4 /4! + … Sin x = x – x 3 /3! + x 5 /5! – x 7 /7! + … Cos x = 1 – x 2 /2! + x 4 /4! – x 6 /6! + … Another way to calculate e: e = 1 + 1 + 1/2! + 1/3! + … Note: 0! = 1

Sine, Cosine and e e iӨ = 1 + iӨ + (iӨ) 2 /2! + (iӨ) 3 /3! + (iӨ) 4 /4! + (iӨ) 5 /5! + (iӨ) 6 /6! + (iӨ) 7 /7! +··· (Multiply terms to eliminate higher powers of i) = 1 + iӨ - Ө 2 /2! - iӨ 3 /3! + Ө 4 /4! + iӨ 5 /5! - Ө 6 /6! - iӨ 7 /7! + ··· (Gather real and complex terms together) = (1- Ө2/2! + Ө 4 /4! - Ө 6 /6! + ···) + i (Ө - Ө 3 /3! + iӨ 5 /5! - iӨ 7 /7! + ···) (Substitute Cos and Sin terms for the series) e iӨ = cos(Ө) + i sin(Ө) (This is called Euler’s formula) From Previous Slide e x = 1 + x + x 2 /2! + x 3 /3! + x 4 /4! + … Sin x = x – x 3 /3! + x 5 /5! – x 7 /7! + … Cos x = 1 – x 2 /2! + x 4 /4! – x 6 /6! + …

Key Formulae and Identities Euler's Formula: e ix = cos(x) + i * sin(x) Trigonometric Identities: cos(x)=cos(-x) and sin(x)=-sin(-x) cos(x) = (e ix + e -ix )/2 and sin(x) = (e ix – e -ix )/2i sin 2 (x)+ cos 2 (x) = 1 sin(x+y) = sin(x)cos(y) + cos(x)sin(y) cos(x+y) = cos(x)cos(y) - sin(x)sin(y)

Quick Linear Algebra Review Linear algebra extends Euclidian space beyond three dimensions. represents a vector going from points (0,0,0) to (3,4,5). Two vectors are orthogonal (perpendicular roughly speaking) if their inner (dot product) equals 0. – Example: = 1*0 + 0*1 + 0*0 = 0 – Example: = 3*-1 + 1*3 = 0 Two functions are orthogonal between a and b if ∫ a,b f(x)g(x)dx = 0 A set of functions are mutually orthogonal if ∫ a,b f i (x)f j (x)dx = 0 if i≠j and c>0 if i=j. Why do we need this? Orthogonal function sets can be used to decompose or construct signals. Inner Product: sum the products of correspondent coordinates

Basis to span a space Consider the orthogonal basis,, – These form a basis a three dimension space. – Why? Any 3-dimension vector is a linear combination of these – Example: = 4 * + 3 * + 2 * Consider the orthogonal basis vectors:, – They are orthogonal because: = 0 Consider the basis vectors:, – Also orthogonal because the inner (dot) product is 0) – has a length of unity ((1/5 ½ ) 2 + (2/5 ½ ) 2 ) ½ = 1 – also has a length of unity (same distance calculation) Orthonormal basis vectors: orthogonal and have unity length

Orthogonal and Orthonormal Experiment (intuitive example, not mathematically precise) Goal: construct from basis vectors – Orthogonal Basis: and – = 18 and = -1 – 18 + (-1) = which is five times Another experiment – Orthonormal basis:, – = 18/5 ½ and = -1/5 ½ – (18/5 ½ ) + (-1/5 ½ ) = = Conclusion: Orthonormal basis vectors correlated with another vector gets the multiple of that basis vector.

Fourier Series A Fourier series is an sum (possibly by not necessarily infinite) of Sine and Cosines to model a continuous signal. Fourier modeling allows us to decompose a signal, perform processing, and recombine the results to solve an original problem

Fourier Decomposition The top signal decomposes into nine cosine and sine waves

Fourier Square Wave Synthesis

Fourier Cosine Series The set of functions: {cos(k2πF 0 } where k is an integer >0 –Mutually orthogonal from –T to T for 0 ≤ t 0 –∫ -L,L cos(k 1 2πx/P) cos(k 2 2πx/P)dx = 0 if k 1 ≠ k 2 ; ≠ 0 if k 1 = k 2 –Proof requires some Calculus: Namely integration x(t) = a 0 cos(0*2πF 0 t) +a 1 cos(1*2πF 0 t) +a 1 cos(2*2πF 0 t) … x(t) = a0 + ∑ k=1,∞ a k cos(k2πFt ) where F = 2π/T Comment: The series doesn’t include phases, if we add phases we have twice as many unknowns to compute cos (πx/3) and cos (2 πx/3) Integral: Cos (πx/3) * cos (2 πx/3)

A General Orthogonal Function Set Euler Equation: e iφ = cos(φ) + i sin(φ) – Radius = magnitude (always unity); φ = phase. Consider the function set: {e iω k } – Angular frequency: ω k = 2πkF 0 = 2πk/T 0 – F 0,T 0 Fundamental frequency & period. – k = speed which e iω k traverses the circle – Orthogonal because ∫ -∞,∞ e jω n e jω m =0 whenever n ≠-m Notes 1.The book uses j instead of I 2.Electrical engineers prefer j 3.Mathematicians prefer I 4.Get used to both! 5.In the diagram, φ = 2πF 0

Orthogonality Example Left: Correlate top with middle resulting bottom having area ≠0 Right: Correlate top with middle resulting bottom having area = 0

Putting it all together {e iω k } is an Orthogonal basis for signals – Each function: e iω k is a basis function – We can use to basis functions to synthesize signals Synthesize (Fourier series) – Source: frequency magnitudes, Sink: time signal – x(k) = (1/T)∑ k=0,T a k e ikω 0 where x(k) = signal at time t – T = # of basis functions (possibly infinite); a k = magnitude of w k For computer processing, we need a discrete counterpart – Why? We don’t to deal with infinite points or basis functions – x[k] = (1/N)∑ k=0,N X[k] e i2∏kn/N – k determines how fast the sum traverses the circle (higher k faster) – N basis functions and N frequencies Note: For periodic functions, we can use [0,T] instead of [- ∞,∞]

Fourier Analysis Goal: Compute coefficients given the signal. Synthesis equation: x(t) = ∑ k= -∞,∞ a k e itmω 0 Multiply both sides by e -itkw 0 x(t)e -itkw 0 = (∑ k= -∞, ∞ a k e imtω 0 )e -itkw 0 Integrate over the period: 0, T0 ∫ 0,T0 x(t)e -itkw 0 dt = ∫ 0,T0 (∑ k= -∞, ∞ a k e imω 0 t ) e -itkω 0 dt The sum will be zero except when k = m ∫ 0,T0 x(t)e -itkw 0 dt = (∑ k= -∞, ∞ a k ) ∫ 0,T0 (e timω 0 ) e -itkω 0 dt ∫ 0,T0 x(t)e -itkw 0 dt = (∑ k= -∞, ∞ a k ) ∫ 0,T0 (e it(m-k)ω 0 )dt The only time this is non-zero is if k=m ∫ 0,T0 x(t)e -itkw 0 dt = a k ∫ 0,T0 dt = a k t | 0,t0 = a k T0 The answer (value of coefficient m): a k = (1/T0)∫ 0,T0 x(t)e -itkw 0 dt Note: 1/T0 is simply a constant the scales the result

Discrete Version Definition: Continuous Fourier Transform and Inverse – Transform: X(w) = ∫ -∞, ∞ x(t)e -itwt dt – Inverse: x(t) = (1/2π)∫ -∞, ∞ X(w)e iwt dw Convert from continuous version: – Evaluate at N equally spaced points (period now is N) – Use sums to approximate the integral – Note: x(t) = value at time t, x[n] is x(t) evaluated at time 2 ∏ kn/N Discrete Fourier Transform and Inverse – Transform: X[k] = ∑ n=0,N-1 x[n] e -i2∏kn/N – Inverse: x[k] = (1/N)∑ n=0,N-1 X[k] e i2∏kn/N Note: X[k] is a complex number representing magnitude/phase Conclusion: We can go between time and frequency domains

Signal Plot The phases are shown in the spectrum plot in the complex plane. The phase affects how the time domain signal looks. The amplitude of the spectrum plot remain constant regardless of phase.

Fourier Transform of Square Wave Fourier Transforms exhibit the property of duality Square wave in frequency = to window sync function in time and visa versa Convolution in time = multiplication in frequency and visa versa Proof with calculus ∫ -∞,∞ x(t)e -jtkw 0 dt = ∫ -1/2,1/2 x(t)e -jtkw 0 dt = ∫ -1/2,1/2 e -jtkw 0 dt =(1/jw)e -jwt | -1/2,1/2 = (1/jw)(e -jw½ –e -jw(-½) )=(1/jw)(e jw/2 –e -jw/2 ) = sin(jw/2)/(jw/2) -1/21/2

Complex DFT by Correlation double[] DFT( double[] time, int N) { double[] f[2*N], real, imag; double om, w = 2 * Math.PI / time.length; for (k=0; k<N; k++) { for (i=0; i<N; i++) { real = Math.cos(2*Math.PI*k*i/N); imag = -Math.sin(2*Math.PI*k*i/N); f[2*k] +=(time[2*i]*real–time[2*i+1]*imag); f[2*k+1]+=(time[2*i+1]*imag+time[2*i]*real); } } return freq; } Note: even indices = real part, odd indices = imaginary part Complexity: O(N 2 ) because of the double loop of N each Example: For 512 samples, loops 262144 times Evaluation: Too slow, but FFT is O(N lg N)

The FFT Algorithm The FFT algorithm is based on divide-and-conquer The running time complexity is O(n log n)

Why do we need FFT? Correlation algorithm is O(N 2 ) Too slow to be practical even on today's processors Optimized FFT is O(N lgN) which is orders of magnitude faster Assume 512 elements in a window – O(N) = C * 512 – O(N2) = C * 512 * 512 = C * 262,144 – O(N lg N) = C * 512 * 9 = C * 4,608

Theory for Optimization Base Case: x[0] Recursive Relationship ∑ t=0  N-1 x[t] e -i2πkt/N = ∑ t=0  N/2-1 x[2t] e -i2πk(2t)/N + ∑ t=0  N/2-1 x[2t+1] e -i2πk(2t+1)/N = ∑ t=0  N/2-1 x[2t] e -i2πkt/(N/2) + ∑ t=0  N/2-1 x[2t+1] e -i2πk(2t+1)/N = ∑ t=0  N/2-1 x[2t] e i2πkt/(N/2) + e -i2πk/N ∑ t=0  N/2-1 x[2t+1]e -i2πkt/(N/2) = F k even + e -i2πk/N * F k odd Note: work at each step is O(N); there are lg(N) levels

Simple Recursive FFT Solution Complex[] fft(Complex[] x) { int N = x.length; Complex[] y = new Complex[N]; if (x.length==1) {y[0] = x[0]; return y; } Complex[] even = new Complex[N/2]; Complex[] odd = new Complex[N/2]; for (int m=0; m<N/2; m++) { even[m] = x[2*m]; odd[n] = x[2*m+1]; } Complex[] q = fft(even), r = fft(odd); for (int k=0; k<N/2; k++) { double exp = -2*k* math.PI /N; Complex wk = new Complex (Math.cos(exp), Math.sin(exp)); Y[k] = q[k].plus(wk.times(r[k])); Y[k+N/2] = q[k].minus(wk.times(r[k])); } Return y; } Note: e -2kπ/N = -e -2kπ/N+N/2

Inefficiencies The Complex class causes many jumps and puts pressure on the hardware cache Declaring and copying arrays at every step slows things down at least by half Repetitive calculations of sines and cosines are extremely slow N<<1 is ten times faster than N/2 Overhead associated with activation record creation due to the recursion calls is very slow The computations still are an order of magnitude slower than needed

Eliminating the Recursion The numbers in the rectangles are the array indices You see the original indices as we pass through each level of recursion Can you see a pattern ? 000001010011100101111110 000010100110001011111101 000100010110001101111011 Butterfly algorithm

Butterfly Code int j = N>>1, k; for (int i=1;i<N-1;i++) { if (i < j) { swap (x[i],x[j]);} k = N>>1; while (k>=2 & j>=k) { j -= k; k >>= 1; } j += k; } Most Significant Bit SwapBit ( x, x + lgN) Second most significant bit SwapBit(x, x + lg(N/2) Third most significant bit SwapBit(x, x + lg(N/4) kth most significant bit SwapBit(x, x + lg(N/2 k )) Flip bits from left to right

Sin and Cosine Table Look Up e i2πk/N = cos(2πk/N) + i sin(2πk/N) We can store in an array (sinX[]) sin(2π0/N), sin(2π1/N), sin(2π2/N) sin(2π3/N), … sin(2π(N-1)/N) cos(2πk/N) = sinX[(k+(N>>2))%N] Compute the values ahead of time and save repetitive calculations

Optimized FFT – after butterfly code // Perform the fft calculations. for (int stage=1; stage<=M; stage++) // M = lg N { // Remember that complex numbers require pairs of doubles fftSubGroupGap = 2<<stage; // 4, 8, 16,... – subgroup distance gap = fftSubGroupGap>>1; // 2, 4, 8,... – odd/even distance kInc = N>>1; // Number of 2PIki/N steps for odd/even entries. // Outer loop: each sub-fft group; inner loop: combine group elements for (int even=0; even<complex.length; even+=fftSubGroupGap) { k = 0; // Index into the trigonometric lookup table. for (int element=even; element<(even+gap); element+=2) { // ***** See Next Slide ***** k += kInc; // position for next look up. } } kInc >>= 1; }

Multiplication Portion // Look up e^2PIki/N avoiding trig calculations here. realW = sines[(k+(N>>2))%N]; // cos(2PIk/N); imagW = -sines[k%N]; // -sin(2PIk/N); // Complex multiplication of the odd entry of the subgroup // with (e^2PIi/N)^k = (cos(2PI/N) - i * sin(2*PI/N)^k j = (element + gap); tempReal = realW * complex[j] - imagW * complex[j+1]; tempImag = realW * complex[j+1] + imagW * complex[j]; // Adjust the odd entry (subtract: the fft is periodic). complex[j] = complex[element] - tempReal; complex[j+1] = complex[element+1] -tempImag; //Adjust the even entry. complex[element] += tempReal; complex[element+1] += tempImag;

Final Notes Standard Fast Fourier Transform – requires N to be a power of 2 for recursion to work – Can pad the array with zeroes to extend frequency domain Can it work if N is not a power of 2? – Yes, but special slower processing is needed How do we know if it works? – Point N/2-1 = Point N/2+1, Point N/2-2 = Point N/2+2, Point N/2-k = Point N/2 + k, etc. – Note: Points 0 and N/2 don't match, so don’t check these – The FFT Inverse should restore the time domain signal – Compare to the slower correlation DFT calculation – Try some simple impulses and check the results

Speech Recognition Front End Pre-emphasis Temporal Features Consolidate Features Frequency Features Spectral Analysis windowing Enhance Features Speech.

Similar presentations

Presentation on theme: "Speech Recognition Front End Pre-emphasis Temporal Features Consolidate Features Frequency Features Spectral Analysis windowing Enhance Features Speech."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Speech Recognition Front End Pre-emphasis Temporal Features Consolidate Features Frequency Features Spectral Analysis windowing Enhance Features Speech.

Similar presentations

Presentation on theme: "Speech Recognition Front End Pre-emphasis Temporal Features Consolidate Features Frequency Features Spectral Analysis windowing Enhance Features Speech."— Presentation transcript:

Similar presentations

About project

Feedback