Download presentation
Presentation is loading. Please wait.
Published byDaniella Tucker Modified over 9 years ago
1
Comparing Audio Signals Phase misalignment Deeper peaks and valleys Pitch misalignment Energy misalignment Embedded noise Length of vowels Phoneme variance What makes it difficult?
2
Review: Minimum Distance Algorithm EXECUTION 0123456789 I1123456678 N2223456777 T3333455678 E4343456678 N5444456777 T6555555678 I7666666567 O8777777656 N9888888765 Array[i,j] = min{1+Array[i-1,j], cost(i,j)+Array[i-1,j-1],1+ Array[i,j-1)}
3
Pseudo Code (minDistance(target, source)) n = character in source m = characters in target Create array, distance, with dimensions n+1, m+1 FOR r=0 TO n distance[r,0] = r FOR c=0 TO m distance[0,c] = c FOR each row r FOR each column c IF source[r]=target[c] cost = 0 ELSE cost = 1 distance[r,c]=minimum of distance[r-1,c] + 1, //insertion distance[r, c-1] + 1, //deletion and distance[r-1,c-1] + cost) //substitution Result is in distance[n,m]
4
Is Minimum Distance Applicable? Maybe? – The optimal distance from indices [a,b] is a function of the costs with smaller indices. – This suggests that a dynamic approach may work. Problems – The cost function is more complex. A binary equal or not equal doesn’t work – Need to define a distance metric. But what should that metric be? Answer: It depends on which audio features we use. – Longer vowels may still represent the same speech. The classical solution is not to apply a cost when going from index [i-1,j] or [i,j-1] to [I,j]. Unfortunately, this assumption can lead to singularities, which result in incorrect comparisons
5
Complexity of Minimum Distance The basic algorithm is O(m*n) where m is the length (samples) of one audio signal and m is the length of the other. If m=n, the algorithm is O(n 2 ). Why?: count the number of cells that need to be filled in. O(n2) may be too slow. Alternate solutions have been devised. – Don’t fill in all of the cells. – Use a multi-level approach Question: Are the faster approaches needed for our purposes? Perhaps not!
6
Don’t Fill in all of the Cells Problem: May miss the optimal minimum distancepath
7
The Multilevel Approach Concept 1.Down sample to coarsen the array 2.Run the algorithm 3.Refine the array (up sample) 4.Adjust the solution 5.Repeat steps 3-4 till the original sample rate is restored Notes The multilevel approach is a common technique for increasing many algorithms’ complexity from O(n 2 ) to O(n lg n) Example is partitioning a graph to balance work loads among threads or processors
8
Singularities Assumption – The minimum distance comparing two signals only depends on the previous adjacent entries – The cost function accounts for the varied length of a particular phoneme, which causes the cost in particular array indices to no longer be well-defined Problem: The algorithm can compute incorrectly due to mismatched alignments Possible solutions: – Compare based on the change of feature values between windows instead of the values themselves – Pre-process to eliminate the causes of the mismatches
9
Possible Preprocessing Remove the phase from the audio: – Compute the Fourier transform – Perform discrete cosine transform on the amplitudes Normalize the energy of voiced audio: – Compute the energy of both signals – Multiply the larger by the percentage difference Remove the DC offset: Subtract the average amplitude from all samples Brick Wall Normalize the peaks and valleys: – Find the average peak and valley value – Set values larger than the average equal to the average Normalize the pitch: Use PSOLA to align the pitch of the two signals Remove duplicate frames: Auto correlate frames at pitch points Remove noise from the signal: implement a noise removal algorithm Normalize the speed of the speech:
10
Which Audio Features? Cepstrals: They are statistically independent and phase differences are removed ΔCepstrals, or ΔΔCepstrals: Reflects how the signal is changing from one frame to the next Energy: Distinguish the frames that are voiced verses those that are unvoiced Normalized LPC Coefficients: Represents the shape of the vocal track normalized by vocal tract length for different speakers. These are the popular features used for speech recognition
11
Which Distance Metric? General Formula: array[i,j] = distance(i,j) + min{array[i-1,j], array[i-1,j-1],array[i,j-1)} Assumption : There is no cost assessed for duplicate or eliminated frames. Distance Formula: – Euclidian: sum the square of one metric minus another squared – Linear: sum the absolute value of the distance between features Weighting the features: M ultiply each metric’s difference by a weighting factor to give greater/lesser emphasis to certain features Example of a distance metric using linear distance ∑ w i |(f a [i] – f b [i])| where f[i] is a particular audio feature for signals a and b. w[i] is that feature’s weight
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.