Comparing Audio Signals Phase misalignment Deeper peaks and valleys Pitch misalignment Energy misalignment Embedded noise Length of vowels Phoneme variance.

Slides:



Advertisements
Similar presentations
Lecture 19: Parallel Algorithms
Advertisements

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Wavelets Fast Multiresolution Image Querying Jacobs et.al. SIGGRAPH95.
Analysis and Digital Implementation of the Talk Box Effect Yuan Chen Advisor: Professor Paul Cuff.
Fast Algorithms For Hierarchical Range Histogram Constructions
DFT/FFT and Wavelets ● Additive Synthesis demonstration (wave addition) ● Standard Definitions ● Computing the DFT and FFT ● Sine and cosine wave multiplication.
Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
CS0007: Introduction to Computer Programming Array Algorithms.
Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
Automatic Lip- Synchronization Using Linear Prediction of Speech Christopher Kohnert SK Semwal University of Colorado, Colorado Springs.
Numerical Algorithms Matrix multiplication
Learn how to make your drawings come alive…  Lecture 3: SKETCH RECOGNITION Analysis, implementation, and comparison of sketch recognition algorithms,
6/3/20151 Voice Transformation : Speech Morphing Gidon Porat and Yizhar Lavner SIPL – Technion IIT December
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
Space Efficient Alignment Algorithms and Affine Gap Penalties
1 Lecture 25: Parallel Algorithms II Topics: matrix, graph, and sort algorithms Tuesday presentations:  Each group: 10 minutes  Describe the problem,
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Advanced Topics in Algorithms and Data Structures 1 Two parallel list ranking algorithms An O (log n ) time and O ( n log n ) work list ranking algorithm.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
A PRESENTATION BY SHAMALEE DESHPANDE
Solving Systems of Equations and Inequalities
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Summarized by Soo-Jin Kim
FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space
Solving Systems of Equations and Inequalities Section 3.1A-B Two variable linear equations Section 3.1C Matrices Resolution of linear systems Section 3.1D.
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
Multimedia Specification Design and Production 2013 / Semester 2 / week 3 Lecturer: Dr. Nikos Gazepidis
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.
Implementing a Speech Recognition System on a GPU using CUDA
1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
Chapter 7: Sorting Algorithms Insertion Sort. Sorting Algorithms  Insertion Sort  Shell Sort  Heap Sort  Merge Sort  Quick Sort 2.
Chapter 5: Speech Recognition An example of a speech recognition system Speech recognition techniques Ch5., v.5b1.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Technical Report of Web Mining Group Presented by: Mohsen Kamyar Ferdowsi University of Mashhad, WTLab.
Sequence Comparison Algorithms Ellen Walker Bioinformatics Hiram College.
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
Understanding Your Data Set Statistics are used to describe data sets Gives us a metric in place of a graph What are some types of statistics used to describe.
Chapter 3 Time Domain Analysis of Speech Signal. 3.1 Short-time windowing signal (1) Three types windows : –Rectangular window –h r [n] = u[n] – u[n –
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
3.7 Adaptive filtering Joonas Vanninen Antonio Palomino Alarcos.
SPEECH CODING Maryam Zebarjad Alessandro Chiumento Supervisor : Sylwester Szczpaniak.
Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez.
Course Code #IDCGRF001-A 5.1: Searching and sorting concepts Programming Techniques.
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
Structure from Motion Paul Heckbert, Nov , Image-Based Modeling and Rendering.
Understanding Your Data Set Statistics are used to describe data sets Gives us a metric in place of a graph What are some types of statistics used to describe.
: Chapter 5: Image Filtering 1 Montri Karnjanadecha ac.th/~montri Image Processing.
ELEC692 VLSI Signal Processing Architecture Lecture 12 Numerical Strength Reduction.
Algorithm Analysis with Big Oh ©Rick Mercer. Two Searching Algorithms  Objectives  Analyze the efficiency of algorithms  Analyze two classic algorithms.
CS 591 S1 – Computational Audio -- Spring, 2017
PATTERN COMPARISON TECHNIQUES
Ch. 5: Speech Recognition
Vocoders.
Linear Prediction.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
School of Computer Science & Engineering
Linear Predictive Coding Methods
Dynamic Programming 1/15/2019 8:22 PM Dynamic Programming.
6.7 Practical Problems with Curve Fitting simple conceptual problems
Digital Systems: Hardware Organization and Design
Presenter: Shih-Hsiang(士翔)
Measuring the Similarity of Rhythmic Patterns
Keyword Spotting Dynamic Time Warping
Presentation transcript:

Comparing Audio Signals Phase misalignment Deeper peaks and valleys Pitch misalignment Energy misalignment Embedded noise Length of vowels Phoneme variance What makes it difficult?

Review: Minimum Distance Algorithm EXECUTION I N T E N T I O N Array[i,j] = min{1+Array[i-1,j], cost(i,j)+Array[i-1,j-1],1+ Array[i,j-1)}

Pseudo Code (minDistance(target, source)) n = character in source m = characters in target Create array, distance, with dimensions n+1, m+1 FOR r=0 TO n distance[r,0] = r FOR c=0 TO m distance[0,c] = c FOR each row r FOR each column c IF source[r]=target[c] cost = 0 ELSE cost = 1 distance[r,c]=minimum of distance[r-1,c] + 1, //insertion distance[r, c-1] + 1, //deletion and distance[r-1,c-1] + cost) //substitution Result is in distance[n,m]

Is Minimum Distance Applicable? Maybe? – The optimal distance from indices [a,b] is a function of the costs with smaller indices. – This suggests that a dynamic approach may work. Problems – The cost function is more complex. A binary equal or not equal doesn’t work – Need to define a distance metric. But what should that metric be? Answer: It depends on which audio features we use. – Longer vowels may still represent the same speech. The classical solution is not to apply a cost when going from index [i-1,j] or [i,j-1] to [I,j]. Unfortunately, this assumption can lead to singularities, which result in incorrect comparisons

Complexity of Minimum Distance The basic algorithm is O(m*n) where m is the length (samples) of one audio signal and m is the length of the other. If m=n, the algorithm is O(n 2 ). Why?: count the number of cells that need to be filled in. O(n2) may be too slow. Alternate solutions have been devised. – Don’t fill in all of the cells. – Use a multi-level approach Question: Are the faster approaches needed for our purposes? Perhaps not!

Don’t Fill in all of the Cells Problem: May miss the optimal minimum distancepath

The Multilevel Approach Concept 1.Down sample to coarsen the array 2.Run the algorithm 3.Refine the array (up sample) 4.Adjust the solution 5.Repeat steps 3-4 till the original sample rate is restored Notes The multilevel approach is a common technique for increasing many algorithms’ complexity from O(n 2 ) to O(n lg n) Example is partitioning a graph to balance work loads among threads or processors

Singularities Assumption – The minimum distance comparing two signals only depends on the previous adjacent entries – The cost function accounts for the varied length of a particular phoneme, which causes the cost in particular array indices to no longer be well-defined Problem: The algorithm can compute incorrectly due to mismatched alignments Possible solutions: – Compare based on the change of feature values between windows instead of the values themselves – Pre-process to eliminate the causes of the mismatches

Possible Preprocessing Remove the phase from the audio: – Compute the Fourier transform – Perform discrete cosine transform on the amplitudes Normalize the energy of voiced audio: – Compute the energy of both signals – Multiply the larger by the percentage difference Remove the DC offset: Subtract the average amplitude from all samples Brick Wall Normalize the peaks and valleys: – Find the average peak and valley value – Set values larger than the average equal to the average Normalize the pitch: Use PSOLA to align the pitch of the two signals Remove duplicate frames: Auto correlate frames at pitch points Remove noise from the signal: implement a noise removal algorithm Normalize the speed of the speech:

Which Audio Features? Cepstrals: They are statistically independent and phase differences are removed ΔCepstrals, or ΔΔCepstrals: Reflects how the signal is changing from one frame to the next Energy: Distinguish the frames that are voiced verses those that are unvoiced Normalized LPC Coefficients: Represents the shape of the vocal track normalized by vocal tract length for different speakers. These are the popular features used for speech recognition

Which Distance Metric? General Formula: array[i,j] = distance(i,j) + min{array[i-1,j], array[i-1,j-1],array[i,j-1)} Assumption : There is no cost assessed for duplicate or eliminated frames. Distance Formula: – Euclidian: sum the square of one metric minus another squared – Linear: sum the absolute value of the distance between features Weighting the features: M ultiply each metric’s difference by a weighting factor to give greater/lesser emphasis to certain features Example of a distance metric using linear distance ∑ w i |(f a [i] – f b [i])| where f[i] is a particular audio feature for signals a and b. w[i] is that feature’s weight