Pitch Estimation by Enhanced Super Resolution determinator By Sunya Santananchai Chia-Ho Ling.

Slides:



Advertisements
Similar presentations
Acoustic/Prosodic Features
Advertisements

Digital Signal Processing
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Glottal Source Parameterization: A Comparative Study Authors: Ixone Arroabarren, Alfonso Carlosena UNIVERSIDAD PÚBLICA DE NAVARRA Dpt. Electrical Engineering.
1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.
Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Multipitch Tracking for Noisy Speech
Overview of Real-Time Pitch Tracking Approaches Music information retrieval seminar McGill University Francois Thibault.
Look Who’s Talking Now SEM Exchange, Fall 2008 October 9, Montgomery College Speaker Identification Using Pitch Engineering Expo Banquet /08/09.
A System for Hybridizing Vocal Performance By Kim Hang Lau.
Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.
A Robust Algorithm for Pitch Tracking David Talkin Hsiao-Tsung Hung.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
Automatic Lip- Synchronization Using Linear Prediction of Speech Christopher Kohnert SK Semwal University of Colorado, Colorado Springs.
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
Xkl: A Tool For Speech Analysis Eric Truslow Adviser: Helen Hanson.
6/3/20151 Voice Transformation : Speech Morphing Gidon Porat and Yizhar Lavner SIPL – Technion IIT December
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
2000/10/31Chin-Kai Wu, CS, NTHU1 The Effect of Waveform Substitution on the Quality of PCM Packet Communications Ondria J. Wasem, David J. Goodman, Charles.
On improving the intelligibility of synchronized over-lap-and-add (SOLA) at low TSM factor Wong, P.H.W.; Au, O.C.; Wong, J.W.C.; Lau, W.H.B. TENCON '97.
Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice.
Multimedia Specification Design and Production 2013 / Semester 2 / week 3 Lecturer: Dr. Nikos Gazepidis
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Automatic Pitch Tracking January 16, 2013 The Plan for Today One announcement: Starting on Monday of next week, we’ll meet in Craigie Hall D 428 We’ll.
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.
Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.
Jacob Zurasky ECE5526 – Spring 2011
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Second Lecture Stuttgart, October 25, 2001.
Speech Signal Processing I
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio
Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
IIT Bombay 14 th National Conference on Communications, 1-3 Feb. 2008, IIT Bombay, Mumbai, India 1/27 Intro.Intro.
Speech Recognition Raymond Sastraputera.  Introduction  Frame/Buffer  Algorithm  Silent Detector  Estimate Pitch ◦ Correlation and Candidate ◦ Optimal.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,
SPEECH CODING Maryam Zebarjad Alessandro Chiumento Supervisor : Sylwester Szczpaniak.
Pitch Tracking MUMT 611 Philippe Zaborowski February 2005.
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
Query by Singing and Humming System
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Topic: Pitch Extraction
1 Introduction1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work2 Spectral subtraction 3 QBNE4 Results5 Conclusion, & future.
Vocoders.
Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Term Project Presentation By: Keerthi C Nagaraj Dated: 30th April 2003
Linear Predictive Coding Methods
Pitch Estimation By Chih-Ti Shih 12/11/2006 Chih-Ti Shih.
Machine Learning for Visual Scene Classification with EEG Data
Presenter: Shih-Hsiang(士翔)
Automatic Prosodic Event Detection
Presentation transcript:

Pitch Estimation by Enhanced Super Resolution determinator By Sunya Santananchai Chia-Ho Ling

Objective  Estimate value of the fundamental frequency of speech by using Enhance Super Resolution determinator (eSRFD)

Introduction  The fundamental frequency of speech is defined as the rate of glottal pluses generated by the vibration of the vocal folds.  The pitch of speech is the perceptual correlate of fundamental frequency.  The fundamental frequency of speech is important in the prosodic features of stress and intonation.

fundamental frequency determination Algorithm (FDAs).  Determine the fundamental frequency of speech waveform or analyzing the pitch automatically.  Desire to examine methods of fundamental frequency extraction which use radically different techniques

The algorithms to determine the  Cepstrum-based determinator (CFD) (Noll, 1969).  Harmonic product spectrum (HPS) (Schroeder, 1968; Noll, 1970)  Feature-based tracker (FBFT) (Phillips, 1985)  Parallel processing method (PP) (Gold & Rabiner, 1969)  Integrated tracking algorithm (IFTA) (Secrest & Doddington, 1983)  Super resolution determinator (SRFD) (Medan et al., 1991)

Enhance Super Resolution determinator (eSRFD)  based on the SRFD method which uses a waveform similarity metric normalized cross-correlation coefficient.  Performances of the SRFD algorithm, to reduced the occurrence of errors.

The eSRFD algorithm  Pass the speech waveform to low-pass filter.  The speech waveform is initially low-pass filtered.

 Each frame of filtered sample data processed by the silence detector.  Signal is analysed frame-by-frame; interval 6.4 ms of non-overlapping.  Contains a set of samples  Divided 3 consecutive segment

Analysis segments for the enhanced super resolution determinator

 Normalized cross-correlation for ‘voiced’ frame:  If frame of data is not classified as silence or unvoice, then candidate values for the fundamental period by using the first normalized cross-correlation of

 Definition threshold for candidate value  Candidate values of the fundamental period are obtained by locating peaks in the normalized crosscorrelation coefficient for which the value of exceeds a specified the threshold.

 A second normalized cross-correlation coefficient.  The frame is classified as ‘voiced’ which has >  Determined the second normalized cross- correlation coefficient

 Candidate score for  Candidates for exceeds the threshold are given a score of 2, others are 1. are given a score of 2, others are 1.  If there are 1 or more candidates with a score of 2 in a frame, then all those candidates with a score of 1 are removed from the list of candidates.  If there is only one candidate (with score 1 or 2), the candidate is assumed to be the best estimate of the fundamental period of that frame.

 Otherwise, an optimal fundamental period is sought from the set of remaining candidates, calculated the coefficient of each candidate.  The first coefficient is assumed to be the optimal value. If the subsequent * 0.77 > the current optimal value, the subsequent is the optimal value.

 In the case of only 1 candidate score 1 but no candidate score2, the frame status will be reconsidered depends on the frames state of previous frame.  If the previous frame is ‘silent’, the current value is hold and depends on the next frame.  If the next frame is also ‘silent’, the current frame will be considered as ‘silent’.  Otherwise, the current frame is considered as ‘voiced’ and the held will be considered as the good estimation for the current frame.

 Modification apply biasing to and  Biasing is applied if the following conditions  The two previous frames were classified as ‘voiced’  The value of the previous frame is not being temporarily held.  The of previous frame is less than 7/4 *( of its preceding voiced frame ), and greater than 5/8*  The biasing tends to increase the percentage of unvoiced regions of speech being incorrectly classified as ‘voiced’.

 Calculate the fundamental period:  The fundamental period for the frame is estimated by calculate

Implementation  In this report will be cover the eSRFD algorithm, implementation by MATLAB ver 7.2b to program following by eSRFD algoithm

The Result

Conclusion  The acoustic correlate of pitch is the fundamental frequency of speech.  Enhance SRFD (eSRFD) is the performances of the SRFD which can reduce the occurrence of error involved in the extraction of fundamental frequency[1].  It have occurrence error in the result which depend on kind of speech waveform.  In addition, the result in this project has more occurrence error than Paul Baghaw’s result[2] because of the problem from design to implement programming follow by eSRFD algorithm.

References  [1] Pual Christopher Bagshaw (1994). Automatic prosodic analysis for computer aided pronunciation teaching. The university of Edinburgh.  [2] Bagshaw, Paul C, Hiller, S M, Jack, Mervyn A (1993). Enhanced pitch tracking and the processing of f0 contours for computer aided intonation teaching. International Speech Communication Association. In Proc. Eurospeech '93, Berlin, volume 2, pages , 1993.