Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.

Slides:



Advertisements
Similar presentations
© Fraunhofer FKIE Corinna Harwardt Automatic Speaker Recognition in Military Environment.
Advertisements

Acoustic Characteristics of Consonants
Coarticulation Analysis of Dysarthric Speech Xiaochuan Niu, advised by Jan van Santen.
Spotting Multilingual Consonant-Vowel Units of Speech using Neural Network Models Suryakanth V.Gangashetty, C. Chandra Sekhar, and B.Yegnanarayana Speech.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Look Who’s Talking Now SEM Exchange, Fall 2008 October 9, Montgomery College Speaker Identification Using Pitch Engineering Expo Banquet /08/09.
Digital Systems: Hardware Organization and Design
: Recognition Speech Segmentation Speech activity detection Vowel detection Duration parameters extraction Intonation parameters extraction German Italian.
Introduction to Acoustics Words contain sequences of sounds Each sound (phone) is produced by sending signals from the brain to the vocal articulators.
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
Robust Voice Activity Detection for Interview Speech in NIST Speaker Recognition Evaluation Man-Wai MAK and Hon-Bill YU The Hong Kong Polytechnic University.
6/3/20151 Voice Transformation : Speech Morphing Gidon Porat and Yizhar Lavner SIPL – Technion IIT December
Neural Net Algorithms for SC Vowel Recognition Presentation for EE645 Neural Networks and Learning Algorithms Spring 2003 Diana Stojanovic.
Recognition of Voice Onset Time for Use in Detecting Pronunciation Variation ● Project Description ● What is Voice Onset Time (VOT)? – Physical Realization.
Advanced Technology Center Stuttgart EMOTIONAL SPACE IMPROVES EMOTION RECOGNITION Raquel Tato, Rocio Santos, Ralf Kompe Man Machine Interface Lab Advance.
SPEECH PERCEPTION The Speech Stimulus Perceiving Phonemes Top-Down Processing Is Speech Special?
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
Separation of Multispeaker Speech Using Excitation Information B.Yegnanarayana, R.Kumara Swamy and S.R.Mahadeva Prasanna Dept of Computer Science and.
Speech Sounds of American English and Some Iranian Languages
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults.
Speech Perception. Phoneme - a basic unit of a speech sound that distinguishes one word from another Phonemes do not have meaning on their own but they.
All features considered separately are relevant in a speech / music classification task. The fusion allows to raise the accuracy rate up to 94% for speech.
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
Speech Perception 4/6/00 Acoustic-Perceptual Invariance in Speech Perceptual Constancy or Perceptual Invariance: –Perpetual constancy is necessary, however,
June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST.
As a conclusion, our system can perform good performance on a read speech corpus, but we will have to develop more accurate tools in order to model the.
Utterance Verification for Spontaneous Mandarin Speech Keyword Spotting Liu Xin, BinXi Wang Presenter: Kai-Wun Shih No.306, P.O. Box 1001,ZhengZhou,450002,
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Speech Perception1 Fricatives and Affricates We will be looking at acoustic cues in terms of … –Manner –Place –voicing.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Speech Science Fall 2009 Oct 28, Outline Acoustical characteristics of Nasal Speech Sounds Stop Consonants Fricatives Affricates.
Ekapol Chuangsuwanich and James Glass MIT Computer Science and Artificial Intelligence Laboratory,Cambridge, Massachusetts 02139,USA 2012/07/2 汪逸婷.
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky.
Automatic Identification and Classification of Words using Phonetic and Prosodic Features Vidya Mohan Center for Speech and Language Engineering The Johns.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System.
IIT Bombay 1/26 Automated CVR Modification for Improving Perception of Stop Consonants A. R. Jayan & P. C. Pandey EE Dept, IIT.
SEPARATION OF CO-OCCURRING SYLLABLES: SEQUENTIAL AND SIMULTANEOUS GROUPING or CAN SCHEMATA OVERRULE PRIMITIVE GROUPING CUES IN SPEECH PERCEPTION? William.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments 張智星
New Acoustic-Phonetic Correlates Sorin Dusan and Larry Rabiner Center for Advanced Information Processing Rutgers University Piscataway,
IIT Bombay 14 th National Conference on Communications, 1-3 Feb. 2008, IIT Bombay, Mumbai, India 1/27 Intro.Intro.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
© 2005, it - instituto de telecomunicações. Todos os direitos reservados. Arlindo Veiga 1,2 Sara Cadeias 1 Carla Lopes 1,2 Fernando Perdigão 1,2 1 Instituto.
Performance Comparison of Speaker and Emotion Recognition
A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-
Unit 5 Phonetics and Phonology. Phonetics Sounds produced by the human speech organs are called the “phonic/auditory medium” Phonetics is the study of.
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
SRINIVAS DESAI, B. YEGNANARAYANA, KISHORE PRAHALLAD A Framework for Cross-Lingual Voice Conversion using Artificial Neural Networks 1 International Institute.
Speech Perception.
Speaker Change Detection using Support Vector Machines V.Kartik, D.Srikrishna Satish and C.Chandra Sekhar Speech and Vision Laboratory Department of Computer.
An i-Vector PLDA based Gender Identification Approach for Severely Distorted and Multilingual DARPA RATS Data Shivesh Ranjan, Gang Liu and John H. L. Hansen.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Suprasegmental Properties of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Speech Enhancement using Excitation Source Information B. Yegnanarayana, S.R. Mahadeva Prasanna & K. Sreenivasa Rao Department of Computer Science & Engineering.
Text-To-Speech System for English
Automated Detection of Speech Landmarks Using
Speech Perception.
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
A maximum likelihood estimation and training on the fly approach
Attentive Tracking of Sound Sources
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Presenter: Shih-Hsiang(士翔)
2017 APSIPA A Study on Landmark Detection Based on CTC and Its Application to Pronunciation Error Detection Chuanying Niu1, Jinsong Zhang1, Xuesong Yang2.
Presentation transcript:

Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute of Technology Madras, India 1

Significance of VOP for speech analysis Issues in VOP detection Acoustic-phonetic description of VOP Acoustic cues for VOP detection Detection of VOP in speech Begin-end detection using VOP Summary and conclusions Objective & Organization 2

Significance of VOP for Speech Analysis 3 Syllabification Begin-end detection Speech recognition Speech enhancement Vowel/Non-vowel classification

Issues in VOP Detection 4 CVs with voiced consonants Nasals, semivowels and aspirated sounds

Changes in excitation source and vocal tract system characteristics C and V regions in terms of acoustic features Acoustic-Phonetic Description of VOP 5 Acoustic Cues for VOP Detection Formant transition (F tr (t)) Epoch intervals (E i (t)) Strength of instants (S i (t)) Itakura distance (I d (t)) Ratio of signal energy to residual energy (S r (t))

Manual VOP Detection 6

VOP Detection in Isolated CVs Performance 7

VOP Detection in Continuous Speech 8

Clean Speech Degraded Speech 9 Performance of VOP Detection Algorithm

Begin-End Detection using VOP 10

Summary & Conclusions Automatic VOP detection algorithm using source features Acoustic-phonetic description for VOP Acoustic cues for VOP detection based on instants of significant excitation Begin-end detection using the knowledge of VOP Need for a method to combine spectral features with proposed source features for robust detection of VOP 11

Paper # 1410 Detection of Vowel Onset Point in Speech

S.R. Mahadeva Prasanna & Jinu Mariam Zachariah

Department of Computer Science & Engineering Indian Institute of Technology Madras, India

ABSTRACT Sound units in many languages are syllabic in nature, and frequently used syllables are of consonant-vowel (CV) type. Vowel onset point (VOP) is an important event in CV units. Knowledge of VOPs helps in many applications such as speech recognition, speaker recognition, speech enhancement, begin-end detection, segmentation of speech into vowel/nonvowel-like units and finding duration of vowels. In this paper we describe parameters or features useful for manually identifying the VOPs for different types of CV units. An automatic algorithm is proposed for detecting VOPs in continuous speech, which is motivated by the nature of production and perception of speech. Speech signal is a result of exciting a time varying vocal tract system with time varying excitation. Changes in the source and system characteristics around the VOP are both useful for the detection of VOPs. In this paper we use the changes in the source characteristics for detecting the VOPs. The performance of the proposed algorithm is evaluated using 25 sentences for which a total of 236 VOPs have been identified manually. It is found that 216 VOPs have been detected within a resolution of +/- 30 ms. Compared to the energy-based approach, VOP-based begin-end detection has significantly improved the performance in the case of text- dependent speaker verification system. For telephone database of 32 speakers consisting of 480 genuine utterances and 1984 impostor utterances, the performance of the system has improved from an equal error rate (EER) of 5.9% to 2.6%.