As a conclusion, our system can perform good performance on a read speech corpus, but we will have to develop more accurate tools in order to model the.

Slides:



Advertisements
Similar presentations
1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.
Advertisements

Identification of Stress Placement in Speakers with and without Dysarthria Pamela Campellone Thomas DiCicco Rupal Patel.
How does first language influence second language rhythm? Laurence White and Sven Mattys Experimental Psychology Bristol University.
The Role of F0 in the Perceived Accentedness of L2 Speech Mary Grantham O’Brien Stephen Winters GLAC-15, Banff, Alberta May 1, 2009.
Research & Development ICASSP' Analysis of Model Adaptation on Non-Native Speech for Multiple Accent Speech Recognition D. Jouvet & K. Bartkova France.
Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.
Prosodics, Part 1 LIN Prosodics, or Suprasegmentals Remember, from our first discussions in class, that speech is really a continuous flow of initiation,
Mandarin Chinese Speech Recognition. Mandarin Chinese Tonal language (inflection matters!) Tonal language (inflection matters!) 1 st tone – High, constant.
AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University.
: Recognition Speech Segmentation Speech activity detection Vowel detection Duration parameters extraction Intonation parameters extraction German Italian.
Modeling Prosodic Sequences with K-Means and Dirichlet Process GMMs Andrew Rosenberg Queens College / CUNY Interspeech 2013 August 26, 2013.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
Languages’ rhythm and language acquisition Franck Ramus Laboratoire de Sciences Cognitives et Psycholinguistique, Paris Jacques Mehler, Marina Nespor,
Sonority as a Basis for Rhythmic Class Discrimination Antonio Galves, USP. Jesus Garcia, USP. Denise Duarte, USP and UFGo. Charlotte Galves, UNICAMP.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
Language and Speaker Identification using Gaussian Mixture Model Prepare by Jacky Chau The Chinese University of Hong Kong 18th September, 2002.
Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.
1 Speech synthesis 2 What is the task? –Generating natural sounding speech on the fly, usually from text What are the main difficulties? –What to say.
Why is ASR Hard? Natural speech is continuous
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
EDC 424 Spring 2014 JMaggiacomo Development of Orthographic Knowledge.
Phonology, phonotactics, and suprasegmentals
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Languages NEW INTERNATIONALIST EASIER ENGLISH ELEMENTARY READY LESSON.
PROSODY MODELING AND EIGEN- PROSODY ANALYSIS FOR ROBUST SPEAKER RECOGNITION Zi-He Chen, Yuan-Fu Liao, and Yau-Tarng Juang ICASSP 2005 Presenter: Fang-Hui.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.
All features considered separately are relevant in a speech / music classification task. The fusion allows to raise the accuracy rate up to 94% for speech.
A Phonotactic-Semantic Paradigm for Automatic Spoken Document Classification Bin MA and Haizhou LI Institute for Infocomm Research Singapore.
Schizophrenia and Depression – Evidence in Speech Prosody Student: Yonatan Vaizman Advisor: Prof. Daphna Weinshall Joint work with Roie Kliper and Dr.
Zero Resource Spoken Term Detection on STD 06 dataset Justin Chiu Carnegie Mellon University 07/24/2012, JHU.
Speech Perception 4/6/00 Acoustic-Perceptual Invariance in Speech Perceptual Constancy or Perceptual Invariance: –Perpetual constancy is necessary, however,
Utterance Verification for Spontaneous Mandarin Speech Keyword Spotting Liu Xin, BinXi Wang Presenter: Kai-Wun Shih No.306, P.O. Box 1001,ZhengZhou,450002,
7-Speech Recognition Speech Recognition Concepts
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Welcome.
Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University Korea.
On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody Zhao-yu Su Phonetics Lab, Institute of Linguistics, Academia Sinica.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky.
The vowel detection algorithm provides an estimation of the actual number of vowel present in the waveform. It thus provides an estimate of SR(u) : François.
1 Determining query types by analysing intonation.
English Phonetics 许德华 许德华. Objectives of the Course This course is intended to help the students to improve their English pronunciation, including such.
國立交通大學 電信工程研究所 National Chiao Tung University Institute of Communication Engineering 1 Phone Boundary Detection using Sample-based Acoustic Parameters.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
July Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme.
Tone, Accent and Quantity October 19, 2015 Thanks to Chilin Shih for making some of these lecture materials available.
© 2005, it - instituto de telecomunicações. Todos os direitos reservados. Arlindo Veiga 1,2 Sara Cadeias 1 Carla Lopes 1,2 Fernando Perdigão 1,2 1 Instituto.
Performance Comparison of Speaker and Emotion Recognition
Phonetics, part III: Suprasegmentals October 19, 2012.
Arlindo Veiga Dirce Celorico Jorge Proença Sara Candeias Fernando Perdigão Prosodic and Phonetic Features for Speaking Styles Classification and Detection.
Speech recognition Home Work 1. Problem 1 Problem 2 Here in this problem, all the phonemes are detected by using phoncode.doc There are several phonetics.
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Merging Segmental, Rhythmic and Fundamental Frequency Features for Automatic Language Identification Jean-Luc Rouas 1, Jérôme Farinas 1 & François Pellegrino.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Suprasegmental features and Prosody Lect 6A&B LING1005/6105.
“Articulatory Talking Head” Showcase Project, INRIA, KTH. Articulatory Talking Head driven by Automatic Speech Recognition INRIA, Parole Team KTH, Centre.
2009 NIST Language Recognition Systems Yan SONG, Bing Xu, Qiang FU, Yanhua LONG, Wenhui LEI, Yin XU, Haibing ZHONG, Lirong DAI USTC-iFlytek Speech Group.
A Text-free Approach to Assessing Nonnative Intonation Joseph Tepperman, Abe Kazemzadeh, and Shrikanth Narayanan Signal Analysis and Interpretation Laboratory,
Suprasegmental features and Prosody
Text-To-Speech System for English
Automated Detection of Speech Landmarks Using
(2) Suprasegmentals The features such as pitch, stress, and length, which are used simultaneously with units larger than segments, are called “suprasegmentals.”
Statistical Models for Automatic Speech Recognition
Presentation transcript:

As a conclusion, our system can perform good performance on a read speech corpus, but we will have to develop more accurate tools in order to model the spontaneous speech prosody which seems to be too complex and with too much speaker variability for our features. Recognition Language identification on read speech: Experiments were previously made on the five languages of the MULTEXT database: English, French, German, Italian and Spanish. Japanese was added thanks to Mr. Kitasawa. The tests are made using 20 seconds read speech utterances and consist in a six-way identification task. On the read speech corpus, our system can achieve good performance (79 % of correct identification on six languages). The main confusion are between English and German (both stress timed languages), and Spanish and Italian. Language identification on spontaneous speech: Experiments are made on ten languages of the OGI Multilingual Telephone Speech Corpus: English, Farsi, French, German, Japanese, Korean, Mandarin, Spanish, Tamil and Vietnamese. The tests are made using the 45 seconds spontaneous speech utterances and consist in a pair discrimination task. On the spontaneous speech corpus, the discrimination is easier to achieve between languages which does not belong to the same rhythmic and intonation classes. Modeling Prosody for Language Identification on Read and Spontaneous Speech L2 L1 Pseudo Syllable Speech segmentation: statistical segmentation (André-Obrecht, 1988) Shorts segments (bursts and transient parts of sounds) Longer segments (steady parts of sounds) Speech Activity Detection and Vowel detection (Pellegrino & Obrecht, 2000) Spectral analysis of the signal Language and speaker independent algorithm Pseudo Syllable segmentation Derived from the most frequent syllable structure in the world: CV The speech signal is parsed in patterns matching the structure: C n V (n integer, can be 0). Duration Parameters 3 parameters are computed: Global consonantal segments duration Global vocalic segment duration Syllable complexity (Nc: number of consonantal segments in the pseudo-syllable) Speech Segmentation Speech activity detection Vowel detection Duration parameters extraction Intonation parameters extraction L1 model L2 model Pseudo syllable generation Signal Language Conclusion Intonation Parameters Fundamental frequency extraction: « MESSIGNAIX » toolbox: combination of three methods (amdf, spectral comb, autocorrelation) Fundamental frequency modeling: Computation of statistics on each pseudo-syllable: skewness & kurtosis 1 Institut de Recherche en Informatique de Toulouse UMR 5505 CNRS - Université Paul Sabatier - INP Toulouse Cedex 4 - France 2 Laboratoire Dynamique du Langage UMR 5596 CNRS - Université Lumière Lyon Lyon Cedex 7 - France Model Item Experiments Jean-Luc ROUAS 1, Jérôme FARINAS 1, François PELLEGRINO 2 and Régine ANDRÉ-OBRECHT 1 {rouas, jfarinas, Results of the prosodic system on read speech (MULTEXT corpus) Results of the prosodic system on spontaneous telephone speech (OGI MLTS corpus) The prosodic modeling uses Gaussian Mixture Models (GMM) on a set of 9 parameters extracted from each pseudo-syllable: Dc, Dv, Nc, F0 mean, F0 variance, F0 skewness, F0 kurtosis, the accent location, the F0 bandwidth. Language specific models are learned using VQ and EM algorithms on learning subsets of the corpus. Recognition