: Recognition Speech Segmentation Speech activity detection Vowel detection Duration parameters extraction Intonation parameters extraction German Italian.

Slides:



Advertisements
Similar presentations
Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.
Advertisements

How does first language influence second language rhythm? Laurence White and Sven Mattys Experimental Psychology Bristol University.
Research & Development ICASSP' Analysis of Model Adaptation on Non-Native Speech for Multiple Accent Speech Recognition D. Jouvet & K. Bartkova France.
An Approach to ECG Delineation using Wavelet Analysis and Hidden Markov Models Maarten Vaessen (FdAW/Master Operations Research) Iwan de Jong (IDEE/MI)
Results obtained in speaker recognition using Gaussian Mixture Models Marieta Gâta*, Gavril Toderean** *North University of Baia Mare **Technical University.
Multipitch Tracking for Noisy Speech
Prosodics, Part 1 LIN Prosodics, or Suprasegmentals Remember, from our first discussions in class, that speech is really a continuous flow of initiation,
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Languages’ rhythm and language acquisition Franck Ramus Laboratoire de Sciences Cognitives et Psycholinguistique, Paris Jacques Mehler, Marina Nespor,
Sonority as a Basis for Rhythmic Class Discrimination Antonio Galves, USP. Jesus Garcia, USP. Denise Duarte, USP and UFGo. Charlotte Galves, UNICAMP.
Neural Net Algorithms for SC Vowel Recognition Presentation for EE645 Neural Networks and Learning Algorithms Spring 2003 Diana Stojanovic.
Recognition of Voice Onset Time for Use in Detecting Pronunciation Variation ● Project Description ● What is Voice Onset Time (VOT)? – Physical Realization.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
Language and Speaker Identification using Gaussian Mixture Model Prepare by Jacky Chau The Chinese University of Hong Kong 18th September, 2002.
1 Speech synthesis 2 What is the task? –Generating natural sounding speech on the fly, usually from text What are the main difficulties? –What to say.
A PRESENTATION BY SHAMALEE DESHPANDE
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh
Phonology, phonotactics, and suprasegmentals
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.
Eng. Shady Yehia El-Mashad
All features considered separately are relevant in a speech / music classification task. The fusion allows to raise the accuracy rate up to 94% for speech.
A Phonotactic-Semantic Paradigm for Automatic Spoken Document Classification Bin MA and Haizhou LI Institute for Infocomm Research Singapore.
Data Processing Functions CSC508 Techniques in Signal/Data Processing.
June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST.
As a conclusion, our system can perform good performance on a read speech corpus, but we will have to develop more accurate tools in order to model the.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky.
The vowel detection algorithm provides an estimation of the actual number of vowel present in the waveform. It thus provides an estimate of SR(u) : François.
1 Determining query types by analysing intonation.
Recognition of Speech Using Representation in High-Dimensional Spaces University of Washington, Seattle, WA AT&T Labs (Retd), Florham Park, NJ Bishnu Atal.
Frequency, Pitch, Tone and Length October 16, 2013 Thanks to Chilin Shih for making some of these lecture materials available.
Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.
English Phonetics 许德华 许德华. Objectives of the Course This course is intended to help the students to improve their English pronunciation, including such.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
Frequency, Pitch, Tone and Length February 12, 2014 Thanks to Chilin Shih for making some of these lecture materials available.
Tone, Accent and Quantity October 19, 2015 Thanks to Chilin Shih for making some of these lecture materials available.
© 2005, it - instituto de telecomunicações. Todos os direitos reservados. Arlindo Veiga 1,2 Sara Cadeias 1 Carla Lopes 1,2 Fernando Perdigão 1,2 1 Instituto.
Performance Comparison of Speaker and Emotion Recognition
A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-
Speech Perception.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Merging Segmental, Rhythmic and Fundamental Frequency Features for Automatic Language Identification Jean-Luc Rouas 1, Jérôme Farinas 1 & François Pellegrino.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Chenn-Jung Huang a*, Yi-Ju Yang b, Dian-Xiu Yang a, You-Jia Chen a a Department of Computer and Information Science b Institute of Ecology and Environmental.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
Definition of syllable One or more letters representing a unit ofletters spoken language consisting of a single uninterrupted sound.language A syllable.
Suprasegmental features and Prosody Lect 6A&B LING1005/6105.
“Articulatory Talking Head” Showcase Project, INRIA, KTH. Articulatory Talking Head driven by Automatic Speech Recognition INRIA, Parole Team KTH, Centre.
A Tutorial on Speaker Verification First A. Author, Second B. Author, and Third C. Author.
Acoustics´08 Paris, 29 June – July 2008
Text-To-Speech System for English
Automated Detection of Speech Landmarks Using
(2) Suprasegmentals The features such as pitch, stress, and length, which are used simultaneously with units larger than segments, are called “suprasegmentals.”
Statistical Models for Automatic Speech Recognition
Statistical Models for Automatic Speech Recognition
Measuring the Similarity of Rhythmic Patterns
Presentation transcript:

: Recognition Speech Segmentation Speech activity detection Vowel detection Duration parameters extraction Intonation parameters extraction German Italian Signal Language Frequency (kHz) 8 0 Amplitude Time (s)  Rhythm : - Duration C - Duration V - Complexity C Intonation : -  F0 -  F0 CCVV CCV CV CCCV CV Vowel Non Vowel Pause Pseudo Syllable Speech segmentation: statistical segmentation [André-Obrecht, 1988] Shorts segments (bursts and transient parts of sounds) Longer segments (steady parts of sounds) Speech Activity Detection and Vowel detection [Pellegrino & Obrecht, 2000] Spectral analysis of the signal Language and speaker independent algorithm Pseudo Syllable segmentation Derived from the most frequent syllable structure in the world: CV The speech signal is parsed in patterns matching the structure: C n V (n integer, can be 0). Duration Parameters 3 parameters are computed: Global consonantal segments duration D c Global vocalic segment duration D v Syllable complexity (N c : number of consonantal segments in the pseudo-syllable)  CCV = {D C D V N C } Intonation Parameters Fundamental frequency extraction: « MESSIGNAIX » toolbox: combination of three methods (amdf, spectral comb, autocorrelation) Fundamental frequency features: 2 parameters are computed: a measurement of the accent location (maximum F 0 location regarding to vocalic onset  F0 ) the normalized fundamental frequency bandwidth on each syllable (  F0 ).  F 0 = {  F 0  F 0 } Each pseudo-syllable is characterized by two vectors, one characterizing rhythmic units and the other characterizing intonation on each of these rhythmic units. For each language two Gaussian Mixture Models are learned to characterize the language specific  CCV and  F 0 distributions, using the EM algorithm with VQ initialization. Recognition Experiments Experiments were previously made on the five languages of the MULTEXT database: English, French, German, Italian and Spanish. Japanese was added thanks to Mr. Kitasawa. The tests are made using 20 seconds read speech utterances and consist in a six-way identification task. On this read speech corpus, our rhythm system can achieve good performance (79 % of correct identification on six languages) while the intonation system allows us to reach up to 62 % of correct identifications. Results of the intonation system on read speech (MULTEXT corpus) Results of the rhythmic system on read speech (MULTEXT corpus) Rhythmic model: Catch the rhythmic information. Achieves a clear separation between languages of different rhythmic classes.  Don’t catch the languages’ rhythm but characterizes language-specific rhythmic units,  Model sequences of these units to get a complete rhythm model. Fundamental frequency model: Better characterizes languages which have well defined intonational rules like Japanese and English.  We need to test our system on many more languages to confirm (or not) the linguistic classes hypothesis. Conclusion Automatic Modelling of Rhythm and Intonation for Language Identification 1 Institut de Recherche en Informatique de Toulouse UMR 5505 CNRS - Université Paul Sabatier - INP Toulouse Cedex 4 - France 2 Laboratoire Dynamique du Langage UMR 5596 CNRS - Université Lumière Lyon Lyon Cedex 7 - France Jean-Luc ROUAS 1, Jérôme FARINAS 1, François PELLEGRINO 2 and Régine ANDRÉ-OBRECHT 1 {jean-luc.rouas, jerome.farinas, Pseudo syllable generation French English Japanese Spanish Model Item Model Item Models