The vowel detection algorithm provides an estimation of the actual number of vowel present in the waveform. It thus provides an estimate of SR(u) : François.

Slides:

Advertisements

Similar presentations

Research & Development ICASSP' Analysis of Model Adaptation on Non-Native Speech for Multiple Accent Speech Recognition D. Jouvet & K. Bartkova France.

Advertisements

Objectives 10.1 Simple linear regression

Inference for Regression

FTP Biostatistics II Model parameter estimations: Confronting models with measurements.

 Coefficient of Determination Section 4.3 Alan Craig

Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.

AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University.

: Recognition Speech Segmentation Speech activity detection Vowel detection Duration parameters extraction Intonation parameters extraction German Italian.

Analysis and Synthesis of Shouted Speech Tuomo Raitio Jouni Pohjalainen Manu Airaksinen Paavo Alku Antti Suni Martti Vainio.

Chapter 10 Regression. Defining Regression Simple linear regression features one independent variable and one dependent variable, as in correlation the.

Objectives (BPS chapter 24)

The Simple Linear Regression Model: Specification and Estimation

Emotion in Meetings: Hot Spots and Laughter. Corpus used ICSI Meeting Corpus – 75 unscripted, naturally occurring meetings on scientific topics – 71 hours.

Copyright (c) Bani K. Mallick1 STAT 651 Lecture #18.

Intro to Statistics for the Behavioral Sciences PSYC 1900

Chapter Topics Types of Regression Models

Varying Input Segmentation for Story Boundary Detection Julia Hirschberg GALE PI Meeting March 23, 2007.

So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.

Correlation & Regression

Correlation and Linear Regression

Example of Simple and Multiple Regression

Correlation and Linear Regression

Introduction Mel- Frequency Cepstral Coefficients (MFCCs) are quantitative representations of speech and are commonly used to label sound files. They are.

STATISTICS: BASICS Aswath Damodaran 1. 2 The role of statistics Aswath Damodaran 2  When you are given lots of data, and especially when that data is.

All features considered separately are relevant in a speech / music classification task. The fusion allows to raise the accuracy rate up to 94% for speech.

Zero Resource Spoken Term Detection on STD 06 dataset Justin Chiu Carnegie Mellon University 07/24/2012, JHU.

SE-280 Dr. Mark L. Hornick 1 Statistics Review Linear Regression & Correlation.

by B. Zadrozny and C. Elkan

As a conclusion, our system can perform good performance on a read speech corpus, but we will have to develop more accurate tools in order to model the.

Evaluation of Alternative Methods for Identifying High Collision Concentration Locations Raghavan Srinivasan 1 Craig Lyon 2 Bhagwant Persaud 2 Carol Martell.

On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.

Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.

Correlation and Prediction Error The amount of prediction error is associated with the strength of the correlation between X and Y.

Examining Relationships in Quantitative Research

Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.

Experimental research in noise influence on estimation precision for polyharmonic model frequencies Natalia Visotska.

Simple Linear Regression. The term linear regression implies that  Y|x is linearly related to x by the population regression equation  Y|x =  +  x.

CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.

11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.

Robust Estimators.

Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,

Katherine Morrow, Sarah Williams, and Chang Liu Department of Communication Sciences and Disorders The University of Texas at Austin, Austin, TX

© 2005, it - instituto de telecomunicações. Todos os direitos reservados. Arlindo Veiga 1,2 Sara Cadeias 1 Carla Lopes 1,2 Fernando Perdigão 1,2 1 Instituto.

ANOVA, Regression and Multiple Regression March

Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.

Speech recognition Home Work 1. Problem 1 Problem 2 Here in this problem, all the phonemes are detected by using phoncode.doc There are several phonetics.

Example x y We wish to check for a non zero correlation.

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.

Merging Segmental, Rhythmic and Fundamental Frequency Features for Automatic Language Identification Jean-Luc Rouas 1, Jérôme Farinas 1 & François Pellegrino.

Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Regression.

Method 3: Least squares regression. Another method for finding the equation of a straight line which is fitted to data is known as the method of least-squares.

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

Comparison of filters for burst detection M.-A. Bizouard on behalf of the LAL-Orsay group GWDAW 7 th IIAS-Kyoto 2002/12/19.

Regression Analysis: A statistical procedure used to find relations among a set of variables B. Klinkenberg G

Paul van Mulbregt Sheera Knecht Jon Yamron Dragon Systems Detection at Dragon Systems.

Correlation, Bivariate Regression, and Multiple Regression

AP Statistics Chapter 14 Section 1.

Mixed Costs Chapter 2: Managerial Accounting and Cost Concepts. In this chapter we explain how managers need to rely on different cost classifications.

Text-To-Speech System for English

Chapter 14: Correlation and Regression

Happiness comes not from material wealth but less desire.

2.1 Equations of Lines Write the point-slope and slope-intercept forms

EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES

6.2 Grid Search of Chi-Square Space

Correlation and Regression

Chapter 7: Sampling Distributions

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression

Presentation transcript:

The vowel detection algorithm provides an estimation of the actual number of vowel present in the waveform. It thus provides an estimate of SR(u) : François Pellegrino 1, J. Farinas 2 & J.-L. Rouas 2 1 Laboratoire Dynamique Du Langage, UMR 5596 CNRS – Univ. Lumière Lyon 2, France 2 Institut de Recherche en Informatique de Toulouse, UMR 5505 CNRS – Univ. Toulouse 3, France {jfarinas; Automatic Estimation of Speaking Rate in Multilingual Spontaneous Speech Language Number of speakers (spontaneous speech) Mean duration per speaker (std) English144 (111)47.1 (3.4) German98 (89)42.7 (8.4) Hindi68 (n.a.)46.5 (6.0) Japanese64 (55)46.1 (5.1) Mandarin69 (69)39.9 (10.7) Spanish108 (106)45.6 (5.6) Language Mean SR with pauses (± CI) Mean SR no pauses (± CI) English3.8 (± 0.11)5.0 (± 0.09) German3.6 (± 0.11)5.0 (± 0.12) Hindi3.7 (± 0.16)5.7 (± 0.14) Japanese4.9 (± 0.25)7.0 (± 0.19) Mandarin3.0 (± 0.19)4.7 (± 0.16) Spanish4.2 (± 0.14)6.0 (± 0.13) LanguageRR²Linear regression English y = 0.89x German y = 0.63x Hindi y = 0.94x Japanese y = 1.14x Mandarin y = 0.98x Spanish y = 1.00x Mean Experiments are performed using a subset of the OGI Multilingual Telephone Speech Corpus for which a hand-made phonetic transcription is provided. For each speaker, one excerpt about 40 seconds long is phonetically labeled and tagged as ‘spontaneous’ or ‘read’. This tagging is missing for Hindi. Corpus Let u be the utterance for which the SR is computed. Let N V (u) be the number of vowel segments labeled along this utterance and D(u), the duration of the utterance. The mean Speaking Rate along the utterance SR(u) is thus defined as: Speech rate calculation Mean and standard deviation values are computed in term of hand- labeled vowels per second. The lowest mean SR is reached for Mandarin (3.0) while the fastest rate is Japanese one (4.9). Cross-linguistic comparison Speaking rate Estimation Results are given below both in terms of correlation coefficients and of linear regression (computed with SPSS). All correlations are highly significant (p<.0001). The worst correlation is reached with German, but it’s still pretty high. It may be explained by the slope revealed by the linear regression (0.63) that means that a significant amount of false alarms occur. On the contrary, for Japanese, the number of vowels seems to be underestimated (slope equals 1.14). This value is due to a bias introduced by the hand-labeling procedure where phonemic long vowels are labeled as two successive segments. Merging these two segments in one unique vowel leads to mean SR of 3.9 (SRns = 5.4), R coefficient of 0.89 and a more conventional linear regression equation: y = 0.92x As a conclusion, the speaking rate detector performs well on all studied languages (on average, R = 0.84). Correlation is quite good, especially for Hindi. It means that this approach may be useful to adapt a system to a specific speaking rate and to accomplish a basic normalization for prosodic modeling purposes. However, it is still necessary to evaluate the specific impact of the SR on either vowels or consonants. Going further with the estimation of the local Speaking Rate in terms of number of vowels per effective second of speech implies to use an efficient Speech Activity Detector and last but not least, to detect filled pauses as well. To reach this goal, taking advantage of the statistical segmentation we already use is planned. Conclusion Frequency (kHz) Time (s) Amplitude Time (s) NonVowel Pause Vowel E laEmEt  eb  n Vowel detection algorithm Automatic Speech rate calculation The automatic vowel detection algorithm provides a good way to estimate the mean SR. However, several parameters may influence the precision of the prediction. First, it appears that considering a SR averaged along the excerpt may be problematic, especially when the speaking rate is widely varying along the utterance and obviously because of the presence of pauses. Other effects are more difficult to predict. For instance, the algorithm performances are varying with very different SR: two speakers, a very fast one (bold lines, SR = 6.9) and a more standard one (thin lines, SR = 3.8) are represented (automatic detection of vowel (red lines) vs hand-labeled phonetic transcription (black lines)). Discussion