Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea.

Slides:



Advertisements
Similar presentations
1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.
Advertisements

The Role of F0 in the Perceived Accentedness of L2 Speech Mary Grantham O’Brien Stephen Winters GLAC-15, Banff, Alberta May 1, 2009.
Catia Cucchiarini Quantitative assessment of second language learners’ fluency in read and spontaneous speech Radboud University Nijmegen.
Automatic determination of skeletal age from hand radiographs of children Image Science Institute Utrecht University C.A.Maas.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
IBM Labs in Haifa © 2007 IBM Corporation SSW-6, Bonn, August 23th, 2007 Maximum-Likelihood Dynamic Intonation Model for Concatenative Text to Speech System.
AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University.
Sentence Durations and Accentedness Judgments ABSTRACT Talkers in a second language can frequently be identified as speaking with a foreign accent. It.
ACCENT MODIFICATION OF POLYSYLLABIC WORDS IN A PHRASE Marina Moscow – подготовка речи и презентации.
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
AUTOMATIC SPEECH CLASSIFICATION TO FIVE EMOTIONAL STATES BASED ON GENDER INFORMATION ABSTRACT We report on the statistics of global prosodic features of.
Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English: The CUHK Experience Helen Meng, Wai-Kit.
MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.
G. Valenzise *, L. Gerosa, M. Tagliasacchi *, F. Antonacci *, A. Sarti * IEEE Int. Conf. On Advanced Video and Signal-based Surveillance, 2007 * Dipartimento.
AN INTRODUCTION TO PRAAT Tina John M.A. Institute of Phonetics and digital Speech Processing - University Kiel Institute of Phonetics and Speech Processing.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,
Looking at Spectrogram in Praat cs4706, Jan 30 Fadi Biadsy.
Primary Stress and Intelligibility: Research to Motivate the Teaching of Suprasegmentals By Laura D. Hahn Afra MA Carolyn MA Josh MA
Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.
Praat Fadi Biadsy.
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
Building High Quality Databases for Minority Languages such as Galician F. Campillo, D. Braga, A.B. Mourín, Carmen García-Mateo, P. Silva, M. Sales Dias,
Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.
Copyright © 2001 by The Psychological Corporation 1 The Academic Competence Evaluation Scales (ACES) Rating scale technology for identifying students with.
Schizophrenia and Depression – Evidence in Speech Prosody Student: Yonatan Vaizman Advisor: Prof. Daphna Weinshall Joint work with Roie Kliper and Dr.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
The role of prosody in dialect synthesis and authentication Kyuchul Yoon Division of English Kyungnam University Spring 2008 Joint Conference of KSPS.
Chapter 12 – Discriminant Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
SPEECH CONTENT Spanish Expressive Voices: Corpus for Emotion Research in Spanish R. Barra-Chicote 1, J. M. Montero 1, J. Macias-Guarasa 2, S. Lufti 1,
The relationship between objective properties of speech and perceived pronunciation quality in read and spontaneous speech was examined. Read and spontaneous.
Acoustic Properties of Taiwanese High School Students ’ Stress in English Intonation Advisor: Dr. Raung-Fu Chung Student: Hong-Yao Chen.
A prosodically sensitive diphone synthesis system for Korean Kyuchul Yoon Linguistics Department The Ohio State University.
Segmental encoding of prosodic categories: A perception study through speech synthesis Kyuchul Yoon, Mary Beckman & Chris Brew.
Bernd Möbius CoE MMCI Saarland University Lecture 7 8 Dec 2010 Unit Selection Synthesis B Möbius Unit selection synthesis Text-to-Speech Synthesis.
Copyright © 2015 by Educational Testing Service. 1 Feature Selection for Automated Speech Scoring Anastassia Loukina, Klaus Zechner, Lei Chen, Michael.
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
1 Current Interests 2007~2008 (Unfinished papers & Premature ideas) 1.Identifying frication & aspiration noise in the frequency domain: The case of Korean.
Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie.
Using Technology to Teach Pronunciation A review of the research from Melike Yücel Eleonora Frigo Laurie Wayne Ling 578, Winter 2010, Dr. Arnold.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Imposing native speakers’ prosody on non-native speakers’ utterances: Preliminary studies Kyuchul Yoon Spring 2006 NAELL The Division of English Kyungnam.
1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.
Merging Segmental, Rhythmic and Fundamental Frequency Features for Automatic Language Identification Jean-Luc Rouas 1, Jérôme Farinas 1 & François Pellegrino.
Automatic Pronunciation Scoring of Specific Phone Segments for Language Instruction EuroSpeech 1997 Authors: Y. Kim, H. Franco, L. Neumeyer Presenter:
The role of prosody in dialect authentication Simulating Masan dialect with Seoul speech segments Kyuchul Yoon Division of English, Kyungnam University.
CS Machine Learning Instance Based Learning (Adapted from various sources)
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Dialect Simulation through Prosody Transfer: A preliminary study on simulating Masan dialect with Seoul dialect Kyuchul Yoon Division of English, Kyungnam.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
2014 Development of a Text-to-Speech Synthesis System for Yorùbá Language Olúòkun Adédayọ̀ Tolulope Department of Computer Science.
영어교육에 있어서의 영어억양의 역할 (The role of prosody in English education) Korea Nazarene University Kyuchul Yoon English Division Kyungnam University.
Praat: doing phonetics by computer Introductory tutorial Kyuchul Yoon Division of English Kyungnam University.
A Text-free Approach to Assessing Nonnative Intonation Joseph Tepperman, Abe Kazemzadeh, and Shrikanth Narayanan Signal Analysis and Interpretation Laboratory,
Dean Luo, Wentao Gu, Ruxin Luo and Lixin Wang
Sentence Durations and Accentedness Judgments
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
August 15, 2008, presented by Rio Akasaka
Text-To-Speech System for English
Do-Gil Lee1*, Ilhwan Kim1 and Seok Kee Lee2
Differences in comprehension strategies for discourse understanding by native Chinese and Korean speakers learning Japanese Katsuo Tamaoka Graduate.
Speech Technology for Language Learning
Automatic Fluency Assessment
Quality Control at a Local Brewery
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
Anastassia Loukina, Klaus Zechner, James Bruno, Beata Beigman Klebanov
Tools for Speech Analysis
Looking at Spectrogram in Praat cs4706, Jan 30
Presentation transcript:

Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University Korea University Part A

English pronunciation evaluation  English pronunciation proficiency evaluation – Ultimate goals Evaluation at –The segmental level –The suprasegmental level – Current goals Evaluation at –The suprasegmental level Introduction

English pronunciation evaluation  The goal of present study – Prosody evaluation of a single target utterance Produced by a Korean student Given –An English target sentence –A sentential model for prosody evaluation Introduction

Manual vs. automatic  Problems of manual evaluation – What to evaluate – How to evaluate – Consistency  Problems of automatic evaluation – How to reflect human knowledge Introduction

Manual vs. automatic  A possible solution? – Avoid knowledge-based abstraction Compare a target utterance with native speakers’ utterances – Use multiple utterances for comparison Multiple “good” utterances from native speakers – Adopt raw values Calculate difference values between the target and the “good” utterances in terms of –The three prosodic aspects : F0, intensity, durations  3D coordinates Introduction

How to build the model  Use multivariate statistical analysis – A discriminant analysis  The components of the model (The segmental proficiency scores controlled) – The manual prosody evaluation scores (response) – The automatic prosody evaluation scores (factors)  The requirements of the model – The correlation between the two levels Manual scores vs. Automatic scores Introduction

How to build the model  The manual prosody scores (an ideal case) The “good” utterance versions (point 5) by many native speakers of English The utterance versions by Korean students whose prosodic proficiencies are High (point 5) Intermediate (point 3) Low (point 1) On a scale of 1 (worst) to 5 (best) Introduction

How to build the model  The automatic prosody scores Use of Praat scripts Comparison between a single target utterance & multiple native speakers’ utterances to yield scores for –The F0 difference –The intensity difference –The duration difference in the form of 3D coordinates (x, y, z) = (F0, Int, Dur) One utterance yields as many coordinates as the number of “good” native speakers Introduction

How to build the model  Evaluation by comparisons Introduction

A 3D sentential model for prosody evaluation  A 3D model – 3D axes: F0, intensity, durations (F0, Int, Dur) coordinates = (x, y, z) – Automatic scores as scatterplot points – Manually evaluated scores group the points Introduction

A 3D sentential model for prosody evaluatioin  Validity of the model – Sufficient separation of groups with different manual scores – colors : manual scores – arrowheads : automatic scores Introduction

Sentential prosody evaluation [7] Before & after duration manipulation native learner before learner after Methods

Sentential prosody evaluation [7] F0 : point-to-point comparison btw/ native and learner after normalization native learner after Methods Automatic score (F0, Int, Dur) (x, y, z)

Sentential prosody evaluation [7] Intensity : point-to-point comparison btw/ native and learner after normalization native learner after Methods Automatic score (F0, Int, Dur) (x, y, z)

Sentential prosody evaluation [7] Duration : segment-to-segment comparison btw/ native and learner native learner before Methods P = (p1, p2, p3,..., pn) and Q = (q1, q2, q3,..., qn) in Euclidean n-dimensional space Euclidean distance metric for evaluation measure Automatic score (F0, Int, Dur) (x, y, z)

Manual evaluation of sentential prosody Methods Manual scores for Set B utterances “The dancing queen likes only the apple pies”

Sentential prosody evaluation [7] Methods A sample score array for one utterance from group K5: one learner utterance vs. 10 model native utterances Automatic prosody score for K5.U1 = {(899,142,408), (360,92,190), …(716,178,183)}

A prosody evaluation model by a Korean phonetician Results Korean phonetician’s Model

A prosody evaluation model by a Korean phonetician Results Korean phonetician’s Model

A sample prosody evaluation with a discriminant analysis Results

To make this fully automatic Discussion  For manual evaluation of the training model – The number of Korean learners The more the better – The levels of English proficiency The diverse the better (scores 1 through 5)  For automatic evaluation of the trainees – Need automatic segmentation (ASR) – Need to deal with redundant/missing segments

Building a sentential model for automatic evaluation of pronunciation proficiency What about segmental evaluation? Part B

Segmental evaluation by spectral comparison Methods  Sex/age controlled (no normalization was used) – Adult male (native/Korean) speakers were selected  Spectral comparison – Three equally-spaced spectral slices were used for each matching segments – Euclidean distance measure was used from a pair of matching spectral envelopes  Four coordinates for pronunciation proficiency evaluation – Segments, F0, intensity, durations – (w, x, y, z) becomes one of the score array

Manual evaluation of overall proficiency Methods Manual scores for Set C utterances “Put your toys away right now” The overall scores of the 34 utterances for Set C sentence “Put your toys away right now”. The manual evaluation was performed by a Korean phonetician. Note that the subjects were all male adults.

A pronunciation proficiency evaluation model by a Korean phonetician Results Korean phonetician’s Models (Intensity axis not shown)

A prosody evaluation model by a Korean phonetician Results Korean phonetician’s Model

A discriminant analysis Results The classification table from the discriminant analysis of one test data. The number in each cell represents the probability of the automatic pronunciation Proficiency score being classified into the predicted group. The confusion matrix for the classification table.

Discriminant analyses with leave-one-out cross-validation Results Testing for score 4 : 6 out of 9 correct Testing for score 2 : 12 out of 15 correct

Discriminant analyses with leave-one-out cross-validation Results  For N4 & K2 groups, evaluation models were built by using – The discriminant analysis with – Leave-one-out cross-validation  The number of models (built by discriminant analyses) was 24 – Group N4 : 9 subjects – Group K2 : 15 subjects  Success rate – Group N4 : 6 out of 9 predicted correct – Group K2 : 12 out of 15 predicted correct

Automatic evaluation of pronunciation proficiency Discussion  Viability of sentential models for the evaluation of – Segmental proficiency : spectral comparison – Prosodic proficiency : F0/intensity/durations in the form of multiple score array coordinates (segments, F0, intensity, durations) = (w, x, y, z)  Comparison seems to work – A target utterance vs. multiple model native utterances  Better models can be built with – More (controlled) utterances – More score resolution Current : score 2 (bad) – score 4 (good) Future : score 1 (worst) – score 3 (fair) – score 5 (best)

References [1] Boersma, Paul, “Praat, a system for doing phonetics by computer”, Glot International 5(9/10), pp , [2] Mahalanobis, P.C., “On the generalized distance in statistics”, Proceedings of the National Institute of Science of India 12, pp.49-55, [3] Moulines, E. & F. Charpentier, “Pitch synchronous waveform processing techniques for text-to-speech synthesis using diphones”, Speech Communication 9, pp , [4] Ramus, F., M. Nespor, J. Mehler, “Correlates of linguistic rhythm in the speech signal”, Cognition 73, pp , [5] Rhee, S., S. Lee, Y. Lee & S. Kang, “Design and construction of Korean-Spoken English Corpus (K-SEC)”, Malsori 46, pp , [6] Yoon, K, “Imposing native speakers' prosody on non-native speakers' utterances: The technique of cloning prosody”, Journal of the Modern British & American Language & Literature 25(4), pp , [7] Yoon, K Synthesis and evaluation of prosodically exaggerated utterances. Unpublished manuscript