1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.

Slides:



Advertisements
Similar presentations
Information structuring in English dialogue class 4
Advertisements

Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.
Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
The Role of F0 in the Perceived Accentedness of L2 Speech Mary Grantham O’Brien Stephen Winters GLAC-15, Banff, Alberta May 1, 2009.
A new Machine Learning algorithm for Neoposy: coining new Parts of Speech Eric Atwell Computer Vision and Language group School of Computing University.
II. PHONOLOGY             .
EE3P BEng Final Year Project – 1 st meeting SLaTE – Speech and Language Technology in Education Martin Russell
/ nailon / – software for online analysis of prosody Interspeech 2006 special session: The prosody of turn-taking and dialog acts September 20, 2006 Jens.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
ACCENT MODIFICATION OF POLYSYLLABIC WORDS IN A PHRASE Marina Moscow – подготовка речи и презентации.
Facial expression as an input annotation modality for affective speech-to-speech translation Éva Székely, Zeeshan Ahmed, Ingmar Steiner, Julie Carson-Berndsen.
Vocal Emotion Recognition with Cochlear Implants Xin Luo, Qian-Jie Fu, John J. Galvin III Presentation By Archie Archibong.
Prosodic marking of appositive relative clause types in spoken discourse: pragmatic and phonetic analyses of a British English corpus Cyril Auran & Rudy.
Course Overview Lecture 1 Spoken Language Processing Prof. Andrew Rosenberg.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,
Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.
Detecting missrecognitions Predicting with prosody.
1 Phonetics Study of the sounds of Speech Articulatory Acoustic Experimental.
Exploring Universal Attribute Characterization of Spoken Languages for Spoken Language Recognition.
English Language Teaching: An Intercultural Dimension 李 欣 欣 Cindy Lee.
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
Prosody and NLP Seminar by Nikhil: Adith: Prachur: 06D05011 We have a presentation this Friday ?
1 Teaching computers to teach people to read and speak updates: (Stanford Open Source Lab ’08) see also:
As a conclusion, our system can perform good performance on a read speech corpus, but we will have to develop more accurate tools in order to model the.
Utterance Verification for Spontaneous Mandarin Speech Keyword Spotting Liu Xin, BinXi Wang Presenter: Kai-Wun Shih No.306, P.O. Box 1001,ZhengZhou,450002,
1 BILC SEMINAR 2009 Speech Recognition: Is It for Real? Tony Mirabito Defense Language Institute English Language Center (DLIELC) DLIELC.
The role of prosody in dialect synthesis and authentication Kyuchul Yoon Division of English Kyungnam University Spring 2008 Joint Conference of KSPS.
A Multimedia English Learning System Using HMMs to Improve Phonemic Awareness for English Learning Yen-Shou Lai, Hung-Hsu Tsai and Pao-Ta Yu Chun-Yu Chen.
Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University Korea.
Information Technology – Dialogue Systems Ulm University (Germany) Speech Data Corpus for Verbal Intelligence Estimation.
Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.
SPEECH SYNTHESIS --AusTalk Zhijie Shao Master of Computer Science Supervisor: Trent Lewis.
1 Hidden Markov Model 報告人:鄒昇龍. 2 Outline Introduction to HMM Activity of HMM Problem and Solution Conclusion Reference.
1 Determining query types by analysing intonation.
Nespole!’s Experiment on Multimodality (Summer 2001) Erica Costantini (University of Trieste) Fabio Pianesi (ITC-irst, Trento) Susanne Burger (CMU)
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
Annotating the HKCSE Pragmatically Martin Weisser Visiting Professor School of English and Education Guangdong University of Foreign Studies mail:
The Effect of Synchronous and Asynchronous CMC on Oral Performance in German (Zsuzsanna Ittzes Abrams,2003) Alex/N97C0027.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
1 Prosody-Based Automatic Segmentation of Speech into Sentences and Topics Elizabeth Shriberg Andreas Stolcke Speech Technology and Research Laboratory.
ESL PLC Meetings October 14 & 15, 2013 What’s Different About Teaching Reading to Students Learning English? Chapter 5: Fluency.
National Taiwan University, Taiwan
Imposing native speakers’ prosody on non-native speakers’ utterances: Preliminary studies Kyuchul Yoon Spring 2006 NAELL The Division of English Kyungnam.
The HTK Book (for HTK Version 3.2.1) Young et al., 2002.
Business English - Basic Business Venture is for people who need to use English in everyday business situations. Language in this course is presented in.
Introduction Part I Speech Representation, Models and Analysis Part II Speech Recognition Part III Speech Synthesis Part IV Speech Coding Part V Frontier.
Strategies to develop speaking skills. Introduction Oral communication.
金聲玉振 Taiwan Univ. & Academia Sinica 1 Spoken Dialogue in Information Retrieval Jia-lin Shen Oct. 22, 1998.
Speech recognition Home Work 1. Problem 1 Problem 2 Here in this problem, all the phonemes are detected by using phoncode.doc There are several phonetics.
Language in Cognitive Science. Research Areas for Language Computational models of speech production and perception Signal processing for speech analysis,
Control of prosodic features under perturbation in collaboration with Frank Guenther Dept. of Cognitive and Neural Systems, BU Carrie Niziolek [carrien]
Tone Recognition With Fractionized Models and Outlined Features Ye Tian, Jian-Lai Zhou, Min Chu, Eric Chang ICASSP 2004 Hsiao-Tsung Hung Department of.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Subjective evaluation of an emotional speech database for Basque Aholab Signal Processing Laboratory – University of the Basque Country Authors: I. Sainz,
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
Learning Deep Rhetorical Structure for Extractive Speech Summarization ICASSP2010 Justin Jian Zhang and Pascale Fung HKUST Speaker: Hsiao-Tsung Hung.
Objectives of session By the end of today’s session you should be able to: Define and explain pragmatics and prosody Draw links between teaching strategies.
Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues Diane Litman, Heather Friedberg, Kate Forbes-Riley University of Pittsburgh.
Speech Recognition Xiaofeng Lai. What is speech recognition?  Speech recognition :  This is the ability of a machine or program to identify words and.
Investigating Pitch Accent Recognition in Non-native Speech
3.0 Map of Subject Areas.
Why Study Spoken Language?
Why Study Spoken Language?
ANJANA RAJ English Optional
Organizing Your Speech
Acoustic-Prosodic and Lexical Entrainment in Deceptive Dialogue
2017 APSIPA A Study on Landmark Detection Based on CTC and Its Application to Pronunciation Error Detection Chuanying Niu1, Jinsong Zhang1, Xuesong Yang2.
The Application of Hidden Markov Models in Speech Recognition
Presentation transcript:

1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School of Computing

2 Outline Introduction -Intonation and Speech Recognition -Tendency of Speech Recognition Research -ISLE Speech Corpus -HTK Hidden Markov Model Toolkit Prosodic Annotation Human Evalution of Intonation Abilities Grouping of German Speakers by Intonation Ability HTK speech recognition experiments Conclusions Q & A

3 Intonation and Speech Recognition Intonation is important in Human Communication. -Convey the meaning and attitude of the speaker Intonation is important for Speech Recognition. -Acoustic Models (duration, F0, intensity) -Language Models (identify the dialogue type)

4 Tendency of Speech Recognition Research Intonation << Pronunciation Non-native speaker << Native speaker Speech recognition research for non-native speakers intonation is unique. Also, Intonation is paid less attention in CALL compared with pronunciation.

5 Features of Various Speech Recognition Research Research ReferenceNon-NGermanIntonG/PHTK (Taylor, 1998)NNYNY (Uebler, 1998)YNNNN (Stemmer, 2001)YYNNN (Teixeira, 1996)YYNNY (Hansen, 1995)YYYYN (Yan and Vaseghi, 2002)NNYNY (Jurafsky et al, 1994)YYNNN (Berkling et al,1998)YNNNY (Oba and Atwell, 2003)YYYYY

6 Objectives Analysis of non-native speakers English intonation. -If the HTK is able to distinguish intonation ? -Is it possible to train distinct models for different intonation ability groups? Prosodic annotation of written English text to produce model intonation patterns. Human evaluation to group German speakers by English intonation ability.

7 ISLE Speech Corpus (1) Re-use of speech corpus collected in ISLE Interactive Spoken Languge Education project. Leeds University, Universität Hamburg, Università di Milano-Bicocca, Entropic Ltd., Ernst Klett Verlag GmbH, and Dida*El S.R.L. Time-aligned audio recordings from 23 German and 23 Italian spoken learners English + 2 Native English Speakers.

8 ISLE Speech Corpus (2) Speaker adaptation -82 sentences edited from The Ascent of Everest e.g. It is in fact a story of many years, in which men tried to climb that mountain. Typical EFL exercises -Minimal Pairs and Polysyllabic words e.g. I said bad not bed. He's a photographer.

9 ISLE Speech Corpus (3) Annotated corpus -Pronuciation errors at word- and phone-levels -Stress errors at word level Prosodic annotation was added to a written transcription of the speech corpus in our research.

10 HTK Hidden Markov Model Toolkit Developed at Cambridge University Engineering Depertment (CUED). Free toolkit for building Hidden Markov Models (HMMs). Module call: available from both command line and script file. Used in speech recognition research and other pattern recogntion research. e.g. Hand writing recognition Facial recognition

11 Prosodic Annotation Purpose: Predict model intonation patternsto be compared against German spoken learners English. Instructions: From text structure to prosodic structure (Knowles, 1996) Environment: Windows Excel Amount: First 27 sentences from the Ascent of Everest

12 Result of Prosodic Annotation (1) 27 sentences, consisting of 429 words, were divided into 84 tone groups: prosodic phrases. 1 low rise, 3 high rise, 52 fall-rise and 28 fall patterns. First 10 sentences were modified according to native speakers recordings. 15 fall-rise and 10 fall patterns 1 low rise, 2 high rise and 4 fall-rise were deleted.

13 Result of Prosodic Annotation (2) (A_01)This is the story of how two men reached the top of Everest on the twenty-ninth of May nineteen fifty-three and came back safely to their friends below. (A_02)Yet this will not be the whole story. (A_03) The ascent of Everest was not the work of one day, nor even of those few unforgettable weeks in which we prepared and climbed that summer.

14 Human Evaluation of German Spoken Learners English Intonation Abilities Purpose: Group German speakers into good and poor intonation groups. Evaluator I: Computational linguistics researcher Evaluator II: English language teaching researcher Quantity: First 10 utterances from each speaker. -If all the tone types of an utterance was matched with model pattern, then it was judged as correct; otherwise incorrect.

15 Grouping of 23 German Speakers Grouping I: based on Evaluator I (Computational linguistics researcher) Grouping II: based on Evaluator II (English language teaching (ELT) researcher) Grouping III: agreement of Evaluator I and II. 23speakers 3exceptionally poor pronunciation speakers 8good 4intermediate 8poor intonation speakers

16 Result of Human Evalution and Grouping Two evaluators agreed about 63% (144 utterances out of 230) Evaluator II marked 109 errors, while Evaluator I marked 78 errors. However, 7 poor and 5 good speakers were same in Grouping I and Grouping II. 2 speakers were added to good intonation group in Grouping III.

17 Conditions of HTK Speech Recognition Experiments Monophone and triphone HMMs were trained. No language models were used. Perl script and configuration file were used for module calls. Number of training speakers: 6 speakers from the same intonation group. Number of test speakers: 2 (1 for Grouping III) speakers from each group.

18 Results of HTK experiments Recognition accuracy was generally higher when test and training speakers intonation abilities were same. Improvement was higher against triphone HMMs. Improvement was most significant in Experiment II. One poor intonation speaker showed negative improvement in all three experiments. Another poor speaker also showed the negative improvement in Experiment I.

19 Average Recognition Accuracies of Good Intonation Speakers (Parentheses show results against monophone HMMs) Good PoorImprovement Experiment I % (33.31 %) % (20.11 %) % (13.20 %) Experiment II % (34.84 %) % (19.41 %) % (15.43 %) Experiment III % (34.50 %) % (18.09 %) % (16.41 %) Trained Models

20 Average Recognition Accuracies of Poor Intonation Speakers (Parentheses show results against monophone HMMs) Good PoorImprovement Experiment I % (22.88 %) % (20.03 %) 6.76 % (-2.85 %) Experiment II % (34.84 %) % (19.41 %) % (1.67 %) Experiment III % (20.60 %) % (20.12 %) % (-0.48 %) Trained Models

21 Prosodic Keywords Tone type is decided by the last accented syllable. (Knowles, 1996) We called word containing the last accented syllable of each tone group the prosodic keyword. Recognition accuracy among prosodic keywords was counted for triphone cases of Experiment II. Improvement of recognition accuracy among prosodic keywords was higher that of overall. -Good test speakers: 26.00% (overall 19.20%) -Poor test speakers: 24.50% (overall 15.50%)

22 Irrelevance of Pronunciation Abilities Good intonation speakers tended to have slightly better pronunication ability than poor intonation speakers, although 3 exceptionally poor pronunciatioin speakers had been excluded. Additional experiments were executed taking 2 best and 2 worst pronunciation speakers from poor and good intonation groups, respectively. Similar improvement was observed in this experiment too.

23 Conclusions Matching of test and training speakers intonation abilities brought about higher recognition accuracy. HTK was able to distinguish good and poor intonation. Confirmed that German speakers weakness of English intonation was generally fall-rise patterns. Human evaluation was successful enough.

24 Future Work Expand tone types. (not only for fall-rise and fall patterns) Applied to other languages and to different native- speaker groups. Use of results in practical language-teaching systems.