On the use of Intonation in ASR: preliminary results March, 13th 14th 2003 Meeting COST 275 Halmstad.

Slides:



Advertisements
Similar presentations
1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.
Advertisements

Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution.
Language Learning Center
The Role of F0 in the Perceived Accentedness of L2 Speech Mary Grantham O’Brien Stephen Winters GLAC-15, Banff, Alberta May 1, 2009.
“Effect of Genre, Speaker, and Word Class on the Realization of Given and New Information” Julia Agustín Gravano & Julia Hirschberg {agus,
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
Lunch & Learn – Session 9 What does good look like? 17 th April 2014.
Analyses on IFA corpus Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC) Project meeting INTAS.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
ACCENT MODIFICATION OF POLYSYLLABIC WORDS IN A PHRASE Marina Moscow – подготовка речи и презентации.
Facial expression as an input annotation modality for affective speech-to-speech translation Éva Székely, Zeeshan Ahmed, Ingmar Steiner, Julie Carson-Berndsen.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
EMOTIONS NATURE EVALUATION BASED ON SEGMENTAL INFORMATION BASED ON PROSODIC INFORMATION AUTOMATIC CLASSIFICATION EXPERIMENTS RESYNTHESIS VOICE PERCEPTUAL.
Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.
Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
Detecting missrecognitions Predicting with prosody.
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
March 25-Monday A strong introduction paragraph contains all of the following EXCEPT: A. A strong hook B. A strong “I think,” “I feel”, or “I believe”
STANDARDIZATION OF SPEECH CORPUS Li Ai-jun, Yin Zhi-gang Phonetics Laboratory, Institute of Linguistics, Chinese Academy of Social Sciences.
9 th Conference on Telecommunications – Conftele 2013 Castelo Branco, Portugal, May 8-10, 2013 Sara Candeias 1 Dirce Celorico 1 Jorge Proença 1 Arlindo.
A Few Paragraph Building Tips
Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms Mark D. Skowronski and John G. Harris.
Author(s) (Name of student) and their Affiliation (Department/Course/Club, School Name and Address) FUTURE DIRECTIONS RESULTS: ANALYSIS AND IMPLICATIONS.
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
Ekapol Chuangsuwanich and James Glass MIT Computer Science and Artificial Intelligence Laboratory,Cambridge, Massachusetts 02139,USA 2012/07/2 汪逸婷.
SPEECH CONTENT Spanish Expressive Voices: Corpus for Emotion Research in Spanish R. Barra-Chicote 1, J. M. Montero 1, J. Macias-Guarasa 2, S. Lufti 1,
Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University Korea.
Acoustic Properties of Taiwanese High School Students ’ Stress in English Intonation Advisor: Dr. Raung-Fu Chung Student: Hong-Yao Chen.
Signature with Text-Dependent and Text-Independent Speech for Robust Identity Verification B. Ly-Van*, R. Blouet**, S. Renouard** S. Garcia-Salicetti*,
Signature with Text-Dependent and Text-Independent Speech for Robust Identity Verification B. Ly-Van*, R. Blouet**, S. Renouard** S. Garcia-Salicetti*,
Automatic Identification and Classification of Words using Phonetic and Prosodic Features Vidya Mohan Center for Speech and Language Engineering The Johns.
Text independent speaker identification in multilingual environments I. Luengo, E. Navas, I. Sainz, I. Saratxaga, J. Sanchez, I. Odriozola and I. Hernaez.
Second Line Intrusion Detection Using Personalization DISA Sponsored GWU-CS.
The Effect of Pitch Span on Intonational Plateaux Rachael-Anne Knight University of Cambridge Speech Prosody 2002.
Staff Science PD Session Two. Programme 1. Today’s session 2. Opening Experiment 3. Science: What is it all about? 4. The Science Exemplars 5. Where to.
Problem-Based Learning Wiley Middle School Joseph Cutts and Phyllis Harvey.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
ESL PLC Meetings October 14 & 15, 2013 What’s Different About Teaching Reading to Students Learning English? Chapter 5: Fluency.
A dependency parser for Spanish. David Herrero Marco Language Processing and Computational Linguistics EDA171.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Imposing native speakers’ prosody on non-native speakers’ utterances: Preliminary studies Kyuchul Yoon Spring 2006 NAELL The Division of English Kyungnam.
Three Part Essay Structure How to write a collegiate five- paragraph essay.
Arlindo Veiga Dirce Celorico Jorge Proença Sara Candeias Fernando Perdigão Prosodic and Phonetic Features for Speaking Styles Classification and Detection.
Identification of MPEG-4 FDP Patterns in Human Faces using Data-Mining Techniques Work subsidized by projects: HUMODAN IST CICYT TIC
1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.
Merging Segmental, Rhythmic and Fundamental Frequency Features for Automatic Language Identification Jean-Luc Rouas 1, Jérôme Farinas 1 & François Pellegrino.
ENTERFACE’08 Multimodal Communication with Robots and Virtual Agents mid-term presentation.
Subjective evaluation of an emotional speech database for Basque Aholab Signal Processing Laboratory – University of the Basque Country Authors: I. Sainz,
1 Pertemuan 1 What is a paragraph? Matakuliah: Writing III Tahun: Versi: 3.
WRITING UP FYP REPORT Prabakaran Poopalan Maizatul Zolkapli March 2009.
Objectives of session By the end of today’s session you should be able to: Define and explain pragmatics and prosody Draw links between teaching strategies.
Yow-Bang Wang, Lin-Shan Lee INTERSPEECH 2010 Speaker: Hsiao-Tsung Hung.
Cluster Host Preparation Meeting Autumn Term 1a Overview and Action Planning Judith Carter Senior Adviser Complex Needs/Vulnerable Learners
Dr Anie Attan 26 April 2017 Language Academy UTMJB
August 15, 2008, presented by Rio Akasaka
A Seminar Report On Face Recognition Technology
Progress Report - V Ravi Chander
Implementing Boosting and Convolutional Neural Networks For Particle Identification (PID) Khalid Teli .
HINTS FOR PREPARING ARC APPLICATIONS
Studying Spoken Language Text 17, 18 and 19
Agustín Gravano & Julia Hirschberg {agus,
Writing Essays.
A maximum likelihood estimation and training on the fly approach
Speaker Identification:
Automatic Prosodic Event Detection
Presentation transcript:

On the use of Intonation in ASR: preliminary results March, 13th 14th 2003 Meeting COST 275 Halmstad

OUR GROUP UPV EI Bilbao Inma Hernáez (leader), Eva Navas, Jon Sanchez UVA ETSII Valladolid Valentín Cardeñoso, Isaac Moro, Carlos Vivaracho, David Escudero. Involved in a CICYT Project of Biometrics with Javier Ortega (Madrid) and Marcos Faundez (Barcelona). Experience in ASR and in Modelling Intonation

OUR AIM To work in the CICYT Project with the rest of the groups. To apply our knowledge in Intonation to the field of ASR. Here we present our preliminary results.

INTRODUCTION Why to make use of intonation in ASR? It is a feature that characterize to the speaker: Speakers of the same group have a similar prosody. Each speaker can have its own prosody. It is a very robust feature Different sessions Different microphones Other experiences in applying intonation to ASR SUPER SID: very simple model of intonation.

INTRODUCTION Aim of this preliminary work To show the potential capabilities of intonation facing different sessions and microphones To show that it can be important to make use of “sophisticated” models for getting benefits in ASR. Overview Presentation of the model of intonation. The corpus. The experiment of speaker verification. Considerations about the robustness of the results. Consideration about the use model of intonation. Conclusions and future work.

Modelling Intonation

BASIC IDEA FOR ITS APPLICATION TO ASR: TO COMPARE THE MODELS OF DIFFERENT SPEAKERS

Modelling Intonation

The Corpus Recorded at EUPMT by Marcos Faundez One paragraph read by 16 speakers in 2 sessions with 3 microphones. Each paragraph = 11 sentences, 106 stress groups intonation units. Speakers are male and in the same social group. The pitch was obtained automatically and segmented into intonation units by hand. Intonation was parameterised according to the intonation model.

The Experiment Speaker Verification. We have 6 recordings for each of the Speakers: 5 for modelling and 1 for testing. Each Speaker will have each Impostor. The impostor is modelled with the samples of the rest of speakers. We will repeat the experiment of verification six times (one for each of the possible set of tests) for each of the speakers. The classifier is based on Decision Trees C.45. Freeware WEKA.

Results Low rates, except for some of the speakers

Results: robustness No significant changes when different test input

Results: relevance of prosodic knowledge. Some parts of the utterance are more relevant depending of the speaker

Conclusions and future work Promising results: some speakers are recognised with high rates. Results are robust to changes in the session and in the microphones. Future work: To test the benefits of including this results in a ASR system. To explore the use of our methodology for modelling intonation in a more general way. Making use of more classes of intonation. Getting knowledge of which of the classes of intonation are more relevant for characterizing to the speaker. New corpura are welcome.

Stop the war Thank you