DU, C-SIIT1 Collecting and Transcribing Real Chinese Spontaneous Telephone Speech Corpus Limin Du, Chair Professor Director, Center for Speech Interactive.

Slides:



Advertisements
Similar presentations
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Advertisements

Tuning Jenny Burr August Discussion Topics What is tuning? What is the process of tuning?
Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
15.0 Utterance Verification and Keyword/Key Phrase Spotting References: 1. “Speech Recognition and Utterance Verification Based on a Generalized Confidence.
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
Sean Powers Florida Institute of Technology ECE 5525 Final: Dr. Veton Kepuska Date: 07 December 2010 Controlling your household appliances through conversation.
Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English: The CUHK Experience Helen Meng, Wai-Kit.
Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,
Student simulation and evaluation DOD meeting Hua Ai 03/03/2006.
Designing a Multi-Lingual Corpus Collection System Jonathan Law Naresh Trilok Pace University 04/19/2002 Advisors: Dr. Charles Tappert (Pace University)
SPOKEN LANGUAGE SYSTEMS MIT Computer Science and Artificial Intelligence Laboratory Mitchell Peabody, Chao Wang, and Stephanie Seneff June 19, 2004 Lexical.
Extracting Social Meaning Identifying Interactional Style in Spoken Conversation Jurafsky et al ‘09 Presented by Laura Willson.
1 How To Annotate Interactions Using Dialog Function Units (Part 1) by Michal Novemsky (with the help of Becky Passonneau & Eddie Kang) CCLS, Columbia.
Why is ASR Hard? Natural speech is continuous
Geography 241 – GIS I Dr. Patrick McHaffie Associate Professor Department of Geography Cook County, % population < 5.
Biometrics: Voice Recognition
1 Problems and Prospects in Collecting Spoken Language Data Kishore Prahallad Suryakanth V Gangashetty B. Yegnanarayana Raj Reddy IIIT Hyderabad, India.
Unit 9 Teaching Listening. Teaching objectives  1. know characteristics of the listening process  2. grasp principles for teaching listening  3. know.
ISSUES IN SPEECH RECOGNITION Shraddha Sharma
Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.
Speech Recognition Final Project Resources
Lightly Supervised and Unsupervised Acoustic Model Training Lori Lamel, Jean-Luc Gauvain and Gilles Adda Spoken Language Processing Group, LIMSI, France.
Twenty-First Century Automatic Speech Recognition: Meeting Rooms and Beyond ASR 2000 September 20, 2000 John Garofolo
STANDARDIZATION OF SPEECH CORPUS Li Ai-jun, Yin Zhi-gang Phonetics Laboratory, Institute of Linguistics, Chinese Academy of Social Sciences.
Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,
How Spread Works. Spread Spread stands for Speech and Phoneme Recognition as Educational Aid for the Deaf and Hearing Impaired Children It is a game used.
Recent Activities of Speech Corpora and Assessment in Korea Yong-Ju Lee Wonkwang University Korea.
AS LEVEL ICT2 Processing Different Types of Information.
May 2006CLINT-CS Verbmobil1 CLINT-CS Dialogue II Verbmobil.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Speech, Perception, & AI Artificial Intelligence CMSC March 5, 2002.
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
Arizona English Language Learner Assessment AZELLA
Presented by: Fang-Hui Chu Boosting HMM acoustic models in large vocabulary speech recognition Carsten Meyer, Hauke Schramm Philips Research Laboratories,
University of Maribor Faculty of Electrical Engineering and Computer Science AST ’04, July 7-9, 2004 Slovenian Lexica and Corpora in the Scope of the LC-STAR.
Rundkast at LREC 2008, Marrakech LREC 2008 Ingunn Amdal, Ole Morten Strand, Jørn Almberg, and Torbjørn Svendsen RUNDKAST: An Annotated.
The vowel detection algorithm provides an estimation of the actual number of vowel present in the waveform. It thus provides an estimate of SR(u) : François.
Weak AI: Can Machines Act Intelligently? Some things they can do: –Computer vision: face recognition from a large set –Robotics: autonomous (mostly) car.
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments 張智星
Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Automatic Transcription of Natural Speech - A Broader Perspective – Dirk Van Compernolle ESAT.
Structural Metadata Annotation of Speech Corpora: Comparing Broadcast News and Broadcast Conversations Jáchym KolářJan Švec University of West Bohemia.
A Fully Annotated Corpus of Russian Speech
Introduction to Speech Neal Snider, For LIN110, April 12 th, 2005 (adapted from slides by Florian Jaeger)
Stentor A new Computer-Aided Transcription software for French language.
© 2005, it - instituto de telecomunicações. Todos os direitos reservados. Arlindo Veiga 1,2 Sara Cadeias 1 Carla Lopes 1,2 Fernando Perdigão 1,2 1 Instituto.
ONZEminer Margaret Maclagan, ONZE director Robert Fromont, designer.
Using Voice to Solve Ergonomic Problems Dr. William Lenharth, CHFP UNH – Project54.
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
Copyright © 2013 by Educational Testing Service. All rights reserved. Evaluating Unsupervised Language Model Adaption Methods for Speaking Assessment ShaSha.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Automatic Pronunciation Scoring of Specific Phone Segments for Language Instruction EuroSpeech 1997 Authors: Y. Kim, H. Franco, L. Neumeyer Presenter:
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
S1S1 S2S2 S3S3 8 October 2002 DARTS ATraNoS Automatic Transcription and Normalisation of Speech Jacques Duchateau, Patrick Wambacq, Johan Depoortere,
A POCKET GUIDE TO PUBLIC SPEAKING 5 TH EDITION Chapter 17 Your Voice in Delivery.
A SCRIPT FOR ARCHIVING DIGITAL RESEARCH DATA IMPROVING ACCURACY AND EFFICIENCY IN THE DATAVERSE NETWORK ABSTRACT SUMMARY Rachel Carriere, Thu-Mai Christian,
Audio Books for Phonetics Research CatCod2008 Jiahong Yuan and Mark Liberman University of Pennsylvania Dec. 4, 2008.
Arnar Thor Jensson Koji Iwano Sadaoki Furui Tokyo Institute of Technology Development of a Speech Recognition System For Icelandic Using Machine Translated.
How can speech technology be used to help people with disabilities?
Automatic screening of Alzheimer's disease using speech recognition
Automatic Speech Recognition
Chapter 6. Data Collection in a Wizard-of-Oz Experiment in Reinforcement Learning for Adaptive Dialogue Systems by: Rieser & Lemon. Course: Autonomous.
Towards Emotion Prediction in Spoken Tutoring Dialogues
3.0 Map of Subject Areas.
A Country Report – COCOSDA Activities in China Data More and more companies on data resources and services suppliers are emerging in China: a new.
Audio Books for Phonetics Research
Tetsuya Nasukawa, IBM Tokyo Research Lab
Presentation transcript:

DU, C-SIIT1 Collecting and Transcribing Real Chinese Spontaneous Telephone Speech Corpus Limin Du, Chair Professor Director, Center for Speech Interactive Information Technology Institute of Acoustics, Chinese Academy of Sciences October 21, 2000

DU, C-SIIT2 Background n n Spontaneous speech interactive via telephone is a very prospect application, building speech recognition systems in terms of the variations in acoustics and spoken styles for telephone application is necessary n n There is no large-scale Chinese Spontaneous Telephone Speech Corpus available for research – –Simulating telephone speech corpus (1997, C-SIIT, IOA, CAS) n n Microphone speech corpus – pipeline to telephone – telephone speech – –Collecting real telephone speech data seems to be a formidable task n n Laws n n Costs n n Chinese-English speech translation (CEST) project, an collaboration between CAS-AT&T ( ) is an strong driving for this work

DU, C-SIIT3 Real Telephone Speech Collection n n A “dialogue oriented” collection paradigm –Human-Human conversations –Human-machine dialogues Real Information Service Center Caller Hotel Information Desk Computer- phone OR Dialogue card Caller Simulated Human or Machine Service Agent Computer Data storage Labelin g is so cool!

DU, C-SIIT4 Speech Data Processing n Sampling –8kHz sampling –16bits A/D quantization n Utterance Segmentation –One Speaker switching for one utterance –Utterances in average length of 3 seconds

DU, C-SIIT5 Speech Data Transcribing n n What to Label? n n How to Label?

DU, C-SIIT6 What to Label? n n Information about Speakers and Environments – –speaker’s dialect, mood, gender, speech quality n n Transcribing – –Chinese characters – –Pinyins – –Other acoustic event labels n n laugh, lip smack, throat clearing, breath, cough, filled pauses, telephone adjusting, background speech, etc. n Time Stamp –are bracketed with time stamps automatically when transcribing with a special software tool –Other acoustic event are bracketed with time stamps automatically when transcribing with a special software tool

DU, C-SIIT7 Detailed Issues Concerned n Mispronunciation –Mispronunciation often occurs in daily life. For example the speaker probably read Chinese character “ 山 ” (who’s correct pronunciation is “shan1”) as “san2”. In such a case, the associated speech segment is transcribed as “ 山 (san2)” to present the right text and real pronunciation n Numbers –Arabia representation of numbers is a natural method, but it cannot be mapped to a single pronunciation. So, transcribers are required to transcribe all numbers with Chinese characters

DU, C-SIIT8 Other Acoustic Events 文件识别结果听觉判断 文件识别结果听觉判断 –PAUSE1AI[UH] –PAUSE14AI[UH] –PAUSE12A[UNG] –PAUSE33KA A[UNG] –PAUSE20ANG[UNG] –PAUSE26ANG[UNG] –PAUSE19AN[EN] –PUASE4CHA[AO] –PAUSE18GAN[UH] –PAUSE21HE[EN] –PAUSE27NE[EN] –PAUSE22YUN[UM] –PAUSE34LENG[UH] –PAUSE15TONG[UH]

DU, C-SIIT9 Other Acoustic Events(cnt) 文件识别结果听觉判断 文件识别结果听觉判断 –PAUSE31NONG[EN] –PAUSE17HEN[EN] –PAUSE24EN[EN] –[AA] –[AI] –[EN] –[UH] –[AO] –[SIL] 无声段 –[NOISE] –[LAUGH] –[ANG] [BREATH] 呼吸 –[HESITATION] 犹豫

DU, C-SIIT10 Transcription Example [FILLER] [NOISE] “ 北京游乐园怎么走 ” 东直门到哪 “ 北京 游乐园 ” 北京游乐园是吗 “ [FILLER] ” [FILLER] 稍等

DU, C-SIIT11 How to Label? n Improving transcribers’ efficiency & reducing the possibility to generate errors –A labeling tool developed specially for this task. n Training transcribers –Usually our employees assisted speech research for more than one year and with good working records –Part time employees trained by our employees before working at

DU, C-SIIT12 Statistical Results in General Chinese Spontaneous Telephone Speech Corpus (CSTSC) # of Speakers600 # of h-h dialogues 1000 # of h-m dialogues 38 Av dura per dialogues3.5 minutes Sampling of Speech8 kHz Quantization of Speech16 bits

DU, C-SIIT13 Statistical Results in Details 180 human-human dialogues, 38 human-machine dialogues

DU, C-SIIT14 Summary n C-SIIT, CAS started the work to build telephone speech corpora under very limited budget 3 years ago n The efforts and experiences in collecting real Chinese telephone speech corpus are introduced n C-SIIT will continue the Activity on Real Chinese Telephone and Mobile phone Speech Corpora and try best to make most of the corpora already built,in building, in planning, released to public n Suggestions and commences from all of you are appreciated Thanks!