DU, C-SIIT1 Collecting and Transcribing Real Chinese Spontaneous Telephone Speech Corpus Limin Du, Chair Professor Director, Center for Speech Interactive Information Technology Institute of Acoustics, Chinese Academy of Sciences October 21, 2000
DU, C-SIIT2 Background n n Spontaneous speech interactive via telephone is a very prospect application, building speech recognition systems in terms of the variations in acoustics and spoken styles for telephone application is necessary n n There is no large-scale Chinese Spontaneous Telephone Speech Corpus available for research – –Simulating telephone speech corpus (1997, C-SIIT, IOA, CAS) n n Microphone speech corpus – pipeline to telephone – telephone speech – –Collecting real telephone speech data seems to be a formidable task n n Laws n n Costs n n Chinese-English speech translation (CEST) project, an collaboration between CAS-AT&T ( ) is an strong driving for this work
DU, C-SIIT3 Real Telephone Speech Collection n n A “dialogue oriented” collection paradigm –Human-Human conversations –Human-machine dialogues Real Information Service Center Caller Hotel Information Desk Computer- phone OR Dialogue card Caller Simulated Human or Machine Service Agent Computer Data storage Labelin g is so cool!
DU, C-SIIT4 Speech Data Processing n Sampling –8kHz sampling –16bits A/D quantization n Utterance Segmentation –One Speaker switching for one utterance –Utterances in average length of 3 seconds
DU, C-SIIT5 Speech Data Transcribing n n What to Label? n n How to Label?
DU, C-SIIT6 What to Label? n n Information about Speakers and Environments – –speaker’s dialect, mood, gender, speech quality n n Transcribing – –Chinese characters – –Pinyins – –Other acoustic event labels n n laugh, lip smack, throat clearing, breath, cough, filled pauses, telephone adjusting, background speech, etc. n Time Stamp –are bracketed with time stamps automatically when transcribing with a special software tool –Other acoustic event are bracketed with time stamps automatically when transcribing with a special software tool
DU, C-SIIT7 Detailed Issues Concerned n Mispronunciation –Mispronunciation often occurs in daily life. For example the speaker probably read Chinese character “ 山 ” (who’s correct pronunciation is “shan1”) as “san2”. In such a case, the associated speech segment is transcribed as “ 山 (san2)” to present the right text and real pronunciation n Numbers –Arabia representation of numbers is a natural method, but it cannot be mapped to a single pronunciation. So, transcribers are required to transcribe all numbers with Chinese characters
DU, C-SIIT8 Other Acoustic Events 文件识别结果听觉判断 文件识别结果听觉判断 –PAUSE1AI[UH] –PAUSE14AI[UH] –PAUSE12A[UNG] –PAUSE33KA A[UNG] –PAUSE20ANG[UNG] –PAUSE26ANG[UNG] –PAUSE19AN[EN] –PUASE4CHA[AO] –PAUSE18GAN[UH] –PAUSE21HE[EN] –PAUSE27NE[EN] –PAUSE22YUN[UM] –PAUSE34LENG[UH] –PAUSE15TONG[UH]
DU, C-SIIT9 Other Acoustic Events(cnt) 文件识别结果听觉判断 文件识别结果听觉判断 –PAUSE31NONG[EN] –PAUSE17HEN[EN] –PAUSE24EN[EN] –[AA] –[AI] –[EN] –[UH] –[AO] –[SIL] 无声段 –[NOISE] –[LAUGH] –[ANG] [BREATH] 呼吸 –[HESITATION] 犹豫
DU, C-SIIT10 Transcription Example [FILLER] [NOISE] “ 北京游乐园怎么走 ” 东直门到哪 “ 北京 游乐园 ” 北京游乐园是吗 “ [FILLER] ” [FILLER] 稍等
DU, C-SIIT11 How to Label? n Improving transcribers’ efficiency & reducing the possibility to generate errors –A labeling tool developed specially for this task. n Training transcribers –Usually our employees assisted speech research for more than one year and with good working records –Part time employees trained by our employees before working at
DU, C-SIIT12 Statistical Results in General Chinese Spontaneous Telephone Speech Corpus (CSTSC) # of Speakers600 # of h-h dialogues 1000 # of h-m dialogues 38 Av dura per dialogues3.5 minutes Sampling of Speech8 kHz Quantization of Speech16 bits
DU, C-SIIT13 Statistical Results in Details 180 human-human dialogues, 38 human-machine dialogues
DU, C-SIIT14 Summary n C-SIIT, CAS started the work to build telephone speech corpora under very limited budget 3 years ago n The efforts and experiences in collecting real Chinese telephone speech corpus are introduced n C-SIIT will continue the Activity on Real Chinese Telephone and Mobile phone Speech Corpora and try best to make most of the corpora already built,in building, in planning, released to public n Suggestions and commences from all of you are appreciated Thanks!