Download presentation
Presentation is loading. Please wait.
Published byShawn Marcy Modified over 9 years ago
1
MAI Internship April-May 2002
2
MAI Internship 2002 Slide 2 of 14 What? The AST Project promotes development of speech technology for official languages of South Africa SAEnglish, Afrikaans, Zulu, Xhosa, Sesotho Create reusable databases & software Prototype hotel booking dialogue system 2000-2003
3
MAI Internship 2002 Slide 3 of 14 AST dialogue system: basics Telephone Network Speech Recognition Natural Language Understanding Dialogue Manager Speech Synthesis DATABASEDATABASE
4
MAI Internship 2002 Slide 4 of 14 Use? input ASR: acoustic training output ASR: dictionary Start from scratch, even for SAE Telephone data based on SpeechDat –Datasheet utterances –Hierarchical recruiting method Labeling Tool: PRAAT AST Speech Database
5
MAI Internship 2002 Slide 5 of 14 Language SpokenCodeNo. of Speakers 1 English (E) Speech varieties: Mother-tongue English Black English Coloured English Asian English Afrikaans English EE BE CE ASE AE 1500-2000 300-400 2 isiXhosa (X)XX300-400 3 Sesotho (S)SS300-400 4 isiZulu (Z)ZZ300-400 5 Afrikaans (A) Speech varieties: Mother-tongue Afrikaans Black Afrikaans Coloured Afrikaans AA BA CA 900-1200 300-400
6
MAI Internship 2002 Slide 6 of 14 AST Speech Database Orthographic annotation Phonemic transcription Acoustic signal Phonetic alignment Manual labour Rules & dictionary: Patana Forced alignment: HTK
7
MAI Internship 2002 Slide 7 of 14 Difficult: –Speaker independent, noisy conditions –Medium-size vocabulary (10.000 words) –Training data sparse Not so difficult: –Dialogue Manager helps Phoneme-based HMMs future diphones Finite-state language model Pitch & clicks African languages ignored AST Speech Recognition
8
MAI Internship 2002 Slide 8 of 14 Same finite-state network as language model recogniser +: all utterances ‘understood’ -: FSG are limited Makes no sense to recognise more than we can understand Semantic labels are activated Alternative: robust parsing (Phoenix, ATIS) AST Natural Language Understanding
9
MAI Internship 2002 Slide 9 of 14 Speech Recognition NLU Dialogue Manager FSG Recognised utterance Grammar ID Meaning AST Natural Language Understanding
10
MAI Internship 2002 Slide 10 of 14 Embedded semantic tags: ‘drie honderd duisend agt en neëntig’ 3 0 0 0 9 8 V6=3 V5=0 V4=0 V3=0 V2=9 V1=8 t1=3 t2=0 t3=0 AST Natural Language Understanding
11
MAI Internship 2002 Slide 11 of 14 Trade-off: naturalness response restriction System-directed: predictability user utterances, simple dialogues Mixed-initiative: shorter dialogues, more recognition errors User-initiative: unpopular AST Dialogue Manager
12
MAI Internship 2002 Slide 12 of 14 Design: Early focus on users and task Wizard-of-Oz: pay no attention to the man behind the curtain System-in-the-loop Finite-state structure because of simplicity and functionality Possible frame-based approach in future AST Dialogue Manager
13
MAI Internship 2002 Slide 13 of 14 Fixed machine utterances: pre-recorded speech Database queries: limited-domain synthesis (Festival platform) AST Speech Synthesis
14
MAI Internship 2002 Slide 14 of 14 Conclusion Finite-state approach in –Recogniser –NLU component –Dialogue manager Workable prototype New fundings 2003
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.