MAI Internship April-May 2002
MAI Internship 2002 Slide 2 of 14 What? The AST Project promotes development of speech technology for official languages of South Africa SAEnglish, Afrikaans, Zulu, Xhosa, Sesotho Create reusable databases & software Prototype hotel booking dialogue system
MAI Internship 2002 Slide 3 of 14 AST dialogue system: basics Telephone Network Speech Recognition Natural Language Understanding Dialogue Manager Speech Synthesis DATABASEDATABASE
MAI Internship 2002 Slide 4 of 14 Use? input ASR: acoustic training output ASR: dictionary Start from scratch, even for SAE Telephone data based on SpeechDat –Datasheet utterances –Hierarchical recruiting method Labeling Tool: PRAAT AST Speech Database
MAI Internship 2002 Slide 5 of 14 Language SpokenCodeNo. of Speakers 1 English (E) Speech varieties: Mother-tongue English Black English Coloured English Asian English Afrikaans English EE BE CE ASE AE isiXhosa (X)XX Sesotho (S)SS isiZulu (Z)ZZ Afrikaans (A) Speech varieties: Mother-tongue Afrikaans Black Afrikaans Coloured Afrikaans AA BA CA
MAI Internship 2002 Slide 6 of 14 AST Speech Database Orthographic annotation Phonemic transcription Acoustic signal Phonetic alignment Manual labour Rules & dictionary: Patana Forced alignment: HTK
MAI Internship 2002 Slide 7 of 14 Difficult: –Speaker independent, noisy conditions –Medium-size vocabulary ( words) –Training data sparse Not so difficult: –Dialogue Manager helps Phoneme-based HMMs future diphones Finite-state language model Pitch & clicks African languages ignored AST Speech Recognition
MAI Internship 2002 Slide 8 of 14 Same finite-state network as language model recogniser +: all utterances ‘understood’ -: FSG are limited Makes no sense to recognise more than we can understand Semantic labels are activated Alternative: robust parsing (Phoenix, ATIS) AST Natural Language Understanding
MAI Internship 2002 Slide 9 of 14 Speech Recognition NLU Dialogue Manager FSG Recognised utterance Grammar ID Meaning AST Natural Language Understanding
MAI Internship 2002 Slide 10 of 14 Embedded semantic tags: ‘drie honderd duisend agt en neëntig’ V6=3 V5=0 V4=0 V3=0 V2=9 V1=8 t1=3 t2=0 t3=0 AST Natural Language Understanding
MAI Internship 2002 Slide 11 of 14 Trade-off: naturalness response restriction System-directed: predictability user utterances, simple dialogues Mixed-initiative: shorter dialogues, more recognition errors User-initiative: unpopular AST Dialogue Manager
MAI Internship 2002 Slide 12 of 14 Design: Early focus on users and task Wizard-of-Oz: pay no attention to the man behind the curtain System-in-the-loop Finite-state structure because of simplicity and functionality Possible frame-based approach in future AST Dialogue Manager
MAI Internship 2002 Slide 13 of 14 Fixed machine utterances: pre-recorded speech Database queries: limited-domain synthesis (Festival platform) AST Speech Synthesis
MAI Internship 2002 Slide 14 of 14 Conclusion Finite-state approach in –Recogniser –NLU component –Dialogue manager Workable prototype New fundings 2003