C omputer S cience and A rtificial I ntelligence L aboratory Multilingual Conversational Systems SPEECH RECOGNITION LANGUAGE UNDERSTANDING LANGUAGE GENERATION.

Slides:

Advertisements

Similar presentations

SPOKEN LANGUAGE SYSTEMS Spoken Conversational Interaction for Language Learning Stephanie Seneff, Chao Wang, and Julia Zhang Spoken Language Systems Group.

Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Configuration management

CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.

Tone perception and production by Cantonese-speaking and English- speaking L2 learners of Mandarin Chinese Yen-Chen Hao Indiana University.

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.

Analyzing Students’ Pronunciation and Improving Tonal Teaching Ropngrong Liao Marilyn Chakwin Defense.

J. Kunzmann, K. Choukri, E. Janke, A. Kießling, K. Knill, L. Lamel, T. Schultz, and S. Yamamoto Automatic Speech Recognition and Understanding ASRU, December.

Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English: The CUHK Experience Helen Meng, Wai-Kit.

MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.

Chapter 20: Natural Language Generation Presented by: Anastasia Gorbunova LING538: Computational Linguistics, Fall 2006 Speech and Language Processing.

Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.

SPOKEN LANGUAGE SYSTEMS MIT Computer Science and Artificial Intelligence Laboratory Mitchell Peabody, Chao Wang, and Stephanie Seneff June 19, 2004 Lexical.

Context in Multilingual Tone and Pitch Accent Recognition Gina-Anne Levow University of Chicago September 7, 2005.

Equal-party Conversation System for Language Learning Chih-yu Chao (advisor: Stephanie Seneff) April 14 th, 2006 Dialogs on Dialogs Reading Group.

The Use of Speech in Speech-to-Speech Translation Andrew Rosenberg 8/31/06 Weekly Speech Lab Talk.

Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.

1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.

Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University

Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.

CALL: Computer-Assisted Language Learning. 2/14 Computer-Assisted (Language) Learning “Little” programs Purpose-built learning programs (courseware) Using.

Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

High-quality Speech Translation for Language Learning Chao Wang and Stephanie Seneff June 24, 2004 Spoken Language Systems Group MIT Computer Science and.

Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.

Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.

Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.

SCILL: Spoken Conversational Interaction for Language Learning

The chapter will address the following questions:

Building High Quality Databases for Minority Languages such as Galician F. Campillo, D. Braga, A.B. Mourín, Carmen García-Mateo, P. Silva, M. Sales Dias,

14: THE TEACHING OF GRAMMAR  Should grammar be taught?  When? How? Why?  Grammar teaching: Any strategies conducted in order to help learners understand,

Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.

Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.

Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.

Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,

Chapter 7. BEAT: the Behavior Expression Animation Toolkit

Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.

1 Computational Linguistics Ling 200 Spring 2006.

May 2006CLINT-CS Verbmobil1 CLINT-CS Dialogue II Verbmobil.

1 The Ferret Copy Detector Finding short passages of similar texts in large document collections Relevance to natural computing: System is based on processing.

Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,

Lessons Learned Mokusei: Multilingual Conversational Interfaces Future Plans Explore language-independent approaches to speech understanding and generation.

PETRA – the Personal Embedded Translation and Reading Assistant Werner Winiwarter University of Vienna InSTIL/ICALL Symposium 2004 June 17-19, 2004.

Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.

A Language Independent Method for Question Classification COLING 2004.

Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,

NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.

16.0 Spoken Dialogues References: , Chapter 17 of Huang 2. “Conversational Interfaces: Advances and Challenges”, Proceedings of the IEEE,

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

Introduction to Computational Linguistics

DIALOG SYSTEMS FOR AUTOMOTIVE ENVIRONMENTS Presenter: Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

L C S Spoken Language Systems Group Stephanie Seneff Spoken Language Systems Group MIT Laboratory for Computer Science January 13, 2000 Multilingual Conversational.

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,

SEESCOASEESCOA SEESCOA Meeting Activities of LUC 9 May 2003.

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.

A Simple English-to-Punjabi Translation System By : Shailendra Singh.

Input, Interaction, and Output Input: (in language learning) language which a learner hears or receives and from which he or she can learn. Enhanced input:

LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.

Text-To-Speech System for English

Studying Intonation Julia Hirschberg CS /21/2018.

Presentation transcript:

C omputer S cience and A rtificial I ntelligence L aboratory Multilingual Conversational Systems SPEECH RECOGNITION LANGUAGE UNDERSTANDING LANGUAGE GENERATION Language Independent Language Transparent DIALOGUE MANAGER DATABASE Graphs & Tables Meaning Representation DISCOURSE CONTEXT Language Dependent Rules SPEECH SYNTHESIS SPEECH SYNTHESIS SPEECH SYNTHESIS Models

C omputer S cience and A rtificial I ntelligence L aboratory Steps to Develop Language Learning System 1.Begin with existing mature system in English 2.Develop English-to-Mandarin translation capability 3.Induce Mandarin corpus from English corpus 4.Train LM statistics for both recognizers from corpora 5.Develop parsing grammar for Mandarin queries and generation rules for Mandarin responses Not yet completed: 1.Develop domain-specific user simulation capability 2.Generate thousands of dialogues in both languages 3. Train recognizers and users from simulated dialogues

C omputer S cience and A rtificial I ntelligence L aboratory Activities over the Last Nine Months Translation from English to Mandarin –Mainly focused on user queries (as contrasted with responses) –Integrating generation-based translation with example-based approach –Exploring the use of statistical machine translation *Use phrase-based statistical translation framework developed by Phillip Koehn *Utilized the formal methods to generate domain-specific parallel corpus in weather query domain *Implemented a finite-state transducer version of the decoder and integrated with Galaxy Translation from Mandarin to English –Use statistical method to obtain Chinese to English translation capability –Explore grammar induction techniques to create parsing grammar for Mandarin queries, towards developing formal methods for Mandarin to English translation

C omputer S cience and A rtificial I ntelligence L aboratory Activities over the Last Nine Months, Cont’d System Development –Upgraded weather harvesting process –Upgraded database server to support Postgres in addition to Oracle –Improved dialogue management *Better handling of meta queries –Developed a new GUI interface ovecoming firewall limitations *Support automatic checking and correction of typed tone errors *Better display of tones as diacritcs –Developed a new concatenative speech synthesis capability for high quality translation of user queries spoken in English using Envoice –Developed a batchmode capability to process synthetic speech through dialogue interaction to aid system development

C omputer S cience and A rtificial I ntelligence L aboratory Activities over the Last Nine Months, Cont’d Presentations: –Three talks at InStill Workshop in Venice *Wang and Seneff: Translation *Seneff et al. : LL Systems *Peabody et al.: Web based interface for tone acquisition –ISCSLP: *Seneff et al.: Focused on MuXing system overall –SigDial Demo Session * Wang and Seneff: Presentation and live demonstration –One hour seminar at Microsoft China’s Speech Group –One hour seminar at Defense Language Institute in Monterey –Demonstrated system to Julian Wheatley, head of Chinese department at MIT and to Henry Jenkins, director of MIT Comparative Media Studies

C omputer S cience and A rtificial I ntelligence L aboratory Activities over the Last Nine Months, Cont’d Data collection initiatives: –Eight subjects have completed Web-based exercise at MIT –Two visits by Stephanie Seneff to Defense Language Institute in Monterey California *One successful class participation exercise *Another attempted but aborted due to power outage –Installed Web-based exercise system on computers at MIT Language Lab *Julian Wheatley has agreed to support data collection initiatives with students in the MIT Chinese classes

C omputer S cience and A rtificial I ntelligence L aboratory Bilingual Recognizer Construction English corpus English Recognizer Language Model Chinese Recognizer Language Model Chinese corpus GenerateParse Semantic Frame Two languages compete in common search space Automatically translate existing English corpus into Mandarin Use NL grammar to automatically induce language model for both English and Mandarin recognizers English Network Chinese Network Recognizer

C omputer S cience and A rtificial I ntelligence L aboratory Automatic Grammar Induction English Sentence Corpus Pairs Grammar Induction Mandarin Parsing Grammar Once translation ability exists from English to target language, can create reverse system almost effortlessly Interlingua parse Mandarin Sentence generate Utilizes English parse tree and Mandarin generation lexicon to induce Mandarin parse tree

C omputer S cience and A rtificial I ntelligence L aboratory NLG Synthesis NLU Recognition Multilingual Spoken Translation Framework Common meaning representation: semantic frame Parsing Rules Generation Rules Models Speech Corpora English Chinese Spanish Japanese English Chinese Spanish Japanese Semantic Frame

C omputer S cience and A rtificial I ntelligence L aboratory Challenges in Cross-language Generation for Translation Some expressions have very different syntactic structures in different languages What is your name? 你 (you) 叫 (call) 什么 (what) 名字 (name)? I like her. Ella me gusta. 附近 (vicinity) 哪儿 (where) 有 (have) 银行 (bank)? Where is a bank nearby? that hotel 那 (that) 家 ( ) 旅馆 (hotel) I lost my key. 我 (I) 丢 (lose) 了 ( ) 我的 (my) 钥匙 (key). –Particles (Chinese but not English) –Gender (extensive in Spanish) Syntactic features are expressed in many different ways –Determiners (English but not Chinese)

C omputer S cience and A rtificial I ntelligence L aboratory How long does it take to take a taxi thereHow long take take taxi there An Example: English/Chinese Function words disappear in Chinese How long does it take to take a taxi there ( take taxi go thereneedhow long ) 坐出租车去那里要多久 Sentence structure is very different Verb “go” omitted in English Two instances of “take” have different translations How long need take taxi thereHow long need take taxi go there

C omputer S cience and A rtificial I ntelligence L aboratory Semantic Frame for Example Semantic frame is identical for both inputs, except for missing function words in Mandarin Where necessary, constituent movement is invoked to render the same hierarchical structure English generation predicts missing function words Mandarin generation infers “go” from “destination” predicate {c wh_question :aux “do” :phatic_pronoun “it” :pred {p take_time :trace “how_long” :aux “to_inf” :v_complement {p take_ride :topic {q taxi :quantifier “indef” } :pred {p destination :topic “there” } } } } English } Chinese

C omputer S cience and A rtificial I ntelligence L aboratory Strategies for Achieving High Quality and Robustness Interlingua-based translation –Maintain consistency of semantic frame representation across different languages whenever possible –Seed grammar rules for each new language on English grammar rules –Target language dependent generation rules specify constituent order –Word sense disambiguation achieved through semantic features Restricted conversational domains (lesson plans) –Emphasis on mechanisms to enable rapid porting to new domains and languages Use parsability to assess quality of translation outputs –Back off to example-based method when parse fails

C omputer S cience and A rtificial I ntelligence L aboratory Schematic of Generation into Mandarin {c verify :aux “will” :subject “it” :pred {p rain :pred {p locative :prep ‘in” :topic {q city :name “boston” } } :pred {p temporal :topic {q weekday :quanitifier “this” :name “weekend” } } } } bo1 shi4 dun4 zhe4 zhou4 mo4 hui4 bu2 hui4 xia4 yu3 ? ( Boston this weekend will-not-will rain ? ) pulled to the front “will” conditioned by “verify” zhe4 zhou4 mo4 bo1 shi4 dun4 hui4 xia4 yu3 ma5 ? ( this weekend Boston will rain ? )

C omputer S cience and A rtificial I ntelligence L aboratory Generation-based Translation Semantic frame serves as interlingua Translation achieved by parsing and generation Use Mandarin grammar to detect potential problems Rejected sentences routed to example-based translation for a second chance Parse English Grammar English Input Semantic Frame Generate Chinese Rules Chinese Sentence Chinese Output Parse? Chinese Grammar accepted rejected Example-based Translation

C omputer S cience and A rtificial I ntelligence L aboratory Example-based Translation Requires translation pairs and a retrieval mechanism –Corpus automatically obtained via the generation-based approach –Retrieval based on lean semantic information *Encoded as key-value pairs *Obtained from semantic frame via simple generation rules *Generalizes words to classes (e.g., city name, weekday, etc.) to overcome data sparseness

C omputer S cience and A rtificial I ntelligence L aboratory WEATHER: rain CITY: San Francisco Example-based Translation Procedure Is there any chance of rain in San Francisco? { : San Francisco } { : jiu4 jin1 shan1 } hui4 bu2 hui4 xia4 yu3? jiu4 jin1 shan1 Key-value string serves as interlingua Translation achieved by parsing and table lookup City name masked during retrieval and recovered in final surface string KV-Chinese Table Chinese Output KV String Parser English Grammar Generator Key-value Rules English Input Semantic Frame

C omputer S cience and A rtificial I ntelligence L aboratory Evaluation: English to Mandarin Weather Domain Evaluation data –Drawn from the publicly available Jupiter weather system –Telephone recordings; conversational speech –Unparsable utterances (English grammar) were excluded –Total of 695 utterances, with 6.5 words per utterance on average System configuration –Text input or speech input *Recognizer achieved 6.9% word error rate, and 19.0% sentence error rate –Generation-based method preferred over example-based method –NULL output if both failed Evaluation criteria –Yield of each translation method –Human judgment of translation quality

C omputer S cience and A rtificial I ntelligence L aboratory Spoken Language Translation: Evaluation Results Recognizer WER was 6.9% Bilingual judge rated translations Example-based translation increased yield by 6% Incorrect translation provided only 2% of the time –Often due to recognition errors –English paraphrase provides context for errors PerfectAdequateWrongFailed Rule Example Total (83%)50(7%)13(2%)8% 85% 15% 100% % 13(2%)

C omputer S cience and A rtificial I ntelligence L aboratory clause: weather_event topic: precip_act, name: thunderstorm, num: pl quantifier: some pred: accompanied_by adverb: possibly topic: wind, num: pl, pred: gusty and: precip_act, name: hail English source: Some thunderstorms may be accompanied by gusty winds and hail wind hail rain/storm Frame indexed under wind, rain, storm, and hail Multilingual Weather Responses Japanese: Spanish:Algunas tormentas posiblement acompanadas por vientos racheados y granizo Chinese: ¨Ç ¹p «B ¥i ¯à ·| ¦ñ ¦³ °} · ©M ¦B ¹r

C omputer S cience and A rtificial I ntelligence L aboratory Stage 1: Drill Exercises Web-based Interface to provide practice in typing queries in the weather domain 10 weather scenarios to be solved using typed pinyin: “Boston, rain, tomorrow” –Student given feedback on both query completeness and tone accuracy Separate recording sessions allow user to practice both read and spontaneous spoken queries –Recordings will be used to train the system on accented speech –Recordings will also be assessed for tone quality The Defense Language Institute in Monterey conducted a successful experiment using this Web-based interface in a class of 30 students We are planning to introduce the exercise in the language laboratory at MIT

C omputer S cience and A rtificial I ntelligence L aboratory Lexical Tone Correction Character representation does not explicitly encode tone: – 洛杉矶星期一刮风吗？ Exploit pinyin to help student acquire tonal knowledge: –Diacritic: luò shān jī xīng qī yī guā fēng ma? –Numeric: luo4 shan1 ji1 xing1 qi1 yi1 gua1 feng1 ma5? Hypothesis: Errors in typed pinyin reflect inaccurate knowledge of tones –luo3 shan1 ji3 xing1 qi2 yi1 gua4 feng2 ma2? Provide explicit feedback about typed tone errors

C omputer S cience and A rtificial I ntelligence L aboratory Lexical Tone Correction Exploit some features of Chinese –Syllable lexicon is small, approximately 420 unique syllables –5 tones (including neutral tone) Exploit some abilities of TINA NL system –Ability to parse weighted word FST using probabilistic models –FST normally represents a list of recognizer hypotheses –A path through the FST represents the most likely correct parse Given some input 1)Generate FST of single sentence 2)Expand the tones on each syllable 3)Attempt to parse FST 4)Selected path through FST represents corrected tones

C omputer S cience and A rtificial I ntelligence L aboratory FST Example: Step 1 Step 1: Generate simple FST Given: luo3 shan1 ji3 xing1 qi2 yi1 gua4 feng2 ma2

C omputer S cience and A rtificial I ntelligence L aboratory FST Example: Step 2 Step 2: Assign benefit of doubt to items that appear in lexicon Given: luo3 shan1 ji3 xing1 qi2 yi1 gua4 feng2 ma2 Items that do not appear in lexicon are removed.

C omputer S cience and A rtificial I ntelligence L aboratory FST Example: Step 3 Step 3: Expand each syllable to alternate tones. More compact than specifying each possible sentence variant. Given: luo3 shan1 ji3 xing1 qi2 yi1 gua4 feng2 ma2

C omputer S cience and A rtificial I ntelligence L aboratory FST Example: Step 4 Step 4: Remaining probability is uniformly distributed among alternate tones Given: luo3 shan1 ji3 xing1 qi2 yi1 gua4 feng2 ma2

C omputer S cience and A rtificial I ntelligence L aboratory FST Example: Step 5 Step 5: Parsing reveals the correct tones Given: luo3 shan1 ji3 xing1 qi2 yi1 gua4 feng2 ma2 Correct: luo4 shan1 ji1 xing1 qi1 yi1 gua1 feng1 ma5

C omputer S cience and A rtificial I ntelligence L aboratory Web interface: Practice Exercise Student is prompted for city, time, and event San Francisco Tuesday Hot

C omputer S cience and A rtificial I ntelligence L aboratory Web interface: Practice Exercise Student types in: A question concerning this topic in Mandarin using pinyin OR An English word or phrase for a translation Student types in: A question concerning this topic in Mandarin using pinyin OR An English word or phrase for a translation Xing1 qi1 er3 jiu3 jin3 shan1 hui4 bu2 hui4 re1

C omputer S cience and A rtificial I ntelligence L aboratory Web interface: Practice Exercise Student is given feedback

C omputer S cience and A rtificial I ntelligence L aboratory Web interface

C omputer S cience and A rtificial I ntelligence L aboratory Weather information domain (rain, snow, wind, temperature, etc.) Initial version configured for American learning Mandarin Recognizer supports both English and Mandarin –Seamless language switching English queries are translated into Mandarin Mandarin queries are answered in Mandarin –User can ask for a translation into English of the response at any time Uses Mandarin synthesizer provided by DELTA Electronics for responses, Envoice concatenative synthesizer for query translations System can be configured as telephone-only or as telephone augmented with a Web-based gui interface Spoken Conversational Interaction

C omputer S cience and A rtificial I ntelligence L aboratory Illustration of Dialogue Interaction User: Bo1 Shi4 Dun4 ming2 tian1 hui4 xia4 yu3 ma5? (Is it going to rain tomorrow in Boston?) System: Tian1 qi4 yu4 bao4 ming2 tian1 Bo1 shi4 dun4 mei2 you3 yu3. (The forecast calls for no rain tomorrow in Boston) User: (in English) What is the temperature? System: (translates) Qi4 wen1 shi4 duo1 shao3? User:(emulates)Qi4 wen1 shi4 duo1 shao3? System: Bo1 Shi4 Dun4 ming2 tian1 zui4 gao1 qi4 wen1 er4 she4 shi4 du4, ming2 tian1 ye4 jian1, zui4 di4 qi4 wen1 ling2 xia4 wu3 she4 shi4 du4. User: Could you translate that? System: In Boston tomorrow, high 2 degrees Celsius, Tomorrow night, low -5 Celsius.

C omputer S cience and A rtificial I ntelligence L aboratory Example Dialogue in Weather Domain “What is the forecast for San Francisco tomorrow?” System paraphrases request, then answers “Please translate” High quality synthesis for translation using MIT’s Envoice concatenative synthesis framework “Could you repeat that” – system provides translation User emulates in Mandarin and system repeats previous response “Will it rain in London?” “I’m sorry I didn’t understand you.” – response given when it fails to recognize or parse the user query

C omputer S cience and A rtificial I ntelligence L aboratory Video Clip Demo

C omputer S cience and A rtificial I ntelligence L aboratory Assessment Phonetic aspects –Expand phonological rules to support non-native realizations (e.g., /dh/  /d/ or schwa insertion) –Allow realizations of selected phones from native language to compete in recognizer search Tonal aspects (Mandarin) –Use tone recognition system (Wang et al., 1998) to score tone productions; highlight worst-scoring words –Tabulate frequencies of tone errors in typed inputs (pinyin) –Use phase-vocoder techniques (Tang et al., 2001) to repair user’s tone productions by replacing prosodic contour with native speech patterns Fluency measures –Word-by-word speaking rate (Chung & Seneff, 1999) –Percentage of utterance containing pauses and disfluencies

C omputer S cience and A rtificial I ntelligence L aboratory Tone analysis: Native vs Non-Native Mandarin Creating pitch contours –F0 extracted using algorithm in (Wang and Seneff, 2000) –Statistics of each pitch contour over each syllable considered without regard for left or right contexts Normalization –Duration normalized by sampling at 10% intervals –Pitch normalized according to: Comparisons based on (Wang et al., 2003) –Include normalized F0 value, peak, valley, range, peak position, valley position, falling range, and rising range Corpus (from the Defense Language Institute) –2065 utterances from 4 native speakers –4657 utterances from 20 non-native speakers

C omputer S cience and A rtificial I ntelligence L aboratory Tonal averages over all syllables: Native Example

C omputer S cience and A rtificial I ntelligence L aboratory Tonal averages over all syllables: Non-Native Example

C omputer S cience and A rtificial I ntelligence L aboratory Capturing Phonological Errors Leverage phonological modeling capabilities of SUMMIT –Model typical pronunciation errors explicitly –Direct and intuitive mapping from linguistic rules –Support both within-language and cross-language substitutions Initial experiments completed on Koreans learning English (Kim et al., ICSLP 2004) –Phonological rules capture typical problems such as schwa insertion and /dh/ /d/ confusions –Best path in alignment used to detect errors –Verbal feedback given to student Current research to apply to Americans learning Mandarin –Build single recognizer to support both languages –Use data-driven approaches to discover most likely cross-language phone substitution errors –Explicitly encode such errors in formal phonological rules –Side benefit may be improved recognition for English-accented Mandarin –

C omputer S cience and A rtificial I ntelligence L aboratory {} dh {} => dh | [dcl] d ; // Becomes an onset stop as in 'they'. No [dh] in Korean phonemes.. {} dd {} => dcl [d [ax]] ; // A vowel may be inserted after a coda consonant (Staccato Rhythm) {CONSONANT} td {CONSONANT} => [tcl] [t] | tcl t [ax]; // No CCC allowed in Korean Detecting Phonological Errors

C omputer S cience and A rtificial I ntelligence L aboratory Future Plans Develop tools to rapidly port to new domains and languages –Automatic grammar induction –Generic dialogue modeling –Simulated dialogue interactions Develop various scoring algorithms for quality assessment of student’s speech Develop high quality synthesis capability for Mandarin translations, for multiple domains of knowledge Collect and transcribe data from language learners and evaluate both system and students –Begin with weather domain, our most mature system –Extend to other domains once they are better developed Refine all aspects of systems based on collected data