Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Non-Native Users in the Let s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute.
Natural Language Understanding Difficulties: Large amount of human knowledge assumed – Context is key. Language is pattern-based. Patterns can restrict.
HIGGINS Error handling strategies in a spoken dialogue system Rolf Carlson, Jens Edlund and Gabriel Skantze Error handling research issues The long term.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
U1, Speech in the interface:2. Dialogue Management1 Module u1: Speech in the Interface 2: Dialogue Management Jacques Terken HG room 2:40 tel. (247) 5254.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Acoustical and Lexical Based Confidence Measures for a Very Large Vocabulary Telephone Speech Hypothesis-Verification System Javier Macías-Guarasa, Javier.
What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.
ITCS 6010 Spoken Language Systems: Architecture. Elements of a Spoken Language System Endpointing Feature extraction Recognition Natural language understanding.
Speaker Adaptation for Vowel Classification
Modeling the Cost of Misunderstandings in the CMU Communicator System Dan BohusAlex Rudnicky School of Computer Science, Carnegie Mellon University, Pittsburgh,
Learning Object Identification Rules for Information Integration Sheila Tejada Craig A. Knobleock Steven University of Southern California.
ASR Evaluation Julia Hirschberg CS Outline Intrinsic Methods –Transcription Accuracy Word Error Rate Automatic methods, toolkits Limitations –Concept.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Information Modeling: The process and the required competencies of its participants Paul Frederiks Theo van der Weide.
Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Author: James Allen, Nathanael Chambers, etc. By: Rex, Linger, Xiaoyi Nov. 23, 2009.
Develop a fast semantic decoder for dialogue systems Capability to parse 10 – 100 ASR hypotheses in real time Robust to speech recognition noise Semantic.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Summary  The task of extractive speech summarization is to select a set of salient sentences from an original spoken document and concatenate them to.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Experimental Evaluation of Learning Algorithms Part 1.
Discriminative Models for Spoken Language Understanding Ye-Yi Wang, Alex Acero Microsoft Research, Redmond, Washington USA ICSLP 2006.
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
DIALOG SYSTEMS FOR AUTOMOTIVE ENVIRONMENTS Presenter: Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
Copyright © 2015 by Educational Testing Service. 1 Feature Selection for Automated Speech Scoring Anastassia Loukina, Klaus Zechner, Lei Chen, Michael.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
16.0 Spoken Dialogues References: , Chapter 17 of Huang 2. “Conversational Interfaces: Advances and Challenges”, Proceedings of the IEEE,
ENTERFACE 08 Project 1 “MultiParty Communication with a Tour Guide ECA” Mid-term presentation August 19th, 2008.
Google’s Deep-Web Crawl By Jayant Madhavan, David Ko, Lucja Kot, Vignesh Ganapathy, Alex Rasmussen, and Alon Halevy August 30, 2008 Speaker : Sahana Chiwane.
Introduction to Computational Linguistics
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Intelligent Robot Architecture (1-3)  Background of research  Research objectives  By recognizing and analyzing user’s utterances and actions, an intelligent.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
Image Classification for Automatic Annotation
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝.
Evaluation of DBMiner By: Shu LIN Calin ANTON. Outline  Importing and managing data source  Data mining modules Summarizer Associator Classifier Predictor.
HMM vs. Maximum Entropy for SU Detection Yang Liu 04/27/2004.
Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.
Spoken Dialogue Systems: Advances and Challenges Presenter Sherif Abdou Electrical and Computer Engineering University of Miami.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Confidence Measures As a Search Guide In Speech Recognition Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering, University.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
Olivier Siohan David Rybach
Linguistic knowledge for Speech recognition
A Pool of Deep Models for Event Recognition
Integrating Learning of Dialog Strategies and Semantic Parsing
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
PROJ2: Building an ASR System
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
Hsien-Chin Lin, Chi-Yu Yang, Hung-Yi Lee, Lin-shan Lee
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering, University of Miami Coral Gables, Florida 33124, U.S.A.

Abstract The sentence produced by the decoder with the highest recognition probability may not be the best choice for extracting the intended concepts. The more knowledge sources that share in the selection process the better result can be achieved. In late disambiguation approach, many hypotheses are permitted to propagate through the system till there is enough knowledge to select the best one. In this work recognition score, parsing score, dialog expectations and prosody are used for the decision of selecting the best hypotheses. The scaling weights of the combined scores are determined automatically by an optimization procedure.

System Architecture I/O Interface Speech Recognizer Parser Dialog MangerSynthesizer Acoustic Model Language Model Grammar Goal Trees Dialog History Prerecorded Speech Units Flights Database Prosodic Utterance Classifier

Travel User Departure Route Depart LocArrive LocDepart DateDepart Time Return Route Return Date Return Time Domain Plans Hierarchical goal tree

Clarifications: User-initiated subdialogs, usually by questioning, to ask about some feature for a concept related to one of the current plans of the focus stack. Corrections: User-initiated subdialogs with the intention to correct part of an already constructed plan. They usually appear after system-explicit or implicit confirmations. Meta_communications: User-initiated subdialogs that refer to the dialogue itself, such as asking for repetitions or signaling nonundersatnding. Clarifications: User-initiated subdialogs, usually by questioning, to ask about some feature for a concept related to one of the current plans of the focus stack. Corrections: User-initiated subdialogs with the intention to correct part of an already constructed plan. They usually appear after system-explicit or implicit confirmations. Meta_communications: User-initiated subdialogs that refer to the dialogue itself, such as asking for repetitions or signaling nonundersatnding. Discourse Plans

User utterance: "I need a flight from Miami to Boston two days after Christmas" Non Fragmented Parse Recognizer output (i): I_need flight from Miami to Boston two days after Christmas Parser output (i): Flight_Constraints:[departloc] ( FROM [Location] ( [city] ( [City_Name] ( MIAMI ) ) ) ) Flight_Constraints:[arriveloc] ( TO [Location] ( [city] ( [City_Name] ( BOSTON ) ) ) ) Flight_Constraints:[Date_Time] ( [Date] ([Date_Relative] ( [date_offset] ( [day_offset] ( [Number] ( TWO ) ) DAYS [_days_after] ( AFTER ) ) ) ) [holiday] ( [holiday_name] ( CHRISTMAS ) ) ) ) Parser Score

Recognizer output (j): I_need flight from Miami to Boston two days other Christmas Parser output(j): Flight_Constraints:[departloc] ( FROM [Location] ( [city] ( [City_Name] ( MIAMI ) ) ) ) Flight_Constraints:[arriveloc] ( TO [Location] ( [city] ( [City_Name] ( BOSTON ) ) ) ) ) Flight_Constraints:[Time_Range] ( [Time] ( [Hour] ( TWO ) )) Flight_Constraints:[Date_Time] ( [Date] ( [holiday] ([holiday_name] ( CHRISTMAS ) ) ) ) Fragmented Parse

Utterance Type Classification Tree Q/S 0.5/0. 5 F0_dif>15 F0_dif<15 Q 0.67 Q 0.77 S 0.56 S 0.69 S 0.8 Q 0.54 S 0.84 Q 0.6 S 0.63 Q 0.78 S 0.89 Q 0.85 Q 0.52 S 0.75 S 0.56 Q 0.77 Q 0.74 S 0.75 Q 0.85 S 0.85 End_slope>4.07 F0_range>9 Reg_shape=-1 Reg_shape=1Pen_slope>1.59 Reg_shape=1F0_range<7 End_slope>3.51 F0_pen_dif>5 End_slope<4.07 End_slope<2.56 End_slope>2.56 F0_range<9Pen_slope<1.59 F0_range>7 F0_pen_dif<5 Reg_shape=-1 End_slope<3.51

Utterance transcription: and what's the fist flight in the morning Recognizer output(1): and with the first flight in the morning Parser output (1): Flight_Reservation:[Flight_Reference]( WITH THE [Earliest](FIRST FLIGHT IN THE [Time_Range]( [Time_spec]( [Period_Of_Day]( MORNING ) ) ) ) ) Recognizer output(2): I'd what's the first flight in the morning Parser output (2): Flight_Reservation:[Request]([Wh_form]( WHAT'S [Flight_Reference](THE [Earliest] ( FIRST FLIGHT IN THE [Time_Range]( [Time_Spec]( [Period_Of_Day]( MORNING ) ) ) ) ) ) ) How Prosody Can Help

Weights Computation Least Squares Minimization/Hill Climbing The error function: E =  i ( G i -  j W j S ij ) 2 I : training sample index J : knowledge source index G i : training score, selected manually, for training sample i W j : score weight for knowledge source j S ij : score of knowledge source j for sample i Get minimum error by solving system of k linear equations: -2  i S ik ( G i -  j W j S ij )=0

Experimental Results Testing with cumulative errors Testing without cumulative errors Baseline system 65%69% Proposed System 71%78% Table I. Comparison of Systems Performance SourceRecognizerParserDialog context Measure83%59%48% Table 2. Measure for knowledge sources contribution

CONCLUSION AND FUTURE WORK Maximize the amount of information passed between system modules, and use all the higher level knowledge to evaluate the different hypothesis. Decisions are made whenever possible and delayed when necessary. Rank parse results according to number of word coverage and information content. Use expectation list generated from current dialog state to select the most appropriate hypothesis. Future work: Use confidence measures from the recognizer output to confirm our selection.