Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems Yun-Nung (Vivian) Chen Email yvchen@cs.cmu.edu Website http://vivianchen.idv.tw.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Dialogue Modelling Milica Gašić Dialogue Systems Group.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Unsupervised Modeling of Twitter Conversations
Patch to the Future: Unsupervised Visual Prediction
Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,
Deep Learning in NLP Word representation and how to use it for Parsing
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
Presented by Zeehasham Rasheed
Distributed Representations of Sentences and Documents
Scalable Text Mining with Sparse Generative Models
Automatically Constructing a Dictionary for Information Extraction Tasks Ellen Riloff Proceedings of the 11 th National Conference on Artificial Intelligence,
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Introduction to Machine Learning Approach Lecture 5.
Machine Learning in Simulation-Based Analysis 1 Li-C. Wang, Malgorzata Marek-Sadowska University of California, Santa Barbara.
Adding Common Sense into Artificial Intelligence Common Sense Computing Initiative Software Agents Group MIT Media Lab.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Yun-Nung (Vivian) Chen William Yang Wang Anatole Gershman
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,
Iterative Readability Computation for Domain-Specific Resources By Jin Zhao and Min-Yen Kan 11/06/2010.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
A Survey for Interspeech Xavier Anguera Information Retrieval-based Dynamic TimeWarping.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
WEB SEARCH PERSONALIZATION WITH ONTOLOGICAL USER PROFILES Data Mining Lab XUAN MAN.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Constructing Knowledge Graph from Unstructured Text Image Source: Kundan Kumar Siddhant Manocha.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Efficient Estimation of Word Representations in Vector Space
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Ontology Mapping in Pervasive Computing Environment C.Y. Kong, C.L. Wang, F.C.M. Lau The University of Hong Kong.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
National Taiwan University, Taiwan
Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature Deyu Zhou, Yulan He and Chee Keong Kwoh School of Computer Engineering.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Omer Levy Yoav Goldberg Ido Dagan Bar-Ilan University Israel
Relational Duality: Unsupervised Extraction of Semantic Relations between Entities on the Web Danushka Bollegala Yutaka Matsuo Mitsuru Ishizuka International.
Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.
PhD Dissertation Defense Scaling Up Machine Learning Algorithms to Handle Big Data BY KHALIFEH ALJADDA ADVISOR: PROFESSOR JOHN A. MILLER DEC-2014 Computer.
 Ontology Induction (Chen et al., 2013 & 2014) Frame-semantic parsing on ASR results (Das et al., 2013) frame  slot candidate lexical unit  slot filler.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
Vector Semantics Dense Vectors.
RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji Oct13, 2015 Acknowledgement: distributional semantics slides from.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Efficient Estimation of Word Representations in Vector Space By Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Google Inc., Mountain View, CA. Published.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Collaborative Deep Learning for Recommender Systems
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Fill-in-The-Blank Using Sum Product Network
Erasmus University Rotterdam
Vector-Space (Distributional) Lexical Semantics
Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.
Integrating Learning of Dialog Strategies and Semantic Parsing
Deep Learning based Machine Translation
Distributed Representation of Words, Sentences and Paragraphs
Word Embedding Word2Vec.
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Example: Academic Search
Dialogue State Tracking & Dialogue Corpus Survey
GANG: Detecting Fraudulent Users in OSNs
Word embeddings (continued)
Hsien-Chin Lin, Chi-Yu Yang, Hung-Yi Lee, Lin-shan Lee
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Presentation transcript:

Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems Yun-Nung (Vivian) Chen Email yvchen@cs.cmu.edu Website http://vivianchen.idv.tw

Outline Introduction Ontology Induction: Frame-Semantic Parsing Structure Learning: Knowledge Graph Propagation Spoken Language Understanding (SLU): Matrix Factorization Experiments Conclusions

Outline Introduction Ontology Induction: Frame-Semantic Parsing Structure Learning: Knowledge Graph Propagation Spoken Language Understanding (SLU): Matrix Factorization Experiments Conclusions

A Popular Robot - Baymax Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people. Big Hero 6 -- Video content owned and licensed by Disney Entertainment, Marvel Entertainment, LLC, etc

Spoken Dialogue System (SDS) Spoken dialogue systems are the intelligent agents that are able to help users finish tasks more efficiently via speech interactions. Spoken dialogue systems are being incorporated into various devices (smart-phones, smart TVs, in-car navigating system, etc). Apple’s Siri Microsoft’s Cortana Microsoft’s XBOX Kinect Amazon’s Echo Samsung’s SMART TV Google Now https://www.apple.com/ios/siri/ http://www.windowsphone.com/en-us/how-to/wp8/cortana/meet-cortana http://www.xbox.com/en-US/ http://www.amazon.com/oc/echo/ http://www.samsung.com/us/experience/smart-tv/ https://www.google.com/landing/now/

Large Smart Device Population The number of global smartphone users will surpass 2 billion in 2016. As of 2012, there are 1.1 billion automobiles on the earth. The more natural and convenient input of the devices evolves towards speech

Challenges for SDS An SDS in a new domain requires A hand-crafted domain ontology Utterances labeled with semantic representations An SLU component for mapping utterances into semantic representations With increasing spoken interactions, building domain ontologies and annotating utterances cost a lot so that the data does not scale up. The goal is to enable an SDS to automatically learn this knowledge so that open domain requests can be handled. New domain requires 1) hand-crafted ontology 2) slot representations for utterances (labelled info) 3) parser for mapping utterances into; labelling cost is too high; so I want to automate this proces

Interaction Example User find an inexpensive eating place for taiwanese food Inexpensive Taiwanese eating places include Din Tai Fung, etc. What do you want to choose? I can help you go there. How can find a restaurant Q: How does a dialogue system process this request? Intelligent Agent

SDS Process – Available Domain Ontology User find an inexpensive eating place for taiwanese food target food price AMOD NN seeking PREP_FOR Domain knowledge representation (graph) Intelligent Agent Organized Domain Knowledge

SDS Process – Available Domain Ontology User find an inexpensive eating place for taiwanese food Ontology Induction (semantic slot) target food price AMOD NN seeking PREP_FOR Domain knowledge representation (graph) Intelligent Agent Organized Domain Knowledge

SDS Process – Available Domain Ontology User find an inexpensive eating place for taiwanese food Ontology Induction (semantic slot) target food price AMOD NN seeking PREP_FOR Domain knowledge representation (graph) Structure Learning (inter-slot relation) Intelligent Agent Organized Domain Knowledge

SDS Process – Spoken language understanding (SLU) User find an inexpensive eating place for taiwanese food target food price AMOD NN seeking PREP_FOR Spoken Language Understanding seeking=“find” target=“eating place” price=“inexpensive” food=“taiwanese food” Domain knowledge representation (graph) Intelligent Agent Organized Domain Knowledge

SDS Process – Dialogue Management (DM) User find an inexpensive eating place for taiwanese food Din Tai Fung Boiling Point : SELECT restaurant { restaurant.price=“inexpensive” restaurant.food=“taiwanese food” } Map lexical unit into concepts Predicted behavior: navigation Behavior Prediction Intelligent Agent Inexpensive Taiwanese eating places include Din Tai Fung, Boiling Point, etc. What do you want to choose? I can help you go there. (navigation)

Goals find an inexpensive eating place for taiwanese food User find an inexpensive eating place for taiwanese food 1. Ontology Induction (semantic slot) 3. Spoken Language Understanding target food price AMOD NN seeking PREP_FOR SELECT restaurant { restaurant.price=“inexpensive” restaurant.food=“taiwanese food” } Predicted behavior: navigation 4. Behavior Prediction 2. Structure Learning (inter-slot relation)

Goals find an inexpensive eating place for taiwanese food User find an inexpensive eating place for taiwanese food 1. Ontology Induction 3. Semantic Decoding 2. Structure Learning 4. Behavior Prediction Knowledge Acquisition SLU Modeling

Spoken Language Understanding Input: user utterances Output: the domain-specific semantic concepts included in each utterance SLU Model target=“restaurant” price=“cheap” “can I have a cheap restaurant” Ontology Induction Unlabeled Collection Semantic KG Frame-Semantic Parsing Fw Fs Feature Model Rw Rs Knowledge Graph Propagation Model Word Relation Model Lexical KG Slot Relation Model Structure Learning . SLU Modeling by Matrix Factorization Semantic Representation

Outline Introduction Ontology Induction: Frame-Semantic Parsing Structure Learning: Knowledge Graph Propagation Spoken Language Understanding (SLU): Matrix Factorization Experiments Conclusions

Probabilistic Frame-Semantic Parsing FrameNet [Baker et al., 1998] a linguistically semantic resource, based on the frame-semantics theory words/phrases can be represented as frames “low fat milk”  “milk” evokes the “food” frame; “low fat” fills the descriptor frame element SEMAFOR [Das et al., 2014] a state-of-the-art frame-semantics parser, trained on manually annotated FrameNet sentences Baker et al., "The berkeley framenet project," in Proc. of International Conference on Computational linguistics, 1998. Das et al., " Frame-semantic parsing," in Proc. of Computational Linguistics, 2014.

Frame-Semantic Parsing for Utterances Good! can i have a cheap restaurant Frame: capability FT LU: can FE LU: i Frame: expensiveness FT LU: cheap Frame: locale by use FT/FE LU: restaurant Good! ? FT: Frame Target; FE: Frame Element; LU: Lexical Unit SEMAFOR outputs set of frames; hopefully includes all domain slots; pick some slots from SEMAFOR 1st Issue: adapting generic frames to domain-specific settings for SDSs

Spoken Language Understanding Input: user utterances Output: the domain-specific semantic concepts included in each utterance SLU Model target=“restaurant” price=“cheap” “can I have a cheap restaurant” Ontology Induction Unlabeled Collection Semantic KG Frame-Semantic Parsing Fw Fs Feature Model Rw Rs Knowledge Graph Propagation Model Word Relation Model Lexical KG Slot Relation Model Structure Learning . SLU Modeling by Matrix Factorization Semantic Representation Y.-N. Chen et al., "Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding," in Proc. of ACL-IJCNLP, 2015.

Outline Introduction Ontology Induction: Frame-Semantic Parsing Structure Learning: Knowledge Graph Propagation (for 1st issue) Spoken Language Understanding (SLU): Matrix Factorization Experiments Conclusions

1ST ISSUE: How to adapt generic slots to domain-specific Setting 1ST ISSUE: How to adapt generic slots to domain-specific Setting? Knowledge Graph Propagation Model locale_by_use food expensiveness seeking relational_quantity desiring Assumption: The domain-specific words/slots have more dependency to each other. Word Observation Slot Candidate i like cheap food restaurant expensiveness food locale_by_use capability Utterance 1 i would like a cheap restaurant … … find a restaurant with chinese food Utterance 2 show me a list of cheap restaurants Test Utterance word relation matrix 1 1 1 1 1 1 1 Train 1 1 1 1 ‧ 1 1 Test slot relation matrix 1 Word Relation Model Slot Induction Slot Relation Model Relation matrices allow each node to propagate scores to its neighbors in the knowledge graph, so that domain-specific words/slots have higher scores after matrix multiplication.

Knowledge graph Construction Syntactic dependency parsing on utterances ccomp amod dobj nsubj det can i have a cheap restaurant capability expensiveness locale_by_use Slot-based semantic knowledge graph capability locale_by_use expensiveness s What’s the weight for each edge? restaurant can have i a cheap w Word-based lexical knowledge graph

Knowledge graph Construction The edge between a node pair is weighted as relation importance to propagate the scores via a relation matrix How to decide the weights to represent relation importance? Slot-based semantic knowledge graph capability locale_by_use expensiveness s What’s the weight for each edge? restaurant can have i a cheap w Word-based lexical knowledge graph

Weight Measurement by Embeddings Dependency-based word embeddings Dependency-based slot embeddings can = [ 0.8 … 0.24 ] can i have a cheap restaurant ccomp amod dobj nsubj det have = [ 0.3 … 0.21 ] : expensiveness = [ 0.12 … 0.7 ] have a capability expensiveness locale_by_use ccomp amod dobj nsubj det capability = [ 0.3 … 0.6 ] : Levy and Goldberg, " Dependency-Based Word Embeddings," in Proc. of ACL, 2014.

Weight Measurement by Embeddings Compute edge weights to represent relation importance Slot-to-slot semantic relation 𝑅 𝑠 𝑆 : similarity between slot embeddings Slot-to-slot dependency relation 𝑅 𝑠 𝐷 : dependency score between slot embeddings Word-to-word semantic relation 𝑅 𝑤 𝑆 : similarity between word embeddings Word-to-word dependency relation 𝑅 𝑤 𝐷 : dependency score between word embeddings 𝑅 𝑠 𝑆𝐷 = 𝑅 𝑠 𝑆 + 𝑅 𝑠 𝐷 s2 s1 s3 Graph for all utterances w1 w2 w3 w4 w5 w6 w7 𝑅 𝑤 𝑆𝐷 = 𝑅 𝑤 𝑆 + 𝑅 𝑤 𝐷 Y.-N. Chen et al., “Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding," in Proc. of NAACL, 2015.

Knowledge Graph Propagation Model 𝑅 𝑠 𝑆𝐷 1 Word Observation Slot Candidate Train cheap restaurant food expensiveness locale_by_use Test Slot Induction word relation matrix ‧ 𝑅 𝑤 𝑆𝐷 slot relation matrix Word Relation Model Slot Relation Model

2nd Issue: unobserved hidden semantics may benefit understanding Feature Model Ontology Induction SLU Fw Fs Structure Learning . Word Observation Slot Candidate cheap food restaurant expensiveness food locale_by_use Utterance 1 i would like a cheap restaurant 1 1 1 1 Utterance 2 Train find a restaurant with chinese food 1 1 1 1 Test Utterance hidden semantics show me a list of cheap restaurants 1 .90 1 .97 .85 .95 Test .05 1 … … … .93 .05 .98 .92 Slot Induction 2nd Issue: unobserved hidden semantics may benefit understanding

Outline Introduction Ontology Induction: Frame-Semantic Parsing Structure Learning: Knowledge Graph Propagation Spoken Language Understanding (SLU): Matrix Factorization (for 2nd issue) Experiments Conclusions

2nd ISSUE: How to learn implicit semantics? Matrix Factorization (MF) 𝑅 𝑠 𝑆𝐷 Word Observation Slot Candidate cheap food restaurant expensiveness food locale_by_use word relation matrix 1 1 1 1 Train 1 1 1 1 ‧ 𝑅 𝑤 𝑆𝐷 1 .90 1 .97 .85 .95 Test slot relation matrix .05 1 .93 .05 .98 .92 Word Relation Model Slot Induction Slot Relation Model Reasoning with Matrix Factorization MF method completes a partially-missing matrix based on a low-rank latent semantics assumption.

Matrix Factorization (MF) The decomposed matrices represent low-rank latent semantics for utterances and words/slots respectively The product of two matrices fills the probability of hidden semantics 1 Word Observation Slot Candidate Train cheap restaurant food expensiveness locale_by_use Test .97 .90 .95 .85 .93 .92 .98 .05 𝑼 ≈ 𝑼 ×𝒅 × 𝒅× 𝑾 + 𝑺 𝑾 + 𝑺

Bayesian Personalized Ranking for MF Model implicit feedback not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts Objective: 𝑓 + 𝑓 − 𝑓 − 𝑥 𝑢 1 The objective is to learn a set of well-ranked semantic slots per utterance.

2nd ISSUE: How to learn implicit semantics? Matrix Factorization (MF) 𝑅 𝑠 𝑆𝐷 Word Observation Slot Candidate cheap food restaurant expensiveness food locale_by_use word relation matrix 1 1 1 1 Train 1 1 1 1 ‧ 𝑅 𝑤 𝑆𝐷 1 .90 1 .97 .85 .95 Test slot relation matrix .05 1 .93 .05 .98 .92 Word Relation Model Slot Induction Slot Relation Model Reasoning with Matrix Factorization MF method completes a partially-missing matrix based on a low-rank latent semantics assumption.

Outline Introduction Ontology Induction: Frame-Semantic Parsing Structure Learning: Knowledge Graph Propagation Spoken Language Understanding (SLU): Matrix Factorization Experiments Conclusions

Experimental Setup Dataset Cambridge University SLU corpus [Henderson, 2012] Restaurant recommendation in an in-car setting in Cambridge WER = 37% vocabulary size = 1868 2,166 dialogues 15,453 utterances dialogue slot: addr, area, food, name, phone, postcode, price range, task, type The mapping table between induced and reference slots Henderson et al., "Discriminative spoken language understanding using word confusion networks," in Proc. of SLT, 2012.

Experiment 1: Quality of Semantics Estimation Metric: Mean Average Precision (MAP) of all estimated slot probabilities for each utterance Approach ASR Manual w/o w/ Explicit Explicit Support Vector Machine 32.5 36.6 Multinomial Logistic Regression 34.0 38.8

Experiment 1: Quality of Semantics Estimation Metric: Mean Average Precision (MAP) of all estimated slot probabilities for each utterance Approach ASR Manual w/o w/ Explicit Explicit Support Vector Machine 32.5 36.6 Multinomial Logistic Regression 34.0 38.8 Implicit Baseline Random Majority MF Feature Model Feature Model + Knowledge Graph Propagation Modeling Implicit Semantics

Experiment 1: Quality of Semantics Estimation Metric: Mean Average Precision (MAP) of all estimated slot probabilities for each utterance Approach ASR Manual w/o w/ Explicit Explicit Support Vector Machine 32.5 36.6 Multinomial Logistic Regression 34.0 38.8 Implicit Baseline Random 3.4 2.6 Majority 15.4 16.4 MF Feature Model 24.2 22.6 Feature Model + Knowledge Graph Propagation 40.5* (+19.1%) 52.1* (+34.3%) Modeling Implicit Semantics

Experiment 1: Quality of Semantics Estimation Metric: Mean Average Precision (MAP) of all estimated slot probabilities for each utterance Approach ASR Manual w/o w/ Explicit Explicit Support Vector Machine 32.5 36.6 Multinomial Logistic Regression 34.0 38.8 Implicit Baseline Random 3.4 22.5 2.6 25.1 Majority 15.4 32.9 16.4 38.4 MF Feature Model 24.2 37.6* 22.6 45.3* Feature Model + Knowledge Graph Propagation 40.5* (+19.1%) 43.5* (+27.9%) 52.1* (+34.3%) 53.4* (+37.6%) Modeling Implicit Semantics The MF approach effectively models hidden semantics to improve SLU. Adding a knowledge graph propagation model further improves performance.

Experiment 2: Effectiveness of Relations Approach ASR Manual Feature Model 37.6 45.3 Feature + Knowledge Graph Propagation Semantic 𝑅 𝑤 𝑆 0 0 𝑅 𝑠 𝑆 41.4* 51.6* Dependency 𝑅 𝑤 𝐷 0 0 𝑅 𝑠 𝐷 41.6* 49.0* Word 𝑅 𝑤 𝑆𝐷 0 0 0 39.2* 45.2 Slot 0 0 0 𝑅 𝑠 𝑆𝐷 42.1* 49.9* Both 𝑅 w 𝑆𝐷 0 0 𝑅 𝑠 𝑆𝐷 All types of relations are useful to infer hidden semantics.

Experiment 2: Effectiveness of Relations Approach ASR Manual Feature Model 37.6 45.3 Feature + Knowledge Graph Propagation Semantic 𝑅 𝑤 𝑆 0 0 𝑅 𝑠 𝑆 41.4* 51.6* Dependency 𝑅 𝑤 𝐷 0 0 𝑅 𝑠 𝐷 41.6* 49.0* Word 𝑅 𝑤 𝑆𝐷 0 0 0 39.2* 45.2 Slot 0 0 0 𝑅 𝑠 𝑆𝐷 42.1* 49.9* Both 𝑅 w 𝑆𝐷 0 0 𝑅 𝑠 𝑆𝐷 43.5* (+15.7%) 53.4* (+17.9%) All types of relations are useful to infer hidden semantics. Combining different relations further improves the performance.

Outline Introduction Ontology Induction: Frame-Semantic Parsing Structure Learning: Knowledge Graph Propagation Spoken Language Understanding (SLU): Matrix Factorization Experiments Conclusions

Conclusions Ontology induction and knowledge graph construction enable systems to automatically acquire open domain knowledge. MF for SLU provides a principle model that is able to unify the automatically acquired knowledge adapt to a domain-specific setting and then allows systems to consider implicit semantics for better understanding. The work shows the feasibility and the potential of improving generalization, maintenance, efficiency, and scalability of SDSs. The proposed unsupervised SLU achieves 43% of MAP on ASR-transcribed conversations.

Thanks for your attentions!! Q & A Thanks for your attentions!!

Cambridge University SLU Corpus hi i'd like a restaurant in the cheap price range in the centre part of town um i'd like chinese food please how much is the main cost okay and uh what's the address great uh and if i wanted to uh go to an italian restaurant instead italian please what's the address i would like a cheap chinese restaurant something in the riverside type=restaurant, pricerange=cheap, area=centre food=chinese pricerange addr food=italian, type=restaurant food=italian addr pricerange=cheap, food=chinese, type=restaurant area=centre [back]

Word Embeddings Training Process Each word w is associated with a vector The contexts within the window size c are considered as the training data D Objective function: wt-2 wt-1 wt+1 wt+2 wt SUM INPUT PROJECTION OUTPUT CBOW Model Mikolov et al., " Efficient Estimation of Word Representations in Vector Space," in Proc. of ICLR, 2013. Mikolov et al., " Distributed Representations of Words and Phrases and their Compositionality," in Proc. of NIPS, 2013. Mikolov et al., " Linguistic Regularities in Continuous Space Word Representations," in Proc. of NAACL-HLT, 2013. [back]

Dependency-Based Embeddings Word & Context Extraction can i have a cheap restaurant ccomp amod dobj nsubj det Word Contexts can have/ccomp i have/nsub-1 have can/ccomp-1, i/nsubj, restaurant/dobj a restaurant/det-1 cheap restaurant/amod-1 restaurant have/dobj-1, a/det, cheap/amod Levy and Goldberg, " Dependency-Based Word Embeddings," in Proc. of ACL, 2014.

Dependency-Based Embeddings Training Process Each word w is associated with a vector vw and each context c is represented as a vector vc Learn vector representations for both words and contexts such that the dot product vw.vc associated with good word-context pairs belonging to the training data D is maximized Objective function: Levy and Goldberg, " Dependency-Based Word Embeddings," in Proc. of ACL, 2014. [back]

Slot Mapping Table Create the mapping if slot fillers of the induced slot are included by the reference slot induced slots reference slot origin food food u1 u2 : uk un asian : japan : beer noodle asian beer : japan noodle [back]

SEMAFOR Performance The SEMAFOR evaluation [back]

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents) The follow-up behaviors are observable and usually correspond to user intents price=“cheap” target=“restaurant” SLU “can i have a cheap restaurant” restaurant=“din tai fung” time=“tonight” SLU “i plan to dine in din tai fung tonight” behavior=navigation behavior=reservation

Behavior Prediction . play lady gaga’s song bad romance SLU Model Predicted Behavior “play lady gaga’s bad romance” Behavior Identification Unlabeled Collection SLU Modeling for Behavior Prediction Ff Fb Feature Model Rf Rb Relation Model Feature Relation Model Behavior Relation Model . 1 Utterance 1 play lady gaga’s song bad romance Feature Observation Behavior Train … … … play song pandora youtube maps i’d like to listen to lady gaga’s bad romance Utterance 2 listen Test .97 .90 .05 .85 Feature Relation Behavior Relation Predicting with Matrix Factorization Identification