Using String-Kernels for Learning Semantic Parsers

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Fast Algorithms For Hierarchical Range Histogram Constructions
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning Semantic Parsers Using Statistical.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
1 Natural Language Processing COMPSCI 423/723 Rohit Kate.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
An Introduction of Support Vector Machine
UNIT-III By Mr. M. V. Nikum (B.E.I.T). Programming Language Lexical and Syntactic features of a programming Language are specified by its grammar Language:-
111 CS 388: Natural Language Processing: Semantic Parsing Raymond J. Mooney University of Texas at Austin.
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning Language Semantics from Ambiguous Supervision Rohit J. Kate.
Develop a fast semantic decoder for dialogue systems Capability to parse 10 – 100 ASR hypotheses in real time Robust to speech recognition noise Semantic.
Probabilistic Context Free Grammars for Representing Action Song Mao November 14, 2000.
1 Learning for Semantic Parsing Using Statistical Syntactic Parsing Techniques Ruifang Ge Ph.D. Final Defense Supervisor: Raymond J. Mooney Machine Learning.
AN IMPLEMENTATION OF A REGULAR EXPRESSION PARSER
Learning to Transform Natural to Formal Language Presented by Ping Zhang Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney.
A Language Independent Method for Question Classification COLING 2004.
1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing Raymond.
Chapter 23: Probabilistic Language Models April 13, 2004.
August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning a Compositional Semantic Parser.
Natural Language Generation with Tree Conditional Random Fields Wei Lu, Hwee Tou Ng, Wee Sun Lee Singapore-MIT Alliance National University of Singapore.
Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore
Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing with Kernels under Various Forms of.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing of Natural.
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Introduction to Parsing
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
A Kernel-based Approach to Learning Semantic Parsers
CS 9633 Machine Learning Support Vector Machines
Support Vector Machines
Describing Syntax and Semantics
CSC 594 Topics in AI – Natural Language Processing
Linguistic Graph Similarity for News Sentence Searching
BCT 2083 DISCRETE STRUCTURE AND APPLICATIONS
Programming Languages Translator
Introduction to Parsing (adapted from CS 164 at Berkeley)
Instance Based Learning
Parsing IV Bottom-up Parsing
Web News Sentence Searching Using Linguistic Graph Similarity
Overview of Compilation The Compiler Front End
Overview of Compilation The Compiler Front End
Semantic Parsing for Question Answering
Statistical NLP: Lecture 13
CS416 Compiler Design lec00-outline September 19, 2018
Learning to Transform Natural to Formal Languages
Data Recombination for Neural Semantic Parsing
Introduction CI612 Compiler Design CI612 Compiler Design.
Training Tree Transducers
CS 388: Natural Language Processing: Semantic Parsing
Learning to Parse Database Queries Using Inductive Logic Programming
Learning to Sportscast: A Test of Grounded Language Acquisition
Family History Technology Workshop
CS416 Compiler Design lec00-outline February 23, 2019
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
SVMs for Document Ranking
Machine Learning: Lecture 5
Presentation transcript:

Using String-Kernels for Learning Semantic Parsers Rohit J. Kate Raymond J. Mooney

Semantic Parsing Semantic Parsing: Transforming natural language (NL) sentences into computer executable complete meaning representations (MRs) for some application Example application domains CLang: Robocup Coach Language Geoquery: A Database Query Application

CLang: RoboCup Coach Language In RoboCup Coach competition teams compete to coach simulated players [http://www.robocup.org] The coaching instructions are given in a formal language called CLang [Chen et al. 2003] If the ball is in our goal area then player 1 should intercept it. Simulated soccer field Semantic Parsing (bpos (goal-area our) (do our {1} intercept)) CLang

Geoquery: A Database Query Application Query application for U.S. geography database containing about 800 facts [Zelle & Mooney, 1996] Which rivers run through the states bordering Texas? Arkansas, Canadian, Cimarron,  Gila, Mississippi, Rio Grande … Answer Semantic Parsing Query answer(traverse(next_to(stateid(‘texas’)))) answer(traverse(next_to(stateid(‘texas’)))) answer(traverse(next_to(stateid(‘texas’))))

Learning Semantic Parsers We assume meaning representation languages (MRLs) have deterministic context free grammars True for almost all computer languages MRs can be parsed unambiguously

NL: Which rivers run through the states bordering Texas? MR: answer(traverse(next_to(stateid(‘texas’)))) Parse tree of MR: Non-terminals: ANSWER, RIVER, TRAVERSE, STATE, NEXT_TO, STATEID Terminals: answer, traverse, next_to, stateid, ‘texas’ Productions: ANSWER  answer(RIVER), RIVER  TRAVERSE(STATE), STATE  NEXT_TO(STATE), TRAVERSE  traverse, NEXT_TO  next_to, STATEID  ‘texas’ ANSWER answer STATE RIVER NEXT_TO TRAVERSE STATEID stateid ‘texas’ next_to traverse

Learning Semantic Parsers Assume meaning representation languages (MRLs) have deterministic context free grammars True for almost all computer languages MRs can be parsed unambiguously Training data consists of NL sentences paired with their MRs Induce a semantic parser which can map novel NL sentences to their correct MRs Learning problem differs from that of syntactic parsing where training data has trees annotated over the NL sentences

KRISP: Kernel-based Robust Interpretation for Semantic Parsing Learns semantic parser from NL sentences paired with their respective MRs given MRL grammar Productions of MRL are treated like semantic concepts SVM classifier with string subsequence kernel is trained for each production to identify if an NL substring represents the semantic concept These classifiers are used to compositionally build MRs of the sentences

Train string-kernel-based Overview of KRISP MRL Grammar Collect positive and negative examples NL sentences with MRs Best MRs (correct and incorrect) Train string-kernel-based SVM classifiers Training Semantic Parser Testing Novel NL sentences Best MRs

Train string-kernel-based Overview of KRISP MRL Grammar Collect positive and negative examples NL sentences with MRs Best MRs (correct and incorrect) Train string-kernel-based SVM classifiers Training Semantic Parser Testing Novel NL sentences Best MRs

KRISP’s Semantic Parsing We first define Semantic Derivation of an NL sentence We next define Probability of a Semantic Derivation Semantic parsing of an NL sentence involves finding its Most Probable Semantic Derivation Straightforward to obtain MR from a semantic derivation

Semantic Derivation of an NL Sentence MR parse with non-terminals on the nodes: ANSWER answer STATE RIVER NEXT_TO TRAVERSE STATEID stateid ‘texas’ next_to traverse Which rivers run through the states bordering Texas?

Semantic Derivation of an NL Sentence MR parse with productions on the nodes: ANSWER  answer(RIVER) RIVER  TRAVERSE(STATE) TRAVERSE  traverse STATE  NEXT_TO(STATE) NEXT_TO  next_to STATE  STATEID STATEID  ‘texas’ Which rivers run through the states bordering Texas?

Semantic Derivation of an NL Sentence Semantic Derivation: Each node covers an NL substring: ANSWER  answer(RIVER) RIVER  TRAVERSE(STATE) TRAVERSE  traverse STATE  NEXT_TO(STATE) NEXT_TO  next_to STATE  STATEID STATEID  ‘texas’ Which rivers run through the states bordering Texas?

Semantic Derivation of an NL Sentence Semantic Derivation: Each node contains a production and the substring of NL sentence it covers: (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO(STATE), [5..9]) (NEXT_TO  next_to, [5..7]) (STATE  STATEID, [8..9]) (STATEID  ‘texas’, [8..9]) Which rivers run through the states bordering Texas? 1 2 3 4 5 6 7 8 9

Semantic Derivation of an NL Sentence Substrings in NL sentence may be in a different order: ANSWER  answer(RIVER) RIVER  TRAVERSE(STATE) TRAVERSE  traverse STATE  NEXT_TO(STATE) NEXT_TO  next_to STATE  STATEID STATEID  ‘texas’ Through the states that border Texas which rivers run?

Semantic Derivation of an NL Sentence Nodes are allowed to permute the children productions from the original MR parse (ANSWER  answer(RIVER), [1..10]) (RIVER  TRAVERSE(STATE), [1..10]] (STATE  NEXT_TO(STATE), [1..6]) (TRAVERSE  traverse, [7..10]) (NEXT_TO  next_to, [1..5]) (STATE  STATEID, [6..6]) (STATEID  ‘texas’, [6..6]) Through the states that border Texas which rivers run? 1 2 3 4 5 6 7 8 9 10

Probability of a Semantic Derivation Let Pπ(s[i..j]) be the probability that production π covers the substring s[i..j] of sentence s For e.g., PNEXT_TO  next_to (“the states bordering”) Obtained from the string-kernel-based SVM classifiers trained for each production π Assuming independence, probability of a semantic derivation D: (NEXT_TO  next_to, [5..7]) the states bordering 5 6 7 0.99

Probability of a Semantic Derivation contd. (ANSWER  answer(RIVER), [1..9]) 0.98 (RIVER  TRAVERSE(STATE), [1..9]) 0.9 (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO(STATE), [5..9]) 0.95 0.89 (NEXT_TO  next_to, [5..7]) (STATE  STATEID, [8..9]) 0.99 0.93 (STATEID  ‘texas’, [8..9]) 0.98 Which rivers run through the states bordering Texas? 1 2 3 4 5 6 7 8 9

Computing the Most Probable Semantic Derivation Task of semantic parsing is to find the most probable semantic derivation of the NL sentence given all the probabilities Pπ(s[i..j]) Implemented by extending Earley’s [1970] context-free grammar parsing algorithm Resembles PCFG parsing but different because: Probability of a production depends on which substring of the sentence it covers Leaves are not terminals but substrings of words

Computing the Most Probable Semantic Derivation contd. Does a greedy approximation search, with beam width ω=20, and returns ω most probable derivations it finds Uses a threshold θ=0.05 to prune low probability trees

Train string-kernel-based Overview of KRISP MRL Grammar Collect positive and negative examples NL sentences with MRs Best semantic derivations (correct and incorrect) Train string-kernel-based SVM classifiers Pπ(s[i..j]) Training Semantic Parser Testing Novel NL sentences Best MRs

KRISP’s Training Algorithm Takes NL sentences paired with their respective MRs as input Obtains MR parses Induces the semantic parser and refines it in iterations In the first iteration, for every production π: Call those sentences positives whose MR parses use that production Call the remaining sentences negatives

KRISP’s Training Algorithm contd. First Iteration STATE  NEXT_TO(STATE) which rivers run through the states bordering texas? what is the most populated state bordering oklahoma ? what is the largest city in states that border california ? … what state has the highest population ? what states does the delaware river run through ? which states have cities named austin ? what is the lowest point of the state with the largest area ? Positives Negatives String-kernel-based SVM classifier

String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” K(s,t) = ?

String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = states K(s,t) = 1+?

String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = next K(s,t) = 2+?

String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = to K(s,t) = 3+?

String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = states next K(s,t) = 4+?

String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” K(s,t) = 7

String Subsequence Kernel contd. The kernel is normalized to remove any bias due to different string lengths Lodhi et al. [2002] give O(n|s||t|) algorithm for computing string subsequence kernel Used for Text Categorization [Lodhi et al, 2002] and Information Extraction [Bunescu & Mooney, 2005]

String Subsequence Kernel contd. The examples are implicitly mapped to the feature space of all subsequences and the kernel computes the dot products state with the capital of states with area larger than the states next to states that border states through which states bordering states that share border

Support Vector Machines SVMs find a separating hyperplane such that the margin is maximized Separating hyperplane state with the capital of states that are next to 0.97 states with area larger than the states next to states that border states through which states bordering states that share border Probability estimate of an example belonging to a class can be obtained using its distance from the hyperplane [Platt, 1999]

KRISP’s Training Algorithm contd. First Iteration STATE  NEXT_TO(STATE) which rivers run through the states bordering texas? what is the most populated state bordering oklahoma ? what is the largest city in states that border california ? … what state has the highest population ? what states does the delaware river run through ? which states have cities named austin ? what is the lowest point of the state with the largest area ? Positives Negatives String-kernel-based SVM classifier PSTATENEXT_TO(STATE) (s[i..j])

Train string-kernel-based Overview of KRISP MRL Grammar Collect positive and negative examples NL sentences with MRs Best semantic derivations (correct and incorrect) Train string-kernel-based SVM classifiers Pπ(s[i..j]) Training Semantic Parser Testing Novel NL sentences Best MRs

Train string-kernel-based Overview of KRISP MRL Grammar Collect positive and negative examples NL sentences with MRs Best semantic derivations (correct and incorrect) Train string-kernel-based SVM classifiers Pπ(s[i..j]) Training Semantic Parser Testing Novel NL sentences Best MRs

KRISP’s Training Algorithm contd. Using these classifiers Pπ(s[i..j]), obtain the ω best semantic derivations of each training sentence Some of these derivations will give the correct MR, called correct derivations, some will give incorrect MRs, called incorrect derivations For the next iteration, collect positives from most probable correct derivation Extended Earley’s algorithm can be forced to derive only the correct derivations by making sure all subtrees it generates exist in the correct MR parse Collect negatives from incorrect derivations with higher probability than the most probable correct derivation

KRISP’s Training Algorithm contd. Most probable correct derivation: (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO(STATE), [5..9]) (NEXT_TO  next_to, [5..7]) (STATE  STATEID, [8..9]) (STATEID  ‘texas’, [8..9]) Which rivers run through the states bordering Texas? 1 2 3 4 5 6 7 8 9

KRISP’s Training Algorithm contd. Most probable correct derivation: Collect positive examples (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO(STATE), [5..9]) (NEXT_TO  next_to, [5..7]) (STATE  STATEID, [8..9]) (STATEID  ‘texas’, [8..9]) Which rivers run through the states bordering Texas? 1 2 3 4 5 6 7 8 9

KRISP’s Training Algorithm contd. Incorrect derivation with probability greater than the most probable correct derivation: (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..7]) (STATE  STATEID, [8..9]) (STATEID  ‘texas’, [8..9]) Which rivers run through the states bordering Texas? 1 2 3 4 5 6 7 8 9 Incorrect MR: answer(traverse(stateid(‘texas’)))

KRISP’s Training Algorithm contd. Incorrect derivation with probability greater than the most probable correct derivation: Collect negative examples (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..7]) (STATE  STATEID, [8..9]) (STATEID  ‘texas’, [8..9]) Which rivers run through the states bordering Texas? 1 2 3 4 5 6 7 8 9 Incorrect MR: answer(traverse(stateid(‘texas’)))

KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO (STATE), [5..9]) (STATE  STATEID, [8..9]) (STATEID  ‘texas’, [8..9]) (NEXT_TO  next_to, [5..7]) Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..7]) (STATE  STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Traverse both trees in breadth-first order till the first nodes where their productions differ are found.

KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO (STATE), [5..9]) (STATE  STATEID, [8..9]) (STATEID  ‘texas’, [8..9]) (NEXT_TO  next_to, [5..7]) Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..7]) (STATE  STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Traverse both trees in breadth-first order till the first nodes where their productions differ are found.

KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO (STATE), [5..9]) (STATE  STATEID, [8..9]) (STATEID  ‘texas’, [8..9]) (NEXT_TO  next_to, [5..7]) Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..7]) (STATE  STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Traverse both trees in breadth-first order till the first nodes where their productions differ are found.

KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO (STATE), [5..9]) (STATE  STATEID, [8..9]) (STATEID  ‘texas’, [8..9]) (NEXT_TO  next_to, [5..7]) Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..7]) (STATE  STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Traverse both trees in breadth-first order till the first nodes where their productions differ are found.

KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO (STATE), [5..9]) (STATE  STATEID, [8..9]) (STATEID  ‘texas’, [8..9]) (NEXT_TO  next_to, [5..7]) Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..7]) (STATE  STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Traverse both trees in breadth-first order till the first nodes where their productions differ are found.

KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO (STATE), [5..9]) (STATE  STATEID, [8..9]) (STATEID  ‘texas’, [8..9]) (NEXT_TO  next_to, [5..7]) Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..7]) (STATE  STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Mark the words under these nodes.

KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO (STATE), [5..9]) (STATE  STATEID, [8..9]) (STATEID  ‘texas’, [8..9]) (NEXT_TO  next_to, [5..7]) Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..7]) (STATE  STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Mark the words under these nodes.

KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO (STATE), [5..9]) (STATE  STATEID, [8..9]) (STATEID  ‘texas’, [8..9]) (NEXT_TO  next_to, [5..7]) Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..7]) (STATE  STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Consider all the productions covering the marked words. Collect negatives for productions which cover any marked word in incorrect derivation but not in the correct derivation.

KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO (STATE), [5..9]) (STATE  STATEID, [8..9]) (STATEID  ‘texas’, [8..9]) (NEXT_TO  next_to, [5..7]) Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..7]) (STATE  STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Consider the productions covering the marked words. Collect negatives for productions which cover any marked word in incorrect derivation but not in the correct derivation.

KRISP’s Training Algorithm contd. Next Iteration: more refined positive and negative examples STATE  NEXT_TO(STATE) Positives Negatives the states bordering texas? state bordering oklahoma ? states that border california ? states which share border next to state of iowa … what state has the highest population ? what states does the delaware river run through ? which states have cities named austin ? what is the lowest point of the state with the largest area ? which rivers run through states bordering … String-kernel-based SVM classifier PSTATENEXT_TO(STATE) (s[i..j])

Train string-kernel-based Overview of KRISP MRL Grammar Collect positive and negative examples NL sentences with MRs Best semantic derivations (correct and incorrect) Train string-kernel-based SVM classifiers Pπ(s[i..j]) Training Semantic Parser Testing Novel NL sentences Best MRs

Experimental Corpora CLang [Kate, Wong & Mooney, 2005] 300 randomly selected pieces of coaching advice from the log files of the 2003 RoboCup Coach Competition 22.52 words on average in NL sentences 13.42 tokens on average in MRs Geoquery [Tang & Mooney, 2001] 880 queries for the given U.S. geography database 7.48 words on average in NL sentences 6.47 tokens on average in MRs

Experimental Methodology Evaluated using standard 10-fold cross validation Correctness CLang: output exactly matches the correct representation Geoquery: the resulting query retrieves the same answer as the correct representation Metrics

Experimental Methodology contd. Compared Systems: CHILL [Tang & Mooney, 2001]: Inductive Logic Programming based semantic parser SILT [Kate, Wong & Mooney, 2005]: learns transformation rules relating NL sentences to MR expressions SCISSOR [Ge & Mooney, 2005]: learns an integrated syntactic-semantic parser, needs extra annotations WASP [Wong & Mooney, 2006]: uses statistical machine translation techniques Zettlemoyer & Collins (2005): CCG-based semantic parser Different Experimental Setup (600 training, 280 testing examples) Results available only for Geoquery corpus

Experimental Methodology contd. KRISP gives probabilities for its semantic derivation which are taken as confidences of the MRs We plot precision-recall curves by first sorting the best MR for each sentence by confidences and then finding precision for every recall value WASP and SCISSOR also output confidences so we show their precision-recall curves Results of other systems shown as points on precision-recall graphs

requires more annotation on Results on CLang requires more annotation on the training corpus CHILL gives 49.2% precision and 12.67% recall with 160 examples, can’t run beyond.

Results on Geoquery

Experiments with Noisy NL Sentences Any application of semantic parser is likely to face noise in the input If the input is coming from a speech recognizer: Interjections (um’s and ah’s) Environment noise (door slams, phone rings etc.) Out-of-domain words, ill-formed utterances etc. KRISP does not use hard-matching rules unlike other systems and is hence more robust to noise We show this by introducing simulated speech recognition errors in the corpus

Experiments with Noisy NL Sentences contd. Interjections, environment noise etc. is likely to be recognized as real words, simulate this by adding a word with probability Padd after every word An extra word w is added with probability P(w) proportional to its frequencies in the BNC A speech recognizer may completely fail to detect a word, so with probability Pdrop a word is dropped If the ball is in our goal area then our 1 should intercept it. you player

Experiments with Noisy NL Sentences contd. A speech recognizer may confuse a word with a high frequency phonetically close word, a word is substituted by another word w with probability: ped(w)*P(w) where p is a parameter in [0,1] ed(w) is w’s edit distance from the original word [Levenshtein, 1966] P(w) is w’s probability proportional to its frequency in BNC If the ball is in our goal area then our 1 should intercept it. you when

Experiments with Noisy NL Sentences contd. Four noise levels were created by: Varying parameters Padd and Pdrop from being 0 at level zero to 0.1 at level four Varying parameter p from being 0 at level zero to 0.01 at level four Results shown when only test sentences are corrupted, qualitatively similar results when both test and train sentences are corrupted We show best F-measures (harmonic mean of precision and recall)

Results on Noisy CLang Corpus

Conclusions KRISP: A new string-kernel-based approach for learning semantic parser String-kernel-based SVM classifiers trained for each MRL production Classifiers used to compositionally build complete MRs of NL sentences Evaluated on two real-world corpora Performs better than rule-based systems Performs comparable to other statistical systems More robust to noise

Thank You! Our corpora can be downloaded from: http://www.cs.utexas.edu/~ml/nldata.html Check out our online demo for Geoquery at: http://www.cs.utexas.edu/~ml/geo.html Questions??

Extra: Experiments with Other Natural Languages

Extra: Dealing with Constants MRL grammar may contain productions corresponding to constants in the domain: STATEID  ‘new york’ RIVERID  ‘colorado’ NUM  ‘2’ STRING  ‘DR4C10’ User can specify these as constant productions giving their NL substrings Classifiers are not learned for these productions Matching substring’s probability is taken as 1 If n constant productions have same substring then each gets probability of 1/n STATEID  ‘colorado’ RIVERID  ‘colorado’

Extra: String Subsequence Kernel Subsequences with gaps should be downweighted Decay factor λ in the range of (0,1] penalizes gaps All subsequences are the implicit features and penalties are the feature values s = “left side of our penalty area” t = “our left penalty area” u = left penalty K(s,t) = 4+?

Extra: String Subsequence Kernel Subsequences with gaps should be downweighted Decay factor λ in the range of (0,1] penalizes gaps All subsequences are the implicit features and penalties are the feature values s = “left side of our penalty area” t = “our left penalty area” u = left penalty K(s,t) = 4+λ3*λ0 +? Gap of 3 => λ3 Gap of 0 => λ0

Extra: String Subsequence Kernel Subsequences with gaps should be downweighted Decay factor λ in the range of (0,1] penalizes gaps All subsequences are the implicit features and penalties are the feature values s = “left side of our penalty area” t = “our left penalty area” K(s,t) = 4+3λ+3 λ3+ λ5

Extra: KRISP’s Average Running Times Corpus Average Training Time (minutes) Average Testing Time (minutes) Geo250 1.44 0.05 Geo880 18.1 0.65 CLang 58.85 3.18 Average running times per fold in minutes taken by KRISP.

Extra: Experimental Methodology Correctness CLang: output exactly matches the correct representation Geoquery: the resulting query retrieves the same answer as the correct representation If the ball is in our penalty area, all our players except player 4 should stay in our half. Correct: ((bpos (penalty-area our)) (do (player-except our{4}) (pos (half our))) ((bpos (penalty-area opp)) (do (player-except our{4}) (pos (half our))) Output:

Extra: Computing the Most Probable Semantic Derivation Task of semantic parsing is to find the most probable semantic derivation of the NL sentence Let En,s[i..j], partial derivation, denote any subtree of a derivation tree with n as the LHS non-terminal of the root production covering sentence s from index i to j Example of ESTATE,s[5..9] : Derivation D is then EANSWER, s[1..|s|] (STATE  NEXT_TO(STATE), [5..9]) (STATE  STATEID, [8..9]) (STATEID  ‘texas’, [8..9]) (NEXT_TO  next_to, [5..7]) the states bordering Texas? 5 6 7 8 9

Extra: Computing the Most Probable Semantic Derivation contd. Let E*STATE,s[5.,9], denote the most probable partial derivation among all ESTATE,s[5.,9] This is computed recursively as follows: E*STATE,s[5..9] (STATE  NEXT_TO(STATE), [5..9]) E*NEXT_TO,s[i..j] E*STATE,s[i..j] the states bordering Texas? 5 6 7 8 9

Extra: Computing the Most Probable Semantic Derivation contd. Let E*STATE,s[5.,9], denote the most probable partial derivation among all ESTATE,s[5.,9] This is computed recursively as follows: E*STATE,s[5..9] (STATE  NEXT_TO(STATE), [5..9]) E*NEXT_TO,s[5..5] E*STATE,s[6..9] the states bordering Texas? 5 6 7 8 9

Extra: Computing the Most Probable Semantic Derivation contd. Let E*STATE,s[5.,9], denote the most probable partial derivation among all ESTATE,s[5.,9] This is computed recursively as follows: E*STATE,s[5..9] (STATE  NEXT_TO(STATE), [5..9]) E*NEXT_TO,s[5..6] E*STATE,s[7..9] the states bordering Texas? 5 6 7 8 9

Extra: Computing the Most Probable Semantic Derivation contd. Let E*STATE,s[5.,9], denote the most probable partial derivation among all ESTATE,s[5.,9] This is computed recursively as follows: E*STATE,s[5..9] (STATE  NEXT_TO(STATE), [5..9]) E*NEXT_TO,s[5..7] E*STATE,s[8..9] the states bordering Texas? 5 6 7 8 9

Extra: Computing the Most Probable Semantic Derivation contd. Let E*STATE,s[5.,9], denote the most probable partial derivation among all ESTATE,s[5.,9] This is computed recursively as follows: E*STATE,s[5..9] (STATE  NEXT_TO(STATE), [5..9]) E*NEXT_TO,s[5..8] E*STATE,s[9..9] the states bordering Texas? 5 6 7 8 9

Extra: Computing the Most Probable Semantic Derivation contd. Let E*STATE,s[5.,9], denote the most probable partial derivation among all ESTATE,s[5.,9] This is computed recursively as follows: E*STATE,s[5..9] (STATE  NEXT_TO(STATE), [5..9]) E*NEXT_TO,s[i..j] E*STATE,s[i..j] the states bordering Texas? 5 6 7 8 9

Extra: Computing the Most Probable Semantic Derivation contd. Let E*STATE,s[5.,9], denote the most probable partial derivation among all ESTATE,s[5.,9] This is computed recursively as follows: E*STATE,s[5..9] (STATE  NEXT_TO(STATE), [5..9]) E*STATE,s[i..j] E*NEXT_TO,s[i..j] the states bordering Texas? 5 6 7 8 9