University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing of Natural.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Fast Algorithms For Hierarchical Range Histogram Constructions
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning Semantic Parsers Using Statistical.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
1 Natural Language Processing COMPSCI 423/723 Rohit Kate.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
An Introduction of Support Vector Machine
UNIT-III By Mr. M. V. Nikum (B.E.I.T). Programming Language Lexical and Syntactic features of a programming Language are specified by its grammar Language:-
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
1 Unsupervised Semantic Parsing Hoifung Poon and Pedro Domingos EMNLP 2009 Best Paper Award Speaker: Hao Xiong.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
111 CS 388: Natural Language Processing: Semantic Parsing Raymond J. Mooney University of Texas at Austin.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
Scalable Text Mining with Sparse Generative Models
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Machine Learning in Simulation-Based Analysis 1 Li-C. Wang, Malgorzata Marek-Sadowska University of California, Santa Barbara.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars Kewei TuVasant Honavar Departments of Statistics and Computer Science University.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning Language Semantics from Ambiguous Supervision Rohit J. Kate.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
1 David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition.
David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition.
The Impact of Grammar Enhancement on Semantic Resources Induction Luca Dini Giampaolo Mazzini
1 Learning for Semantic Parsing Using Statistical Syntactic Parsing Techniques Ruifang Ge Ph.D. Final Defense Supervisor: Raymond J. Mooney Machine Learning.
Learning to Transform Natural to Formal Language Presented by Ping Zhang Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney.
Discriminative Models for Spoken Language Understanding Ye-Yi Wang, Alex Acero Microsoft Research, Redmond, Washington USA ICSLP 2006.
A Language Independent Method for Question Classification COLING 2004.
1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing Raymond.
Chapter 23: Probabilistic Language Models April 13, 2004.
Introduction to Compiling
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning a Compositional Semantic Parser.
Compositional Program Synthesis from Natural Language and Examples Mohammad Raza, Sumit Gulwani & Natasa Milic-Frayling Microsoft.
Natural Language Generation with Tree Conditional Random Fields Wei Lu, Hwee Tou Ng, Wee Sun Lee Singapore-MIT Alliance National University of Singapore.
Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing with Kernels under Various Forms of.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing of Natural.
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
CS416 Compiler Design1. 2 Course Information Instructor : Dr. Ilyas Cicekli –Office: EA504, –Phone: , – Course Web.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
1 Learning Language from its Perceptual Context Ray Mooney Department of Computer Sciences University of Texas at Austin Joint work with David Chen Rohit.
1 Learning Semantic Parsers: An Important But Under-Studied Problem Raymond J. Mooney Dept. of Computer Sciences University of Texas at Austin "The fish.
A Kernel-based Approach to Learning Semantic Parsers
Statistical Machine Translation Part II: Word Alignments and EM
School of Computer Science & Engineering
Overview of Compilation The Compiler Front End
Semantic Parsing for Question Answering
Natural Language Processing (NLP)
Using String-Kernels for Learning Semantic Parsers
Learning to Transform Natural to Formal Languages
CS 388: Natural Language Processing: Semantic Parsing
Learning to Parse Database Queries Using Inductive Logic Programming
Learning to Sportscast: A Test of Grounded Language Acquisition
Statistical Machine Translation Papers from COLING 2004
Natural Language Processing (NLP)
Natural Language Processing (NLP)
Presentation transcript:

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing of Natural Language December 19, 2005 Raymond J. Mooney Ruifang Ge, Rohit Kate, Yuk Wah Wong John Zelle, Cynthia Thompson

2 Syntactic Natural Language Learning Most computational research in natural-language learning has addressed “low-level” syntactic processing. –Morphology (e.g. past-tense generation) –Part-of-speech tagging –Shallow syntactic parsing (chunking) –Syntactic parsing

3 Semantic Natural Language Learning Learning for semantic analysis has been restricted to relatively “shallow” meaning representations. –Word sense disambiguation (e.g. SENSEVAL) –Semantic role assignment (determining agent, patient, instrument, etc., e.g. FrameNet, PropBank) –Information extraction

4 Semantic Parsing A semantic parser maps a natural-language sentence to a complete, detailed semantic representation: logical form or meaning representation (MR). For many applications, the desired output is immediately executable by another program. Two application domains: –CLang: RoboCup Coach Language –GeoQuery: A Database Query Application

5 CLang: RoboCup Coach Language In RoboCup Coach competition teams compete to coach simulated players The coaching instructions are given in a formal language called CLang Simulated soccer field Coach CLang If the ball is in our penalty area, then all our players except player 4 should stay in our half. ((bpos (penalty-area our)) (do (player-except our{4}) (pos (half our))) Semantic Parsing

6 GeoQuery: A Database Query Application Query application for U.S. geography database containing about 800 facts [Zelle & Mooney, 1996] User How many cities are there in the US? Query answer(A, count(B, (city(B), loc(B, C), const(C, countryid(USA))),A)) Semantic Parsing

7 Learning Semantic Parsers Manually programming robust semantic parsers is difficult due to the complexity of the task. Semantic parsers can be learned automatically from sentences paired with their logical form. NL  LF Training Exs Semantic-Parser Learner Natural Language Logical Form Semantic Parser

8 Engineering Motivation Most computational language-learning research strives for broad coverage while sacrificing depth. –“Scaling up by dumbing down” Realistic semantic parsing currently entails domain dependence. Domain-dependent natural-language interfaces have a large potential market. Learning makes developing specific applications more tractable. Training corpora can be easily developed by tagging existing corpora of formal statements with natural-language glosses.

9 Cognitive Science Motivation Most natural-language learning methods require supervised training data that is not available to a child. –General lack of negative feedback on grammar. –No POS-tagged or treebank data. Assuming a child can infer the likely meaning of an utterance from context, NL  LF pairs are more cognitively plausible training data.

10 Our Semantic-Parser Learners CHILL+WOLFIE (Zelle & Mooney, 1996; Thompson & Mooney, 1999, 2003) –Separates parser-learning and semantic-lexicon learning. –Learns a deterministic parser using ILP techniques. COCKTAIL (Tang & Mooney, 2001) –Improved ILP algorithm for CHILL. SILT (Kate, Wong & Mooney, 2005) –Learns symbolic transformation rules for mapping directly from NL to LF. SCISSOR (Ge & Mooney, 2005) –Integrates semantic interpretation into Collins’ statistical syntactic parser. WASP (Wong & Mooney, in preparation) –Uses syntax-based statistical machine translation methods. KRISP (Kate & Mooney, in preparation) –Uses a series of SVM classifiers employing a string-kernel to iteratively build semantic representations.

11 Based on a fairly standard approach to compositional semantics [Jurafsky and Martin, 2000] A statistical parser is used to generate a semantically augmented parse tree (SAPT) –Augment Collins’ head-driven model 2 to incorporate semantic labels Translate SAPT into a complete formal meaning representation (MR) SCISSOR : Semantic Composition that Integrates Syntax and Semantics to get Optimal Representations MR: bowner(player(our,2)) ourplayer2has theball PRP$-teamNN-playerCD-unumVB-bowner DT-nullNN-null NP-null VP-bownerNP-player S-bowner ourplayer2has theball PRP$-teamNN-playerCD-unumVB-bowner DT-nullNN-null NP-null VP-bownerNP-player S-bowner

12 Overview of S CISSOR Integrated Semantic Parser SAPT Training Examples TRAINING SAPT ComposeMR MR NL Sentence TESTING learner

13 SCISSOR SAPT Parser Implementation Semantic labels added to Bikel’s (2004) open- source version of the Collins statistical parser. Head-driven derivation of production rules augmented to also generate semantic labels. Parameter estimates during training employ an augmented smoothing technique to account for additional data sparsity created by semantic labels. Parsing of test sentences to find the most probable SAPT is performed using a standard beam-search constrained version of CKY chart-parsing algorithm.

14 ComposeMR ourplayer 2 has theball teamplayerunumbowner null bownerplayer bowner

15 ComposeMR ourplayer 2 has theball teamplayer(_,_)unumbowner(_) null bowner(_)player(_,_) bowner(_)

16 ComposeMR ourplayer 2 has theball teamplayer(_,_)unumbowner(_) null player(_,_) bowner(_) null bowner(_) null player(_,_) player(team,unum) player(our,2) bowner(_) bowner(player) bowner(player(our,2))

17 WASP A Machine Translation Approach to Semantic Parsing Based on a semantic grammar of the natural language. Uses machine translation techniques –Synchronous context-free grammars (SCFG) (Wu, 1997; Melamed, 2004; Chiang, 2005) –Word alignments (Brown et al., 1993; Och & Ney, 2003) Hence the name: Word Alignment-based Semantic Parsing

18 Synchronous Context-Free Grammars (SCFG) Developed by Aho & Ullman (1972) as a theory of compilers that combines syntax analysis and code generation in a single phase Generates a pair of strings in a single derivation

19 Compiling, Machine Translation, and Semantic Parsing SCFG: formal language to formal language (compiling) Alignment models: natural language to natural language (machine translation) WASP: natural language to formal language (semantic parsing)

20 QUERY  What is CITY CITY  the capital CITY CITY  of STATE STATE  Ohio Context-Free Semantic Grammar Ohio of STATE QUERY CITY What is CITY the capital

21 QUERY  What is CITY / answer(CITY) Productions of Synchronous Context-Free Grammars Referred to as transformation rules in Kate, Wong & Mooney (2005) patterntemplate

22 STATE  Ohio / stateid('ohio') QUERY  What is CITY / answer(CITY) CITY  the capital CITY / capital(CITY) CITY  of STATE / loc_2(STATE) What is the capital of Ohio Synchronous Context-Free Grammars Ohio of STATE QUERY CITY What is QUERY answer ( CITY ) capital ( CITY ) loc_2 ( STATE ) stateid ( 'ohio' ) answer(capital(loc_2(stateid('ohio')))) CITY the capital

23 N (non-terminals) = {QUERY, CITY, STATE, …} S (start symbol) = QUERY T m (MRL terminals) = {answer, capital, loc_2, (, ), …} T n (NL words) = {What, is, the, capital, of, Ohio, …} L (lexicon) = λ (parameters of probabilistic model) = ? Parsing Model of WASP STATE  Ohio / stateid('ohio') QUERY  What is CITY / answer(CITY) CITY  the capital CITY / capital(CITY) CITY  of STATE / loc_2(STATE)

24 Probabilistic Parsing Model Ohio of STATE CITY capital ( CITY ) loc_2 ( STATE ) stateid ( 'ohio' ) capital CITY STATE  Ohio / stateid('ohio') CITY  capital CITY / capital(CITY) CITY  of STATE / loc_2(STATE) d1d1

25 Probabilistic Parsing Model Ohio of RIVER CITY capital ( CITY ) loc_2 ( RIVER ) riverid ( 'ohio' ) capital CITY RIVER  Ohio / riverid('ohio') CITY  capital CITY / capital(CITY) CITY  of RIVER / loc_2(RIVER) d2d2

26 CITY capital ( CITY ) loc_2 ( STATE ) stateid ( 'ohio' ) Probabilistic Parsing Model CITY capital ( CITY ) loc_2 ( RIVER ) riverid ( 'ohio' ) STATE  Ohio / stateid('ohio') CITY  capital CITY / capital(CITY) CITY  of STATE / loc_2(STATE) RIVER  Ohio / riverid('ohio') CITY  capital CITY / capital(CITY) CITY  of RIVER / loc_2(RIVER) λλ Pr(d 1 |capital of Ohio) = exp( ) / ZPr(d 2 |capital of Ohio) = exp( ) / Z d1d1 d2d2 normalization constant

27 N (non-terminals) = {QUERY, CITY, STATE, …} S (start symbol) = QUERY T m (MRL terminals) = {answer, capital, loc_2, (, ), …} T n (NL words) = {What, is, the, capital, of, Ohio, …} L (lexicon) = λ (parameters of probabilistic model) Parsing Model of WASP STATE  Ohio / stateid('ohio') QUERY  What is CITY / answer(CITY) CITY  the capital CITY / capital(CITY) CITY  of STATE / loc_2(STATE)

28 Overview of WASP Lexical acquisition Parameter estimation Semantic parsing Unambiguous CFG of MRL Training set, {(e,f)} Lexicon, L Parsing model parameterized by λ Input sentence, e' Output MR, f' Training Testing

29 Lexical Acquisition Transformation rules are extracted from word alignments between an NL sentence, e, and its correct MR, f, for each training example, (e, f)

30 Word Alignments A mapping from French words to their meanings expressed in English And the program has been implemented Le programme a été mis en application

31 Lexical Acquisition Train a statistical word alignment model (IBM Model 5) on training set Obtain most probable n-to-1 word alignments for each training example Extract transformation rules from these word alignments Lexicon L consists of all extracted transformation rules

32 Word Alignment for Semantic Parsing How to introduce syntactic tokens such as parens? ( ( true ) ( do our { 1 } ( pos ( half our ) ) ) ) The goalie should always stay in our half

33 Use of MRL Grammar The goalie should always stay in our half RULE  (CONDITION DIRECTIVE) CONDITION  (true) DIRECTIVE  (do TEAM {UNUM} ACTION) TEAM  our UNUM  1 ACTION  (pos REGION) REGION  (half TEAM) TEAM  our top-down, left-most derivation of an un- ambiguous CFG n-to-1

34 TEAM Extracting Transformation Rules The goalie should always stay in our half RULE  (CONDITION DIRECTIVE) CONDITION  (true) DIRECTIVE  (do TEAM {UNUM} ACTION) TEAM  our UNUM  1 ACTION  (pos REGION) REGION  (half TEAM) TEAM  our TEAM  our / our

35 REGION TEAM REGION  TEAM half / (half TEAM) Extracting Transformation Rules The goalie should always stay in half RULE  (CONDITION DIRECTIVE) CONDITION  (true) DIRECTIVE  (do TEAM {UNUM} ACTION) TEAM  our UNUM  1 ACTION  (pos REGION) REGION  (half TEAM) TEAM  our REGION  (half our)

36 ACTION ACTION  (pos (half our)) REGION ACTION  stay in REGION / (pos REGION) Extracting Transformation Rules The goalie should always stay in RULE  (CONDITION DIRECTIVE) CONDITION  (true) DIRECTIVE  (do TEAM {UNUM} ACTION) TEAM  our UNUM  1 ACTION  (pos REGION) REGION  (half our)

37 Based on maximum-entropy model: Features f i (d) are number of times each transformation rule is used in a derivation d Output translation is the yield of most probable derivation Probabilistic Parsing Model

38 Parameter Estimation Maximum conditional log-likelihood criterion Since correct derivations are not included in training data, parameters λ * are learned in an unsupervised manner EM algorithm combined with improved iterative scaling, where hidden variables are correct derivations (Riezler et al., 2000)

39 KRISP: Kernel-based Robust Interpretation by Semantic Parsing Learns semantic parser from NL sentences paired with their respective MRs given MRL grammar Productions of MRL are treated like semantic concepts SVM classifier is trained for each production with string subsequence kernel These classifiers are used to compositionally build MRs of the sentences

40 Kernel Functions A kernel K is a similarity function over domain X which maps any two objects x, y in X to their similarity score K(x,y) For x 1, x 2,…, x n in X, the n-by-n matrix (K(x i,x j )) ij should be symmetric and positive-semidefinite, then the kernel function calculates the dot-product of the implicit feature vectors in some high- dimensional feature space Machine learning algorithms which use the data only to compute similarity can be kernelized (e.g. Support Vector Machines, Nearest Neighbor etc.)

41 String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] All possible subsequences become the implicit feature vectors and the kernel computes their dot-products s = “left side of our penalty area” t = “our left penalty area” K(s,t) = ?

42 String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] All possible subsequences become the implicit feature vectors and the kernel computes their dot-products s = “left side of our penalty area” t = “our left penalty area” u = left K(s,t) = 1+?

43 String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] All possible subsequences become the implicit feature vectors and the kernel computes their dot-products s = “left side of our penalty area” t = “our left penalty area” u = our K(s,t) = 2+?

44 String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] All possible subsequences become the implicit feature vectors and the kernel computes their dot-products s = “left side of our penalty area” t = “our left penalty area” u = penalty K(s,t) = 3+?

45 String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] All possible subsequences become the implicit feature vectors and the kernel computes their dot-products s = “left side of our penalty area” t = “our left penalty area” u = area K(s,t) = 4+?

46 String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] All possible subsequences become the implicit feature vectors and the kernel computes their dot-products s = “left side of our penalty area” t = “our left penalty area” u = left penalty K(s,t) = 5+?

47 String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] All possible subsequences become the implicit feature vectors and the kernel computes their dot-products s = “left side of our penalty area” t = “our left penalty area” K(s,t) = 11

48 Normalized String Subsequence Kernel Normalize the kernel (range [0,1]) to remove any bias due to different string lengths Lodhi et al. [2002] give O(n|s||t|) for computing string subsequence kernel Used for Text Categorization [Lodhi et al, 2002] and Information Extraction [Bunescu & Mooney, 2005b]

49 Support Vector Machines SVM’s are classifiers that learn linear separators that maximize the margin between data and the classification boundary. Kernel’s allow SVM’s to learn non-linear separators by implicitly mapping data to a higher- dimensional feature space. ρ

50 Overview of KRISP Train string-kernel-based SVM classifiers Semantic Parser Collect positive and negative examples MRL Grammar NL sentences with MRs Novel NL sentences Best MRs Best semantic derivations (correct and incorrect) Training Testing

51 Overview of KRISP’s Semantic Parsing We first define Semantic Derivation of an NL sentence We define probability of a semantic derivation Semantic parsing of an NL sentence involves finding its most probable semantic derivation Straightforward to obtain MR from a semantic derivation

52 Semantic Derivation of an NL Sentence ANSWER answer STATE RIVER STATE NEXT_TO TRAVERSE_2 STATEID stateid ‘ texas ’ next_to traverse_2 Which rivers run through the states bordering Texas? MR parse with non-terminals on the nodes:

53 Semantic Derivation of an NL Sentence Which rivers run through the states bordering Texas? ANSWER  answer(RIVER) RIVER  TRAVERSE_2(STATE) TRAVERSE_2  traverse_2 STATE  NEXT_TO(STATE) NEXT_TO  next_to STATE  STATEID STATEID  ‘ texas ’ MR parse with productions on the nodes:

54 Semantic Derivation of an NL Sentence Which rivers run through the states bordering Texas? ANSWER  answer(RIVER) RIVER  TRAVERSE_2(STATE) TRAVERSE_2  traverse_2 STATE  NEXT_TO(STATE) NEXT_TO  next_to STATE  STATEID STATEID  ‘ texas ’ Semantic Derivation: Each node covers an NL substring:

55 Semantic Derivation of an NL Sentence Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE_2(STATE), [1..9]) (TRAVERSE_2  traverse_2, [1..4]) (STATE  NEXT_TO(STATE), [5..9]) (STATE  STATEID, [8..9]) (STATEID  ‘ texas ’, [8..9]) Semantic Derivation: Each node contains a production and the substring of NL sentence it covers: (NEXT_TO  next_to, [5..7])

56 Probability of a Semantic Derivation Let P π (s[i..j]) be the probability that production π covers the substring s[i..j], For e.g., P NEXT_TO  next_to (“the states bordering”) Obtained from the string-kernel-based SVM classifiers trained for each production π Probability of a semantic derivation D: (NEXT_TO  next_to, [5..7]) the states bordering 5 6 7

57 Computing the Most Probable Semantic Derivation Implemented by extending Earley’s [1970] context-free grammar parsing algorithm Dynamic programming algorithm which generates and compactly stores each subtree once. Does a greedy approximation search, with beam width ω, and returns ω most probable derivations it finds.

58 KRISP’s Training Algorithm Takes NL sentences paired with their respective MRs as input Obtains MR parses Proceeds in iterations In the first iteration, for every production π: –Call those sentences positives whose MR parses use that production –Call the remaining sentences negatives

59 KRISP’s Training Algorithm contd. STATE  NEXT_TO(STATE) which rivers run through the states bordering texas? what is the most populated state bordering oklahoma ? what is the largest city in states that border california ? … what state has the highest population ? what states does the delaware river run through ? which states have cities named austin ? what is the lowest point of the state with the largest area ? … PositivesNegatives P STATE  NEXT_TO(STATE) (s[i..j]) String-kernel-based SVM classifier First Iteration

60 KRISP’s Training Algorithm contd. Using these classifiers P π (s[i..j]), obtain the ω best semantic derivations of each training sentence Some of these derivations will give the correct MR, called correct derivations, some will give incorrect MRs, called incorrect derivations For the next iteration, collect positives from most probable correct derivation Collect negatives from incorrect derivations with higher probability than the most probable correct derivation

61 KRISP’s Training Algorithm contd. STATE  NEXT_TO(STATE) the states bordering texas? state bordering oklahoma ? states that border california ? states which share border next to state of iowa … what state has the highest population ? what states does the delaware river run through ? which states have cities named austin ? what is the lowest point of the state with the largest area ? which rivers run through states bordering … PositivesNegatives P STATE  NEXT_TO(STATE) (s[i..j]) String-kernel-based SVM classifier Next Iteration

62 Experimental Corpora CLang –300 randomly selected pieces of coaching advice from the log files of the 2003 RoboCup Coach Competition –22.52 words on average in NL sentences –14.24 tokens on average in formal expressions GeoQuery [Zelle & Mooney, 1996] –250 queries for the given U.S. geography database –6.87 words on average in NL sentences –5.32 tokens on average in formal expressions

63 Experimental Methodology Evaluated using standard 10-fold cross validation Correctness –CLang: output exactly matches the correct representation –Geoquery: the resulting query retrieves the same answer as the correct representation Metrics

64 Precision Learning Curve for CLang

65 Recall Learning Curve for CLang

66 Precision Learning Curve for GeoQuery

67 Recall Learning Curve for Geoquery

68 Future Work Explore methods that can automatically generate SAPTs to minimize the annotation effort for S CISSOR. Learning semantic parsers just from sentences paired with “perceptual context.”

69 Conclusions Learning semantic parsers is an important and challenging problem in natural-language learning. We have obtained promising results on several applications using a variety of approaches with different strengths and weaknesses. Not many others have explored this problem, I would encourage others to consider it. More and larger corpora are needed for training and testing semantic parser induction.

70 Thank You! Our papers on learning semantic parsers are on-line at: Our corpora can be downloaded from: Questions??