Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing with Kernels under Various Forms of Supervision Rohit J. Kate Ph.D. Final Defense Supervisor: Raymond J. Mooney
2 Semantic Parsing Semantic Parsing: Transforming natural language (NL) sentences into computer executable complete meaning representations (MRs) for domain-specific applications Requires deeper semantic analysis than other semantic tasks like semantic role labeling, word sense disambiguation, information extraction Example application domains –CLang: Robocup Coach Language –Geoquery: A Database Query Application
3 CLang: RoboCup Coach Language In RoboCup Coach competition teams compete to coach simulated players [ The coaching instructions are given in a formal language called CLang [Chen et al. 2003] Simulated soccer field CLang If the ball is in our goal area then player 1 should intercept it. (bpos (goal-area our) (do our {1} intercept)) Semantic Parsing
4 Geoquery: A Database Query Application Query application for U.S. geography database containing about 800 facts [Zelle & Mooney, 1996] Which rivers run through the states bordering Texas? Query answer(traverse(next_to(stateid(‘texas’)))) Semantic Parsing Arkansas, Canadian, Cimarron, Gila, Mississippi, Rio Grande … Answer
5 Engineering Motivation for Semantic Parsing Most computational language-learning research analyzes open-domain text but the analysis is shallow Realistic semantic parsing currently entails domain dependence Applications of domain-dependent semantic parsing –Natural language interfaces to computing systems –Communication with robots in natural language –Personalized software assistants –Question-answering systems Machine Learning makes developing semantic parsers for specific applications more tractable
6 Cognitive Science Motivation for Semantic Parsing Most natural-language learning methods require supervised training data that is not available to a child –No POS-tagged or treebank data Assuming a child can infer the likely meaning of an utterance from context, NL - MR pairs are more cognitively plausible training data
7 Thesis Contributions A new framework for learning for semantic parsing based on kernel-based string classification –Requires no feature engineering –Does not use any hard-matching rules or any grammar rules for natural language which makes it robust First semi-supervised learning system for semantic parsing Considers learning for semantic parsing under cognitively motivated weaker and more general form of ambiguous supervision Introduces transformations for meaning representation grammars to make them conform better with natural language semantics
8 Outline K RISP: A Semantic Parsing Learning System Utilizing Weaker Forms of Supervision –Semi-supervision –Ambiguous supervision Transforming meaning representation grammar Directions for Future Work Conclusions
9 KRISP: Kernel-based Robust Interpretation for Semantic Parsing [Kate & Mooney, 2006] Learns semantic parser from NL sentences paired with their respective MRs given meaning representation language (MRL) grammar Productions of MRL are treated like semantic concepts SVM classifier with string subsequence kernel is trained for each production to identify if an NL substring represents the semantic concept These classifiers are used to compositionally build MRs of the sentences
10 Overview of KRISP Train string-kernel-based SVM classifiers Semantic Parser Collect positive and negative examples MRL Grammar NL sentences with MRs Novel NL sentences Best MRs Best MRs (correct and incorrect) Training Testing
11 Overview of KRISP Train string-kernel-based SVM classifiers Semantic Parser Collect positive and negative examples MRL Grammar NL sentences with MRs Novel NL sentences Best MRs Best MRs (correct and incorrect) Training Testing
12 MR: answer(traverse(next_to(stateid(‘texas’)))) Parse tree of MR: Productions: ANSWER answer(RIVER) RIVER TRAVERSE(STATE) STATE NEXT_TO(STATE) TRAVERSE traverse NEXT_TO next_to STATEID ‘texas’ ANSWER answer STATE RIVER STATE NEXT_TO TRAVERSE STATEID stateid ‘ texas ’ next_to traverse Meaning Representation Language ANSWER answer(RIVER) RIVER TRAVERSE(STATE) TRAVERSE traverse STATE NEXT_TO(STATE) NEXT_TO next_to STATE STATEID STATEID ‘ texas ’
13 Semantic Parsing by KRISP SVM classifier for each production gives the probability that a substring represents the semantic concept of the production Which rivers run through the states bordering Texas? NEXT_TO next_to NEXT_TO next_to 0.95
14 Semantic Parsing by KRISP SVM classifier for each production gives the probability that a substring represents the semantic concept of the production Which rivers run through the states bordering Texas? TRAVERSE traverse
15 Semantic Parsing by KRISP Semantic parsing is done by finding the most probable derivation of the sentence [Kate & Mooney 2006] Which rivers run through the states bordering Texas? ANSWER answer(RIVER) RIVER TRAVERSE(STATE) TRAVERSE traverse STATE NEXT_TO(STATE) NEXT_TO next_to STATE STATEID STATEID ‘ texas ’ Probability of the derivation is the product of the probabilities at the nodes.
16 Overview of KRISP Train string-kernel-based SVM classifiers Semantic Parser Collect positive and negative examples MRL Grammar NL sentences with MRs Novel NL sentences Best MRs Training Testing Classification probabilities Best semantic derivations (correct and incorrect)
17 KRISP’s Training Algorithm Takes NL sentences paired with their respective MRs as input Obtains MR parses Induces the semantic parser and refines it in iterations In the first iteration, for every production: –Call those sentences positives whose MR parses use that production –Call the remaining sentences negatives
18 KRISP’s Training Algorithm contd. STATE NEXT_TO(STATE) which rivers run through the states bordering texas? what is the most populated state bordering oklahoma ? what is the largest city in states that border california ? … what state has the highest population ? what states does the delaware river run through ? which states have cities named austin ? what is the lowest point of the state with the largest area ? … PositivesNegatives String-kernel-based SVM classifier First Iteration
19 String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” K(s,t) = ?
20 String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = states K(s,t) = 1+?
21 String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = next K(s,t) = 2+?
22 String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = to K(s,t) = 3+?
23 String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = states next K(s,t) = 4+?
24 String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” K(s,t) = 7
25 String Subsequence Kernel contd. The kernel is normalized to remove any bias due to different string lengths Lodhi et al. [2002] give O(n|s||t|) algorithm for computing string subsequence kernel Used for Text Categorization [Lodhi et al, 2002] and Information Extraction [Bunescu & Mooney, 2005]
26 String Subsequence Kernel contd. The examples are implicitly mapped to the feature space of all subsequences and the kernel computes the dot products states bordering states that border states that share border states with area larger than states through which state with the capital of the states next to STATE NEXT_TO(STATE)
27 Support Vector Machines SVMs find a separating hyperplane such that the margin is maximized the states next to states that are next to Separating hyperplane Probability estimate of an example belonging to a class can be obtained using its distance from the hyperplane [Platt, 1999] states bordering states that border states that share border state with the capital of states with area larger than states through which 0.97 STATE NEXT_TO(STATE)
28 KRISP’s Training Algorithm contd. STATE NEXT_TO(STATE) which rivers run through the states bordering texas? what is the most populated state bordering oklahoma ? what is the largest city in states that border california ? … what state has the highest population ? what states does the delaware river run through ? which states have cities named austin ? what is the lowest point of the state with the largest area ? … PositivesNegatives Clasification probabilities String-kernel-based SVM classifier First Iteration
29 Overview of KRISP Train string-kernel-based SVM classifiers Semantic Parser Collect positive and negative examples MRL Grammar NL sentences with MRs Novel NL sentences Best MRs Best semantic derivations (correct and incorrect) Training Testing Classification probabilities
30 Overview of KRISP Train string-kernel-based SVM classifiers Semantic Parser Collect positive and negative examples MRL Grammar NL sentences with MRs Novel NL sentences Best MRs Best semantic derivations (correct and incorrect) Training Testing Classification probabilities
31 KRISP’s Training Algorithm contd. Using these classifiers, obtain the ω best semantic derivations of each training sentence Some of these derivations will give the correct MR, called correct derivations, some will give incorrect MRs, called incorrect derivations For the next iteration, collect positives from most probable correct derivation Collect negatives from incorrect derivations with higher probability than the most probable correct derivation
32 KRISP’s Training Algorithm contd. Most probable correct derivation: Which rivers run through the states bordering Texas? (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..4]) (STATE NEXT_TO(STATE), [5..9]) (STATE STATEID, [8..9]) (STATEID ‘ texas ’, [8..9]) (NEXT_TO next_to, [5..7])
33 KRISP’s Training Algorithm contd. Most probable correct derivation: Collect positive examples Which rivers run through the states bordering Texas? (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..4]) (STATE NEXT_TO(STATE), [5..9]) (STATE STATEID, [8..9]) (STATEID ‘ texas ’, [8..9]) (NEXT_TO next_to, [5..7])
34 KRISP’s Training Algorithm contd. Most probable correct derivation: Collect positive examples Which rivers run through the states bordering Texas? (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..4]) (STATE NEXT_TO(STATE), [5..9]) (STATE STATEID, [8..9]) (STATEID ‘ texas ’, [8..9]) (NEXT_TO next_to, [5..7])
35 KRISP’s Training Algorithm contd. Incorrect derivation with probability greater than the most probable correct derivation: Which rivers run through the states bordering Texas? (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..7]) (STATE STATEID, [8..9]) (STATEID ‘ texas ’, [8..9]) Incorrect MR: answer(traverse(stateid( ‘ texas ’ )))
36 KRISP’s Training Algorithm contd. Incorrect derivation with probability greater than the most probable correct derivation: Collect negative examples Which rivers run through the states bordering Texas? (ANSWER answer(RIVER), [1..9]) (RIVER TRAVERSE(STATE), [1..9]) (TRAVERSE traverse, [1..7]) (STATE STATEID, [8..9]) (STATEID ‘ texas ’, [8..9]) Incorrect MR: answer(traverse(stateid( ‘ texas ’ )))
37 KRISP’s Training Algorithm contd. STATE NEXT_TO(STATE) the states bordering texas? state bordering oklahoma ? states that border california ? states which share border next to state of iowa … what state has the highest population ? what states does the delaware river run through ? which states have cities named austin ? what is the lowest point of the state with the largest area ? which rivers run through states bordering … PositivesNegatives Better classification probabilities String-kernel-based SVM classifier Next Iteration: more refined positive and negative examples
38 Overview of KRISP Train string-kernel-based SVM classifiers Semantic Parser Collect positive and negative examples MRL Grammar NL sentences with MRs Novel NL sentences Best MRs Best semantic derivations (correct and incorrect) Training Testing Classification probabilities
39 Experimental Corpora CLang [Kate, Wong & Mooney, 2005] –300 randomly selected pieces of coaching advice from the log files of the 2003 RoboCup Coach Competition –22.52 words on average in NL sentences –13.42 tokens on average in MRs Geoquery [Tang & Mooney, 2001] –880 queries for the given U.S. geography database –7.48 words on average in NL sentences –6.47 tokens on average in MRs
40 Experimental Methodology Evaluated using standard 10-fold cross validation Correctness –CLang: output exactly matches the correct representation –Geoquery: the resulting query retrieves the same answer as the correct representation Metrics
41 Experimental Methodology contd. Compared Systems: –CHILL [Tang & Mooney, 2001]: Inductive Logic Programming based semantic parser –SCISSOR [Ge & Mooney, 2005]: learns an integrated syntactic-semantic parser, needs extra annotations –WASP [Wong & Mooney, 2006]: uses statistical machine translation techniques –Zettlemoyer & Collins (2007): Combinatory Categorial Grammar (CCG) based semantic parser Different Experimental Setup (600 training, 280 testing examples) Requires an initial hand-built lexicon
42 Experimental Methodology contd. KRISP gives probabilities for its semantic derivation which are taken as confidences of the MRs We plot precision-recall curves by first sorting the best MR for each sentence by confidences and then finding precision for every recall value WASP and SCISSOR also output confidences so we show their precision-recall curves Results of other systems shown as points on precision-recall graphs
43 Results on CLang CHILL gives 49.2% precision and 12.67% recall with 160 examples, can ’ t run beyond. requires more annotation on the training corpus
44 Results on Geoquery
45 Robustness of K RISP K RISP does not use grammar rules for natural language String-kernel-based classification softly captures wide range of natural language expressions Robust to rephrasing and noise
46 Robustness of K RISP K RISP does not use grammar rules for natural language String-kernel-based classification softly captures wide range of natural language expressions Robust to rephrasing and noise Which rivers run through the states bordering Texas? TRAVERSE traverse 0.95
47 K RISP does not use grammar rules for natural language String-kernel-based classification softly captures wide range of natural language expressions Robust to rephrasing and noise Robustness of KRISP Which are the rivers that run through the states bordering Texas? TRAVERSE traverse 0.78
48 K RISP does not use grammar rules for natural language String-kernel-based classification softly captures wide range of natural language expressions Robust to rephrasing and noise Robustness of KRISP Which rivers run though the states bordering Texas? TRAVERSE traverse 0.68
49 K RISP does not use grammar rules for natural language String-kernel-based classification softly captures wide range of natural language expressions Robust to rephrasing and noise Robustness of KRISP Which rivers through the states bordering Texas? TRAVERSE traverse 0.65
50 Robustness of KRISP Which rivers ahh.. run through the states bordering Texas? TRAVERSE traverse 0.81 K RISP does not use grammar rules for natural language String-kernel-based classification softly captures wide range of natural language expressions Robust to rephrasing and noise
51 Experiments with Noisy NL Sentences Any application of semantic parser is likely to face noise in the input If the input is coming from a speech recognizer: –Interjections (um’s and ah’s) –Environment noise (door slams, phone rings etc.) –Out-of-domain words, ill-formed utterances etc. We demonstrate robustness of K RISP by introducing simulated speech recognition errors in the corpus
52 Experiments with Noisy NL Sentences contd. Noise was introduced in the NL sentences by: –Adding extra words chosen according to their frequencies in the BNC –Dropping words randomly –Substituting words withphonetically close high frequency words Four levels of noise was created by increasing the probabilities of the above Results shown when only test sentences are corrupted, qualitatively similar results when both test and train sentences are corrupted We show best F-measures (harmonic mean of precision and recall)
53 Results on Noisy CLang Corpus
54 Outline K RISP: A Supervised Learning System Utilizing Weaker Forms of Supervision –Semi-supervision –Ambiguous supervision Transforming meaning representation grammar Directions for Future Work Conclusions
55 Semi-Supervised Semantic Parsing Building annotated training data is expensive Utilize NL sentences not annotated with their MRs, usually cheaply available KRISP can be turned into a semi-supervised learner if the SVM classifiers are given appropriate unlabeled examples Which substrings should be the unlabeled examples for which productions’ SVMs?
56 SEMISUP-KRISP: Semi-Supervised Semantic Parser Learner [Kate & Mooney, 2007a] First learns a semantic parser from the supervised data using KRISP
57 SEMISUP-KRISP: Semi-Supervised Semantic Parser Learner contd. Which rivers run through the states bordering Texas? answer(traverse(next_to(stateid( ‘ texas ’ )))) What is the lowest point of the state with the largest area? answer(lowest(place(loc(largest_one(area(state(all))))))) What is the largest city in states that border California? answer(largest(city(loc(next_to(stateid( 'california')))))) …… Which states have a city named Springfield? What is the capital of the most populous state? How many rivers flow through Mississippi? How many states does the Mississippi run through? How high is the highest point in the smallest state? Which rivers flow through the states that border California? ……. Supervised Corpus SVM classifiers Collect labeled examples Semantic Parsing KRISP Unsupervised Corpus
58 SEMISUP-KRISP: Semi-Supervised Semantic Parser Learner First learns a semantic parser from the supervised data using KRISP Applies the learned parser on the unsupervised NL sentences Whenever an SVM classifier is called to estimate the probability of a substring, that substring becomes an unlabeled example for that classifier These substrings are representative of examples that the classifiers will encounter during testing Which rivers run through the states bordering Texas? NEXT_TO next_toTRAVERSE traverse
59 SVMs with Unlabeled Examples Separating hyperplane - the states next to states that are next to the states bordering states that border states that share border state with the capital of area larger than through which Production: NEXT_TO next_to
60 SVMs with Unlabeled Examples Using unlabeled test examples during training can help find a better hyperplane [Joachims 1999] Production: NEXT_TO next_to
61 Transductive SVMs contd. Find a labeling that separates all the examples with maximum margin Finding the exact solution is intractable but approximation algorithms exist [Joachims 1999], [Chen et al. 2003], [Collobert et al. 2006]
62 SEMISUP-KRISP: Semi-Supervised Semantic Parser Learner contd. Which rivers run through the states bordering Texas? answer(traverse(next_to(stateid( ‘ texas ’ )))) What is the lowest point of the state with the largest area? answer(lowest(place(loc(largest_one(area(state(all))))))) What is the largest city in states that border California? answer(largest(city(loc(next_to(stateid( 'california')))))) …… Which states have a city named Springfield? What is the capital of the most populous state? How many rivers flow through Mississippi? How many states does the Mississippi run through? How high is the highest point in the smallest state? Which rivers flow through the states that border California? ……. Supervised Corpus Unsupervised Corpus SVM classifiers Collect labeled examples Semantic Parsing Collect unlabeled examples Semantic Parsing Learned Semantic parser Transductive
63 Experiments Compared the performance of SEMISUP-KRISP and KRISP on the Geoquery domain Corpus contains 250 NL sentences annotated with their correct MRs Collected 1037 unannotated sentences from our web-based demo Evaluated by 10-fold cross validation keeping the unsupervised data same in each fold Increased the amount of supervised training data and measured the best F-measure
64 Results
65 Results 25% saving GEOBASE: Hand-built semantic parser [Borland International, 1988]
66 Outline K RISP: A Supervised Learning System Utilizing Weaker Forms of Supervision –Semi-supervision –Ambiguous supervision Transforming meaning representation grammar Directions for Future Work Conclusions
67 Unambiguous Supervision for Learning Semantic Parsers The training data for semantic parsing consists of hundreds of natural language sentences unambiguously paired with their meaning representations
68 Unambiguous Supervision for Learning Semantic Parsers The training data for semantic parsing consists of hundreds of natural language sentences unambiguously paired with their meaning representations Which rivers run through the states bordering Texas? answer(traverse(next_to(stateid(‘texas’)))) What is the lowest point of the state with the largest area? answer(lowest(place(loc(largest_one(area(state(all))))))) What is the largest city in states that border California? answer(largest(city(loc(next_to(stateid( 'california')))))) ……
69 Shortcomings of Unambiguous Supervision It requires considerable human effort to annotate each sentence with its correct meaning representation Does not model the type of supervision children receive when they are learning a language –Children are not taught meanings of individual sentences –They learn to identify the correct meaning of a sentence from several meanings possible in their perceptual context
70 ??? “Mary is on the phone”
71 Ambiguous Supervision for Learning Semantic Parsers A computer system simultaneously exposed to perceptual contexts and natural language utterances should be able to learn the underlying language semantics We consider ambiguous training data of sentences associated with multiple potential meaning representations –Siskind (1996) uses this type “referentially uncertain” training data to learn meanings of words Capturing meaning representations from perceptual contexts is a difficult unsolved problem –Our system directly works with symbolic meaning representations
72 “Mary is on the phone” ???
73 “Mary is on the phone” ???
74 Ironing(Mommy, Shirt) “Mary is on the phone” ???
75 Ironing(Mommy, Shirt) Working(Sister, Computer) “Mary is on the phone” ???
76 Ironing(Mommy, Shirt) Working(Sister, Computer) Carrying(Daddy, Bag) “Mary is on the phone” ???
77 Ironing(Mommy, Shirt) Working(Sister, Computer) Carrying(Daddy, Bag) Talking(Mary, Phone) Sitting(Mary, Chair) “Mary is on the phone” Ambiguous Training Example ???
78 Ironing(Mommy, Shirt) Working(Sister, Computer) Talking(Mary, Phone) Sitting(Mary, Chair) “Mommy is ironing shirt” Next Ambiguous Training Example ???
79 Ambiguous Supervision for Learning Semantic Parsers contd. Our model of ambiguous supervision corresponds to the type of data that will be gathered from a temporal sequence of perceptual contexts with occasional language commentary We assume each sentence has exactly one meaning in a perceptual context Each meaning is associated with at most one sentence in a perceptual context
80 Sample Ambiguous Corpus Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) saw(john, walks(man, dog)) Forms a bipartite graph
81 K RISPER: K RISP with E M-like R etraining Extension of K RISP that learns from ambiguous supervision Uses an iterative EM-like method to gradually converge on a correct meaning for each sentence Given a sentence and a meaning representation, KRISP can also find the probability that it is the correct meaning representation for the sentence
82 K RISPER’s Training Algorithm Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) saw(john, walks(man, dog)) 1. Assume every possible meaning for a sentence is correct
83 K RISPER’s Training Algorithm Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) saw(john, walks(man, dog)) 1. Assume every possible meaning for a sentence is correct
84 K RISPER’s Training Algorithm contd. Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) saw(john, walks(man, dog)) 2. Resulting NL-MR pairs are weighted and given to K RISP 1/2 1/4 1/5 1/3
85 K RISPER’s Training Algorithm contd. Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) saw(john, walks(man, dog)) 3. Estimate the confidence of each NL-MR pair using the resulting parser
86 K RISPER’s Training Algorithm contd. Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) saw(john, walks(man, dog)) 3. Estimate the confidence of each NL-MR pair using the resulting parser
87 K RISPER’s Training Algorithm contd. Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) saw(john, walks(man, dog)) 4. Use maximum weighted matching on a bipartite graph to find the best NL-MR pairs [Munkres, 1957]
88 K RISPER’s Training Algorithm contd. Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) saw(john, walks(man, dog)) 4. Use maximum weighted matching on a bipartite graph to find the best NL-MR pairs [Munkres, 1957]
89 K RISPER’s Training Algorithm contd. Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) saw(john, walks(man, dog)) 5. Give the best pairs to K RISP in the next iteration, continue till converges
90 Ambiguous Corpus Construction There is no real-world ambiguous corpus yet available for semantic parsing to our knowledge We artificially obfuscated the real-world unambiguous corpus by adding extra distracter MRs to each training pair (Ambig-Geoquery) We also created an artificial ambiguous corpus (Ambig-ChildWorld) which more accurately models real-world ambiguities in which potential candidate MRs are often related
91 Ambiguity in Corpora Three levels of ambiguity were created: MRs per NL Level 125%50%25% Level 211%22%34%22%11% Level 36%13%19%26%18%12%6%
92 Results on Ambig-Geoquery Corpus
93 Results on Ambig-ChildWorld Corpus
94 Outline K RISP: A Supervised Learning System Utilizing Weaker Forms of Supervision –Semi-supervision –Ambiguous supervision Transforming meaning representation grammar Directions for Future Work Conclusions
95 Why Transform Meaning Representation Grammar? Productions of meaning representation grammar (MRG) may not correspond well with NL semantics REGION (rec POINT POINT) POINT (pt NUM NUM) NUM -32NUM -35 NUM 0 NUM 35 “ our midfield ” ???? CLang MR expression: (rec (pt ) (pt 0 35) )
96 Why Transform Meaning Representation Grammar? Geoquery MR: answer(longest(river(loc_2(stateid( ‘ Texas ’ ))))) Which is the longest river in Texas? ANSWER answer ( RIVER ) RIVER longest ( RIVER ) RIVER river ( LOCATIONS ) LOCATIONS loc_2 ( STATE ) STATE STATEID STATEID stateid ( ‘ Texas ’ ) Productions of meaning representation grammar (MRG) may not correspond well with NL semantics
97 Why Transform Meaning Representation Grammar? Geoquery MR: answer(longest(river(loc_2(stateid( ‘ Texas ’ ))))) Which is the longest river in Texas? ANSWER answer ( RIVER ) RIVER longest ( RIVER ) RIVER river ( LOCATIONS ) LOCATIONS loc_2 ( STATE ) STATE STATEID STATEID stateid ( ‘ Texas ’ ) Productions of meaning representation grammar (MRG) may not correspond well with NL semantics
98 Manual Engineering of MRG Several awkward constructs from the original CLang grammar were manually replaced with NL compatible MR expressions MRG for Geoquery was manually constructed for its functional MRL which was derived from the original Prolog expressions Requires expertise in MRL and domain knowledge Automatically transform MRG to improve semantic parsing
99 Transforming Meaning Representation Grammar Train K RISP using the given MRG and parse the training sentences Collect “bad” productions which K RISP often uses incorrectly (its output MR parses use them but the correct MR parses do not, or vice versa) Modify these productions using four Context-Free Grammar transformation operators The transformed MRG accepts the same MRL as the original MRG
10 0 Transformation Operators 1.Create non-terminal from a terminal: Introduces a new semantic concept STATE largest STATE CITY largest CITY PLACE largest PLACE LARGEST largest Bad productions
10 1 Transformation Operators 1.Create non-terminal from a terminal: Introduces a new semantic concept STATE LARGEST STATE CITY LARGEST CITY PLACE LARGEST PLACE LARGEST largest
10 2 Transformation Operators 2.Merge non-terminals: Generalizes productions STATE LARGEST STATE CITY LARGEST CITY PLACE LARGEST PLACE STATE SMALLEST STATE CITY SMALLEST CITY PLACE SMALLEST PLACE Bad productions
10 3 Transformation Operators 2.Merge non-terminals: Generalizes productions STATE LARGEST STATE CITY LARGEST CITY PLACE LARGEST PLACE STATE SMALLEST STATE CITY SMALLEST CITY PLACE SMALLEST PLACE Bad productions QUALIFIER LARGEST QUALIFIER SMALLEST
10 4 Transformation Operators 2.Merge non-terminals: Generalizes productions STATE QUALIFIER STATE CITY QUALIFIER CITY PLACE QUALIFIER PLACE QUALIFIER LARGEST QUALIFIER SMALLEST
10 5 Transformation Operators 3.Combine non-terminals: Combines the concepts CITY SMALLEST MAJOR CITY Bad productions SMALLEST_MAJOR SMALLEST MAJOR LAKE SMALLEST MAJOR LAKE
10 6 Transformation Operators 3.Combine non-terminals: Combines the concepts CITY SMALLEST_MAJOR CITY SMALLEST_MAJOR SMALLEST MAJOR LAKE SMALLEST_MAJOR LAKE
10 7 Transformation Operators 4. Delete production: Eliminates a semantic concept NUM AREA LEFTBR STATE RIGHTBR NUM DENSITY LEFTBR CITY RIGHTBR LEFTBR ( Bad productions RIGHTBR )
10 8 Transformation Operators 4. Delete production: Eliminates a semantic concept NUM AREA ( STATE RIGHTBR NUM DENSITY ( CITY RIGHTBR LEFTBR ( Bad productions RIGHTBR )
10 9 Transformation Operators 4. Delete production: Eliminates a semantic concept NUM AREA ( STATE ) NUM DENSITY ( CITY ) LEFTBR ( Bad productions RIGHTBR )
11 0 MRG Transformation Algorithm A heuristic search is used to find a good MRG among all possible MRGs All possible instances of each type of operator are applied, then the training examples are re-parsed and the semantic parser is re-trained Two iterations were sufficient for convergence of performance
11 1 Results on Geoquery Using Transformation Operators
11 2 Rest of the Dissertation Utilizing More Supervision –Utilize syntactic parses using tree-kernel –Utilize Semantically Augmented Parse Trees [Ge & Mooney, 2005] Not much improvement in performance Meaning representation macros to transform MRG Ensembles of semantic parsers –Simple majority ensemble of KRISP, WASP and SCISSOR achieves the best overall performance
11 3 Outline K RISP: A Supervised Learning System Utilizing Weaker Forms of Supervision –Semi-supervision –Ambiguous supervision Transforming meaning representation grammar Directions for Future Work Conclusions
11 4 Directions for Future Work Improve KRISP’s semantic parsing framework –Do not make independence assumption –Allow words to overlap Will increase complexity of the system Which rivers run through the states bordering Texas? ANSWER answer(RIVER) RIVER TRAVERSE(STATE) TRAVERSE traverse STATE NEXT_TO(STATE) NEXT_TO next_to STATE STATEID STATEID ‘ texas ’
11 5 Directions for Future Work Improve KRISP’s semantic parsing framework –Do not make independence assumption –Allow words to overlap Will increase complexity of the system Better kernels: –Dependency tree kernels –Use word categories or domain-specific word ontology –Noise resistant kernel Learn from perceptual contexts –Combine with a vision-based system to map real-world perceptual contexts into symbolic MRs
11 6 Directions for Future Work contd. Structured Information Extraction Most IE work has focused on extracting single entities or binary relations, e.g. “person”, “company”, “employee-of” Structured IE like extracting complex n-ary relations [McDonald et al., 2005] is more useful in automatically building databases and text mining Level of semantic analysis required is intermediate between normal IE and semantic parsing
11 7 Directions for Future Work contd. Complex relation (person, job, company) NL sentence: John Smith is the CEO of Inc. Corp. MR: (John Smith, CEO, Inc. Corp.) John Smith is the CEO of Inc. Corp. (person, job) (job, company) (person, job,company) KRISP should be applicable to extract complex relations by treating it like higher level production composed of lower level productions.
11 8 Directions for Future Work contd. Broaden the applicability of semantic parsers to open-domain Difficult to construct one MRL for open-domain But a suitable MRL may be constructed by narrowing down the meaning of open-domain natural language based on the actions expected from the computer Will need help from open-domain techniques of word-sense disambiguation, anaphora resolution etc.
11 9 Conclusions A new string-kernel-based approach for learning semantic parsers, more robust to noisy input Extension for semi-supervised semantic parsing to utilize unannotated training data Learns from more general and weaker form of ambiguous supervision Transforms meaning representation grammar to improve semantic parsing In future, scope and applicability of semantic parsing can be broadened
12 0 Thank You! Questions??