Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing with Kernels under Various Forms of.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning Semantic Parsers Using Statistical.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
1 Learning Language from its Perceptual Context Ray Mooney Department of Computer Sciences University of Texas at Austin Joint work with David Chen Joohyun.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Machine learning continued Image source:
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Combining Inductive and Analytical Learning Ch 12. in Machine Learning Tom M. Mitchell 고려대학교 자연어처리 연구실 한 경 수
UNIT-III By Mr. M. V. Nikum (B.E.I.T). Programming Language Lexical and Syntactic features of a programming Language are specified by its grammar Language:-
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
111 CS 388: Natural Language Processing: Semantic Parsing Raymond J. Mooney University of Texas at Austin.
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Presented by Zeehasham Rasheed
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning Language Semantics from Ambiguous Supervision Rohit J. Kate.
1 David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition.
David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
1 Learning for Semantic Parsing Using Statistical Syntactic Parsing Techniques Ruifang Ge Ph.D. Final Defense Supervisor: Raymond J. Mooney Machine Learning.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Learning to Transform Natural to Formal Language Presented by Ping Zhang Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney.
1 Using Perception to Supervise Language Learning and Language to Supervise Perception Ray Mooney Department of Computer Sciences University of Texas at.
1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa
A Language Independent Method for Question Classification COLING 2004.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing Raymond.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
CSKGOI'08 Commonsense Knowledge and Goal Oriented Interfaces.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning a Compositional Semantic Parser.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Reconnecting Computational Linguistics.
Natural Language Generation with Tree Conditional Random Fields Wei Lu, Hwee Tou Ng, Wee Sun Lee Singapore-MIT Alliance National University of Singapore.
Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing of Natural.
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
1 Learning Language from its Perceptual Context Ray Mooney Department of Computer Sciences University of Texas at Austin Joint work with David Chen Rohit.
1 Learning Semantic Parsers: An Important But Under-Studied Problem Raymond J. Mooney Dept. of Computer Sciences University of Texas at Austin "The fish.
A Kernel-based Approach to Learning Semantic Parsers
Semi-Supervised Clustering
Semantic Parsing for Question Answering
Using String-Kernels for Learning Semantic Parsers
Learning to Transform Natural to Formal Languages
CSc4730/6730 Scientific Visualization
Learning to Parse Database Queries Using Inductive Logic Programming
Learning to Sportscast: A Test of Grounded Language Acquisition
Chapter 10: Compilers and Language Translation
Presentation transcript:

Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing with Kernels under Various Forms of Supervision Rohit J. Kate Ph.D. Final Defense Supervisor: Raymond J. Mooney

2 Semantic Parsing Semantic Parsing: Transforming natural language (NL) sentences into computer executable complete meaning representations (MRs) for domain-specific applications Requires deeper semantic analysis than other semantic tasks like semantic role labeling, word sense disambiguation, information extraction Example application domains –CLang: Robocup Coach Language –Geoquery: A Database Query Application

3 CLang: RoboCup Coach Language In RoboCup Coach competition teams compete to coach simulated players [ The coaching instructions are given in a formal language called CLang [Chen et al. 2003] Simulated soccer field CLang If the ball is in our goal area then player 1 should intercept it. (bpos (goal-area our) (do our {1} intercept)) Semantic Parsing

4 Geoquery: A Database Query Application Query application for U.S. geography database containing about 800 facts [Zelle & Mooney, 1996] Which rivers run through the states bordering Texas? Query answer(traverse(next_to(stateid(‘texas’)))) Semantic Parsing Arkansas, Canadian, Cimarron, Gila, Mississippi, Rio Grande … Answer

5 Engineering Motivation for Semantic Parsing Most computational language-learning research analyzes open-domain text but the analysis is shallow Realistic semantic parsing currently entails domain dependence Applications of domain-dependent semantic parsing –Natural language interfaces to computing systems –Communication with robots in natural language –Personalized software assistants –Question-answering systems Machine Learning makes developing semantic parsers for specific applications more tractable

6 Cognitive Science Motivation for Semantic Parsing Most natural-language learning methods require supervised training data that is not available to a child –No POS-tagged or treebank data Assuming a child can infer the likely meaning of an utterance from context, NL - MR pairs are more cognitively plausible training data

7 Thesis Contributions A new framework for learning for semantic parsing based on kernel-based string classification –Requires no feature engineering –Does not use any hard-matching rules or any grammar rules for natural language which makes it robust First semi-supervised learning system for semantic parsing Considers learning for semantic parsing under cognitively motivated weaker and more general form of ambiguous supervision Introduces transformations for meaning representation grammars to make them conform better with natural language semantics

8 Outline K RISP: A Semantic Parsing Learning System Utilizing Weaker Forms of Supervision –Semi-supervision –Ambiguous supervision Transforming meaning representation grammar Directions for Future Work Conclusions

9 KRISP: Kernel-based Robust Interpretation for Semantic Parsing [Kate & Mooney, 2006] Learns semantic parser from NL sentences paired with their respective MRs given meaning representation language (MRL) grammar Productions of MRL are treated like semantic concepts SVM classifier with string subsequence kernel is trained for each production to identify if an NL substring represents the semantic concept These classifiers are used to compositionally build MRs of the sentences

10 Overview of KRISP Train string-kernel-based SVM classifiers Semantic Parser Collect positive and negative examples MRL Grammar NL sentences with MRs Novel NL sentences Best MRs Best MRs (correct and incorrect) Training Testing

11 Overview of KRISP Train string-kernel-based SVM classifiers Semantic Parser Collect positive and negative examples MRL Grammar NL sentences with MRs Novel NL sentences Best MRs Best MRs (correct and incorrect) Training Testing

12 MR: answer(traverse(next_to(stateid(‘texas’)))) Parse tree of MR: Productions: ANSWER  answer(RIVER) RIVER  TRAVERSE(STATE) STATE  NEXT_TO(STATE) TRAVERSE  traverse NEXT_TO  next_to STATEID  ‘texas’ ANSWER answer STATE RIVER STATE NEXT_TO TRAVERSE STATEID stateid ‘ texas ’ next_to traverse Meaning Representation Language ANSWER  answer(RIVER) RIVER  TRAVERSE(STATE) TRAVERSE  traverse STATE  NEXT_TO(STATE) NEXT_TO  next_to STATE  STATEID STATEID  ‘ texas ’

13 Semantic Parsing by KRISP SVM classifier for each production gives the probability that a substring represents the semantic concept of the production Which rivers run through the states bordering Texas? NEXT_TO  next_to NEXT_TO  next_to 0.95

14 Semantic Parsing by KRISP SVM classifier for each production gives the probability that a substring represents the semantic concept of the production Which rivers run through the states bordering Texas? TRAVERSE  traverse

15 Semantic Parsing by KRISP Semantic parsing is done by finding the most probable derivation of the sentence [Kate & Mooney 2006] Which rivers run through the states bordering Texas? ANSWER  answer(RIVER) RIVER  TRAVERSE(STATE) TRAVERSE  traverse STATE  NEXT_TO(STATE) NEXT_TO  next_to STATE  STATEID STATEID  ‘ texas ’ Probability of the derivation is the product of the probabilities at the nodes.

16 Overview of KRISP Train string-kernel-based SVM classifiers Semantic Parser Collect positive and negative examples MRL Grammar NL sentences with MRs Novel NL sentences Best MRs Training Testing Classification probabilities Best semantic derivations (correct and incorrect)

17 KRISP’s Training Algorithm Takes NL sentences paired with their respective MRs as input Obtains MR parses Induces the semantic parser and refines it in iterations In the first iteration, for every production: –Call those sentences positives whose MR parses use that production –Call the remaining sentences negatives

18 KRISP’s Training Algorithm contd. STATE  NEXT_TO(STATE) which rivers run through the states bordering texas? what is the most populated state bordering oklahoma ? what is the largest city in states that border california ? … what state has the highest population ? what states does the delaware river run through ? which states have cities named austin ? what is the lowest point of the state with the largest area ? … PositivesNegatives String-kernel-based SVM classifier First Iteration

19 String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” K(s,t) = ?

20 String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = states K(s,t) = 1+?

21 String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = next K(s,t) = 2+?

22 String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = to K(s,t) = 3+?

23 String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = states next K(s,t) = 4+?

24 String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” K(s,t) = 7

25 String Subsequence Kernel contd. The kernel is normalized to remove any bias due to different string lengths Lodhi et al. [2002] give O(n|s||t|) algorithm for computing string subsequence kernel Used for Text Categorization [Lodhi et al, 2002] and Information Extraction [Bunescu & Mooney, 2005]

26 String Subsequence Kernel contd. The examples are implicitly mapped to the feature space of all subsequences and the kernel computes the dot products states bordering states that border states that share border states with area larger than states through which state with the capital of the states next to STATE  NEXT_TO(STATE)

27 Support Vector Machines SVMs find a separating hyperplane such that the margin is maximized the states next to states that are next to Separating hyperplane Probability estimate of an example belonging to a class can be obtained using its distance from the hyperplane [Platt, 1999] states bordering states that border states that share border state with the capital of states with area larger than states through which 0.97 STATE  NEXT_TO(STATE)

28 KRISP’s Training Algorithm contd. STATE  NEXT_TO(STATE) which rivers run through the states bordering texas? what is the most populated state bordering oklahoma ? what is the largest city in states that border california ? … what state has the highest population ? what states does the delaware river run through ? which states have cities named austin ? what is the lowest point of the state with the largest area ? … PositivesNegatives Clasification probabilities String-kernel-based SVM classifier First Iteration

29 Overview of KRISP Train string-kernel-based SVM classifiers Semantic Parser Collect positive and negative examples MRL Grammar NL sentences with MRs Novel NL sentences Best MRs Best semantic derivations (correct and incorrect) Training Testing Classification probabilities

30 Overview of KRISP Train string-kernel-based SVM classifiers Semantic Parser Collect positive and negative examples MRL Grammar NL sentences with MRs Novel NL sentences Best MRs Best semantic derivations (correct and incorrect) Training Testing Classification probabilities

31 KRISP’s Training Algorithm contd. Using these classifiers, obtain the ω best semantic derivations of each training sentence Some of these derivations will give the correct MR, called correct derivations, some will give incorrect MRs, called incorrect derivations For the next iteration, collect positives from most probable correct derivation Collect negatives from incorrect derivations with higher probability than the most probable correct derivation

32 KRISP’s Training Algorithm contd. Most probable correct derivation: Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO(STATE), [5..9]) (STATE  STATEID, [8..9]) (STATEID  ‘ texas ’, [8..9]) (NEXT_TO  next_to, [5..7])

33 KRISP’s Training Algorithm contd. Most probable correct derivation: Collect positive examples Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO(STATE), [5..9]) (STATE  STATEID, [8..9]) (STATEID  ‘ texas ’, [8..9]) (NEXT_TO  next_to, [5..7])

34 KRISP’s Training Algorithm contd. Most probable correct derivation: Collect positive examples Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO(STATE), [5..9]) (STATE  STATEID, [8..9]) (STATEID  ‘ texas ’, [8..9]) (NEXT_TO  next_to, [5..7])

35 KRISP’s Training Algorithm contd. Incorrect derivation with probability greater than the most probable correct derivation: Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..7]) (STATE  STATEID, [8..9]) (STATEID  ‘ texas ’, [8..9]) Incorrect MR: answer(traverse(stateid( ‘ texas ’ )))

36 KRISP’s Training Algorithm contd. Incorrect derivation with probability greater than the most probable correct derivation: Collect negative examples Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..7]) (STATE  STATEID, [8..9]) (STATEID  ‘ texas ’, [8..9]) Incorrect MR: answer(traverse(stateid( ‘ texas ’ )))

37 KRISP’s Training Algorithm contd. STATE  NEXT_TO(STATE) the states bordering texas? state bordering oklahoma ? states that border california ? states which share border next to state of iowa … what state has the highest population ? what states does the delaware river run through ? which states have cities named austin ? what is the lowest point of the state with the largest area ? which rivers run through states bordering … PositivesNegatives Better classification probabilities String-kernel-based SVM classifier Next Iteration: more refined positive and negative examples

38 Overview of KRISP Train string-kernel-based SVM classifiers Semantic Parser Collect positive and negative examples MRL Grammar NL sentences with MRs Novel NL sentences Best MRs Best semantic derivations (correct and incorrect) Training Testing Classification probabilities

39 Experimental Corpora CLang [Kate, Wong & Mooney, 2005] –300 randomly selected pieces of coaching advice from the log files of the 2003 RoboCup Coach Competition –22.52 words on average in NL sentences –13.42 tokens on average in MRs Geoquery [Tang & Mooney, 2001] –880 queries for the given U.S. geography database –7.48 words on average in NL sentences –6.47 tokens on average in MRs

40 Experimental Methodology Evaluated using standard 10-fold cross validation Correctness –CLang: output exactly matches the correct representation –Geoquery: the resulting query retrieves the same answer as the correct representation Metrics

41 Experimental Methodology contd. Compared Systems: –CHILL [Tang & Mooney, 2001]: Inductive Logic Programming based semantic parser –SCISSOR [Ge & Mooney, 2005]: learns an integrated syntactic-semantic parser, needs extra annotations –WASP [Wong & Mooney, 2006]: uses statistical machine translation techniques –Zettlemoyer & Collins (2007): Combinatory Categorial Grammar (CCG) based semantic parser Different Experimental Setup (600 training, 280 testing examples) Requires an initial hand-built lexicon

42 Experimental Methodology contd. KRISP gives probabilities for its semantic derivation which are taken as confidences of the MRs We plot precision-recall curves by first sorting the best MR for each sentence by confidences and then finding precision for every recall value WASP and SCISSOR also output confidences so we show their precision-recall curves Results of other systems shown as points on precision-recall graphs

43 Results on CLang CHILL gives 49.2% precision and 12.67% recall with 160 examples, can ’ t run beyond. requires more annotation on the training corpus

44 Results on Geoquery

45 Robustness of K RISP K RISP does not use grammar rules for natural language String-kernel-based classification softly captures wide range of natural language expressions  Robust to rephrasing and noise

46 Robustness of K RISP K RISP does not use grammar rules for natural language String-kernel-based classification softly captures wide range of natural language expressions  Robust to rephrasing and noise Which rivers run through the states bordering Texas? TRAVERSE  traverse 0.95

47 K RISP does not use grammar rules for natural language String-kernel-based classification softly captures wide range of natural language expressions  Robust to rephrasing and noise Robustness of KRISP Which are the rivers that run through the states bordering Texas? TRAVERSE  traverse 0.78

48 K RISP does not use grammar rules for natural language String-kernel-based classification softly captures wide range of natural language expressions  Robust to rephrasing and noise Robustness of KRISP Which rivers run though the states bordering Texas? TRAVERSE  traverse 0.68

49 K RISP does not use grammar rules for natural language String-kernel-based classification softly captures wide range of natural language expressions  Robust to rephrasing and noise Robustness of KRISP Which rivers through the states bordering Texas? TRAVERSE  traverse 0.65

50 Robustness of KRISP Which rivers ahh.. run through the states bordering Texas? TRAVERSE  traverse 0.81 K RISP does not use grammar rules for natural language String-kernel-based classification softly captures wide range of natural language expressions  Robust to rephrasing and noise

51 Experiments with Noisy NL Sentences Any application of semantic parser is likely to face noise in the input If the input is coming from a speech recognizer: –Interjections (um’s and ah’s) –Environment noise (door slams, phone rings etc.) –Out-of-domain words, ill-formed utterances etc. We demonstrate robustness of K RISP by introducing simulated speech recognition errors in the corpus

52 Experiments with Noisy NL Sentences contd. Noise was introduced in the NL sentences by: –Adding extra words chosen according to their frequencies in the BNC –Dropping words randomly –Substituting words withphonetically close high frequency words Four levels of noise was created by increasing the probabilities of the above Results shown when only test sentences are corrupted, qualitatively similar results when both test and train sentences are corrupted We show best F-measures (harmonic mean of precision and recall)

53 Results on Noisy CLang Corpus

54 Outline K RISP: A Supervised Learning System Utilizing Weaker Forms of Supervision –Semi-supervision –Ambiguous supervision Transforming meaning representation grammar Directions for Future Work Conclusions

55 Semi-Supervised Semantic Parsing Building annotated training data is expensive Utilize NL sentences not annotated with their MRs, usually cheaply available KRISP can be turned into a semi-supervised learner if the SVM classifiers are given appropriate unlabeled examples Which substrings should be the unlabeled examples for which productions’ SVMs?

56 SEMISUP-KRISP: Semi-Supervised Semantic Parser Learner [Kate & Mooney, 2007a] First learns a semantic parser from the supervised data using KRISP

57 SEMISUP-KRISP: Semi-Supervised Semantic Parser Learner contd. Which rivers run through the states bordering Texas? answer(traverse(next_to(stateid( ‘ texas ’ )))) What is the lowest point of the state with the largest area? answer(lowest(place(loc(largest_one(area(state(all))))))) What is the largest city in states that border California? answer(largest(city(loc(next_to(stateid( 'california')))))) …… Which states have a city named Springfield? What is the capital of the most populous state? How many rivers flow through Mississippi? How many states does the Mississippi run through? How high is the highest point in the smallest state? Which rivers flow through the states that border California? ……. Supervised Corpus SVM classifiers Collect labeled examples Semantic Parsing KRISP Unsupervised Corpus

58 SEMISUP-KRISP: Semi-Supervised Semantic Parser Learner First learns a semantic parser from the supervised data using KRISP Applies the learned parser on the unsupervised NL sentences Whenever an SVM classifier is called to estimate the probability of a substring, that substring becomes an unlabeled example for that classifier These substrings are representative of examples that the classifiers will encounter during testing Which rivers run through the states bordering Texas? NEXT_TO  next_toTRAVERSE  traverse

59 SVMs with Unlabeled Examples Separating hyperplane - the states next to states that are next to the states bordering states that border states that share border state with the capital of area larger than through which Production: NEXT_TO  next_to

60 SVMs with Unlabeled Examples Using unlabeled test examples during training can help find a better hyperplane [Joachims 1999] Production: NEXT_TO  next_to

61 Transductive SVMs contd. Find a labeling that separates all the examples with maximum margin Finding the exact solution is intractable but approximation algorithms exist [Joachims 1999], [Chen et al. 2003], [Collobert et al. 2006]

62 SEMISUP-KRISP: Semi-Supervised Semantic Parser Learner contd. Which rivers run through the states bordering Texas? answer(traverse(next_to(stateid( ‘ texas ’ )))) What is the lowest point of the state with the largest area? answer(lowest(place(loc(largest_one(area(state(all))))))) What is the largest city in states that border California? answer(largest(city(loc(next_to(stateid( 'california')))))) …… Which states have a city named Springfield? What is the capital of the most populous state? How many rivers flow through Mississippi? How many states does the Mississippi run through? How high is the highest point in the smallest state? Which rivers flow through the states that border California? ……. Supervised Corpus Unsupervised Corpus SVM classifiers Collect labeled examples Semantic Parsing Collect unlabeled examples Semantic Parsing Learned Semantic parser Transductive

63 Experiments Compared the performance of SEMISUP-KRISP and KRISP on the Geoquery domain Corpus contains 250 NL sentences annotated with their correct MRs Collected 1037 unannotated sentences from our web-based demo Evaluated by 10-fold cross validation keeping the unsupervised data same in each fold Increased the amount of supervised training data and measured the best F-measure

64 Results

65 Results 25% saving GEOBASE: Hand-built semantic parser [Borland International, 1988]

66 Outline K RISP: A Supervised Learning System Utilizing Weaker Forms of Supervision –Semi-supervision –Ambiguous supervision Transforming meaning representation grammar Directions for Future Work Conclusions

67 Unambiguous Supervision for Learning Semantic Parsers The training data for semantic parsing consists of hundreds of natural language sentences unambiguously paired with their meaning representations

68 Unambiguous Supervision for Learning Semantic Parsers The training data for semantic parsing consists of hundreds of natural language sentences unambiguously paired with their meaning representations Which rivers run through the states bordering Texas? answer(traverse(next_to(stateid(‘texas’)))) What is the lowest point of the state with the largest area? answer(lowest(place(loc(largest_one(area(state(all))))))) What is the largest city in states that border California? answer(largest(city(loc(next_to(stateid( 'california')))))) ……

69 Shortcomings of Unambiguous Supervision It requires considerable human effort to annotate each sentence with its correct meaning representation Does not model the type of supervision children receive when they are learning a language –Children are not taught meanings of individual sentences –They learn to identify the correct meaning of a sentence from several meanings possible in their perceptual context

70 ??? “Mary is on the phone”

71 Ambiguous Supervision for Learning Semantic Parsers A computer system simultaneously exposed to perceptual contexts and natural language utterances should be able to learn the underlying language semantics We consider ambiguous training data of sentences associated with multiple potential meaning representations –Siskind (1996) uses this type “referentially uncertain” training data to learn meanings of words Capturing meaning representations from perceptual contexts is a difficult unsolved problem –Our system directly works with symbolic meaning representations

72 “Mary is on the phone” ???

73 “Mary is on the phone” ???

74 Ironing(Mommy, Shirt) “Mary is on the phone” ???

75 Ironing(Mommy, Shirt) Working(Sister, Computer) “Mary is on the phone” ???

76 Ironing(Mommy, Shirt) Working(Sister, Computer) Carrying(Daddy, Bag) “Mary is on the phone” ???

77 Ironing(Mommy, Shirt) Working(Sister, Computer) Carrying(Daddy, Bag) Talking(Mary, Phone) Sitting(Mary, Chair) “Mary is on the phone” Ambiguous Training Example ???

78 Ironing(Mommy, Shirt) Working(Sister, Computer) Talking(Mary, Phone) Sitting(Mary, Chair) “Mommy is ironing shirt” Next Ambiguous Training Example ???

79 Ambiguous Supervision for Learning Semantic Parsers contd. Our model of ambiguous supervision corresponds to the type of data that will be gathered from a temporal sequence of perceptual contexts with occasional language commentary We assume each sentence has exactly one meaning in a perceptual context Each meaning is associated with at most one sentence in a perceptual context

80 Sample Ambiguous Corpus Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) saw(john, walks(man, dog)) Forms a bipartite graph

81 K RISPER: K RISP with E M-like R etraining Extension of K RISP that learns from ambiguous supervision Uses an iterative EM-like method to gradually converge on a correct meaning for each sentence Given a sentence and a meaning representation, KRISP can also find the probability that it is the correct meaning representation for the sentence

82 K RISPER’s Training Algorithm Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) saw(john, walks(man, dog)) 1. Assume every possible meaning for a sentence is correct

83 K RISPER’s Training Algorithm Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) saw(john, walks(man, dog)) 1. Assume every possible meaning for a sentence is correct

84 K RISPER’s Training Algorithm contd. Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) saw(john, walks(man, dog)) 2. Resulting NL-MR pairs are weighted and given to K RISP 1/2 1/4 1/5 1/3

85 K RISPER’s Training Algorithm contd. Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) saw(john, walks(man, dog)) 3. Estimate the confidence of each NL-MR pair using the resulting parser

86 K RISPER’s Training Algorithm contd. Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) saw(john, walks(man, dog)) 3. Estimate the confidence of each NL-MR pair using the resulting parser

87 K RISPER’s Training Algorithm contd. Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) saw(john, walks(man, dog)) 4. Use maximum weighted matching on a bipartite graph to find the best NL-MR pairs [Munkres, 1957]

88 K RISPER’s Training Algorithm contd. Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) saw(john, walks(man, dog)) 4. Use maximum weighted matching on a bipartite graph to find the best NL-MR pairs [Munkres, 1957]

89 K RISPER’s Training Algorithm contd. Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) saw(john, walks(man, dog)) 5. Give the best pairs to K RISP in the next iteration, continue till converges

90 Ambiguous Corpus Construction There is no real-world ambiguous corpus yet available for semantic parsing to our knowledge We artificially obfuscated the real-world unambiguous corpus by adding extra distracter MRs to each training pair (Ambig-Geoquery) We also created an artificial ambiguous corpus (Ambig-ChildWorld) which more accurately models real-world ambiguities in which potential candidate MRs are often related

91 Ambiguity in Corpora Three levels of ambiguity were created: MRs per NL Level 125%50%25% Level 211%22%34%22%11% Level 36%13%19%26%18%12%6%

92 Results on Ambig-Geoquery Corpus

93 Results on Ambig-ChildWorld Corpus

94 Outline K RISP: A Supervised Learning System Utilizing Weaker Forms of Supervision –Semi-supervision –Ambiguous supervision Transforming meaning representation grammar Directions for Future Work Conclusions

95 Why Transform Meaning Representation Grammar? Productions of meaning representation grammar (MRG) may not correspond well with NL semantics REGION  (rec POINT POINT) POINT  (pt NUM NUM) NUM  -32NUM  -35 NUM  0 NUM  35 “ our midfield ” ???? CLang MR expression: (rec (pt ) (pt 0 35) )

96 Why Transform Meaning Representation Grammar? Geoquery MR: answer(longest(river(loc_2(stateid( ‘ Texas ’ ))))) Which is the longest river in Texas? ANSWER  answer ( RIVER ) RIVER  longest ( RIVER ) RIVER  river ( LOCATIONS ) LOCATIONS  loc_2 ( STATE ) STATE  STATEID STATEID  stateid ( ‘ Texas ’ ) Productions of meaning representation grammar (MRG) may not correspond well with NL semantics

97 Why Transform Meaning Representation Grammar? Geoquery MR: answer(longest(river(loc_2(stateid( ‘ Texas ’ ))))) Which is the longest river in Texas? ANSWER  answer ( RIVER ) RIVER  longest ( RIVER ) RIVER  river ( LOCATIONS ) LOCATIONS  loc_2 ( STATE ) STATE  STATEID STATEID  stateid ( ‘ Texas ’ ) Productions of meaning representation grammar (MRG) may not correspond well with NL semantics

98 Manual Engineering of MRG Several awkward constructs from the original CLang grammar were manually replaced with NL compatible MR expressions MRG for Geoquery was manually constructed for its functional MRL which was derived from the original Prolog expressions Requires expertise in MRL and domain knowledge Automatically transform MRG to improve semantic parsing

99 Transforming Meaning Representation Grammar Train K RISP using the given MRG and parse the training sentences Collect “bad” productions which K RISP often uses incorrectly (its output MR parses use them but the correct MR parses do not, or vice versa) Modify these productions using four Context-Free Grammar transformation operators The transformed MRG accepts the same MRL as the original MRG

10 0 Transformation Operators 1.Create non-terminal from a terminal: Introduces a new semantic concept STATE  largest STATE CITY  largest CITY PLACE  largest PLACE LARGEST  largest Bad productions

10 1 Transformation Operators 1.Create non-terminal from a terminal: Introduces a new semantic concept STATE  LARGEST STATE CITY  LARGEST CITY PLACE  LARGEST PLACE LARGEST  largest

10 2 Transformation Operators 2.Merge non-terminals: Generalizes productions STATE  LARGEST STATE CITY  LARGEST CITY PLACE  LARGEST PLACE STATE  SMALLEST STATE CITY  SMALLEST CITY PLACE  SMALLEST PLACE Bad productions

10 3 Transformation Operators 2.Merge non-terminals: Generalizes productions STATE  LARGEST STATE CITY  LARGEST CITY PLACE  LARGEST PLACE STATE  SMALLEST STATE CITY  SMALLEST CITY PLACE  SMALLEST PLACE Bad productions QUALIFIER  LARGEST QUALIFIER  SMALLEST

10 4 Transformation Operators 2.Merge non-terminals: Generalizes productions STATE  QUALIFIER STATE CITY  QUALIFIER CITY PLACE  QUALIFIER PLACE QUALIFIER  LARGEST QUALIFIER  SMALLEST

10 5 Transformation Operators 3.Combine non-terminals: Combines the concepts CITY  SMALLEST MAJOR CITY Bad productions SMALLEST_MAJOR  SMALLEST MAJOR LAKE  SMALLEST MAJOR LAKE

10 6 Transformation Operators 3.Combine non-terminals: Combines the concepts CITY  SMALLEST_MAJOR CITY SMALLEST_MAJOR  SMALLEST MAJOR LAKE  SMALLEST_MAJOR LAKE

10 7 Transformation Operators 4. Delete production: Eliminates a semantic concept NUM  AREA LEFTBR STATE RIGHTBR NUM  DENSITY LEFTBR CITY RIGHTBR LEFTBR  ( Bad productions RIGHTBR  )

10 8 Transformation Operators 4. Delete production: Eliminates a semantic concept NUM  AREA ( STATE RIGHTBR NUM  DENSITY ( CITY RIGHTBR LEFTBR  ( Bad productions RIGHTBR  )

10 9 Transformation Operators 4. Delete production: Eliminates a semantic concept NUM  AREA ( STATE ) NUM  DENSITY ( CITY ) LEFTBR  ( Bad productions RIGHTBR  )

11 0 MRG Transformation Algorithm A heuristic search is used to find a good MRG among all possible MRGs All possible instances of each type of operator are applied, then the training examples are re-parsed and the semantic parser is re-trained Two iterations were sufficient for convergence of performance

11 1 Results on Geoquery Using Transformation Operators

11 2 Rest of the Dissertation Utilizing More Supervision –Utilize syntactic parses using tree-kernel –Utilize Semantically Augmented Parse Trees [Ge & Mooney, 2005] Not much improvement in performance Meaning representation macros to transform MRG Ensembles of semantic parsers –Simple majority ensemble of KRISP, WASP and SCISSOR achieves the best overall performance

11 3 Outline K RISP: A Supervised Learning System Utilizing Weaker Forms of Supervision –Semi-supervision –Ambiguous supervision Transforming meaning representation grammar Directions for Future Work Conclusions

11 4 Directions for Future Work Improve KRISP’s semantic parsing framework –Do not make independence assumption –Allow words to overlap Will increase complexity of the system Which rivers run through the states bordering Texas? ANSWER  answer(RIVER) RIVER  TRAVERSE(STATE) TRAVERSE  traverse STATE  NEXT_TO(STATE) NEXT_TO  next_to STATE  STATEID STATEID  ‘ texas ’

11 5 Directions for Future Work Improve KRISP’s semantic parsing framework –Do not make independence assumption –Allow words to overlap Will increase complexity of the system Better kernels: –Dependency tree kernels –Use word categories or domain-specific word ontology –Noise resistant kernel Learn from perceptual contexts –Combine with a vision-based system to map real-world perceptual contexts into symbolic MRs

11 6 Directions for Future Work contd. Structured Information Extraction Most IE work has focused on extracting single entities or binary relations, e.g. “person”, “company”, “employee-of” Structured IE like extracting complex n-ary relations [McDonald et al., 2005] is more useful in automatically building databases and text mining Level of semantic analysis required is intermediate between normal IE and semantic parsing

11 7 Directions for Future Work contd. Complex relation (person, job, company) NL sentence: John Smith is the CEO of Inc. Corp. MR: (John Smith, CEO, Inc. Corp.) John Smith is the CEO of Inc. Corp. (person, job) (job, company) (person, job,company) KRISP should be applicable to extract complex relations by treating it like higher level production composed of lower level productions.

11 8 Directions for Future Work contd. Broaden the applicability of semantic parsers to open-domain Difficult to construct one MRL for open-domain But a suitable MRL may be constructed by narrowing down the meaning of open-domain natural language based on the actions expected from the computer Will need help from open-domain techniques of word-sense disambiguation, anaphora resolution etc.

11 9 Conclusions A new string-kernel-based approach for learning semantic parsers, more robust to noisy input Extension for semi-supervised semantic parsing to utilize unannotated training data Learns from more general and weaker form of ambiguous supervision Transforms meaning representation grammar to improve semantic parsing In future, scope and applicability of semantic parsing can be broadened

12 0 Thank You! Questions??