Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning Language Semantics from Ambiguous Supervision Rohit J. Kate.

Slides:

Advertisements

Similar presentations

Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.

Advertisements

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Online Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning Semantic Parsers Using Statistical.

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.

1 Natural Language Processing COMPSCI 423/723 Rohit Kate.

1 Learning Language from its Perceptual Context Ray Mooney Department of Computer Sciences University of Texas at Austin Joint work with David Chen Joohyun.

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.

Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.

1 Relational Learning of Pattern-Match Rules for Information Extraction Presentation by Tim Chartrand of A paper bypaper Mary Elaine Califf and Raymond.

111 CS 388: Natural Language Processing: Semantic Parsing Raymond J. Mooney University of Texas at Austin.

C. Varela; Adapted w/permission from S. Haridi and P. Van Roy1 Declarative Computation Model Defining practical programming languages Carlos Varela RPI.

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

ANSWERING CONTROLLED NATURAL LANGUAGE QUERIES USING ANSWER SET PROGRAMMING Syeed Ibn Faiz.

1 Learning to Interpret Natural Language Navigation Instructions from Observation Ray Mooney Department of Computer Science University of Texas at Austin.

1 Sequence Labeling Raymond J. Mooney University of Texas at Austin.

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars Kewei TuVasant Honavar Departments of Statistics and Computer Science University.

Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation

Knowledge and Tree-Edits in Learnable Entailment Proofs Asher Stern, Amnon Lotan, Shachar Mirkin, Eyal Shnarch, Lili Kotlerman, Jonathan Berant and Ido.

1 David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition.

David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition.

The Impact of Grammar Enhancement on Semantic Resources Induction Luca Dini Giampaolo Mazzini

1 Statistical Parsing Chapter 14 October 2012 Lecture #9.

1 Learning for Semantic Parsing Using Statistical Syntactic Parsing Techniques Ruifang Ge Ph.D. Final Defense Supervisor: Raymond J. Mooney Machine Learning.

Learning to Transform Natural to Formal Language Presented by Ping Zhang Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney.

David L. Chen Fast Online Lexicon Learning for Grounded Language Acquisition The 50th Annual Meeting of the Association for Computational Linguistics (ACL)

An Extended GHKM Algorithm for Inducing λ-SCFG Peng Li Tsinghua University.

1 Using Perception to Supervise Language Learning and Language to Supervise Perception Ray Mooney Department of Computer Sciences University of Texas at.

1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa

1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.

David L. Chen and Raymond J. Mooney Department of Computer Science The University of Texas at Austin Learning to Interpret Natural Language Navigation.

1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,

1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing Raymond.

CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.

Copyright R. Weber Machine Learning, Data Mining INFO 629 Dr. R. Weber.

Comparative Study of Name Disambiguation Problem using a Scalable Blocking-based Framework Byung-Won On, Dongwon Lee, Jaewoo Kang, Prasenjit Mitra JCDL.

Artificial Intelligence: Natural Language

FlashNormalize: Programming by Examples for Text Normalization International Joint Conference on Artificial Intelligence, Buenos Aires 7/29/2015FlashNormalize1.

Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning a Compositional Semantic Parser.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Reconnecting Computational Linguistics.

Natural Language Generation with Tree Conditional Random Fields Wei Lu, Hwee Tou Ng, Wee Sun Lee Singapore-MIT Alliance National University of Singapore.

Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore

NTU & MSRA Ming-Feng Tsai

Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing with Kernels under Various Forms of.

David Chen Supervising Professor: Raymond J. Mooney Doctoral Dissertation Proposal December 15, 2009 Learning Language from Perceptual Context 1.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing of Natural.

Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.

NATURAL LANGUAGE PROCESSING

Reinforcement Learning for Mapping Instructions to Actions S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay Computer Science and Artificial.

1 Learning Language from its Perceptual Context Ray Mooney Department of Computer Sciences University of Texas at Austin Joint work with David Chen Rohit.

A Kernel-based Approach to Learning Semantic Parsers

Semantic Parsing for Question Answering

CSC 594 Topics in AI – Natural Language Processing

Using String-Kernels for Learning Semantic Parsers

Learning to Transform Natural to Formal Languages

Integrating Learning of Dialog Strategies and Semantic Parsing

Learning to Parse Database Queries Using Inductive Logic Programming

Learning to Sportscast: A Test of Grounded Language Acquisition

Learning a Policy for Opportunistic Active Learning

Natural Language to SQL(nl2sql)

Presentation transcript:

Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning Language Semantics from Ambiguous Supervision Rohit J. Kate Raymond J. Mooney

2 Semantic Parsing Involves learning language semantics to transform natural language (NL) sentences into computer executable complete meaning representations (MRs) for some application Geoquery: An example database query application Which rivers run through the states bordering Texas? Query answer(traverse(next_to(stateid(‘texas’)))) Semantic Parsing Arkansas, Canadian, Cimarron, Gila, Mississippi, Rio Grande … Answer

3 Learning for Semantic Parsing Learning for semantic parsing consists of inducing a semantic parser from training data which can map novel sentences into their meaning representations Many accurate learning systems for semantic parsing have been recently developed: [ Ge & Mooney, 2005], [Zettlemoyer & Collins, 2005], [Wong & Mooney, 2006], [Kate & Mooney, 2006], [Nguyen, Shimazu & Phan, 2006]

4 Unambiguous Supervision for Learning Semantic Parsers The training data for semantic parsing consists of hundreds of natural language sentences unambiguously paired with their meaning representations

5 Unambiguous Supervision for Learning Semantic Parsers The training data for semantic parsing consists of hundreds of natural language sentences unambiguously paired with their meaning representations Which rivers run through the states bordering Texas? answer(traverse(next_to(stateid(‘texas’)))) What is the lowest point of the state with the largest area? answer(lowest(place(loc(largest_one(area(state(all))))))) What is the largest city in states that border California? answer(largest(city(loc(next_to(stateid( 'california')))))) ……

6 Shortcomings of Unambiguous Supervision It requires considerable human effort to annotate each sentence with its correct meaning representation Does not model the type of supervision children receive when they are learning a language –Children are not taught meanings of individual sentences –They learn to identify the correct meaning of a sentence from several meanings possible in their perceptual context

7 ??? “Mary is on the phone”

8 Ambiguous Supervision for Learning Semantic Parsers A computer system simultaneously exposed to perceptual contexts and natural language utterances should be able to learn the underlying language semantics We consider ambiguous training data of sentences associated with multiple potential meaning representations –Siskind (1996) uses this type “referentially uncertain” training data to learn meanings of words Capturing meaning representations from perceptual contexts is a difficult unsolved problem –Our system directly works with symbolic meaning representations

9 “Mary is on the phone” ???

10 “Mary is on the phone” ???

11 Ironing(Mommy, Shirt) “Mary is on the phone” ???

12 Ironing(Mommy, Shirt) Working(Sister, Computer) “Mary is on the phone” ???

13 Ironing(Mommy, Shirt) Working(Sister, Computer) Carrying(Daddy, Bag) “Mary is on the phone” ???

14 Ironing(Mommy, Shirt) Working(Sister, Computer) Carrying(Daddy, Bag) Talking(Mary, Phone) Sitting(Mary, Chair) “Mary is on the phone” Ambiguous Training Example ???

15 Ironing(Mommy, Shirt) Working(Sister, Computer) Talking(Mary, Phone) Sitting(Mary, Chair) “Mommy is ironing shirt” Next Ambiguous Training Example ???

16 Ambiguous Supervision for Learning Semantic Parsers contd. Our model of ambiguous supervision corresponds to the type of data that will be gathered from a temporal sequence of perceptual contexts with occasional language commentary We assume each sentence has exactly one meaning in a perceptual context Each meaning is associated with at most one sentence in a perceptual context

17 Sample Ambiguous Corpus Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) saw(john, walks(man, dog)) Forms a bipartite graph

18 Rest of the Talk Brief background on KRISP, the semantic parsing learning system for unambiguous supervision KRISPER: Extended system to handle ambiguous supervision Corpus construction Experiments

19 KRISP: Semantic Parser Learner for Unambiguous Supervision KRISP: Kernel-based Robust Interpretation for Semantic Parsing [Kate & Mooney 2006] Takes NL sentences unambiguously paired with their MRs as training data Treats the formal MR language grammar’s productions as semantic concepts Trains an SVM classifier for each production with string subsequence kernel [Lodhi et al. 2002]

20 MR: answer(traverse(next_to(stateid(‘texas’)))) Parse tree of MR: Productions: ANSWER  answer(RIVER) RIVER  TRAVERSE(STATE) STATE  NEXT_TO(STATE) TRAVERSE  traverse NEXT_TO  next_to STATEID  ‘texas’ ANSWER answer STATE RIVER STATE NEXT_TO TRAVERSE STATEID stateid ‘ texas ’ next_to traverse Meaning Representation Language ANSWER  answer(RIVER) RIVER  TRAVERSE(STATE) TRAVERSE  traverse STATE  NEXT_TO(STATE) NEXT_TO  next_to STATE  STATEID STATEID  ‘ texas ’

21 Semantic Parsing by KRISP SVM classifier for each production gives the probability that a substring represents the semantic concept of the production Which rivers run through the states bordering Texas? NEXT_TO  next_to NEXT_TO  next_to 0.95

22 Semantic Parsing by KRISP SVM classifier for each production gives the probability that a substring represents the semantic concept of the production Which rivers run through the states bordering Texas? TRAVERSE  traverse

23 Semantic Parsing by KRISP Semantic parsing is done by finding the most probable derivation of the sentence [Kate & Mooney 2006] Which rivers run through the states bordering Texas? ANSWER  answer(RIVER) RIVER  TRAVERSE(STATE) TRAVERSE  traverse STATE  NEXT_TO(STATE) NEXT_TO  next_to STATE  STATEID STATEID  ‘ texas ’ Probability of the derivation is the product of the probabilities at the nodes.

24 Semantic Parsing by KRISP Given a sentence and a meaning representation, KRISP can also find the probability that it is the correct meaning representation for the sentence

25 K RISPER: K RISP with E M-like R etraining Extension of K RISP that learns from ambiguous supervision Uses an iterative EM-like method to gradually converge on a correct meaning for each sentence

26 KRISPER’s Training Algorithm Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) saw(john, walks(man, dog)) 1. Assume every possible meaning for a sentence is correct

27 KRISPER’s Training Algorithm Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) saw(john, walks(man, dog)) 1. Assume every possible meaning for a sentence is correct

28 KRISPER’s Training Algorithm contd. Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) saw(john, walks(man, dog)) 2. Resulting NL-MR pairs are weighted and given to K RISP 1/2 1/4 1/5 1/3

29 KRISPER’s Training Algorithm contd. Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) saw(john, walks(man, dog)) 3. Estimate the confidence of each NL-MR pair using the resulting parser

30 KRISPER’s Training Algorithm contd. Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) saw(john, walks(man, dog)) 3. Estimate the confidence of each NL-MR pair using the resulting parser

31 KRISPER’s Training Algorithm contd. Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) saw(john, walks(man, dog)) 4. Use maximum weighted matching on a bipartite graph to find the best NL-MR pairs [Munkres, 1957]

32 KRISPER’s Training Algorithm contd. Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) saw(john, walks(man, dog)) 4. Use maximum weighted matching on a bipartite graph to find the best NL-MR pairs [Munkres, 1957]

33 KRISPER’s Training Algorithm contd. Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) saw(john, walks(man, dog)) 5. Give the best pairs to K RISP in the next iteration, continue till converges

34 Corpus Construction There is no real-world ambiguous corpus yet available for semantic parsing to our knowledge We artificially obfuscated the real-world unambiguous corpus by adding extra distracter MRs to each training pair (Ambig-Geoquery) We also created an artificial ambiguous corpus (Ambig-ChildWorld) which more accurately models real-world ambiguities in which potential candidate MRs are often related

35 Ambig-Geoquery Corpus NL MR Start with the unambiguous Geoquery corpus

36 Ambig-Geoquery Corpus NL MR Insert 0 to  random MRs from the corpus between each pair MR

37 Ambig-Geoquery Corpus NL MR Form a window of width from 0 to  in either direction for each NL sentence MR

38 Ambig-Geoquery Corpus NL MR Form the ambiguous corpus MR

39 Ambig-ChildWorld Corpus Although Ambig-Geoquery corpus uses real- world NL-MR pairs, it does not model relatedness between potential MRs for each sentence, common in perceptual contexts Constructed a synchronous grammar [Aho & Ullman, 1972] to simultaneously generate artificial NL-MR pairs Uses 15 verbs and 37 nouns (people, animals, things), MRs are in predicate logic without quantifiers

40 Ambig-ChildWorld Corpus contd. Different perceptual contexts were modeled by choosing subsets of productions of the synchronous grammar This leads to subsets of verbs and nouns (e.g. only Mommy, Daddy, Mary) causing more relatedness among potential MRs For each such perceptual context, data was generated in a way similar to Ambig-Geoquery corpus

41 Ambiguity in Corpora Three levels of ambiguity were created by varying parameters  and  MRs per NL Level 125%50%25% Level 211%22%34%22%11% Level 36%13%19%26%18%12%6%

42 Methodology Performed 10-fold cross validation Metrics: Measured best F-measure across the precision-recall curve obtained using output confidence thresholds

43 Results on Ambig-Geoquery Corpus

44 Results on Ambig-ChildWorld Corpus

45 Future Work Construct a real-world ambiguous corpus and test this approach Combine this system with a vision-based system that extracts MRs from perceptual contexts

46 Conclusions We presented the problem of learning language semantics from ambiguous supervision This form of supervision is more representative of natural training environment for a language learning system We presented an approach that learns from ambiguous supervision by iteratively re-training a system for unambiguous supervision Experimental results on two artificial corpora showed that this approach is able to cope with ambiguities to learn accurate semantic parsers

47 Thank you! Questions??