Learning to Transform Natural to Formal Languages

Slides:



Advertisements
Similar presentations
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning Semantic Parsers Using Statistical.
Advertisements

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
1 Natural Language Processing COMPSCI 423/723 Rohit Kate.
Partial Prebracketing to Improve Parser Performance John Judge NCLT Seminar Series 7 th December 2005.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Football Quiz Game What You Know About The Beautiful Game Of Football Group Members : Ricardo Davis Jermaine Thompson Sanjay Douglas.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
1 Data-Driven Dependency Parsing. 2 Background: Natural Language Parsing Syntactic analysis String to (tree) structure He likes fish S NP VP NP VNPrn.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning Language Semantics from Ambiguous Supervision Rohit J. Kate.
1 Learning for Semantic Parsing Using Statistical Syntactic Parsing Techniques Ruifang Ge Ph.D. Final Defense Supervisor: Raymond J. Mooney Machine Learning.
Learning to Transform Natural to Formal Language Presented by Ping Zhang Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney.
1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa
Chapter 6 Programming Languages (2) Introduction to CS 1 st Semester, 2015 Sanghyun Park.
1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.
Soccer. Overview History Gameplay Techniques Field & Positions.
1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing Raymond.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning a Compositional Semantic Parser.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Supertagging CMSC Natural Language Processing January 31, 2006.
Natural Language Generation with Tree Conditional Random Fields Wei Lu, Hwee Tou Ng, Wee Sun Lee Singapore-MIT Alliance National University of Singapore.
Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing of Natural.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing of Natural.
1 Learning Semantic Parsers: An Important But Under-Studied Problem Raymond J. Mooney Dept. of Computer Sciences University of Texas at Austin "The fish.
Introduction to Parsing
Lecture 7: Constrained Conditional Models
A Kernel-based Approach to Learning Semantic Parsers
CSC 594 Topics in AI – Natural Language Processing
Approaches to Machine Translation
Programming Languages Translator
Overview of Compilation The Compiler Front End
Overview of Compilation The Compiler Front End
CRF &SVM in Medication Extraction
Authorship Attribution Using Probabilistic Context-Free Grammars
Semantic Parsing for Question Answering
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Syntax Analysis Chapter 4.
Natural Language Processing (NLP)
Using String-Kernels for Learning Semantic Parsers
CS416 Compiler Design lec00-outline September 19, 2018
Joohyun Kim Supervising Professor: Raymond J. Mooney
Syntax Analysis Sections :.
LING 581: Advanced Computational Linguistics
Compiler Design 4. Language Grammars
Introduction CI612 Compiler Design CI612 Compiler Design.
CS 388: Natural Language Processing: Syntactic Parsing
CS 388: Natural Language Processing: Semantic Parsing
Lecture 7: Introduction to Parsing (Syntax Analysis)
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Programming Language Syntax 5
Learning to Parse Database Queries Using Inductive Logic Programming
Learning to Sportscast: A Test of Grounded Language Acquisition
Approaches to Machine Translation
Chunk Parsing CS1573: AI Application Development, Spring 2003
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
CS416 Compiler Design lec00-outline February 23, 2019
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
CS246: Information Retrieval
Natural Language Processing (NLP)
Chapter 10: Compilers and Language Translation
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
David Kauchak CS159 – Spring 2019
Extracting Why Text Segment from Web Based on Grammar-gram
Natural Language Processing (NLP)
Presentation transcript:

Learning to Transform Natural to Formal Languages Rohit J. Kate Yuk Wah Wong Raymond J. Mooney July 13, 2005

Introduction Semantic Parsing: Transforming natural language sentences into executable complete formal representations Different from Semantic Role Labeling which involves only shallow semantic analysis Two application domains: CLang: RoboCup Coach Language GeoQuery: A Database Query Application

CLang: RoboCup Coach Language In RoboCup Coach competition teams compete to coach simulated players The coaching instructions are given in a formal language called CLang If the ball is in our penalty area, then all our players except player 4 should stay in our half. Simulated soccer field Coach Semantic Parsing ((bpos (penalty-area our)) (do (player-except our{4}) (pos (half our))) CLang

GeoQuery: A Database Query Application Query application for U.S. geography database containing about 800 facts [Zelle & Mooney, 1996] How many cities are there in the US? User Semantic Parsing answer(A, count(B, (city(B), loc(B, C), const(C, countryid(USA))),A)) Query

Outline Semantic Parsing using Transformation Rules Learning Transformation Rules Experiments Conclusions

Semantic Parsing using Transformation Rules SILT (Semantic Interpretation by Learning Transformations) Uses pattern-based transformation rules which map natural language phrases to formal language constructs Transformation rules are repeatedly applied to the sentence to construct its formal language expression

Formal Language Grammar NL: If our player 4 has the ball, our player 4 should shoot. CLang: ((bowner our {4}) (do our {4} shoot)) CLang Parse: Non-terminals: RULE, CONDITION, ACTION… Terminals: bowner, our, 4… Productions: RULE  CONDITION DIRECTIVE DIRECTIVE  do TEAM UNUM ACTION ACTION  shoot RULE CONDITION DIRECTIVE do TEAM UNUM ACTION bowner our 4 shoot

Transformation Rule Representation Rule has two components: a natural language pattern and an associated formal language template Two versions of SILT: String-based rules: used to convert natural language sentence directly to formal language Tree-based rules: used to convert syntactic tree to formal language word gap String-pattern TEAM UNUM has [1] ball Template CONDITION  (bowner TEAM {UNUM}) Tree-pattern Template CONDITION  (bowner TEAM {UNUM}) NP VP VBZ DT NN the ball has TEAM UNUM S

Example of Semantic Parsing If our player 4 has the ball, our player 4 should shoot. our TEAM  our player 4 UNUM  4 shoot ACTIONshoot TEAM UNUM has [1] ball CONDITION  (bowner TEAM {UNUM}) If CONDITION, DIRECTIVE. RULE  (CONDITION DIRECTIVE) TEAM UNUM should ACTION DIRECTIVE  (do TEAM {UNUM} ACTION)

Example of Semantic Parsing If player 4 has the ball, player 4 should shoot . TEAM our our TEAM our our our TEAM  our player 4 UNUM  4 shoot ACTIONshoot TEAM UNUM has [1] ball CONDITION  (bowner TEAM {UNUM}) If CONDITION, DIRECTIVE. RULE  (CONDITION DIRECTIVE) TEAM UNUM should ACTION DIRECTIVE  (do TEAM {UNUM} ACTION)

Example of Semantic Parsing If player 4 has the ball, player 4 should shoot . TEAM our TEAM our our TEAM  our player 4 UNUM  4 shoot ACTIONshoot TEAM UNUM has [1] ball CONDITION  (bowner TEAM {UNUM}) If CONDITION, DIRECTIVE. RULE  (CONDITION DIRECTIVE) TEAM UNUM should ACTION DIRECTIVE  (do TEAM {UNUM} ACTION)

Example of Semantic Parsing If has the ball, should shoot . TEAM our player 4 UNUM 4 TEAM our player 4 UNUM 4 our TEAM  our player 4 UNUM  4 shoot ACTIONshoot TEAM UNUM has [1] ball CONDITION  (bowner TEAM {UNUM}) If CONDITION, DIRECTIVE. RULE  (CONDITION DIRECTIVE) TEAM UNUM should ACTION DIRECTIVE  (do TEAM {UNUM} ACTION)

Example of Semantic Parsing If has the ball, should shoot . TEAM our UNUM 4 TEAM our UNUM 4 our TEAM  our player 4 UNUM  4 shoot ACTIONshoot TEAM UNUM has [1] ball CONDITION  (bowner TEAM {UNUM}) If CONDITION, DIRECTIVE. RULE  (CONDITION DIRECTIVE) TEAM UNUM should ACTION DIRECTIVE  (do TEAM {UNUM} ACTION)

Example of Semantic Parsing If has the ball, should . TEAM our UNUM 4 TEAM our UNUM 4 ACTION shoot shoot our TEAM  our player 4 UNUM  4 shoot ACTIONshoot TEAM UNUM has [1] ball CONDITION  (bowner TEAM {UNUM}) If CONDITION, DIRECTIVE. RULE  (CONDITION DIRECTIVE) TEAM UNUM should ACTION DIRECTIVE  (do TEAM {UNUM} ACTION)

Example of Semantic Parsing If has the ball, should . TEAM our UNUM 4 TEAM our UNUM 4 ACTION shoot our TEAM  our player 4 UNUM  4 shoot ACTIONshoot TEAM UNUM has [1] ball CONDITION  (bowner TEAM {UNUM}) If CONDITION, DIRECTIVE. RULE  (CONDITION DIRECTIVE) TEAM UNUM should ACTION DIRECTIVE  (do TEAM {UNUM} ACTION)

Example of Semantic Parsing If , should . TEAM our CONDITION (bowner our {4}) UNUM 4 has the ball TEAM our UNUM 4 ACTION shoot our TEAM  our player 4 UNUM  4 shoot ACTIONshoot TEAM UNUM has [1] ball CONDITION  (bowner TEAM {UNUM}) If CONDITION, DIRECTIVE. RULE  (CONDITION DIRECTIVE) TEAM UNUM should ACTION DIRECTIVE  (do TEAM {UNUM} ACTION)

Example of Semantic Parsing If , should . CONDITION (bowner our {4}) TEAM our UNUM 4 ACTION shoot our TEAM  our player 4 UNUM  4 shoot ACTIONshoot TEAM UNUM has [1] ball CONDITION  (bowner TEAM {UNUM}) If CONDITION, DIRECTIVE. RULE  (CONDITION DIRECTIVE) TEAM UNUM should ACTION DIRECTIVE  (do TEAM {UNUM} ACTION)

Example of Semantic Parsing If , . CONDITION (bowner our {4}) TEAM our DIRECTIVE (do our {4} shoot) UNUM 4 should ACTION shoot our TEAM  our player 4 UNUM  4 shoot ACTIONshoot TEAM UNUM has [1] ball CONDITION  (bowner TEAM {UNUM}) If CONDITION, DIRECTIVE. RULE  (CONDITION DIRECTIVE) TEAM UNUM should ACTION DIRECTIVE  (do TEAM {UNUM} ACTION)

Example of Semantic Parsing If , . CONDITION (bowner our {4}) DIRECTIVE (do our {4} shoot) our TEAM  our player 4 UNUM  4 shoot ACTIONshoot TEAM UNUM has [1] ball CONDITION  (bowner TEAM {UNUM}) If CONDITION, DIRECTIVE. RULE  (CONDITION DIRECTIVE) TEAM UNUM should ACTION DIRECTIVE  (do TEAM {UNUM} ACTION)

Example of Semantic Parsing If , . CONDITION (bowner our {4}) RULE ((bowner our {4}) (do our {4} shoot)) DIRECTIVE (do our {4} shoot) our TEAM  our player 4 UNUM  4 shoot ACTIONshoot TEAM UNUM has [1] ball CONDITION  (bowner TEAM {UNUM}) If CONDITION, DIRECTIVE. RULE  (CONDITION DIRECTIVE) TEAM UNUM should ACTION DIRECTIVE  (do TEAM {UNUM} ACTION)

Learning Transformation Rules SILT induces rules from a corpora of NL sentences paired with their formal representations Patterns are learned for each production by bottom-up rule learning For every production: Call those sentences positives whose formal representations’ parses use that production Call the remaining sentences negatives

Rule Learning for a Production SILT applies greedy-covering, bottom-up rule induction method that repeatedly generalizes positives until they start covering negatives CONDITION (bpos REGION) positives negatives The ball is in REGION , our player 7 is in REGION and no opponent is around our player 7 within 1.5 distance. If the ball is in REGION and not in REGION then player 3 should intercept the ball. During normal play if the ball is in the REGION then player 7 , 9 and 11 should dribble the ball to the REGION . When the play mode is normal and the ball is in the REGION then our player 2 should pass the ball to the REGION . All players except the goalie should pass the ball to REGION if it is in RP18. If the ball is inside rectangle ( -54 , -36 , 0 , 36 ) then player 10 should position itself at REGION with a ball attraction of REGION . Player 2 should pass the ball to REGION if it is in REGION . If our player 6 has the ball then he should take a shot on goal. If player 4 has the ball , it should pass the ball to player 2 or 10. If the condition DR5C3 is true , then player 2 , 3 , 7 and 8 should pass the ball to player 3. During play on , if players 6 , 7 or 8 is in REGION , they should pass the ball to players 9 , 10 or 11. If "Clear_Condition" , players 2 , 3 , 7 or 5 should clear the ball REGION . If it is before the kick off , after our goal or after the opponent's goal , position player 3 at REGION . If the condition MDR4C9 is met , then players 4-6 should pass the ball to player 9. If Pass_11 then player 11 should pass to player 9 and no one else.

Generalization of String Patterns ACTION  (pos REGION) Pattern 1: Always position player UNUM at REGION . Pattern 2: Whenever the ball is in REGION, position player UNUM near the REGION . Find the highest scoring common subsequence:

Generalization of String Patterns ACTION  (pos REGION) Pattern 1: Always position player UNUM at REGION . Pattern 2: Whenever the ball is in REGION, position player UNUM near the REGION . Find the highest scoring common subsequence: Generalization: position player UNUM [2] REGION .

Generalization of Tree Patterns REGION  (penalty-area TEAM) Pattern 1: Pattern 2 Find common subgraphs. NP NP PRP$ NN NN NP NN NN TEAM penalty area TEAM POS penalty box ’s

Generalization of Tree Patterns REGION  (penalty-area TEAM) Pattern 1: Pattern 2 Find common subgraphs. NP NP PRP$ NN NN NP NN NN TEAM penalty area TEAM POS penalty box ’s NP * NN NN Generalization: TEAM penalty

Rule Learning for a Production CONDITION  (bpos REGION) positives negatives The ball is in REGION , our player 7 is in REGION and no opponent is around our player 7 within 1.5 distance. If the ball is in REGION and not in REGION then player 3 should intercept the ball. During normal play if the ball is in the REGION then player 7 , 9 and 11 should dribble the ball to the REGION . When the play mode is normal and the ball is in the REGION then our player 2 should pass the ball to the REGION . All players except the goalie should pass the ball to REGION if it is in REGION. If the ball is inside REGION then player 10 should position itself at REGION with a ball attraction of REGION . Player 2 should pass the ball to REGION if it is in REGION . If our player 6 has the ball then he should take a shot on goal. If player 4 has the ball , it should pass the ball to player 2 or 10. If the condition DR5C3 is true , then player 2 , 3 , 7 and 8 should pass the ball to player 3. During play on , if players 6 , 7 or 8 is in REGION , they should pass the ball to players 9 , 10 or 11. If "Clear_Condition" , players 2 , 3 , 7 or 5 should clear the ball REGION . If it is before the kick off , after our goal or after the opponent's goal , position player 3 at REGION . If the condition MDR4C9 is met , then players 4-6 should pass the ball to player 9. If Pass_11 then player 11 should pass to player 9 and no one else. Bottom-up Rule Learner ball is [2] REGION CONDITION  (bpos REGION) it is in REGION CONDITION  (bpos REGION)

Rule Learning for a Production CONDITION  (bpos REGION) positives negatives The CONDITION , our player 7 is in REGION and no opponent is around our player 7 within 1.5 distance. If the CONDITION and not in REGION then player 3 should intercept the ball. During normal play if the CONDITION then player 7 , 9 and 11 should dribble the ball to the REGION . When the play mode is normal and the CONDITION then our player 2 should pass the ball to the REGION . All players except the goalie should pass the ball to REGION if CONDITION. If the CONDITION then player 10 should position itself at REGION with a ball attraction of REGION . Player 2 should pass the ball to REGION if CONDITION . If our player 6 has the ball then he should take a shot on goal. If player 4 has the ball , it should pass the ball to player 2 or 10. If the condition DR5C3 is true , then player 2 , 3 , 7 and 8 should pass the ball to player 3. During play on , if players 6 , 7 or 8 is in REGION , they should pass the ball to players 9 , 10 or 11. If "Clear_Condition" , players 2 , 3 , 7 or 5 should clear the ball REGION . If it is before the kick off , after our goal or after the opponent's goal , position player 3 at REGION . If the condition MDR4C9 is met , then players 4-6 should pass the ball to player 9. If Pass_11 then player 11 should pass to player 9 and no one else. Bottom-up Rule Learner ball is [2] REGION CONDITION  (bpos REGION) it is in REGION CONDITION  (bpos REGION)

Rule Learning for All Productions Transformation rules for productions should cooperate globally to generate complete semantic parses Redundantly cover every positive example by β = 5 best rules Find the subset of these rules which best cooperate to generate complete semantic parses on the training data coverage accuracy

Experimental Corpora CLang GeoQuery [Zelle & Mooney, 1996] 300 randomly selected pieces of coaching advice from the log files of the 2003 RoboCup Coach Competition 22.52 words on average in NL sentences 14.24 tokens on average in formal expressions GeoQuery [Zelle & Mooney, 1996] 250 queries for the given U.S. geography database 6.87 words on average in NL sentences 5.32 tokens on average in formal expressions It’s test on Clang corpora. It’s also tested on an benchmark corpora of semantic parsing. - Geoquery example: how many cities are there in the US?

Experimental Methodology Evaluated using standard 10-fold cross validation Syntactic parses needed by tree-based version were obtained by training Collins’ parser [Bikel, 2004] on WSJ treebank and gold-standard parses of training sentences Correctness CLang: output exactly matches the correct representation Geoquery: the resulting query retrieves the same answer as the correct representation Metrics Geoquery – using the same evaluation method in previous papers Why not using partial match: doesn’t make sense, player 2 instead of player 3, left instead of right Animation, first show correct, then wrong Explain what’s precision, what’s recall

Compared Systems CHILL GEOBASE CHILLIN [Zelle & Mooney, 1996] Learns control rules for shift-reduce parsing using Inductive Logic Programming (ILP) CHILLIN [Zelle & Mooney, 1996] COCKTAIL [Tang & Mooney, 2001] GEOBASE Hand-built parser for GeoQuery [Borland International, 1988] SILT-string - Map string to MR SILT-tree Map tree to MR One take syntax into account , but one doesn’t.

Precision Learning Curves for CLang

Recall Learning Curves for CLang

Precision Learning Curves for GeoQuery

Recall Learning Curves for GeoQuery

Related Work SCISSOR [Ge & Mooney, 2005] Integrates semantic and syntactic statistical parsing Requires extensive annotations but gives better results PRECISE [Popescu et al., 2003] Designed to work specially on NL database interfaces Speech Recognition Community [Zue & Glass, 2000] Simpler queries in ATIS corpus

Conclusions New approach for semantic parsing, SILT, which uses transformation rules SILT learns transformation rules by doing bottom-up rule induction exploiting the target language grammar Tested on two very different domains, performs better than previous ILP-based approaches

Thank You! Our corpora can be downloaded from: http://www.cs.utexas.edu/~ml/nldata.html Questions??

F-measure Learning Curves for CLang

F-measure Learning Curves for GeoQuery

Extra Slide: Average Training Time in Minutes CLang GeoQuery SILT-string 3.2 0.35 CHILLIN 10.4 6.3 SILT-tree 81.4 21.5 COCKTAIL _ 39.6

Extra Slide: Variations of Rule Representation Context in the patterns: in REGION CONDITION  (bpos REGION)

Extra Slide: Variations of Rule Representation Context in the patterns: the ball in REGION CONDITION  (bpos REGION) TEAM UNUM has the ball CONDITION (bpos REGION) in REGION TEAM UNUM has [1] ball CONDITION  (bowner TEAM {UNUM})

Extra Slide: Variations of Rule Representation Context in the patterns: Templates with multiple productions: TEAM UNUM has the ball in REGION CONDITION  (and (bwoner TEAM UNUM) (bpos REGION))

Extra Slide: Experimental Methodology Correctness CLang: output exactly matches the correct representation Geoquery: the resulting query retrieves the same answer as the correct representation If the ball is in our penalty area, all our players except player 4 should stay in our half. Geoquery – using the same evaluation method in previous papers Why not using partial match: doesn’t make sense, player 2 instead of player 3, left instead of right Animation, first show correct, then wrong Explain what’s precision, what’s recall Correct: ((bpos (penalty-area our)) (do (player-except our{4}) (pos (half our))) ((bpos (penalty-area opp)) (do (player-except our{4}) (pos (half our))) Output:

Extra Slide: Future Work Hard-matching symbolic patterns are sometimes too brittle, exploit string and tree kernels as classifiers [Lodhi et al., 2002] Unified implementation of string and tree-based versions for direct comparisons