Download presentation
Presentation is loading. Please wait.
Published byPaula Richards Modified over 9 years ago
1
1 Learning for Semantic Parsing Using Statistical Syntactic Parsing Techniques Ruifang Ge Ph.D. Final Defense Supervisor: Raymond J. Mooney Machine Learning Group Department of Computer Science The University of Texas at Austin
2
2 Semantic Parsing Semantic Parsing: Transforming natural language (NL) sentences into completely formal meaning representations (MRs) Sample application domains where MRs are directly executable by another computer system to perform some task CLang: Robocup Coach Language Geoquery: A Database Query Application
3
3 CLang (RoboCup Coach Language) In RoboCup Coach competition, teams compete to coach simulated players The coaching instructions are given in a formal language called CLang Simulated soccer field Coach If our player 2 has the ball, then position our player 5 in the midfield. CLang ((bowner (player our {2})) (do (player our {5}) (pos (midfield)))) Semantic Parsing
4
4 GeoQuery: A Database Query Application Query application for U.S. geography database [Zelle & Mooney, 1996] User What are the rivers in Texas? Semantic Parsing DataBase Angelina, Blanco, … Query answer(x1, (river(x1), loc(x1,x2), equal(x2,stateid(texas))))
5
5 Motivation for Semantic Parsing Theoretically, it answers the question of how people interpret language Practical applications Question answering Natural language interface Knowledge acquisition Reasoning
6
6 Motivating Example Semantic parsing is a compositional process. Sentence structures are needed for building meaning representations. ((bowner (player our {2})) (do our {4} (pos (half our)))) If our player 2 has the ball, our player 4 should stay in our half bowner: ball owner pos: position
7
7 Syntax-Based Approaches Meaning composition follows the tree structure of a syntactic parse Composing the meaning of a constituent from the meanings of its sub-constituents in a syntactic parse Hand-built approaches (Woods, 1970, Warren and Pereira, 1982) Learned approaches Miller et al. (1996): Conceptually simple sentences Zettlemoyer & Collins (2005)): hand-built Combinatory Categorial Grammar (CCG) template rules
8
8 Example ourplayer 2 has theball PRP$NNCDVB DTNN NP VPNP S MR: bowner(player(our,2)) Use the structure of a syntactic parse
9
9 ourplayer 2 has theball PRP$-ourNN-player(_,_)CD-2VB-bowner(_) DT-nullNN-null NP VPNP S Example MR: bowner(player(our,2)) Assign semantic concepts to words
10
10 ourplayer 2 has theball PRP$-ourNN-player(_,_)CD-2VB-bowner(_) DT-nullNN-null NP VP NP-player(our,2) S Example MR: bowner(player(our,2)) Compose meaning for the internal nodes
11
11 ourplayer 2 has theball PRP$-ourNN-player(_,_)CD-2VB-bowner(_) DT-nullNN-null NP-null VP-bowner(_) NP-player(our,2) S Example MR: bowner(player(our,2)) Compose meaning for the internal nodes
12
12 ourplayer 2 has theball PRP$-ourNN-player(_,_)CD-2VB-bowner(_) DT-nullNN-null NP-null VP-bowner(_)NP-player(our,2) S-bowner(player(our,2)) Example MR: bowner(player(our,2)) Compose meaning for the internal nodes
13
13 Semantic Grammars Non-terminals in a semantic grammar correspond to semantic concepts in application domains Hand-built approaches (Hendrix et al., 1978) Learned approaches Tang & Mooney (2001), Kate & Mooney (2006), Wong & Mooney (2006)
14
14 ourplayer2 hastheball our2 player bowner Example MR: bowner(player(our,2)) bowner → player has the ball
15
15 Thesis Contributions Introduce two novel syntax-based approaches to semantic parsing Theoretically well-founded in computational semantics ( Blackburn and Bos, 2005 ) Great opportunity: leverage the significant progress made in statistical syntactic parsing for semantic parsing (Collins, 1997; Charniak and Johnson, 2005; Huang, 2008)
16
16 Thesis Contributions S CISSOR : a novel integrated syntactic- semantic parser S YN S EM: exploits an existing syntactic parser to produce disambiguated parse trees that drive the compositional meaning composition Investigate when the knowledge of syntax can help
17
17 Representing Semantic Knowledge in Meaning Representation Language Grammar (MRLG) ProductionPredicate CONDITION →(bowner PLAYER)P_BOWNER PLAYER →(player TEAM {UNUM})P_PLAYER UNUM → 2P_UNUM TEAM → ourP_OUR Assumes a meaning representation language (MRL) is defined by an unambiguous context-free grammar. Each production rule introduces a single predicate in the MRL. The parse of a MR gives its predicate-argument structure.
18
18 Roadmap S CISSOR S YNSEM Future Work Conclusions
19
19 Semantic Composition that Integrates Syntax and Semantics to get Optimal Representations Integrated syntactic-semantic parsing Allows both syntax and semantics to be used simultaneously to obtain an accurate combined syntactic-semantic analysis A statistical parser is used to generate a semantically augmented parse tree (SAPT) SCISSOR
20
20 Syntactic Parse PRP$NN CDVB DTNN NP VPNP S ourplayer2has theball
21
21 SAPT PRP$ -P_OUR NN -P_PLAYER CD - P_UNUMVB - P_BOWNER DT -NULL NN -NULL NP -NULL VP -P_BOWNER NP -P_PLAYER S -P_BOWNER ourplayer2has theball Non-terminals now have both syntactic and semantic labels Semantic labels: dominate predicates in the sub-trees
22
22 SAPT PRP$ -P_OUR NN -P_PLAYER CD - P_UNUMVB - P_BOWNER DT -NULL NN -NULL NP -NULL VP -P_BOWNER NP -P_PLAYER S -P_BOWNER ourplayer2has theball MR: P_BOWNER(P_PLAYER(P_OUR,P_UNUM))
23
23 S CISSOR Overview Integrated Semantic Parser SAPT Training Examples TRAINING learner
24
24 Integrated Semantic Parser SAPT Compose MR MR NL Sentence TESTING SCISSOR Overview
25
25 Extending Collins ’ (1997) Syntactic Parsing Model Find a SAPT with the maximum probability A lexicalized head-driven syntactic parsing model Extending the parsing model to generate semantic labels simultaneously with syntactic labels
26
26 Why Extending Collins ’ (1997) Syntactic Parsing Model Suitable for incorporating semantic knowledge Head dependency: predicate-argument relation Syntactic subcategorization: a set of arguments that a predicate appears with Bikel (2004) implementation: easily extendable
27
27 Parser Implementation Supervised training on annotated SAPTs is just frequency counting Testing: a variant of standard CKY chart- parsing algorithm Details in the thesis
28
28 Smoothing Each label in SAPT is the combination of a syntactic label and a semantic label Increases data sparsity Break the parameters down P h (H | P, w) = P h (H syn, H sem | P, w) = P h (H syn | P, w) × P h (H sem | P, w, H syn )
29
29 Experimental Corpora CLang ( Kate, Wong & Mooney, 2005 ) 300 pieces of coaching advice 22.52 words per sentence Geoquery ( Zelle & Mooney, 1996 ) 880 queries on a geography database 7.48 word per sentence MRL: Prolog and FunQL
30
30 Prolog vs. FunQL (Wong, 2007) Prolog: answer(x 1, (river(x 1 ), loc(x 1,x 2 ), equal(x 2,stateid(texas)))) What are the rivers in Texas? FunQL: answer(river(loc_2(stateid(texas)))) X 1 : river; x 2 : texas Logical forms: widely used as MRLs in computational semantics, support reasoning
31
31 Prolog: answer(x 1, (river(x 1 ), loc(x 1,x 2 ), equal(x 2,stateid(texas)))) What are the rivers in Texas? FunQL: answer(river(loc_2(stateid(texas)))) Flexible order Strict order Better generalization on Prolog Prolog vs. FunQL (Wong, 2007)
32
32 Experimental Methodology standard 10-fold cross validation Correctness CLang: exactly matches the correct MR Geoquery: retrieves the same answers as the correct MR Metrics Precision: % of the returned MRs that are correct Recall: % of NLs with their MRs correctly returned F-measure: harmonic mean of precision and recall
33
33 Compared Systems COCKTAIL (Tang & Mooney, 2001) Deterministic, inductive logic programming WASP (Wong & Mooney, 2006) Semantic grammar, machine translation KRISP (Kate & Mooney, 2006) Semantic grammar, string kernels Z&C (Zettleymoyer & Collins, 2007) Syntax-based, combinatory categorial grammar (CCG) LU (Lu et al., 2008) Semantic grammar, generative parsing model
34
34 Compared Systems COCKTAIL (Tang & Mooney, 2001) Deterministic, inductive logic programming WASP (Wong & Mooney, 2006) Semantic grammar, machine translation KRISP (Kate & Mooney, 2006) Semantic grammar, string kernels Z&C (Zettleymoyer & Collins, 2007) Syntax-based, combinatory categorial grammar (CCG) LU (Lu et al., 2008) Semantic grammar, generative parsing model Hand-built lexicon for Geoquery Manual CCG Template rules
35
35 Compared Systems COCKTAIL (Tang & Mooney, 2001) Deterministic, inductive logic programming WASP (Wong & Mooney, 2006) Semantic grammar, machine translation KRISP (Kate & Mooney, 2006) Semantic grammar, string kernels Z&C (Zettleymoyer & Collins, 2007) Syntax-based, combinatory categorial grammar (CCG) LU (Lu et al., 2008) Semantic grammar, generative parsing model λ-WASP, handling logical forms
36
36 Results on CLang PrecisionRecallF-measure COCKTAIL--- SCISSOR89.573.780.8 WASP88.961.973.0 KRISP85.261.971.7 Z&C--- LU82.457.767.8 (LU: F-measure after reranking is 74.4%) Memory overflow Not reported
37
37 Results on CLang PrecisionRecallF-measure SCISSOR89.573.780.8 WASP88.961.973.0 KRISP85.261.971.7 LU82.457.767.8 (LU: F-measure after reranking is 74.4%)
38
38 Results on Geoquery PrecisionRecallF-measure SCISSOR92.172.381.0 WASP87.274.880.5 KRISP93.371.781.1 LU86.281.884.0 COCKTAIL89.979.484.3 λ -WASP 92.086.689.2 Z&C95.583.288.9 (LU: F-measure after reranking is 85.2%) Prolog FunQL
39
39 Results on Geoquery (FunQL) PrecisionRecallF-measure SCISSOR92.172.381.0 WASP87.274.880.5 KRISP93.371.781.1 LU86.281.884.0 (LU: F-measure after reranking is 85.2%) competitive
40
40 Why Knowledge of Syntax does not Help Geoquery: 7.48 word per sentence Short sentence Sentence structure can be feasibly learned from NLs paired with MRs Gain from knowledge of syntax vs. flexibility loss
41
41 Limitation of Using Prior Knowledge of Syntax What state is the smallest N1N1 N2N2 answer(smallest(state(all))) Traditional syntactic analysis
42
42 Limitation of Using Prior Knowledge of Syntax What state is the smallest state is the smallest N1N1 WhatN2N2 N1N1 N2N2 answer(smallest(state(all))) Traditional syntactic analysisSemantic grammar Isomorphic syntactic structure with MR Better generalization
43
43 Why Prior Knowledge of Syntax does not Help Geoquery: 7.48 word per sentence Short sentence Sentence structure can be feasibly learned from NLs paired with MRs Gain from knowledge of syntax vs. flexibility loss LU vs. WASP and KRISP Decomposed model for semantic grammar
44
44 Detailed Clang Results on Sentence Length 0-10 (7%) 11-20 ( 33 %) 21-30 ( 46 %) 31-40 (13%) 0-10 (7%) 11-20 ( 33 %) 21-30 ( 46 %) 0-10 (7%) 11-20 ( 33 %) 31-40 (13%) 21-30 ( 46 %) 0-10 (7%) 11-20 ( 33 %)
45
45 S CISSOR Summary Integrated syntactic-semantic parsing approach Learns accurate semantic interpretations by utilizing the SAPT annotations knowledge of syntax improves performance on long sentences
46
46 Roadmap S CISSOR S YNSEM Future Work Conclusions
47
47 S YN S EM Motivation SCISSOR requires extra SAPT annotation for training Must learn both syntax and semantics from same limited training corpus High performance syntactic parsers are available that are trained on existing large corpora (Collins, 1997; Charniak & Johnson, 2005)
48
48 SCISSOR Requires SAPT Annotation PRP$ -P_OUR NN -P_PLAYER CD - P_UNUMVB - P_BOWNER DT -NULL NN -NULL NP -NULL VP -P_BOWNER NP -P_PLAYER S -P_BOWNER ourplayer2has theball Time consuming. Automate it!
49
49 Part I: Syntactic Parse PRP$NN CDVB DTNN NP VPNP S ourplayer2has theball Use a statistical syntactic parser
50
50 Part II: Word Meanings P_OURP_PLAYERP_UNUMP_BOWNERNULL ourplayer2has theball Use a word alignment model ( Wong and Mooney (2006) ) ourplayer2hasballthe P_PLAYERP_BOWNERP_OURP_UNUM
51
51 Learning a Semantic Lexicon IBM Model 5 word alignment (GIZA++) top 5 word/predicate alignments for each training example Assume each word alignment and syntactic parse defines a possible SAPT for composing the correct MR
52
52 Introducing λ variables in semantic labels for missing arguments (a 1 : the first argument) ourplayer2hasballthe VP S NP P_OUR λa 1 λa 2 P_PLAYER λa 1 P_BOWNER P_UNUMNULL NP
53
53 ourplayer2hasballthe VP S NP P_OUR λa 1 λa 2 P_PLAYER λa 1 P_BOWNER P_UNUMNULL P_BOWNER P_PLAYER P_UNUM P_OUR Part III: Internal Semantic Labels How to choose the dominant predicates? NP
54
54 λa 1 λa 2 P_PLAYERP_UNUM ? player2 P_BOWNER P_PLAYER P_UNUM P_OUR, a 2 =c 2 P_PLAYER λa 1 λa 2 PLAYER + P_UNUM λa 1 (c 2 : child 2) Learning Semantic Composition Rules
55
55 ourplayer2hasballthe VP S NPP_OUR λa 1 λa 2 P_PLAYER λa 1 P_BOWNER P_UNUMNULL λa 1 P_PLAYER ? λa 1 λa 2 PLAYER + P_UNUM { λa 1 P_PLAYER, a 2 =c 2 } P_BOWNER P_PLAYER P_UNUM P_OUR Learning Semantic Composition Rules
56
56 ourplayer2hasballthe VP S P_OUR λa 1 λa 2 P_PLAYER λa 1 P_BOWNER P_UNUMNULL λa 1 P_PLAYER ? P_PLAYER P_OUR +λa 1 P_PLAYER {P_PLAYER, a 1 =c 1 } P_BOWNER P_PLAYER P_UNUMP_OUR Learning Semantic Composition Rules
57
57 ourplayer2hasballthe P_OUR λa 1 λa 2 P_PLAYER λa 1 P_BOWNER P_UNUMNULL λa 1 P_PLAYER P_PLAYER NULL λa 1 P_BOWNER ? P_BOWNER P_PLAYER P_UNUMP_OUR Learning Semantic Composition Rules
58
58 ourplayer2hasballthe P_OUR λa 1 λa 2 P_PLAYER λa 1 P_BOWNER P_UNUMNULL λa 1 P_PLAYER P_PLAYER NULL λa 1 P_BOWNER P_BOWNER P_PLAYER + λa 1 P_BOWNER {P_BOWNER, a 1 =c 1 } P_BOWNER P_PLAYER P_UNUMP_OUR Learning Semantic Composition Rules
59
59 Ensuring Meaning Composition What state is the smallest N1N1 N2N2 answer(smallest(state(all))) Non-isomorphism
60
60 Ensuring Meaning Composition Non-isomorphism between NL parse and MR parse Various linguistic phenomena Machine translation between NL and MRL Use automated syntactic parses Introduce macro-predicates that combine multiple predicates. Ensure that MR can be composed using a syntactic parse and word alignment
61
61 Unambiguous CFG of MRL Training set, {(S,T,MR)} Training Semantic parsing Input sentence parse T Output MR Testing Before training & testing training/test sentence, S Syntactic parser syntactic parse tree,T Semantic knowledge acquisition Semantic lexicon & composition rules Parameter estimation Probabilistic parsing model S YN S EM Overview
62
62 Unambiguous CFG of MRL Training set, {(S,T,MR)} Training Semantic parsing Input sentence, S Output MR Testing Before training & testing training/test sentence, S Syntactic parser syntactic parse tree,T Semantic knowledge acquisition Semantic lexicon & composition rules Parameter estimation Probabilistic parsing model S YN S EM Overview
63
63 Parameter Estimation Apply the learned semantic knowledge to all training examples to generate possible SAPTs Use a standard maximum-entropy model similar to that of Zettlemoyer & Collins (2005), and Wong & Mooney (2006) Training finds a parameter that (approximately) maximizes the sum of the conditional log-likelihood of the training set including syntactic parses Incomplete data since SAPTs are hidden variables
64
64 Features Lexical features: Unigram features: # that a word is assigned a predicate Bigram features: # that a word is assigned a predicate given its previous/subsequent word. Rule features: # a composition rule applied in a derivation
65
65 answer(x 1, (river(x 1 ), loc(x 1,x 2 ), equal(x 2,stateid(texas)))) What are the rivers in Texas? λv 1 P_ANSWER(x 1 ) λv 1 P_RIVER(x 1 ) λv 1 λv 2 P_LOC(x 1,x 2 ) λv 1 P_EQUAL(x 2 ) Handling Logical Forms Handle shared logical variables Use Lambda Calculus (v: variable)
66
66 Prolog Example answer(x 1, (river(x 1 ), loc(x 1,x 2 ), equal(x 2,stateid(texas)))) What are the rivers in Texas? λv 1 P_ANSWER(x 1 ) (λv 1 P_RIVER(x 1 ) λv 1 λv 2 P_LOC(x 1,x 2 ) λv 1 P_EQUAL(x 2 )) Handle shared logical variables Use Lambda Calculus (v: variable)
67
67 Prolog Example answer(x 1, (river(x 1 ), loc(x 1,x 2 ), equal(x 2,stateid(texas)))) What are the rivers in Texas? λv 1 P_ANSWER(x 1 ) (λv 1 P_RIVER(x 1 ) λv 1 λv 2 P_LOC(x 1,x 2 ) λv 1 P_EQUAL(x 2 )) Handle shared logical variables Use Lambda Calculus (v: variable)
68
68 Prolog Example What are the rivers in Texas NP PP IN SBARQ NP SQ VBPWHNP answer(x 1, (river(x 1 ), loc(x 1,x 2 ), equal(x 2,stateid(texas)))) Start from a syntactic parse
69
69 Prolog Example What are the rivers in Texas PP SBARQ NP SQ λv 1 λa 1 P_ ANSWER NULL λv 1 P_RIVER λv 1 λv 2 P_LOC λv 1 P_EQUAL Add predicates to words answer(x 1, (river(x 1 ), loc(x 1,x 2 ), equal(x 2,stateid(texas))))
70
70 Prolog Example What are the rivers in Texas SBARQ NP SQ λv 1 λa 1 P_ ANSWER NULL λv 1 P_RIVER λv 1 λv 2 P_LOC λv 1 P_EQUAL λv 1 P_LOC answer(x 1, (river(x 1 ), loc(x 1, x 2 ), equal( x 2,stateid(texas)))) Learn a rule with variable unification λv 1 λv 2 P_LOC(x 1, x 2 ) + λv 1 P_EQUAL( x 2 ) λv 1 P_LOC
71
71 Experimental Results CLang Geoquery (Prolog)
72
72 Syntactic Parsers (Bikel,2004) WSJ only CLang(S YN 0): F-measure=82.15% Geoquery(S YN 0) : F-measure=76.44% WSJ + in-domain sentences CLang(S YN 20): 20 sentences, F-measure=88.21% Geoquery(S YN 40): 40 sentences, F-measure=91.46% Gold-standard syntactic parses ( G OLD S YN )
73
73 Questions Q1. Can S YN S EM produce accurate semantic interpretations? Q2. Can more accurate Treebank syntactic parsers produce more accurate semantic parsers? Q3. Does it also improve on long sentences? Q4. Does it improve on limited training data due to the prior knowledge from large treebanks? Q5. Can it handle syntactic errors?
74
74 Results on CLang PrecisionRecallF-measure G OLD S YN 84.774.079.0 S YN 2085.470.076.9 S YN 087.067.075.7 SCISSOR89.573.780.8 WASP88.961.973.0 KRISP85.261.971.7 LU82.457.767.8 (LU: F-measure after reranking is 74.4%) S YN S EM SAPTs GOLDSYN > SYN20 > SYN0
75
75 Questions Q1. Can SynSem produce accurate semantic interpretations? [yes] Q2. Can more accurate Treebank syntactic parsers produce more accurate semantic parsers? [yes] Q3. Does it also improve on long sentences?
76
76 Detailed Clang Results on Sentence Length 31-40 (13%) 21-30 ( 46 %) 0-10 (7%) 11-20 ( 33 %) Prior Knowledge Syntactic error + Flexibility + = ?
77
77 Questions Q1. Can SynSem produce accurate semantic interpretations? [yes] Q2. Can more accurate Treebank syntactic parsers produce more accurate semantic parsers? [yes] Q3. Does it also improve on long sentences? [yes] Q4. Does it improve on limited training data due to the prior knowledge from large treebanks?
78
78 Results on Clang (training size = 40 ) PrecisionRecallF-measure G OLD S YN 61.135.745.1 S YN 2057.831.040.4 S YN 053.522.731.9 SCISSOR85.023.036.2 WASP88.014.424.7 KRISP68.3520.031.0 S YN S EM SAPTs The quality of syntactic parser is critically important!
79
79 Questions Q1. Can SynSem produce accurate semantic interpretations? [yes] Q2. Can more accurate Treebank syntactic parsers produce more accurate semantic parsers? [yes] Q3. Does it also improve on long sentences? [yes] Q4. Does it improve on limited training data due to the prior knowledge from large treebanks? [yes] Q5. Can it handle syntactic errors?
80
80 Handling Syntactic Errors Training ensures meaning composition from syntactic parses with errors For test NLs that generate correct MRs, measure the F-measures of their syntactic parses SYN0: 85.5% SYN20: 91.2% If DR2C7 is true then players 2, 3, 7 and 8 should pass to player 4
81
81 Questions Q1. Can SynSem produce accurate semantic interpretations? [yes] Q2. Can more accurate Treebank syntactic parsers produce more accurate semantic parsers? [yes] Q3. Does it also improve on long sentences? [yes] Q4. Does it improve on limited training data due to the prior knowledge of large treebanks? [yes] Q5. Is it robust to syntactic errors? [yes]
82
82 Results on Geoquery (Prolog) PrecisionRecallF-measure G OLD S YN 91.988.290.0 S YN 4090.286.988.5 S YN 081.879.080.4 COCKTAIL89.979.484.3 λ -WASP 92.086.689.2 Z&C95.583.288.9 S YN S EM SYN0 does not perform well All other recent systems perform competitively
83
83 S YN S EM Summary Exploits an existing syntactic parser to drive the meaning composition process Prior knowledge of syntax improves performance on long sentences Prior knowledge of syntax improves performance on limited training data Handle syntactic errors
84
84 Discriminative Reranking for semantic Parsing Adapt global features used for reranking syntactic parsing for semantic parsing Improvement on CLang No improvement on Geoquery where sentences are short, and are less likely for global features to show improvement on
85
85 Roadmap S CISSOR S YNSEM Future Work Conclusions
86
86 Future Work Improve S CISSOR Discriminative S CISSOR (Finkel, et al., 2008) Handling logical forms S CISSOR without extra annotation ( Klein and Manning, 2002, 2004 ) Improve S YN S EM Utilizing syntactic parsers with improved accuracy and in other syntactic formalism
87
87 Future Work Utilizing wide-coverage semantic representations ( Curran et al., 2007 ) Better generalizations for syntactic variations Utilizing semantic role labeling (Gildea and Palmer, 2002) Provides a layer of correlated semantic information
88
88 Roadmap S CISSOR S YNSEM Future Work Conclusions
89
89 Conclusions S CISSOR : a novel integrated syntactic-semantic parser. S YN S EM: exploits an existing syntactic parser to produce disambiguated parse trees that drive the compositional meaning composition. Both produce accurate semantic interpretations. Using the knowledge of syntax improves performance on long sentences. S YN S EM also improves performance on limited training data.
90
90 Thank you! Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.