1 Natural Language Processing COMPSCI 423/723 Rohit Kate.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

ECG Signal processing (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
An Introduction of Support Vector Machine
Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning Semantic Parsers Using Statistical.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Structured SVM Chen-Tse Tsai and Siddharth Gupta.
An Introduction of Support Vector Machine
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.
1 Unsupervised Semantic Parsing Hoifung Poon and Pedro Domingos EMNLP 2009 Best Paper Award Speaker: Hao Xiong.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars Kewei TuVasant Honavar Departments of Statistics and Computer Science University.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning Language Semantics from Ambiguous Supervision Rohit J. Kate.
Graphical models for part of speech tagging
A Natural Language Interface for Crime-related Spatial Queries Chengyang Zhang, Yan Huang, Rada Mihalcea, Hector Cuellar Department of Computer Science.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
1 Learning for Semantic Parsing Using Statistical Syntactic Parsing Techniques Ruifang Ge Ph.D. Final Defense Supervisor: Raymond J. Mooney Machine Learning.
Learning to Transform Natural to Formal Language Presented by Ping Zhang Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney.
Universit at Dortmund, LS VIII
Discriminative Models for Spoken Language Understanding Ye-Yi Wang, Alex Acero Microsoft Research, Redmond, Washington USA ICSLP 2006.
A Language Independent Method for Question Classification COLING 2004.
1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing Raymond.
Chapter 23: Probabilistic Language Models April 13, 2004.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning a Compositional Semantic Parser.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Supertagging CMSC Natural Language Processing January 31, 2006.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Natural Language Generation with Tree Conditional Random Fields Wei Lu, Hwee Tou Ng, Wee Sun Lee Singapore-MIT Alliance National University of Singapore.
Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing of Natural.
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
A Kernel-based Approach to Learning Semantic Parsers
Statistical Machine Translation Part II: Word Alignments and EM
PRESENTED BY: PEAR A BHUIYAN
Semantic Parsing for Question Answering
Using String-Kernels for Learning Semantic Parsers
Learning to Transform Natural to Formal Languages
Learning to Parse Database Queries Using Inductive Logic Programming
Learning to Sportscast: A Test of Grounded Language Acquisition
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Presentation transcript:

1 Natural Language Processing COMPSCI 423/723 Rohit Kate

2 Semantic Parsing Some of the slides have been adapted from the ACL 2010 tutorial by Rohit Kate and Yuk Wah Wong, and from Raymond Mooney’s NLP course at UT Austin.

Introduction to the Semantic Parsing Task

4 Semantic Parsing “Semantic Parsing” is, ironically, a semantically ambiguous term –Semantic role labeling (find agent, patient etc. for a verb) –Finding generic relations in text (find part-whole, member-of relations) –Transforming a natural language sentence into its meaning representation 

5 Semantic Parsing Semantic Parsing: Transforming natural language (NL) sentences into computer executable complete meaning representations (MRs) for domain-specific applications Realistic semantic parsing currently entails domain dependence Example application domains –ATIS: Air Travel Information Service –CLang: Robocup Coach Language –Geoquery: A Database Query Application

6 Interface to an air travel database [Price, 1990] Widely-used benchmark for spoken language understanding ATIS: Air Travel Information Service Air-Transportation Show: (Flight-Number) Origin: (City "Cleveland") Destination: (City "Dallas") May I see all the flights from Cleveland to Dallas? NA 1439, TQ 23, … Semantic Parsing Query

7 CLang: RoboCup Coach Language In RoboCup Coach competition teams compete to coach simulated players [ The coaching instructions are given in a computer language called CLang [Chen et al. 2003] Simulated soccer field CLang If the ball is in our goal area then player 1 should intercept it. (bpos (goal-area our) (do our {1} intercept)) Semantic Parsing

8 Geoquery: A Database Query Application Query application for U.S. geography database containing about 800 facts [Zelle & Mooney, 1996] Which rivers run through the states bordering Texas? Query answer(traverse(next_to(stateid(‘texas’)))) Semantic Parsing Arkansas, Canadian, Cimarron, Gila, Mississippi, Rio Grande … Answer

9 What is the meaning of “meaning”? Representing the meaning of natural language is ultimately a difficult philosophical question Many attempts have been made to define generic formal semantics of natural language –Can they really be complete? –What can they do for us computationally? –Not so useful if the meaning of Life is defined as Life’ Our meaning representation for semantic parsing does something useful for an application Procedural Semantics: The meaning of a sentence is a formal representation of a procedure that performs some action that is an appropriate response –Answering questions –Following commands

10 Meaning Representation Languages Meaning representation language (MRL) for an application is assumed to be present MRL is designed by the creators of the application to suit the application’s needs independent of natural language CLang was designed by RoboCup community to send formal coaching instructions to simulated players Geoquery’s MRL was based on the Prolog database MRL is unambiguous by design

11 MR: answer(traverse(next_to(stateid(‘texas’)))) Unambiguous parse tree of MR: Productions: ANSWER  answer(RIVER) RIVER  TRAVERSE(STATE) STATE  NEXT_TO(STATE) TRAVERSE  traverse NEXT_TO  next_to STATEID  ‘texas’ ANSWER answer STATE RIVER STATE NEXT_TO TRAVERSE STATEID stateid ‘ texas ’ next_to traverse Meaning Representation Languages

12 Engineering Motivation for Semantic Parsing Applications of domain-dependent semantic parsing –Natural language interfaces to computing systems –Communication with robots in natural language –Personalized software assistants –Question-answering systems Machine learning makes developing semantic parsers for specific applications more tractable Training corpora can be easily developed by tagging natural-language glosses with formal statements

13 Cognitive Science Motivation for Semantic Parsing Most natural-language learning methods require supervised training data that is not available to a child –No POS-tagged or treebank data Assuming a child can infer the likely meaning of an utterance from context, NL - MR pairs are more cognitively plausible training data

14 Distinctions from Other NLP Tasks: Deeper Semantic Analysis Information extraction involves shallow semantic analysis Show the long Alice sent me yesterday SenderSent-toTypeTime AliceMeLong7/10/2010

15 Distinctions from Other NLP Tasks: Deeper Semantic Analysis Semantic role labeling also involves shallow semantic analysis sender recipient theme Show the long Alice sent me yesterday

16 Distinctions from Other NLP Tasks: Deeper Semantic Analysis Semantic parsing involves deeper semantic analysis to understand the whole sentence for some application Show the long Alice sent me yesterday Semantic Parsing

17 Distinctions from Other NLP Tasks: Final Representation Part-of-speech tagging, syntactic parsing, SRL etc. generate some intermediate linguistic representation, typically for latter processing; in contrast, semantic parsing generates a final representation Show the long Alice sent me yesterday determineradjectivenoun verbpronounnounverb verb phrasenoun phrase verb phrase sentence noun phrase sentence

18 Distinctions from Other NLP Tasks: Computer Readable Output The output of some NLP tasks, like question- answering, summarization and machine translation, are in NL and meant for humans to read Since humans are intelligent, there is some room for incomplete, ungrammatical or incorrect output in these tasks; credit is given for partially correct output In contrast, the output of semantic parsing is in formal language and is meant for computers to read; it is critical to get the exact output, strict evaluation with no partial credit

19 Distinctions from Other NLP Tasks Shallow semantic processing –Information extraction –Semantic role labeling Intermediate linguistic representations –Part-of-speech tagging –Syntactic parsing –Semantic role labeling Output meant for humans –Question answering –Summarization –Machine translation

20 Relations to Other NLP Tasks: Word Sense Disambiguation Semantic parsing includes performing word sense disambiguation Which rivers run through the states bordering Mississippi? answer(traverse(next_to(stateid(‘mississippi’)))) Semantic Parsing State? River? State

21 Relations to Other NLP Tasks: Syntactic Parsing Semantic parsing inherently includes syntactic parsing but as dictated by the semantics ourplayer 2 has theball our player(_,_) 2 bowner(_) null bowner(_) player(our,2) bowner(player(our,2)) MR: bowner(player(our,2)) A semantic derivation:

22 Relations to Other NLP Tasks: Syntactic Parsing Semantic parsing inherently includes syntactic parsing but as dictated by the semantics ourplayer 2 has theball PRP$-ourNN-player(_,_)CD-2VB-bowner(_) null VP-bowner(_)NP-player(our,2) S-bowner(player(our,2)) MR: bowner(player(our,2)) A semantic derivation:

23 Relations to Other NLP Tasks: Machine Translation The MR could be looked upon as another NL [Papineni et al., 1997; Wong & Mooney, 2006] Which rivers run through the states bordering Mississippi? answer(traverse(next_to(stateid(‘mississippi’))))

24 Relations to Other NLP Tasks: Natural Language Generation Reversing a semantic parsing system becomes a natural language generation system [Jacobs, 1985; Wong & Mooney, 2007a] Which rivers run through the states bordering Mississippi? answer(traverse(next_to(stateid(‘mississippi’)))) Semantic Parsing NL Generation

25 Relations to Other NLP Tasks Tasks being performed within semantic parsing –Word sense disambiguation –Syntactic parsing as dictated by semantics Tasks closely related to semantic parsing –Machine translation –Natural language generation

26 References Chen et al. (2003) Users manual: RoboCup soccer server manual for soccer server version 7.07 and later. Available at P. Jacobs (1985). PHRED: A generator for natural language interfaces. Comp. Ling., 11(4): K. Papineni, S. Roukos, T. Ward (1997). Feature-based language understanding. In Proc. of EuroSpeech, pp Rhodes, Greece. P. Price (1990). Evaluation of spoken language systems: The ATIS domain. In Proc. of the Third DARPA Speech and Natural Language Workshop, pp Y. W. Wong, R. Mooney (2006). Learning for semantic parsing with statistical machine translation. In Proc. of HLT-NAACL, pp New York, NY. Y. W. Wong, R. Mooney (2007a). Generation by inverting a semantic parser that uses statistical machine translation. In Proc. of NAACL-HLT, pp Rochester, NY. J. Zelle, R. Mooney (1996). Learning to parse database queries using inductive logic programming. In Proc. of AAAI, pp Portland, OR.

27 An Early Hand-Built System: Lunar (Woods et al., 1972) English as the query language for a 13,000- entry lunar geology database built after the Apollo program Hand-built system to do syntactic analysis and then semantic interpretation A non-standard logic as a formal language System contains: –Grammar for a subset of English –Semantic interpretation rules –Dictionary of 3,500 words

28 Lunar (Woods et al., 1972) How many breccias contain olivine (FOR THE X12 / (SEQL (NUMBER X12 / (SEQ TYPECS) : (CONTAIN X12 (NPR* X14 / (QUOTE OLIV)) (QUOTE NIL)))) : T ; (PRINTOUT X12))  (5) What are they (FOR EVERY X12 / (SEQ TYPECS) : (CONTAIN X12 (NPR* X14 / (QUOTE OLIV)) (QUOTE NIL)) ; (PRINTOUT X12))  S10019, S10059, S10065, S10067, S10073

29 References J. Dowding, R. Moore, F. Andry, D. Moran (1994). Interleaving syntax and semantics in an efficient bottom-up parser. In Proc. of ACL, pp Las Cruces, NM. S. Seneff (1992). TINA: A natural language system for spoken language applications. Comp. Ling., 18(1): W. Ward, S. Issar (1996). Recent improvement in the CMU spoken language understanding system. In Proc. of the ARPA HLT Workshop, pp D. Warren, F. Pereira (1982). An efficient easily adaptable system for interpreting natural language queries. American Journal of CL, 8(3-4): W. Woods, R. Kaplan, B. Nash-Webber (1972). The lunar sciences natural language information system: Final report. Tech. Rep. 2378, BBN Inc., Cambridge, MA.

30 Learning for Semantic Parsing: Motivations Manually programming robust semantic parsers is difficult It is easier to develop training corpora by associating natural-language sentences with meaning representations The increasing availability of training corpora, and the decreasing cost of computation, relative to engineering cost, favor the learning approach

31 Learning Semantic Parsers Meaning Representation Novel sentence Training Sentences & Meaning Representations Semantic Parser Learner Semantic Parser

32 Learning Semantic Parsers Semantic Parser Learner Semantic Parser Which rivers run through the states that are next to Texas? answer(traverse(next_to(stateid(‘texas’)))) What is the lowest point of the state with the largest area? answer(lowest(place(loc(largest_one(area(state(all))))))) What is the largest city in states that border California? answer(largest(city(loc(next_to(stateid( 'california')))))) …… Which rivers run through the states bordering Mississippi? answer(traverse(next_to( stateid(‘mississippi’)))) Training Sentences & Meaning Representations Novel sentence Meaning Representation

33 Outline: Recent Semantic Parser Learners Wong & Mooney (2006, 2007a, 2007b) –Syntax-based machine translation methods Zettlemoyer & Collins (2005, 2007) –Structured learning with combinatory categorial grammars (CCG) Kate & Mooney (2006), Kate (2008a, 2008b) –SVM with kernels for robust semantic parsing Lu et al. (2008) –A generative model for semantic parsing Ge & Mooney (2005, 2009) –Exploiting syntax for semantic parsing

Semantic Parsing using Machine Translation Techniques

35 WASP: A Machine Translation Approach to Semantic Parsing Based on a semantic grammar of the natural language Uses machine translation techniques –Synchronous context-free grammars (SCFG) –Word alignments (Brown et al., 1993) Wong & Mooney (2006)

36 Synchronous Context-Free Grammar Developed by Aho & Ullman (1972) as a theory of compilers that combines syntax analysis and code generation in one phase –Formal to formal languages Used for syntax-based machine translation (Wu, 1997; Chiang 2005) –Natural to natural languages Generates a pair of strings in a derivation

37 QUERY  What is CITY CITY  the capital CITY CITY  of STATE STATE  Ohio Context-Free Semantic Grammar Ohio of STATE QUERY CITY What is CITY the capital

38 QUERY  What is CITY Production of Context-Free Grammar

39 QUERY  What is CITY / answer(CITY) Production of Synchronous Context-Free Grammar Natural languageFormal language

40 STATE  Ohio / stateid('ohio') QUERY  What is CITY / answer(CITY) CITY  the capital CITY / capital(CITY) CITY  of STATE / loc_2(STATE) What is the capital of Ohio Synchronous Context-Free Grammar Derivation Ohio of STATE QUERY CITY What is QUERY answer ( CITY ) capital ( CITY ) loc_2 ( STATE ) stateid ( 'ohio' ) answer(capital(loc_2(stateid('ohio')))) CITY the capital

41 Probabilistic SCFG for Semantic Parsing S (start symbol) = QUERY L (lexicon) = w (feature weights) Features: –f i (x, d): Number of times production i is used in derivation d Log-linear model: P w (d | x)  exp(w. f(x, d)) Best derviation: d* = argmax d w. f(x, d) STATE  Ohio / stateid('ohio') QUERY  What is CITY / answer(CITY) CITY  the capital CITY / capital(CITY) CITY  of STATE / loc_2(STATE)

42 Probabilistic Parsing Model Ohio of STATE CITY capital ( CITY ) loc_2 ( STATE ) stateid ( 'ohio' ) capital CITY STATE  Ohio / stateid('ohio') CITY  capital CITY / capital(CITY) CITY  of STATE / loc_2(STATE) d1d1

43 Probabilistic Parsing Model Ohio of RIVER CITY capital ( CITY ) loc_2 ( RIVER ) riverid ( 'ohio' ) capital CITY RIVER  Ohio / riverid('ohio') CITY  capital CITY / capital(CITY) CITY  of RIVER / loc_2(RIVER) d2d2

44 CITY capital ( CITY ) loc_2 ( STATE ) stateid ( 'ohio' ) Probabilistic Parsing Model CITY capital ( CITY ) loc_2 ( RIVER ) riverid ( 'ohio' ) STATE  Ohio / stateid('ohio') CITY  capital CITY / capital(CITY) CITY  of STATE / loc_2(STATE) RIVER  Ohio / riverid('ohio') CITY  capital CITY / capital(CITY) CITY  of RIVER / loc_2(RIVER) λλ Pr(d 1 |capital of Ohio) = exp( ) / ZPr(d 2 |capital of Ohio) = exp( ) / Z d1d1 d2d2 normalization constant

45 Overview of WASP Lexical acquisition Parameter estimation Semantic parsing Unambiguous CFG of MRL Training set, {(e,f)} Lexicon, L Parsing model parameterized by λ Input sentence, e' Output MR, f' Training Testing

46 Lexical Acquisition SCFG productions are extracted from word alignments between an NL sentence, e, and its correct MR, f, for each training example, (e, f)

47 Word Alignments A mapping from French words to their meanings expressed in English And the program has been implemented Le programme a été mis en application

48 Lexical Acquisition Train a statistical word alignment model (IBM Model 5) on training set Obtain most probable n-to-1 word alignments for each training example Extract SCFG productions from these word alignments Lexicon L consists of all extracted SCFG productions

49 Lexical Acquisition SCFG productions are extracted from word alignments between training sentences and their meaning representations

50 Use of MRL Grammar The goalie should always stay in our half RULE  (CONDITION DIRECTIVE) CONDITION  (true) DIRECTIVE  (do TEAM {UNUM} ACTION) TEAM  our UNUM  1 ACTION  (pos REGION) REGION  (half TEAM) TEAM  our top-down, left-most derivation of an un- ambiguous CFG n-to-1

51 TEAM Extracting SCFG Productions The goalie should always stay in our half RULE  (CONDITION DIRECTIVE) CONDITION  (true) DIRECTIVE  (do TEAM {UNUM} ACTION) TEAM  our UNUM  1 ACTION  (pos REGION) REGION  (half TEAM) TEAM  our TEAM  our / our

52 REGION TEAM REGION  TEAM half / (half TEAM) The goalie should always stay in half RULE  (CONDITION DIRECTIVE) CONDITION  (true) DIRECTIVE  (do TEAM {UNUM} ACTION) TEAM  our UNUM  1 ACTION  (pos REGION) REGION  (half TEAM) TEAM  our REGION  (half our) Extracting SCFG Productions

53 ACTION ACTION  (pos (half our)) REGION ACTION  stay in REGION / (pos REGION) The goalie should always stay in RULE  (CONDITION DIRECTIVE) CONDITION  (true) DIRECTIVE  (do TEAM {UNUM} ACTION) TEAM  our UNUM  1 ACTION  (pos REGION) REGION  (half our) Extracting SCFG Productions

54 Output SCFG Productions TEAM  our / our REGION  TEAM half / (half TEAM) ACTION  stay in REGION / (pos REGION) UNUM  goalie / 1 RULE  [the] UNUM should always ACTION / ((true) (do our {UNUM} ACTION)) Phrases can be non-contiguous

55 Based on maximum-entropy model: Features f i (d) are number of times each transformation rule is used in a derivation d Output translation is the yield of most probable derivation Probabilistic Parsing Model

56 Parameter Estimation Maximum conditional log-likelihood criterion Since correct derivations are not included in training data, parameters λ * are iteratively computed that maximizes the probability of the training data

57 References A. Aho, J. Ullman (1972). The Theory of Parsing, Translation, and Compiling. Prentice Hall. P. Brown, V. Della Pietra, S. Della Pietra, R. Mercer (1993). The mathematics of statistical machine translation: Parameter estimation. Comp. Ling., 19(2): D. Chiang (2005). A hierarchical phrase-based model for statistical machine translation. In Proc. of ACL, pp Ann Arbor, MI. P. Koehn, F. Och, D. Marcu (2003). Statistical phrase-based translation. In Proc. of HLT-NAACL. Edmonton, Canada. Y. W. Wong, R. Mooney (2006). Learning for semantic parsing with statistical machine translation. In Proc. of HLT-NAACL, pp New York, NY.

58 References Y. W. Wong, R. Mooney (2007a). Generation by inverting a semantic parser that uses statistical machine translation. In Proc. of NAACL-HLT, pp Rochester, NY. Y. W. Wong, R. Mooney (2007b). Learning synchronous grammars for semantic parsing with lambda calculus. In Proc. of ACL, pp Prague, Czech Republic. D. Wu (1997). Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Comp. Ling., 23(3):

Semantic Parsing using CCG

60 Combinatory Categorial Grammar (CCG) Highly structured lexical entries A few general parsing rules (Steedman, 2000; Steedman & Baldridge, 2005) Each lexical entry is a word paired with a category Texas := NP borders := (S \ NP) / NP Mexico := NP New Mexico := NP

61 Parsing Rules (Combinators) Describe how adjacent categories are combined Functional application: A / BB ⇒ A(>) BA \ B ⇒ A(<) TexasbordersNew Mexico NP(S \ NP) / NPNP S \ NP > < S Forward composition Backward composition

62 CCG for Semantic Parsing Extend categories with semantic types (Lambda calculus expressions) Functional application with semantics: Texas := NP : texas borders := (S \ NP) / NP : λx.λy.borders(y, x) A / B : fB : a ⇒ A : f(a)(>) B : aA \ B : f ⇒ A : f(a)(<)

63 Sample CCG Derivation TexasbordersNew Mexico NP texas (S \ NP) / NP λx.λy.borders(y, x) NP new_mexico S \ NP λy.borders(y, new_mexico) > < S borders(texas, new_mexico)

64 Another Sample CCG Derivation Texas touches New Mexico NP texas (S \ NP) / NP λx.λy.borders(y, x) NP new_mexico S \ NP λy.borders(y, new_mexico) > < S borders(texas, new_mexico)

65 Probabilistic CCG for Semantic Parsing L (lexicon) = w (feature weights) Features: –f i (s, d): Number of times lexical item i is used in derivation d of sentence s Log-linear model: P w (d | s)  exp(w. f(s, d)) Best derviation: d* = argmax d w. f(s, d) –Consider all possible derivations d for the sentence s given the lexicon L Zettlemoyer & Collins (2005) Texas := NP : texas borders := (S \ NP) / NP : λx.λy.borders(y, x) Mexico := NP : mexico New Mexico := NP : new_mexico

66 Learning Probabilistic CCG Lexical Generation CCG Parser Logical Forms Sentences Training Sentences & Logical Forms Parameter Estimation Lexicon L Feature weights w

67 Lexical Generation Input: Output lexicon: Texas borders New Mexico borders(texas, new_mexico) Texas := NP : texas borders := (S \ NP) / NP : λx.λy.borders(y, x) New Mexico := NP : new_mexico

68 Lexical Generation Input sentence: Texas borders New Mexico Output substrings: Texas borders New Mexico Texas borders borders New New Mexico Texas borders New … Input logical form: borders(texas, new_mexico) Output categories:

69 Category Rules Input TriggerOutput Category constant cNP : c arity one predicate pN : λx.p(x) arity one predicate pS \ NP : λx.p(x) arity two predicate p(S \ NP) / NP : λx.λy.p(y, x) arity two predicate p(S \ NP) / NP : λx.λy.p(x, y) arity one predicate p N / N : λg.λx.p(x)  g(x) arity two predicate p and constant c N / N : λg.λx.p(x, c)  g(x) arity two predicate p (N \ N) / NP : λx.λg.λy.p(y, x)  g(x) arity one function fNP / N : λg.argmax/min(g(x), λx.f(x)) arity one function fS / NP : λx.f(x)

70 Lexical Generation Input sentence: Texas borders New Mexico Output substrings: Texas borders New Mexico Texas borders borders New New Mexico Texas borders New … Input logical form: borders(texas, new_mexico) Output categories: NP : texas NP : new _mexico (S \ NP) / NP : λx.λy.borders(y, x) (S \ NP) / NP : λx.λy.borders(x, y) …  Take cross product to form an initial lexicon, include some domain independent entries What | S/(S\NP)/N : λf.λg.λx.f(x) ∧ g(x)

71 Parameter Estimation A lexicon can lead to multiple parses, parameters decide the best parse Maximum conditional likelihood: Iteratively find parameters that maximizes the probability of the training data Derivations d are not annotated, treated as hidden variables Stochastic gradient ascent (LeCun et al., 1998) Keep only those lexical items that occur in the highest scoring derivations of training set

72 References Y. LeCun, L. Bottou, Y. Bengio, P. Haffner (1998). Gradient-based learning applied to document recognition. In Proc. of the IEEE, 86(11): M. Steedman (2000). The Syntactic Process. MIT Press. M. Steedman, J. Baldridge (2005). Combinatory categorial grammar. To appear in: R. Borsley, K. Borjars (eds.) Non-Transformational Syntax, Blackwell. L. Zettlemoyer, M. Collins (2005). Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. In Proc. of UAI. Edinburgh, Scotland. L. Zettlemoyer, M. Collins (2007). Online learning of relaxed CCG grammars for parsing to logical form. In Proc. of EMNLP-CoNLL, pp Prague, Czech Republic. Tom Kwiatkowski, Luke Zettlemoyer, Sharon Goldwater, and Mark Steedman. Inducing Probabilistic CCG Grammars from Logical Form with Higher-order Unification. In Proceedings of the Conference on Emperical Methods in Natural Language Processing (EMNLP), 2010.

73 Homework 7 Given the following synchronous grammar, parse the sentence “player three should always intercept” and hence find its meaning representation. TEAM  our / our TEAM  opponent’s / opponent REGION  TEAM half / (half TEAM) ACTION  stay in REGION / (pos REGION) ACTION  intercept / (intercept) UNUM  goalie / 1 UNUM  player three / 3 RULE  UNUM should always ACTION / ((true) (do our {UNUM} ACTION))

Semantic Parsing Using Kernels

75 KRISP: Kernel-based Robust Interpretation for Semantic Parsing Learns semantic parser from NL sentences paired with their respective MRs given MRL grammar Productions of MRL are treated like semantic concepts A string classifier is trained for each production to estimate the probability of an NL string representing its semantic concept These classifiers are used to compositionally build MRs of the sentences Kate & Mooney (2006), Kate (2008a)

76 MR: answer(traverse(next_to(stateid(‘texas’)))) Parse tree of MR: Productions: ANSWER  answer(RIVER) RIVER  TRAVERSE(STATE) STATE  NEXT_TO(STATE) TRAVERSE  traverse NEXT_TO  next_to STATEID  ‘texas’ ANSWER answer STATE RIVER STATE NEXT_TO TRAVERSE STATEID stateid ‘ texas ’ next_to traverse Meaning Representation Language ANSWER  answer(RIVER) RIVER  TRAVERSE(STATE) TRAVERSE  traverse STATE  NEXT_TO(STATE) NEXT_TO  next_to STATE  STATEID STATEID  ‘ texas ’

77 Overview of KRISP Semantic Parser Semantic Parser Learner MRL Grammar NL sentences with MRs String classification probabilities Novel NL sentences Best MRs

78 Semantic Parsing by KRISP String classifier for each production gives the probability that a substring represents the semantic concept of the production Which rivers run through the states bordering Texas? NEXT_TO  next_to NEXT_TO  next_to 0.95

79 Semantic Parsing by KRISP String classifier for each production gives the probability that a substring represents the semantic concept of the production Which rivers run through the states bordering Texas? TRAVERSE  traverse

80 Semantic Derivation of an NL Sentence Which rivers run through the states bordering Texas? ANSWER  answer(RIVER) RIVER  TRAVERSE(STATE) TRAVERSE  traverse STATE  NEXT_TO(STATE) NEXT_TO  next_to STATE  STATEID STATEID  ‘ texas ’ Semantic Derivation: Each node covers an NL substring:

81 Semantic Derivation of an NL Sentence Through the states that border Texas which rivers run? ANSWER  answer(RIVER) RIVER  TRAVERSE(STATE) TRAVERSE  traverse STATE  NEXT_TO(STATE) NEXT_TO  next_to STATE  STATEID STATEID  ‘ texas ’ Substrings in NL sentence may be in a different order:

82 Semantic Derivation of an NL Sentence Through the states that border Texas which rivers run? (ANSWER  answer(RIVER), [1..10]) (RIVER  TRAVERSE(STATE), [1..10]] (TRAVERSE  traverse, [7..10]) (STATE  NEXT_TO(STATE), [1..6]) (NEXT_TO  next_to, [1..5]) (STATE  STATEID, [6..6]) (STATEID  ‘ texas ’, [6..6]) Nodes are allowed to permute the children productions from the original MR parse

83 Semantic Parsing by KRISP Semantic parsing reduces to finding the most probable derivation of the sentence Efficient dymamic programming algorithm with beam search (Extended Earley’s algorithm) [Kate & Mooney 2006] Which rivers run through the states bordering Texas? ANSWER  answer(RIVER) RIVER  TRAVERSE(STATE) TRAVERSE  traverse STATE  NEXT_TO(STATE) NEXT_TO  next_to STATE  STATEID STATEID  ‘ texas ’ Probability of the derivation is the product of the probabilities at the nodes.

84 Overview of KRISP Semantic Parser Semantic Parser Learner MRL Grammar NL sentences with MRs String classification probabilities

85 KRISP’s Training Algorithm Takes NL sentences paired with their respective MRs as input Obtains MR parses using MRL grammar Induces the semantic parser and refines it in iterations In the first iteration, for every production: –Call those sentences positives whose MR parses use that production –Call the remaining sentences negatives –Trains Support Vector Machine (SVM) classifier [Cristianini & Shawe-Taylor 2000] using string- subsequence kernel

86 Overview of KRISP Semantic Parser Semantic Parser Learner MRL Grammar NL sentences with MRs

87 Overview of KRISP Train string-kernel-based SVM classifiers Semantic Parser Collect positive and negative examples MRL Grammar NL sentences with MRs Training

88 Overview of KRISP Train string-kernel-based SVM classifiers Semantic Parser Collect positive and negative examples MRL Grammar NL sentences with MRs Training

89 KRISP’s Training Algorithm contd. STATE  NEXT_TO(STATE) which rivers run through the states bordering texas? what is the most populated state bordering oklahoma ? what is the largest city in states that border california ? … what state has the highest population ? what states does the delaware river run through ? which states have cities named austin ? what is the lowest point of the state with the largest area ? … PositivesNegatives String-kernel-based SVM classifier First Iteration

90 Support Vector Machines (SVMs) Recent very popular machine learning method for classification Finds that linear separator that maximizes the margin between the classes Based in computational learning theory, which explains why max-margin is a good approach (Vapnik, 1995) Good at avoiding over-fitting in high- dimensional feature spaces Performs well on various text and language problems, which tend to be high-dimensional

91 Picking a Linear Separator Which of the alternative linear separators is best?

92 Classification Margin Consider the distance of points from the separator. Examples closest to the hyperplane are support vectors. Margin ρ of the separator is the width of separation between classes. r ρ

93 SVM Algorithms Finding the max-margin separator is an optimization problem Algorithms that guarantee an optimal margin take at least O(n 2 ) time and do not scale well to large data sets Approximation algorithms like SVM-light (Joachims, 1999) and SMO (Platt, 1999) allow scaling to realistic problems

94 SVM and Kernels An interesting observation: SVM sees the training and test data only through their dot products This observation has been exploited with the idea of a kernel

95 Kernels SVMs can be extended to learning non-linear separators by using kernel functions. A kernel function is a similarity function between two instances, K(x 1,x 2 ), that must satisfy certain mathematical constraints. A kernel function implicitly maps instances into a higher dimensional feature space where (hopefully) the categories are linearly separable. A kernel-based method (like SVMs) can use a kernel to implicitly operate in this higher-dimensional space without having to explicitly map instances into this much larger (perhaps infinite) space (called “the kernel trick”). Kernels can be defined on non-vector data like strings, trees, and graphs, allowing the application of kernel-based methods to complex, unbounded, non- vector data structures.

96 Non-linear SVMs: Feature spaces General idea: the original feature space can always be mapped to some higher- dimensional feature space where the training set is separable: Φ: x → φ(x)

97 String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” K(s,t) = ?

98 String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = states K(s,t) = 1+?

99 String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = next K(s,t) = 2+?

100 String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = to K(s,t) = 3+?

101 String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = states next K(s,t) = 4+?

102 String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = states to K(s,t) = 5+?

103 String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = next to K(s,t) = 6+?

104 String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = states next to K(s,t) = 7

105 String Subsequence Kernel contd. The kernel is normalized to remove any bias due to different string lengths Lodhi et al. [2002] give O(n|s||t|) algorithm for computing string subsequence kernel

106 String Subsequence Kernel Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] The examples are implicitly mapped to the feature space of all subsequences and the kernel computes the dot products the states next to states bordering states that border states that share border state with the capital of states through which STATE  NEXT_TO(STATE) states with area larger than

107 Support Vector Machines SVMs find a separating hyperplane such that the margin is maximized the states next to Separating hyperplane Probability estimate of an example belonging to a class can be obtained using its distance from the hyperplane [Platt, 1999] states bordering states that border states that share border state with the capital of states with area larger than states through which 0.63 STATE  NEXT_TO(STATE) 0.97 next to state states that are next to

108 Support Vector Machines SVMs find a separating hyperplane such that the margin is maximized the states next to Separating hyperplane SVMs with string subsequence kernel softly capture different ways of expressing the semantic concept. states bordering states that border states that share border state with the capital of states with area larger than states through which 0.63 STATE  NEXT_TO(STATE) 0.97 next to state states that are next to

109 KRISP’s Training Algorithm contd. STATE  NEXT_TO(STATE) which rivers run through the states bordering texas? what is the most populated state bordering oklahoma ? what is the largest city in states that border california ? … what state has the highest population ? what states does the delaware river run through ? which states have cities named austin ? what is the lowest point of the state with the largest area ? … PositivesNegatives String classification probabilities String-kernel-based SVM classifier First Iteration

110 Overview of KRISP Train string-kernel-based SVM classifiers Semantic Parser Collect positive and negative examples MRL Grammar NL sentences with MRs Training

111 Overview of KRISP Train string-kernel-based SVM classifiers Semantic Parser Collect positive and negative examples MRL Grammar NL sentences with MRs Training

112 Overview of KRISP Train string-kernel-based SVM classifiers Semantic Parser Collect positive and negative examples MRL Grammar NL sentences with MRs Training String classification probabilities

113 Overview of KRISP Train string-kernel-based SVM classifiers Semantic Parser Collect positive and negative examples MRL Grammar NL sentences with MRs Best MRs (correct and incorrect) Training

114 Overview of KRISP Train string-kernel-based SVM classifiers Semantic Parser Collect positive and negative examples MRL Grammar NL sentences with MRs Best semantic derivations (correct and incorrect) Training

115 KRISP’s Training Algorithm contd. Using these classifiers, it tries to parse the sentences in the training data Some of these derivations will give the correct MR, called correct derivations, some will give incorrect MRs, called incorrect derivations For the next iteration, collect positive examples from correct derivations and negative examples from incorrect derivations Extended Earley’s algorithm can be forced to derive only the correct derivations by making sure all subtrees it generates exist in the correct MR parse Collect negatives from incorrect derivations with higher probability than the most probable correct derivation

11 6 KRISP’s Training Algorithm contd. Most probable correct derivation: Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO(STATE), [5..9]) (STATE  STATEID, [8..9]) (STATEID  ‘ texas ’, [8..9]) (NEXT_TO  next_to, [5..7])

11 7 KRISP’s Training Algorithm contd. Most probable correct derivation: Collect positive examples Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO(STATE), [5..9]) (STATE  STATEID, [8..9]) (STATEID  ‘ texas ’, [8..9]) (NEXT_TO  next_to, [5..7])

11 8 KRISP’s Training Algorithm contd. Incorrect derivation with probability greater than the most probable correct derivation: Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..7]) (STATE  STATEID, [8..9]) (STATEID  ‘ texas ’, [8..9]) Incorrect MR: answer(traverse(stateid( ‘ texas ’ )))

11 9 KRISP’s Training Algorithm contd. Incorrect derivation with probability greater than the most probable correct derivation: Collect negative examples Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..7]) (STATE  STATEID, [8..9]) (STATEID  ‘ texas ’, [8..9]) Incorrect MR: answer(traverse(stateid( ‘ texas ’ )))

12 0 KRISP’s Training Algorithm contd. Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..7]) (STATE  STATEID, [8..9]) (STATEID  ‘ texas ’,[8..9]) Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO (STATE), [5..9]) (STATE  STATEID, [8..9]) (STATEID  ‘ texas ’, [8..9]) (NEXT_TO  next_to, [5..7]) Most Probable Correct derivation: Incorrect derivation: Traverse both trees in breadth-first order till the first nodes where their productions differ are found.

12 1 KRISP’s Training Algorithm contd. Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..7]) (STATE  STATEID, [8..9]) (STATEID  ‘ texas ’,[8..9]) Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO (STATE), [5..9]) (STATE  STATEID, [8..9]) (STATEID  ‘ texas ’, [8..9]) (NEXT_TO  next_to, [5..7]) Most Probable Correct derivation: Incorrect derivation: Traverse both trees in breadth-first order till the first nodes where their productions differ are found.

12 2 KRISP’s Training Algorithm contd. Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..7]) (STATE  STATEID, [8..9]) (STATEID  ‘ texas ’,[8..9]) Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO (STATE), [5..9]) (STATE  STATEID, [8..9]) (STATEID  ‘ texas ’, [8..9]) (NEXT_TO  next_to, [5..7]) Most Probable Correct derivation: Incorrect derivation: Traverse both trees in breadth-first order till the first nodes where their productions differ are found.

12 3 KRISP’s Training Algorithm contd. Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..7]) (STATE  STATEID, [8..9]) (STATEID  ‘ texas ’,[8..9]) Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO (STATE), [5..9]) (STATE  STATEID, [8..9]) (STATEID  ‘ texas ’, [8..9]) (NEXT_TO  next_to, [5..7]) Most Probable Correct derivation: Incorrect derivation: Traverse both trees in breadth-first order till the first nodes where their productions differ are found.

12 4 KRISP’s Training Algorithm contd. Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..7]) (STATE  STATEID, [8..9]) (STATEID  ‘ texas ’,[8..9]) Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO (STATE), [5..9]) (STATE  STATEID, [8..9]) (STATEID  ‘ texas ’, [8..9]) (NEXT_TO  next_to, [5..7]) Most Probable Correct derivation: Incorrect derivation: Traverse both trees in breadth-first order till the first nodes where their productions differ are found.

12 5 KRISP’s Training Algorithm contd. Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..7]) (STATE  STATEID, [8..9]) (STATEID  ‘ texas ’,[8..9]) Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO (STATE), [5..9]) (STATE  STATEID, [8..9]) (STATEID  ‘ texas ’, [8..9]) (NEXT_TO  next_to, [5..7]) Most Probable Correct derivation: Incorrect derivation: Mark the words under these nodes.

12 6 KRISP’s Training Algorithm contd. Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..7]) (STATE  STATEID, [8..9]) (STATEID  ‘ texas ’,[8..9]) Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO (STATE), [5..9]) (STATE  STATEID, [8..9]) (STATEID  ‘ texas ’, [8..9]) (NEXT_TO  next_to, [5..7]) Most Probable Correct derivation: Incorrect derivation: Mark the words under these nodes.

12 7 Consider all the productions covering the marked words. Collect negatives for productions which cover any marked word in incorrect derivation but not in the correct derivation. KRISP’s Training Algorithm contd. Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..7]) (STATE  STATEID, [8..9]) (STATEID  ‘ texas ’,[8..9]) Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO (STATE), [5..9]) (STATE  STATEID, [8..9]) (STATEID  ‘ texas ’, [8..9]) (NEXT_TO  next_to, [5..7]) Most Probable Correct derivation: Incorrect derivation:

12 8 Consider the productions covering the marked words. Collect negatives for productions which cover any marked word in incorrect derivation but not in the correct derivation. KRISP’s Training Algorithm contd. Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..7]) (STATE  STATEID, [8..9]) (STATEID  ‘ texas ’,[8..9]) Which rivers run through the states bordering Texas? (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO (STATE), [5..9]) (STATE  STATEID, [8..9]) (STATEID  ‘ texas ’, [8..9]) (NEXT_TO  next_to, [5..7]) Most Probable Correct derivation: Incorrect derivation:

129 Overview of KRISP Train string-kernel-based SVM classifiers Semantic Parser Collect positive and negative examples MRL Grammar NL sentences with MRs Best semantic derivations (correct and incorrect) Training

130 Overview of KRISP Train string-kernel-based SVM classifiers Semantic Parser Collect positive and negative examples MRL Grammar NL sentences with MRs Best semantic derivations (correct and incorrect) Training

131 Overview of KRISP Train string-kernel-based SVM classifiers Semantic Parser Collect positive and negative examples MRL Grammar NL sentences with MRs Best semantic derivations (correct and incorrect) Novel NL sentences Best MRs Testing

132 A Dependency-based Word Subsequence Kernel A word subsequence kernel can count linguistically meaningless subsequences –A fat cat was chased by a dog. –A cat with a red collar was chased two days ago by a fat dog A new kernel that counts only the linguistically meaningful subsequences Kate (2008a)

133 A Dependency-based Word Subsequence Kernel Count the number of common paths in the dependency trees; efficient algorithm to do it Outperforms word subsequence kernel on semantic parsing was cat chased by dog a fat a was cat chased a with collar reda byago dog fat a days two Kate (2008a)

134 References Huma Lodhi, Craig Saunders, John Shawe-Taylor, Nello Cristianini, and Chris Watkins (2002). Text classification using string kernels. Journal of Machine Learning Research, 2: John C. Platt (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers, pages MIT Press. Rohit J. Kate and Raymond J. Mooney (2006). Using string-kernels for learning semantic parsers. In Proc. of COLING/ACL-2006, pp , Sydney, Australia, July Rohit J. Kate (2008a). A dependency-based word subsequence kernel. In Proc. of EMNLP-2008, pp , Waikiki, Honolulu, Hawaii, October Rohit J. Kate (2008b). Transforming meaning representation grammars to improve semantic parsing. In Proc. Of CoNLL-2008, pp , Manchester, UK, August 2008.

Exploiting Syntax for Semantic Parsing

136 Semantic Composition that Integrates Syntax and Semantics to get Optimal Representations Integrated syntactic-semantic parsing –Allows both syntax and semantics to be used simultaneously to obtain an accurate combined syntactic-semantic analysis A statistical parser is used to generate a semantically augmented parse tree (SAPT) SCISSOR Ge & Mooney (2005)

137 Syntactic Parse PRP$NN CDVB DTNN NP VPNP S ourplayer2has theball

138 SAPT PRP$ -P_OUR NN -P_PLAYER CD - P_UNUMVB - P_BOWNER DT -NULL NN -NULL NP -NULL VP -P_BOWNER NP -P_PLAYER S -P_BOWNER ourplayer2has theball Non-terminals now have both syntactic and semantic labels

139 SAPT PRP$ -P_OUR NN -P_PLAYER CD - P_UNUMVB - P_BOWNER DT -NULL NN -NULL NP -NULL VP -P_BOWNER NP -P_PLAYER S -P_BOWNER ourplayer2has theball MR: (bowner (player our {2})) Compose MR

140 S CISSOR Overview Integrated Semantic Parser SAPT Training Examples TRAINING learner

141 Integrated Semantic Parser SAPT Compose MR MR NL Sentence TESTING SCISSOR Overview

142 Integrated Syntactic-Semantic Parsing Find a SAPT with the maximum probability A lexicalized head-driven syntactic parsing model Extended Collins (1997) syntactic parsing model to generate semantic labels simultaneously with syntactic labels Smoothing –Each label in SAPT is the combination of a syntactic label and a semantic label which increases data sparsity –Break the parameters down into syntactic label and semantic label given syntactic label

143 Experimental Corpora CLang ( Kate, Wong & Mooney, 2005 ) –300 pieces of coaching advice –22.52 words per sentence Geoquery ( Zelle & Mooney, 1996 ) –880 queries on a geography database –7.48 word per sentence –MRL: Prolog and FunQL

144 Prolog: answer(x 1, (river(x 1 ), loc(x 1,x 2 ), equal(x 2,stateid(texas)))) What are the rivers in Texas? FunQL: answer(river(loc_2(stateid(texas)))) Logical forms: widely used as MRLs in computational semantics, support reasoning Prolog vs. FunQL (Wong & Mooney, 2007b)

145 Prolog: answer(x 1, (river(x 1 ), loc(x 1,x 2 ), equal(x 2,stateid(texas)))) What are the rivers in Texas? FunQL: answer(river(loc_2(stateid(texas)))) Flexible order Strict order Better generalization on Prolog Prolog vs. FunQL (Wong & Mooney, 2007b)

146 Experimental Methodology Standard 10-fold cross validation Correctness –CLang: exactly matches the correct MR –Geoquery: retrieves the same answers as the correct MR Metrics –Precision: % of the returned MRs that are correct –Recall: % of NLs with their MRs correctly returned –F-measure: harmonic mean of precision and recall

147 Compared Systems COCKTAIL (Tang & Mooney, 2001) –Deterministic, inductive logic programming WASP (Wong & Mooney, 2006) –Semantic grammar, machine translation KRISP (Kate & Mooney, 2006) –Semantic grammar, string kernels Z&C (Zettleymoyer & Collins, 2007) –Syntax-based, combinatory categorial grammar (CCG) LU (Lu et al., 2008) –Semantic grammar, generative parsing model SCISSOR (Ge & Mooney 2005) –Integrated syntactic-semantic parsing

148 Compared Systems COCKTAIL (Tang & Mooney, 2001) –Deterministic, inductive logic programming WASP (Wong & Mooney, 2006) –Semantic grammar, machine translation KRISP (Kate & Mooney, 2006) –Semantic grammar, string kernels Z&C (Zettleymoyer & Collins, 2007) –Syntax-based, combinatory categorial grammar (CCG) LU (Lu et al., 2008) –Semantic grammar, generative parsing model SCISSOR (Ge & Mooney 2005) –Integrated syntactic-semantic parsing Hand-built lexicon for Geoquery Small part of the lexicon

149 Compared Systems COCKTAIL (Tang & Mooney, 2001) –Deterministic, inductive logic programming WASP (Wong & Mooney, 2006, 2007b) –Semantic grammar, machine translation KRISP (Kate & Mooney, 2006) –Semantic grammar, string kernels Z&C (Zettleymoyer & Collins, 2007) –Syntax-based, combinatory categorial grammar (CCG) LU (Lu et al., 2008) –Semantic grammar, generative parsing model SCISSOR (Ge & Mooney 2005) –Integrated syntactic-semantic parsing λ-WASP, handling logical forms

150 Results on CLang PrecisionRecallF-measure COCKTAIL--- SCISSOR WASP KRISP Z&C--- LU (LU: F-measure after reranking is 74.4%) Memory overflow Not reported

151 Results on CLang PrecisionRecallF-measure SCISSOR WASP KRISP LU (LU: F-measure after reranking is 74.4%)

152 Results on Geoquery PrecisionRecallF-measure SCISSOR WASP KRISP LU COCKTAIL λ -WASP Z&C (LU: F-measure after reranking is 85.2%) Prolog FunQL

153 Results on Geoquery (FunQL) PrecisionRecallF-measure SCISSOR WASP KRISP LU (LU: F-measure after reranking is 85.2%) competitive

154 When the Prior Knowledge of Syntax Does Not Help Geoquery: 7.48 word per sentence Short sentence –Sentence structure can be feasibly learned from NLs paired with MRs Gain from knowledge of syntax vs. flexibility loss

155 Limitation of Using Prior Knowledge of Syntax What state is the smallest N1N1 N2N2 answer(smallest(state(all))) Traditional syntactic analysis

156 Limitation of Using Prior Knowledge of Syntax What state is the smallest state is the smallest N1N1 WhatN2N2 N1N1 N2N2 answer(smallest(state(all))) Traditional syntactic analysisSemantic grammar Isomorphic syntactic structure with MR Better generalization

157 When the Prior Knowledge of Syntax Does Not Help Geoquery: 7.48 word per sentence Short sentence –Sentence structure can be feasibly learned from NLs paired with MRs Gain from knowledge of syntax vs. flexibility loss

158 Clang Results with Sentence Length 0-10 (7%) ( 33 %) ( 46 %) (13%) 0-10 (7%) ( 33 %) ( 46 %) 0-10 (7%) ( 33 %) (13%) ( 46 %) 0-10 (7%) ( 33 %) Knowledge of syntax improves performance on long sentences Sentence length

159 S YN S EM SCISSOR requires extra SAPT annotation for training Must learn both syntax and semantics from same limited training corpus High performance syntactic parsers are available that are trained on existing large corpora (Collins, 1997; Charniak & Johnson, 2005) Ge & Mooney (2009)

160 SCISSOR Requires SAPT Annotation PRP$ -P_OUR NN -P_PLAYER CD - P_UNUMVB - P_BOWNER DT -NULL NN -NULL NP -NULL VP -P_BOWNER NP -P_PLAYER S -P_BOWNER ourplayer2has theball Time consuming. Automate it!

161 S YN S EM Overview NL Sentence Syntactic Parser Semantic Lexicon Composition Rules Disambiguation Model Syntactic Parse Multiple word alignments Multiple SAPTS Best SAPT Ge & Mooney (2009)

162 S YN S EM Training : Learn Semantic Knowledge NL Sentence Syntactic Parser Semantic Lexicon Composition Rules Syntactic Parse Multiple word alignments

163 Syntactic Parser PRP$NN CDVB DTNN NP VPNP S ourplayer2has theball Use a statistical syntactic parser

164 S YN S EM Training: Learn Semantic Knowledge NL Sentence Syntactic Parser Semantic Lexicon Composition Rules Syntactic Parse Multiple word alignments MR

165 Semantic Lexicon P_OURP_PLAYERP_UNUMP_BOWNERNULL ourplayer2has theball Use a word alignment model ( Wong and Mooney (2006) ) ourplayer2hasballthe P_PLAYERP_BOWNERP_OURP_UNUM

166 Learning a Semantic Lexicon IBM Model 5 word alignment (GIZA++) Top 5 word/predicate alignments for each training example Assume each word alignment and syntactic parse defines a possible SAPT for composing the correct MR

167 S YN S EM Training : Learn Semantic Knowledge NL Sentence Syntactic Parser Semantic Lexicon Composition Rules Syntactic Parse Multiple word alignments MR

168 Introducing λ variables in semantic labels for missing arguments (a 1 : the first argument) ourplayer2hasballthe VP S NP P_OUR λa 1 λa 2 P_PLAYER λa 1 P_BOWNER P_UNUMNULL NP Introduce λ Variables

169 ourplayer2hasballthe VP S NP P_OUR λa 1 λa 2 P_PLAYER λa 1 P_BOWNER P_UNUMNULL P_BOWNER P_PLAYER P_UNUM P_OUR Internal Semantic Labels How to choose the dominant predicates? NP From Correct MR

170 λa 1 λa 2 P_PLAYERP_UNUM ? player2 P_BOWNER P_PLAYER P_UNUM P_OUR, a 2 =c 2 P_PLAYER λa 1 λa 2 PLAYER + P_UNUM  λa 1 (c 2 : child 2) Collect Semantic Composition Rules

171 ourplayer2hasballthe VP S NPP_OUR λa 1 λa 2 P_PLAYER λa 1 P_BOWNER P_UNUMNULL λa 1 P_PLAYER ? λa 1 λa 2 PLAYER + P_UNUM  { λa 1 P_PLAYER, a 2 =c 2 } P_BOWNER P_PLAYER P_UNUM P_OUR Collect Semantic Composition Rules

172 ourplayer2hasballthe VP S P_OUR λa 1 λa 2 P_PLAYER λa 1 P_BOWNER P_UNUMNULL λa 1 P_PLAYER ? P_PLAYER P_OUR +λa 1 P_PLAYER  {P_PLAYER, a 1 =c 1 } P_BOWNER P_PLAYER P_UNUMP_OUR Collect Semantic Composition Rules

173 ourplayer2hasballthe P_OUR λa 1 λa 2 P_PLAYER λa 1 P_BOWNER P_UNUMNULL λa 1 P_PLAYER P_PLAYER NULL λa 1 P_BOWNER ? P_PLAYER P_UNUMP_OUR Collect Semantic Composition Rules P_BOWNER

174 ourplayer2hasballthe P_OUR λa 1 λa 2 P_PLAYER λa 1 P_BOWNER P_UNUMNULL λa 1 P_PLAYER P_PLAYER NULL λa 1 P_BOWNER P_BOWNER P_PLAYER + λa 1 P_BOWNER  {P_BOWNER, a 1 =c 1 } P_BOWNER P_PLAYER P_UNUMP_OUR Collect Semantic Composition Rules

175 Ensuring Meaning Composition What state is the smallest N1N1 N2N2 answer(smallest(state(all))) Non-isomorphism

176 Ensuring Meaning Composition Non-isomorphism between NL parse and MR parse –Various linguistic phenomena –Word alignment between NL and MRL –Use automated syntactic parses Introduce macro-predicates that combine multiple predicates Ensure that MR can be composed using a syntactic parse and word alignment

177 S YN S EM Training: Learn Disambiguation Model NL Sentence Syntactic Parser Semantic Lexicon Composition Rules Disambiguation Model Syntactic Parse Multiple word alignments Multiple SAPTS Correct SAPTs MR

178 Parameter Estimation Apply the learned semantic knowledge to all training examples to generate possible SAPTs Use a standard maximum-entropy model similar to that of Zettlemoyer & Collins (2005), and Wong & Mooney (2006) Training finds a parameter that (approximately) maximizes the sum of the conditional log-likelihood of the training set including syntactic parses Incomplete data since SAPTs are hidden variables

179 Features Lexical features: –Unigram features: # that a word is assigned a predicate –Bigram features: # that a word is assigned a predicate given its previous/subsequent word. Rule features: # a composition rule applied in a derivation

180 S YN S EM Testing NL Sentence Syntactic Parser Semantic Lexicon Composition Rules Disambiguation Model Syntactic Parse Multiple word alignments Multiple SAPTS Best SAPT

181 Syntactic Parsers (Bikel,2004) WSJ only –CLang(S YN 0): F-measure=82.15% –Geoquery(S YN 0) : F-measure=76.44% WSJ + in-domain sentences –CLang(S YN 20): 20 sentences, F-measure=88.21% –Geoquery(S YN 40): 40 sentences, F-measure=91.46% Gold-standard syntactic parses ( G OLD S YN )

182 Questions Q1. Can S YN S EM produce accurate semantic interpretations? Q2. Can more accurate Treebank syntactic parsers produce more accurate semantic parsers?

183 Results on CLang PrecisionRecallF-measure G OLD S YN S YN S YN SCISSOR WASP KRISP LU (LU: F-measure after reranking is 74.4%) S YN S EM SAPTs GOLDSYN > SYN20 > SYN0

184 Questions Q1. Can SynSem produce accurate semantic interpretations? [yes] Q2. Can more accurate Treebank syntactic parsers produce more accurate semantic parsers? [yes] Q3. Does it also improve on long sentences?

185 Detailed Clang Results on Sentence Length (13%) ( 46 %) 0-10 (7%) ( 33 %) Prior Knowledge Syntactic error + Flexibility + = ?

186 Questions Q1. Can SynSem produce accurate semantic interpretations? [yes] Q2. Can more accurate Treebank syntactic parsers produce more accurate semantic parsers? [yes] Q3. Does it also improve on long sentences? [yes] Q4. Does it improve on limited training data due to the prior knowledge from large treebanks?

187 Results on Clang (training size = 40 ) PrecisionRecallF-measure G OLD S YN S YN S YN SCISSOR WASP KRISP S YN S EM SAPTs The quality of syntactic parser is critically important!

188 Questions Q1. Can SynSem produce accurate semantic interpretations? [yes] Q2. Can more accurate Treebank syntactic parsers produce more accurate semantic parsers? [yes] Q3. Does it also improve on long sentences? [yes] Q4. Does it improve on limited training data due to the prior knowledge from large treebanks? [yes]

189 References Bikel (2004). Intricacies of Collins’ Parsing Model. Computational Linguistics, 30(4): Eugene Charniak and Mark Johnson (2005). Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In Proc. of ACL-2005, pp , Ann Arbor, MI, June Michal J. Collins (1997). Three generative, lexicalized models for syntactic parsing. In Proc. of ACL-97, pages 16-23, Madrid, Spain. Ruifang Ge and Raymond J. Mooney (2005). A statistical semantic parser that integrates syntax and semantics. In Proc. of CoNLL-2005, pp. 9-16, Ann Arbor, MI, June Ruifang Ge and Raymond J. Mooney (2006). Discriminative reranking for semantic parsing. In Proc. of COLING/ACL-2006, pp , Syndney, Australia, July Ruifang Ge and Raymond J. Mooney (2009). Learning a compositional semantic parser using an existing syntactic parser. In Proc. of ACL-2009, pp , Suntec, Singapore, August 2009.

Underlying Commonalities and Differences between Semantic Parsers

191 Underlying Commonalities between Semantic Parsers A model to connect language and meaning

192 A Model to Connect Language and Meaning Zettlemoyer & Collins: CCG grammar with semantic types WASP: Synchronous CFG KRISP: Probabilistic string classifiers borders := (S \ NP) / NP : λx.λy.borders(y, x) QUERY  What is CITY / answer(CITY) Which rivers run through the states bordering Texas? NEXT_TO  next_to 0.95

193 A Model to Connect Language and Meaning Lu et al.: Hybrid tree, hybrid patterns SCISSOR/SynSem: Semantically annotated parse trees do not havestates rivers How many ? QUERY:answer(NUM) NUM:count(STATE) STATE:exclude(STATE STATE) STATE:state(all)STATE:loc_1(RIVER) RIVER:river(all) w: the NL sentence m: the MR T: the hybrid tree PRP$ -P_OUR NN -P_PLAYER CD - P_UNUMVB - P_BOWNER DT -NULL NN -NULL NP -NULL VP -P_BOWNER NP -P_PLAYER S -P_BOWNER ourplayer2has theball

194 A model to connect language and meaning A mechanism for meaning composition Underlying Commonalities between Semantic Parsers

195 A Mechanism for Meaning Composition Zettlemoyer & Collins: CCG parsing rules WASP: Meaning representation grammar KRISP: Meaning representation grammar Lu et al.: Meaning representation grammar SCISSOR: Semantically annotated parse trees SynSem: Syntactic parses

196 A model to connect language and meaning A mechanism for meaning composition Parameters for selecting a meaning representation out of many Underlying Commonalities between Semantic Parsers

197 Parameters to Select a Meaning Representation Zettlemoyer & Collins: Weights for lexical items and CCG parsing rules WASP: Weights for grammar productions KRISP: SVM weights Lu et al.: Generative model parameters SCISSOR: Parsing model weights SynSem: Parsing model weights

198 A model to connect language and meaning A mechanism for meaning composition Parameters for selecting a meaning representation out of many An iterative EM-like method for training to find the right associations between NL and MR components Underlying Commonalities between Semantic Parsers

199 An Iterative EM-like Method for Training Zettlemoyer & Collins: Stochastic gradient ascent, structured perceptron WASP: Quasi-Newton method (L-BFGS) KRISP: Re-parse the training sentences to find more refined positive and negative examples Lu et al.: Inside-outside algorithm with EM SCISSOR: None SynSem: Quasi-Newton method (L-BFGS)

200 A model to connect language and meaning A mechanism for meaning composition Parameters for selecting a meaning representation out of many An iterative EM-like loop in training to find the right associations between NL and MR components A generalization mechanism Underlying Commonalities between Semantic Parsers

201 A Generalization Mechanism Zettlemoyer & Collins: Different combinations of lexicon items and parsing rules WASP: Different combinations of productions KRISP: Different combinations of meaning productions, string similarity Lu et al.: Different combinations of hybrid patterns SCISSOR: Different combinations of parsing productions SynSem: Different combinations of parsing productions

202 A model to connect language and meaning A mechanism for meaning composition Parameters for selecting a meaning representation out of many An iterative EM-like loop in training to find the right associations between NL and MR components A generalization mechanism Underlying Commonalities between Semantic Parsers

203 Differences between Semantic Parsers Learn lexicon or not

204 Learn Lexicon or Not Learn it as a first step: –Zettlemoyer & Collins –WASP –SynSem –COCKTAIL Do not learn it: –KRISP –Lu at el. –SCISSOR

205 Differences between Semantic Parsers Learn lexicon or not Utilize knowledge of natural language or not

206 Utilize Knowledge of Natural Language or Not Utilize knowledge of English syntax: –Zettlemoyer & Collins: CCG –SCISSOR: Phrase Structure Grammar –SynSem: Phrase Structure Grammar Leverage from natural language knowledge Do not utilize: –WASP –KRISP –Lu et al. Portable to other natural languages

207 Precision Learning Curve for GeoQuery (WASP)

208 Recall Learning Curve for GeoQuery (WASP)

209 Differences between Semantic Parsers Learn lexicon or not Utilize general syntactic parsing grammars or not Use matching patterns or not

210 Use Matching Patterns or Not Use matching patterns: –Zettlemoter & Collins –WASP –Lu et al. –SCISSOR/SynSem The systems can be inverted to form a generation system, for e.g. WASP -1 Do not use matching patterns: –KRISP Makes it robust to noise

211 Robustness of K RISP K RISP does not use matching patterns String-kernel-based classification softly captures wide range of natural language expressions  Robust to rephrasing and noise Which rivers run through the states bordering Texas? TRAVERSE  traverse 0.95

212 K RISP does not use matching patterns String-kernel-based classification softly captures wide range of natural language expressions  Robust to rephrasing and noise Robustness of KRISP Which rivers through the states bordering Texas? TRAVERSE  traverse 0.65

213 Experiments with Noisy NL Sentences contd. Noise was introduced in the NL sentences by: –Adding extra words chosen according to their frequencies in the BNC –Dropping words randomly –Substituting words withphonetically close high frequency words Four levels of noise was created by increasing the probabilities of the above We show best F-measures (harmonic mean of precision and recall)

214 Results on Noisy CLang Corpus

215 Differences between Semantic Parsers Learn lexicon or not Utilize general syntactic parsing grammars or not Use matching patterns or not Different amounts of supervision

216 Different Amounts of Supervision Zettlemoyer & Collins –NL-MR pairs, CCG category rules, a small number of predefined lexical items WASP, KRISP, Lu et al. –NL-MR pairs, MR grammar SCISSOR –Semantically annotated parse trees SynSem –NL-MR pairs, syntactic parser

217 Differences between Semantic Parsers Learn lexicon or not Utilize general syntactic parsing grammars or not Use matching patterns or not Different forms of supervision

218 Conclusions Semantic parsing maps NL sentences to completely formal MRs. Semantic parsing is an interesting and challenging task and is critical for developing computing systems that can understand and process natural language input Semantic parsers can be effectively learned from supervised corpora consisting of only sentences paired with their formal MRs (and possibly also SAPTs). The state-of-the-art in semantic parsing has been significantly advanced in recent years using a variety of statistical machine learning techniques and grammar formalisms

219 Resources ACL 2010 Tutorial: parsing-tutorial-acl10.ppt Geoquery and CLang data: WASP and KRISP semantic parsers: Geoquery Demo:

220 Homework 8 What will be the features that an SVM with string subsequence kernel will consider implicitly when considering an example string “near our goal”? And when considering an example string “near goal area”? What will it explicitly take as the only information about the two examples when it considers them together?