1 Textual Entailment: A Perspective on Applied Text Understanding Ido DaganBar-Ilan University, Israel Joint works with: Oren Glickman, Idan Szpektor, Roy Bar Haim Bar Ilan University, Israel Maayan GeffetHebrew University, Israel Hristo Tanev, Bernardo Magnini, Alberto Lavelli, Lorenza RomanoITC-irst, Italy Bonaventura Coppola and Milen Kouylekov University of Trento and ITC-irst, Italy
2 Talk Focus: A Framework for “Applied Semantics” The textual entailment task – what and why? Empirical evaluation – PASCAL RTE Challenge Problem scope, decomposition and analysis Different perspective on semantic inference Probabilistic framework Cf. syntax, MT – clear task, methodology and community
3 Natural Language and Meaning Meaning Language Ambiguity Variability
4 Variability of Semantic Expression Dow ends up Dow climbs 255 The Dow Jones Industrial Average closed up 255 Stock market hits a record high Dow gains 255 points All major stock markets surged
5 Variability Recognition – Major Inference in Applications Information Retrieval (IR) Question Answering (QA) Multi Document Summarization (MDS) Information Extraction (IE)
6 Typical Application Inference Overture’s acquisition by Yahoo Yahoo bought Overture Question Expected answer form Who bought Overture? >> X bought Overture Similar for IE: X buy Y Similar for “ semantic ” IR: t: Overture was bought … Summarization (multi-document) – identify redundant info MT evaluation (and recent proposals for MT?) text hypothesized answer
7 KRAQ'05 Workshop - KNOWLEDGE and REASONING for ANSWERING QUESTIONS (IJCAI-05) CFP: –Reasoning aspects: * information fusion, * search criteria expansion models * summarization and intensional answers, * reasoning under uncertainty or with incomplete knowledge, –Knowledge representation and integration: * levels of knowledge involved (e.g. ontologies, domain knowledge), * knowledge extraction models and techniques to optimize response accuracy, * coherence and integration.
8 Inference for Textual Question Answering Workshop (AAAI-05) CFP: abductions, default reasoning, inference with epistemic logic or description logic inference methods for QA need to be robust, cover all ambiguities of language available knowledge sources that can be used for inference … but similar needs for other applications – can we address a uniform empirical task?
9 Applied Textual Entailment: Abstract Semantic Variability Inference QA: “Where was John Wayne Born?” –Answer: Iowa Text (t): The birthplace of John Wayne is in Iowa Hypothesis (h): John Wayne was born in Iowa inference
10 The Generic Entailment Task Text (t): The birthplace of John Wayne is in Iowa Hypothesis (h): John Wayne was born in Iowa inference Given the text t, can we infer that h is (most likely) true?
11 Classical Entailment Definition Chierchia & McConnell-Ginet (2001): A text t entails a hypothesis h if h is true in every circumstance (possible world) in which t is true Strict entailment - doesn't account for some uncertainty allowed in applications
12 “Almost certain” Entailments t: The technological triumph known as GPS … was incubated in the mind of Ivan Getting. h: Ivan Getting invented the GPS. t: According to the Encyclopedia Britannica, Indonesia is the largest archipelagic nation in the world, consisting of 13,670 islands. h: 13,670 islands make up Indonesia.
13 Textual Entailment ≈ Human Reading Comprehension From a children’s English learning book (Sela and Greenberg): Reference Text: “…The Bermuda Triangle lies in the Atlantic Ocean, off the coast of Florida. …” Hypothesis (True/False?): The Bermuda Triangle is near the United States ???
14 Reading Comprehension QA By Canadian Broadcasting Corporation T: The school has turned its one-time metal shop – lost to budget cuts almost two years ago - into a money-making professional fitness club. Q: When did the metal shop close? A: Almost two years ago
15 Recognizing Textual Entailment (RTE) Challenge PASCAL NOE Challenge Ido Dagan, Oren glickmanBar-Ilan University, Israel Bernardo Magnini ITC-irst, Trento, Italy
16 Generic Dataset by Application Use QA IE Similar for “ semantic ” IR: Overture was acquired by Yahoo Comparable documents (summarization) MT evaluation Reading comprehension Paraphrase acquisition
17 Some Examples TEXTHYPOTHESISTASK ENTAIL- MENT 1 iTunes software has seen lower sales in Europe. Strong sales for iTunes in Europe. IRFalse 2 Cavern Club sessions paid the Beatles £15 evenings and £5 lunchtime. The Beatles perform at Cavern Club at lunchtime. IRTrue 3 …: a shootout at the Guadalajara airport in May, 1993, that killed Cardinal Juan Jesus Posadas Ocampo and six others. Cardinal Juan Jesus Posadas Ocampo died in QATrue 567 development examples, 800 test examples
18 Dataset Characteristics Examples selected and annotated manually –Using automatic systems where available Balanced True/False split True – certain or highly probable entailment –Filtering controversial examples Example distribution? Mode –explorative rather than competitive
19 Arthur Bernstein Competition “… Competition, even a piano competition, is legitimate … as long as it is just an anecdotal side effect of the musical culture scene, and doesn’t threat to overtake the center stage” Haaretz News Paper Culture Section, April 1 st, 2005
20 Submissions 17 participating groups –26 system submissions –Microsoft Research: manual analysis of dataset at lexical-syntactic matching level
21 Broad Range of System Types Knowledge sources and inferences –Direct t-h matching: Word overlap / Syntactic tree matching –Lexical relations: WordNet & statistical (corpus based) –Theorem Provers / Logical inference Adding a fuzzy scoring mechanism Supervised / unsupervised learning methods
22
23 Accuracy
24 Where are we?
25 What’s next – RTE-2 Organizers: –Bar Ilan, CELCT (Trento), MITRE, MS-Research Main dataset: utilizing real systems outputs –QA, IE, IR, summarization Human performance dataset –Reading comprehension, human QA (planned) Schedule (RTE website): –October – development set –February – results submission (test set January) –April 10 – PASCAL workshop in Venice! right after EACL
26 Other Evaluation Modes Entailment subtasks evaluations –Lexical, lexical-syntactic, alignment… “Seek” mode: –Input: h and corpus –Output: All entailing t’s in corpus –Captures nicely information seeking needs, but requires post-run annotation (like TREC) Contribution to specific applications
27 Empirical Modeling of Meaning Equivalence and Entailment ACL-05 Workshop Roy Bar-Haim Idan Szpektor Oren Glickman Bar-Ilan University Decomposition of Entailment Levels
28 Why? Entailment Modeling is Complex!! –Was apparent at RTE1 How can we decompose it, for –Better analysis and sub-task modeling –Piecewise evaluation Avoid “this is the performance of my complex system…” methodology
29 Combination of Inference Types The oddest thing about the UAE is that only 500,000 of the 2 million people living in the country are UAE citizens. T The population of the United Arab Emirates is 2 million.H T H
30 Combination of Inference Types The oddest thing about the UAE is that only 500,000 of the 2 million people living in the country are UAE citizens. The oddest thing about the UAE is that only 500,000 of the 2 million people living in the UAE are UAE citizens. 2 million people live in UAE. The population of the UAE is 2 million. The population of the United Arab Emirates is 2 million Co- reference Syntactic trans. paraphrasing Lexical world knowledge Diverse inference types, different levels of representation T H
31 Defining Intermediate Models Lexical Lexical-syntactic
32 Lexical Model T and H are represented as bag of terms T L H if –for each term u H there exists a term v T such that v L u v L u if –they share the same lemma and POS OR –they are connected by a chain of lexical transformations
33 Lexical Transformations We assume perfect word sense disambiguation acquisition acquire terrorist terror Morphological derivations Synonyms (buy acquire) Hypernyms (produce make) Meronym (executive company) Ontological relations Bill Gates Microsoft’s founder kill die Lexical world knowledge
34 Lexical Entailment - Examples #1952 from RTE1 (T H) TLHTLHTLHTLH Crude oil prices soared to record levelsT Crude oil prices riseH ?
35 Lexical Entailment - Examples #1361 from RTE1 (T H) Crude oil prices soared to record levelsT Crude oil prices rise.H
36 Lexical Entailment - Examples #1361 from RTE1 (T H) Crude oil prices soared to record levelsT Crude oil prices riseH Synonym
37 Lexical Entailment - Examples #1952 from RTE1 (T H) Crude oil prices soared to record levelsT Crude oil prices riseH Synonym T L H T L H
38 Lexical Entailment - Examples #2127 from RTE1 (T H) A coyote was shot after biting girl in parkT A girl was shot in a parkH TLHTLHTLHTLH ?
39 Lexical Entailment - Examples #2127 from RTE1 (T H) A coyote was shot after biting girl in Vanier Park T girl was shot in a park A H T L H
40 Lexical-Syntactic Model T and H are represented by syntactic dependency relations T LS H if the relations within H can be matched by the relations in T The coverage can be obtained through a sequence of lexical-syntactic transformations
41 Lexical-Syntactic Transformations We assume perfect disambiguation and reference resolution Synonyms, hypernyms, etc. (as before)Lexical Active/Passive Apposition do not change lexical elements Syntactic X take in Y Y join X X is Y man by birth X was born in Y change both lexical elements and structure Lexical-synt. Entailment Paraphrases The country UAE Co-reference
42 Lexical-Syntactic Entailment - Examples #1361 from RTE1 (T H) Crude oil prices soared to record levelsT Crude oil prices riseH subj T LS H T LS H
43 Lexical-Syntactic Entailment - Examples #2127 from RTE1 (T H) A Coyote was shot after biting girl in Vanier Park T A girl was shot in a park H T LS H T LS H subj
44 Beyond Lexical-Syntactic Models The SPD got just 21.5% of the vote in the European Parliament elections, while the conservative opposition parties polled 44.5% T The SPD was defeated by the opposition parties.H Future work…
45 Empirical Analysis
46 Annotation 240 T-H pairs of RTE1 dataset T L H ; T LS H High annotator agreement (authors) KappaAgreementEntailment Model %Lexical %Lexical-Syntactic Kappa: “substantial agreement”
47 Model evaluation results Low precision for Lexical model Lexical match fails to predict entailment High precision for Lexical Syntactic model Checking syntactic relations is crucial Medium recall for both levels Higher levels of inference are missing F1F1 PrecisionRecallModel %44%Lexical %50%Lexical Syntactic
48 contribution of individual components RTE 1 positive examples % RR fInference type 16%14%19Synonym 14%10%16Morphological 10%8%12Lexical world knowledge 6%4%7Hypernym 1% 1Meronym 31%26%37Entailment paraphrases 19%17%22Syntactic Transformations 8%5%10Co-reference Lexical Lex-Syn
49 Summary (1) Annotating and analaysing entailment components Guide research on entailment Opens new research problems and redirects old ones
50 Summary (2) Allows better evaluation of systems –Performance of individual components Future work – expand analysis to additional levels of representation and inferences –Identify the exciting semantic phenomena …
51 A Different Perspective on Semantic Inference
52 Text Mapping vs. Interpretation Focus on the entailment relation as a (directed) mapping between language expressions –Identify the contextual constraints for mappings Vs. interpret language into meaning representations (explicitly stipulated senses, logical form, etc.) –Can still be a mean, rather than goal How far (faster) can we get? –Cf. MT – direct, transfer, interlingua
53 Making sense of (implicit) senses What is the RIGHT set of senses? –Any concrete set is problematic/subjective –… but WSD forces you to choose one A lexical entailment perspective: –Instead of identifying an explicitly stipulated sense of a word occurrence … –identify whether a word occurrence (i.e. its implicit sense) entails another word occurrence, in context
54 That’s what applications need Lexical matching: recognize sense equivalence T1: IKEA announced a new comfort chair Q: announcement of new models of chairs T2: MIT announced a new CS chair position T1: IKEA announced a new comfort chair Q: announcement of new models of furniture T2: MIT announced a new CS chair position Lexical expansion: Recognize sense entailment
55 Bottom Line Address semantic inference as text mapping, rather than interpretation From applications perspective - interpretation may be a mean, not the goal –we shouldn’t create artificial problems, which might be harder than those we need to solve
56 Probabilistic Framework for Textual Entailment Oren Glickman, Ido Dagan, Moshe Koppel and Jacob Goldberger Bar Ilan University ACL-05 Workshop, AAAI-05
57 Motivation Approach entailment uncertainty by principled probabilistic models –Following success of statistical MT, parsing, language modeling etc. –Integrating inferences and knowledge sources –Vs. ad-hoc scoring Need to define concrete probability space –Generative model
58 Notation t -- a text (t T) h -- a hypothesis (h H) –propositional statements which can be assigned a truth value w: H → {true, false} -- a possible world – truth assignment for every hypothesis
59 A Generative Model We assume a probabilistic generative model: –generation event of : a text along with a (hidden) possible world –based on a joint probability distribution John was born in France ( t ) John Speaks French 1 John was born in Paris 1 John likes fois gras 0 John is married to Alice 1 … ( w ) Hidden Possible World ( w )
60 Probabilities For a given text t and hypothesis h, we consider the following probabilities: –P(Tr h =1) Probability that h is assigned a truth value of 1 in a generated pair –P(Tr h =1| t) Probability that h is assigned a truth value of 1 given that the corresponding text is t
61 Probabilistic Textual Entailment Definition: t probabilistically entails h if: –P(Tr h = 1| t) > P(Tr h = 1) t increases the likelihood of h being true Positive PMI – t provides information on h’s truth P(Tr h = 1| t): entailment confidence –The relevant entailment score for applications –In practice: high confidence required
62 Setting Properties (1) Logical vs. Textual Entailment –Logical entailment: proposition proposition –Textual entailment: text text Conditioning on generation of texts rather than on propositional values –David’s father was born in Italy David was born in Italy Possible ambiguities of the texts are taken into account –Play baseball with a bat play baseball with an animal
63 Setting Properties (2) We do not distinguish between inferences that are based on –language semantics: e.g. murdering killing –vs. domain or world knowledge: e.g. live in Paris live in France Setting accounts for all causes of uncertainty
64 Setting Properties (3) for a given text t and hypothesis h – h P(Tr h =1|t) ≠ 1 But rather: –P(Tr h =1|t) + P(Tr h =0 |t) = 1 Vs. generative language models (cf. speech, MT, LM for IR)
65 Having a probability space we can now define concrete probabilistic models for various entailment phenomena
66 Initial Lexical Models Alignment-based (ACL-05 Workshop) –The probability that a term in h is entailed by a particular term in t Bayesian classification (AAAI-05) –The probability that a term in h is entailed by (fits in) the entire text of t –An unsupervised text categorization setting (with EM) – each term is a category Demonstrate directions for probabilistic modeling and unsupervised estimation
67 Additional Work: Acquiring Entailment Relations Lexical (Geffet and Dagan, 2004/2005) –A clear goal for distributional similarity –Obtain characteristic features via bootstrapping –Test characteristic feature inclusion (vs. overlap) Lexical Syntactic – TEASE (Szpektor et al. 2004) –Deduce entailment from joint anchor sets –Initial prospects for unsupervised IE Next: obtain probabilities for these entailment “rules”
68 Conclusions: Textual entailment… Provides a framework for semantic inference –Application-independent abstraction –Text mapping rather than interpretation Raises interesting problems to work on Amenable for empirical evaluation and decomposition May be modeled in principled probabilistic terms Thank you!
69 Textual Entailment References Workshops · PASCAL Challenges Workshop for Recognizing Textual Entailment, Note: see 2 nd RTE Challenge at Challenges Workshop for Recognizing Textual Entailment, 2005 · ACL 2005 Workshop on Empirical Modeling of Semantic Equivalence and Entailment, 2005ACL 2005 Workshop on Empirical Modeling of Semantic Equivalence and Entailment, Papers from recent conferences and workshops J. Bos & K. Markert Recognising Textual Entailment with Logical Inference. Proceedings of EMNLP R. Braz, R. Girju, V. Punyakanok, D. Roth, and M. Sammons An Inference Model for Semantic Entailment in Natural Language. Twentieth National Conference on Artificial Intelligence (AAAI-05) R. Braz, R. Girju, V. Punyakanok, D. Roth, and M. Sammons Knowledge Representation for Semantic Entailment and Question-Answering. IJCAI-05 Workshop on Knowledge and Reasoning for Answering Questions. C. Corley, A. Csomai and R. Mihalcea. Text Semantic Similarity, with Applications. RANLP-05. I. Dagan and O. Glickman Probabilistic textual entailment: Generic applied modeling of language variability. In PASCAL Workshop on Learning Methods for Text Understanding and Mining, Grenoble.
70 Textual Entailment References (2) M. Geffet and I. Dagan. Feature Vector Quality and Distributional Similarity. Proceedings of The 20th International Conference on Computational Linguistics (COLING), M. Geffet and I. Dagan "The Distributional Inclusion Hypotheses and Lexical Entailment", ACL 2005, Michigan, USA. O. Glickman, I. Dagan and M. Koppel A Probabilistic Classification Approach for Lexical Textual Entailment, Twentieth National Conference on Artificial Intelligence (AAAI-05) A. Haghighi, A. Y. Ng, and C. D. Manning Robust Textual Inference via Graph Matching. HLT-EMNLP M. Kouylekov and B. Magnini Tree Edit Distance for Textual Entailment. RANLP R. Raina, A. Y. Ng, and C. Manning Robust textual inference via learning and abductive reasoning. Twentieth National Conference on Artificial Intelligence (AAAI-05) V. Rus, A. Graesser and K. Desai Lexico-Syntactic Subsumption for Textual Entailment. RANLP M. Tatu and D. Moldovan A Semantic Approach to Recognizing Textual Entailment. HLT-EMNLP We would be glad to receive more references on textual entailment. Please send them to