1 Textual Entailment as a Framework for Applied Semantics Ido DaganBar-Ilan University, Israel Joint works with: Oren Glickman, Idan Szpektor, Roy Bar Haim, Maayan Geffet, Moshe Koppel Bar Ilan University Shachar MirkinHebrew University, Israel Hristo Tanev, Bernardo Magnini, Alberto Lavelli, Lorenza RomanoITC-irst, Italy Bonaventura Coppola, Milen Kouylekov University of Trento and ITC-irst, Italy Danilo GiampiccoloCELCT, Italy
2 Talk Perspective: A Framework for “Applied Semantics” The textual entailment task – what and why? Evaluation – PASCAL RTE Challenges Modeling approach: –Knowledge acquisition –Inference –Applications An appealing framework for semantic inference –Cf. syntax, MT – clear task, methodology and community
3 Natural Language and Meaning Meaning Language Ambiguity Variability
4 Variability of Semantic Expression Model variability as relations between text expressions: Equivalence: expr1 expr2 (paraphrasing) Entailment: expr1 expr2 – the general case –Incorporates inference as well Dow ends up Dow climbs 255 The Dow Jones Industrial Average closed up 255 Stock market hits a record high Dow gains 255 points
5 Typical Application Inference Overture’s acquisition by Yahoo Yahoo bought Overture Question Expected answer form Who bought Overture? >> X bought Overture Similar for IE: X buy Y Similar for “ semantic ” IR: t: Overture was bought … Summarization (multi-document) – identify redundant info MT evaluation (and recent ideas for MT) Educational applications text hypothesized answer entails
6 KRAQ'05 Workshop - KNOWLEDGE and REASONING for ANSWERING QUESTIONS (IJCAI-05) CFP: –Reasoning aspects: * information fusion, * search criteria expansion models * summarization and intensional answers, * reasoning under uncertainty or with incomplete knowledge, –Knowledge representation and integration: * levels of knowledge involved (e.g. ontologies, domain knowledge), * knowledge extraction models and techniques to optimize response accuracy … but similar needs for other applications – can entailment provide a common empirical task?
7 Classical Entailment Definition Chierchia & McConnell-Ginet (2001): A text t entails a hypothesis h if h is true in every circumstance (possible world) in which t is true Strict entailment - doesn't account for some uncertainty allowed in applications
8 “Almost certain” Entailments t: The technological triumph known as GPS … was incubated in the mind of Ivan Getting. h: Ivan Getting invented the GPS. t: According to the Encyclopedia Britannica, Indonesia is the largest archipelagic nation in the world, consisting of 13,670 islands. h: 13,670 islands make up Indonesia.
9 Applied Textual Entailment Directional relation between two text fragments: Text (t) and Hypothesis (h): t entails h (t h) if, typically, a human reading t would infer that h is most likely true Operational (applied) definition: –Human gold standard - as in NLP applications –Assuming common background knowledge – which is indeed expected from applications!
10 Probabilistic Interpretation Definition: t probabilistically entails h if: –P(h is true | t) > P(h is true) t increases the likelihood of h being true ≡ Positive PMI – t provides information on h’s truth P(h is true | t ): entailment confidence –The relevant entailment score for applications –In practice: “most likely” entailment expected
11 PASCAL Recognizing Textual Entailment (RTE) Challenges FP-6 Funded PASCAL NOE Bar-Ilan UniversityITC-irst and CELCT, Trento MITREMicrosoft Research
12 Generic Dataset by Application Use 7 application settings in RTE-1, 4 in RTE-2/3 –QA – IE – “Semantic” IR – Comparable documents / multi-doc summarization – MT evaluation – Reading comprehension – Paraphrase acquisition Most data created from actual applications output RTE-2: 800 examples in development and test sets 50-50% YES/NO split
13 Some Examples TEXTHYPOTHESISTASK ENTAIL- MENT 1 Regan attended a ceremony in Washington to commemorate the landings in Normandy. Washington is located in Normandy. IEFalse 2Google files for its long awaited IPO.Google goes public.IRTrue 3 …: a shootout at the Guadalajara airport in May, 1993, that killed Cardinal Juan Jesus Posadas Ocampo and six others. Cardinal Juan Jesus Posadas Ocampo died in QATrue 4 The SPD got just 21.5% of the vote in the European Parliament elections, while the conservative opposition parties polled 44.5%. The SPD is defeated by the opposition parties. IETrue
14 Participation and Impact Very successful challenges, world wide: –RTE-1 – 17 groups –RTE-2 – 23 groups 30 groups in total ~150 downloads! –RTE-3 underway Workshop planned at ACL-07 High interest in the research community –Papers, conference keywords, sessions and areas, PhD’s, influence on funded projects –Textual Entailment special issue at JNLE
15 Methods and Approaches Measure similarity match between t and h (coverage of h by t): –Lexical overlap (unigram, N-gram, subsequence) –Lexical substitution (WordNet, statistical) –Syntactic matching/transformations –Lexical-syntactic variations (“paraphrases”) –Semantic role labeling and matching –Global similarity parameters (e.g. negation, modality) Cross-pair similarity Detect mismatch (for non-entailment) Logical interpretation and inference (vs. matching)
16 Dominant approach: Supervised Learning Features model similarity and mismatch Classifier determines relative weights of information sources Train on development set and auxiliary t-h corpora t,h Similarity Features: Lexical, n-gram,syntactic semantic, global Feature vector Classifier YES NO
17 Results Average PrecisionAccuracyFirst Author (Group) 80.8%75.4%Hickl (LCC) 71.3%73.8%Tatu (LCC) 64.4%63.9%Zanzotto (Milan & Rome) 62.8%62.6%Adams (Dallas) 66.9%61.6%Bos (Rome & Leeds) 58.1%-60.5%11 groups 52.9%-55.6%7 groups Average: 60% Median: 59%
18 Analysis For the first time: deeper methods (semantic/ syntactic/ logical) clearly outperform shallow methods (lexical/n-gram) Cf. Kevin Knight’s invited talk in EACL, titled: Isn’t linguistic Structure Important, Asked the Engineer Still, most systems based on deep analysis did not score significantly better than the lexical baseline
19 Why? System reports point at: –Lack of knowledge (syntactic transformation rules, paraphrases, lexical relations, etc.) –Lack of training data It seems that systems that coped better with these issues performed best: –Hickl et al. - acquisition of large entailment corpora for training –Tatu et al. – large knowledge bases (linguistic and world knowledge)
20 Some suggested research directions Knowledge acquisition –Unsupervised acquisition of linguistic and world knowledge from general corpora and web –Acquiring larger entailment corpora –Manual knowledge engineering for concise knowledge E.g. syntactic transformations, logical axioms Inference –Principled framework for inference and fusing information levels –Are we happy with bags of features?
21 Complementary Evaluation Modes Entailment subtasks evaluations –Lexical, lexical-syntactic, logical, alignment… “Seek” mode: –Input: h and corpus –Output: All entailing t’s in corpus –Captures information seeking needs, but requires post- run annotation (TREC style) Contribution to specific applications! –Cf. QA - Harabagiu & Hickl, ACL-06; RE – EACL-06
22 Where are we?
23 Our Own Research Directions Acquisition Inference Applications
24 Learning Entailment Rules Text: Aspirin prevents Heart Attacks Q: What reduces the risk of Heart Attacks? Entailment Rule: X prevent Y ⇨ X reduce risk of Y Hypothesis: Aspirin reduces the risk of Heart Attacks Need a large knowledge base of entailment rules template
25 TEASE – Algorithm Flow WE B Lexicon Input template: X sub j -accuse- obj Y Sample corpus for input template: Paula Jones accused Clinton… Sanhedrin accused St.Paul… … Anchor sets: {Paula Jones subj; Clinton obj} {Sanhedrin subj; St.Paul obj} … Sample corpus for anchor sets: Paula Jones called Clinton indictable… St.Paul defended before the Sanhedrin … Templates: X call Y indictable Y defend before X … TEASE Anchor Set Extraction (ASE) Template Extraction (TE) iterate
26 Sample of Extracted Anchor-Sets for X prevent Y X=‘sunscreens’, Y=‘sunburn’ X=‘sunscreens’, Y=‘skin cancer’ X=‘vitamin e’, Y=‘heart disease’ X=‘aspirin’, Y=‘heart attack’ X=‘vaccine candidate’, Y=‘infection’ X=‘universal precautions’, Y=‘HIV’ X=‘safety device’, Y=‘fatal injuries’ X=‘hepa filtration’, Y=‘contaminants’ X=‘low cloud cover’, Y= ‘measurements’ X=‘gene therapy’, Y=‘blindness’ X=‘cooperation’, Y=‘terrorism’ X=‘safety valve’, Y=‘leakage’ X=‘safe sex’, Y=‘cervical cancer’ X=‘safety belts’, Y=‘fatalities’ X=‘security fencing’, Y=‘intruders’ X=‘soy protein’, Y=‘bone loss’ X=‘MWI’, Y=‘pollution’ X=‘vitamin C’, Y=‘colds’
27 Sample of Extracted Templates for X prevent Y X reduce Y X protect against Y X eliminate Y X stop Y X avoid Y X for prevention of Y X provide protection against Y X combat Y X ward Y X lower risk of Y X be barrier against Y X fight Y X reduce Y risk X decrease the risk of Y relationship between X and Y X guard against Y X be cure for Y X treat Y X in war on Y X in the struggle against Y X a day keeps Y away X eliminate the possibility of Y X cut risk Y X inhibit Y
28 Experiment and Evaluation 48 randomly chosen input verbs 1392 templates extracted ; human judgments Encouraging Results: Future work: precision, estimate probabilities Average Yield per verb 29 correct templates per verb Average Precision per verb 45.3%
29 Acquiring Lexical Entailment Relations COLING-04, ACL-05 Lexical entailment via distributional similarity –Individual features characterize semantic properties –Obtain characteristic features via bootstrapping –Test characteristic feature inclusion (vs. overlap) COLING-ACL-06 Integrate pattern-based extraction –NP such as NP1, NP2, … –Complementary information to distributional evidence –Integration using ML with minimal supervision (10 words)
30 Acquisition Example Does not overlap traditional ontological relations Top-ranked entailments for “ company ” : firm, bank, group, subsidiary, unit, business, supplier, carrier, agency, airline, division, giant, entity, financial institution, manufacturer, corporation, commercial bank, joint venture, maker, producer, factory …
31 Analysis Discovering Different Relation Types 80% of hyponymy relations – from the pattern-based method 75% of the synonyms – from distributional similarity Determining Entailment Direction Typically, distributional similarity is non-directional The integrated method correctly identified the single correct entailment direction of 73% of distributional pairs Filtering Co-hyponyms Most co-hyponyms originate in the distributional candidates 65% of co-hyponyms were successfully filtered Precision of distributional pairs nearly doubled
32 Initial Probabilistic Lexical Co-occurrence Models Alignment-based (RTE-1 & ACL-05 Workshop) –The probability that a term in h is entailed by a particular term in t Bayesian classification (AAAI-05) –The probability that a term in h is entailed by (fits in) the entire text of t –An unsupervised text categorization setting – each term is a category Demonstrate directions for probabilistic modeling and unsupervised estimation
33 Manual Syntactic Transformations Example: ‘X prevent Y ’ sunscreen which prevents moles andsunburns () subj obj conj mod N1 N2and N2 rel mod conj Sunscreen, which prevents moles and sunburns, …. prevent subj obj X Y
34 Syntactic Variability Phenomena Template: X activate Y ExamplePhenomenon Y is activated by XPassive form X activates its companion, YApposition X activates Z and YConjunction X activates two proteins: Y and ZSet X, which activates YRelative clause X binds and activates YCoordination X activates a fragment of YTransparent head X is a kinase, though it activates YCo-reference
35 Takeout Promising potential for creating huge entailment knowledge bases –Mostly by unsupervised approaches
36 Application: Unsupervised Relation Extraction EACL 2006
37 Relation Extraction Subfield of Information Extraction Identify different ways of expressing a target relation –Examples: Management Succession, Birth - Death, Mergers and Acquisitions, Protein Interaction Traditionally performed in a supervised manner –Requires dozens-hundreds examples per relation –Examples should cover broad semantic variability Costly - Feasible??? Little work on unsupervised approaches
38 Our Goals Entailment Approach for Relation Extraction Unsupervised Relation Extraction System Evaluation Framework for Entailment Rule Acquisition and Matching
39 Proposed Approach Input Template X prevent Y Entailment Rule Acquisition Templates X prevention for Y, X treat Y, X reduce Y Syntactic Matcher Relation Instances TEASE Transformation Rules
40 Dataset Bunescu 2005 Recognizing interactions between annotated proteins pairs –200 Medline abstracts –Gold standard dataset of protein pairs Input template : X interact with Y
41 Manual Analysis - Results 93% of interacting protein pairs can be identified with lexical syntactic templates %Phenomenon% 8relative clause34transparent head 7co-reference24apposition 7coordination24conjunction 2passive form13set # templatesR(%)# templatesR(%) Occurrence percentage of each syntactic phenomenon: Number of templates vs. recall (within 93%):
42 TEASE Output for X interact with Y A sample of correct templates learned: X binding to YX bind to Y X Y interactionX activate Y X attach to YX stimulate Y X interaction with YX couple to Y X trap Yinteraction between X and Y X recruit YX become trapped in Y X associate with YX Y complex X be linked to YX recognize Y X target YX block Y
43 Iterative - taking the top 5 ranked templates as input Morph - recognizing morphological derivations (cf. semantic role labeling vs. matching) RecallExperiment 39%input 49%input + iterative 63%input + iterative + morph TEASE algorithm - Potential Recall on Training Set
44 Results for Full System Problems: Dependency parser and syntactic matching errors No morphological derivation recognition TEASE precision (incorrect templates) F1F1 PrecisionRecall %18%Input %29%input + iterative
45 Vs Supervised Approaches 180 training abstracts
46 Inference Goal: infer hypothesis from text –Match and apply entailment rules –Heuristically bridge inference gaps –Cost/probability based Current approaches: mapping language constructs –Vs. semantic interpretation –Lexical-syntactic structures as meaning representation Amenable for unsupervised learning –Entailment rule transformations over syntactic trees
47 Textual Entailment as a Framework for Applied Semantic Inference
48 Classical Approach = Interpretation Stipulated Meaning Representation (by scholar) Language (by nature) Variability Logical forms, word senses, semantic roles, named entity types, … - scattered works Feasible/suitable framework for applied semantics?
49 Textual Entailment = Text Mapping Assumed Meaning (by humans) Language (by nature) Variability Entailment mapping is the actual applied goal - but also a touchstone for understanding! Interpretation becomes a possible mean
50 Adding inference to the picture Mapping between different meanings that can be inferred from each other
51 Textual Entailment ≈ Human Reading Comprehension From a children’s English learning book (Sela and Greenberg): Reference Text: “…The Bermuda Triangle lies in the Atlantic Ocean, off the coast of Florida. …” Hypothesis (True/False?): The Bermuda Triangle is near the United States ???
52 Opens up a framework for investigating semantics Classical problems can be cast (linguistics) –All boys are nice All tall boys are nice But also… A new slant at old problems Revealing many new ones
53 Making sense of (implicit) senses What is the RIGHT set of senses? –Any concrete set is problematic/subjective –… but WSD forces you to choose one A lexical entailment perspective: –Instead of identifying an explicitly stipulated sense of a word occurrence … –identify whether a word occurrence (i.e. its implicit sense) entails another word occurrence, in context –ACL-2006
54 Lexical Matching for Applications Sense equivalence T1: IKEA announced a new comfort chair Q: announcement of new models of chairs T2: MIT announced a new CS chair position T1: IKEA announced a new comfort chair Q: announcement of new models of furniture T2: MIT announced a new CS chair position Sense entailment in substitution
55 Investigated Methods Matching: indirect direct Learning:supervised unsupervised Task:classificationranking
56 Unsupervised Direct: kNN-ranking Test example score: Average Cosine similarity with k most similar training examples Rational: –positive examples will be similar to some source occurrence (of corresponding sense) –negative examples won’t be similar to source Rank test examples by score –A classification slant on language modeling
57 Results (for synonyms): Ranking kNN improves 8-18% precision up to 25% recall
58 Other problems Named Entity Classification –Which pickup trucks are produced by Mitsubishi? Magnum pickup truck Lexical semantic relationships (e.g. Wordnet) –Which relations contribute to entailment inference? How? Lexical reference – lexical-projection of entailment –(EMNLP-06 ) …
59 Distinguish Goal from Means The essence of the proposal - TE as goal: –Adopt textual entailment problems as the test goal for semantic models –Base applied semantic inference on entailment “engines” Interpretations and mapping methods may compete Open question: which inference –can be represented at language level? –requires logical or specialized representation and inference? (temporal, mathematical, spatial, …)
60 Meeting the knowledge challenge – by a coordinated effort? A vast amount of “entailment rules” needed Speculation: is it possible to have a public effort for knowledge acquisition? –Simple, uniform representations –Assuming mostly automatic acquisition (millions of rules?) –Human Genome Project analogy First step: RTE-3 Resources Pool at ACLWiki
61 Optimistic Conclusions: Textual Entailment… A promising framework for applied semantics –Application-independent abstraction –Text mapping as goal rather than interpretation –Amenable for empirical evaluation Defines new semantic problems to work on May be modeled probabilistically Appealing potential for knowledge acquisition Thank you!