Download presentation
Presentation is loading. Please wait.
Published byLindsay Alexia Austin Modified over 9 years ago
1
Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 1 /20 Extracting a Lexical Entailment Rule-base from Wikipedia Eyal Shnarch, Libby Barak, Ido Dagan Bar Ilan University
2
Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 2 /20 Entailment - What is it and what is it good for? Question Answering: Information Retrieval: “The Beatles” “Which are produced in ?”luxury carsBritain
3
Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 3 /20 Lexical Entailment Lexical Entailment rules model such lexical relations Part of the Textual Entailment paradigm – a generic framework for semantic inference Encompasses a variety of relations: – Synonymy: Hypertension Elevated blood-pressure – IS-A: Jim Carrey actor – Predicates:Crime and Punishment Fyodor Dostoyevsky – Reference:Abbey Road The Beatles
4
Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 4 /20 What was done so far? Lexical database, made for computational consumption, NLP resource - WordNet – Costly, need experts, many years of development (since 1985) Distributional similarity – Country and State share similar contexts – But also Nurse and Doctor, Bear and Tiger - Low precision Patterns: – NP 1 such as NP 2 luxury car such as Jaguar – NP 1 and other NP 2 dogs and other domestic pets – Low coverage, mainly IS-A patterns
5
Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 5 /20 Our approach – Utilize Definitions Pen: an instrument for writing or drawing with ink. – pen is-an instrument – pen used for writing / drawing – ink is part of pen Source of definitions: – Dictionary: describes language terms, slow growth – Encyclopedia: contains knowledge, proper names, events, concepts, rapidly grow We chose Wikipedia – Very dynamic, constantly growing and updating – Covers a vast range of domains – Gaining popularity in research - AAAI 2008 workshop
6
Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 6 /20 Extraction Types Be-compliment noun in the position of a compliment of a verb ‘be’ All-Nouns all nouns in the definition different likelihood to be entailed
7
Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 7 /20 The likelihood of entailment depends greatly on the syntactic path connecting the title and the noun. – Path in a parsed tree An unsupervised entailment likelihood score for a syntactic path p within a definition: Split Def-N into Def-Ntop and Def-Nbot – Indicative for rule reliability - Def-Ntop rules’ precision is much higher than Def-Nbot’s. Ranking All-Nouns Rules film titledirected by noun subjvrel by-subj pcomp-n
8
Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 8 /20 Extraction Types Redirect noun in the position of a Parenthesis all nouns in the definition Link all nouns in the definition
9
Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 9 /20 Ranking Rules by Supervised Learning
10
Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 10 /20
11
Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 11 /20
12
Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 12 /20 Ranking Rules by Supervised Learning Extraction Types An alternative approach for deciding which rules to select out of all extracted rules. Each rule is represented by: – 6 binary features: one for each extraction type – 2 binary features: one for each side of the rule indicating whether it is NE – 2 numerical features: rule sides’ co-occurrence & count extracted – 1 numeric feature: the score of the path for Def-N extraction type Manually annotated set used to train SVM light – Varied the J parameter in order to obtain different recall-precision tradeoffs
13
Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 13 /20 Results and Evaluation The obtained knowledge base include: – About 10 million rules For comparison: Snow’s extension to WordNet includes 400,000 relations. – More than 2.4 million distinct RHSs – 18% of the rules extracted by more than one extraction type – Mostly named entities and specific concepts, as expected from encyclopedia Two Evaluation types: – Rule-based: rule correctness relative to human judgment – Inside real application: the utility of the extracted rules for lexical expansion in keyword-based text categorization Results & Evaluations
14
Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 14 /20 Rule-base Evaluation Randomly sampled 830 rules and annotated them for correctness – inter annotators agreement achieved Kappa of 0.7 Precision: the percentage of correct rules Est. # of correct rules: number of rules annotated as correct multiply by the sampling proportion. AccumulatedPer TypeExtraction Type RPEst. # RulesP 0.310.872,232,8770.87Redirect 0.60.822,740,9570.8Be-Comp 0.710.772,179,3950.72Def-N top 0.720.7766,8530.71Parenthesis 0.80.76708,6380.7Link 10.661,657,9440.47Def-N bot Results & Evaluations
15
Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 15 /20 Supervised Learning Evaluation 5-fold cross validation on the annotated sample: Although considering additional information, performance is almost identical to considering only extraction types. Further research is needed to improve our current feature set and classification performance. Results & Evaluations 1.31.10.90.50.40.3J 0.660.70.75 0.820.86R 10.910.810.730.590.32P
16
Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 16 /20 Text Categorization Evaluation Represent a category by a feature vector of characteristic terms for it. – The characteristic terms should entail the category name. Compare the term-based feature vector of a classified document with the feature vectors of all categories. – Assign the document to the category which yields the highest cosine similarity score (single-class classification). 20-News Groups collection 3 baselines: No expansions, WordNet, WikiBL, [Snow] Also evaluated the union of Wikipedia and WordNet Results & Evaluations
17
Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 17 /20 Text Categorization Evaluation Results & Evaluations F1RPRule Base 0.280.190.53No Expansion Baselines 0.360.290.46WordNet 0.280.190.53WikiBL 0.30.210.54Redirect only Extraction Types 0.30.210.55+ Be-comp 0.350.30.41+ Parenthesis and Link 0.350.30.42+ Def-Ntop 0.350.320.39+ Def-Nbot (all rules) 0.310.210.55J = 0.3SVM 0.30.280.31J = 1.1 0.370.340.4WN + Wiki (all)Union 0.390.330.5WN + Wiki (redir + Be-comp)
18
Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 18 /20 Promising Directions for Future Work Learning semantic relations in addition to Taxonomical relations (hyponym, synonyms) : Fine-grained relations of LE is important for inference Path PatternRuleRelation Lovek city in Cambodia Lovek Cambodia Location George Bogdan Kistiakowsky chemistry professor George Bogdan Kistiakowsky chemistry Occupation Crime and Punishment is a novel by Fyodor Dostoyevsky Crime and Punishment Fyodor Dostoyevsky Creation Willem van Aelst Dutch artist Willem van Aelst Dutch Origin Dean Moriarty is an alias of Benjamin Linus on Lost Dean Moriarty Benjamin Linus Alias Egushawa, also spelled Agushaway... Egushawa Agushaway Spelling Conclusions & Future Work
19
Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 19 /20 Promising Directions for Future Work Natural Types, naturally phrased entities: – 56,000 terms entail Album – 31,000 terms entail Politician – 11,000 terms entail Footballer – 20,000 terms entail Actor – 15,000 terms entail Actress – 4,000 terms entail American Actor Conclusions & Future Work
20
Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 20 /20 Conclusions Conclusions & Future Work First large-scale rule base directed to cover LE. Learning ontology which is a very important knowledge for reasoning systems (one of the conclusions of the first 3 RTE benchmarks). Automatically extracting lexical entailment rules from an unstructured source Comparable results, on a real NLP task, to a costly manually crafted resource such as WordNet.
21
Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 21 /20 Inference System t: Strong sales were shown for Abbey Road in 1969. grammar rule: passive to active Abbey Road showed strong sales in 1969. lexical entailment rule: Abbey Road The Beatles The Beatles showed strong sales in 1969. lexico-syntactic rule: show strong sales gain commercial success h: The Beatles gained commercial success in 1969. Textual Entailment
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.