Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 1 /20 Extracting a.

Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 2 /20 Entailment - What is it and what is it good for? Question Answering: Information Retrieval: “The Beatles” “Which are produced in ?”luxury carsBritain

Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 3 /20 Lexical Entailment Lexical Entailment rules model such lexical relations Part of the Textual Entailment paradigm – a generic framework for semantic inference Encompasses a variety of relations: – Synonymy: Hypertension  Elevated blood-pressure – IS-A: Jim Carrey  actor – Predicates:Crime and Punishment  Fyodor Dostoyevsky – Reference:Abbey Road  The Beatles

Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 4 /20 What was done so far? Lexical database, made for computational consumption, NLP resource - WordNet – Costly, need experts, many years of development (since 1985) Distributional similarity – Country and State share similar contexts – But also Nurse and Doctor, Bear and Tiger - Low precision Patterns: – NP 1 such as NP 2 luxury car such as Jaguar – NP 1 and other NP 2 dogs and other domestic pets – Low coverage, mainly IS-A patterns

Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 5 /20 Our approach – Utilize Definitions Pen: an instrument for writing or drawing with ink. – pen is-an instrument – pen used for writing / drawing – ink is part of pen Source of definitions: – Dictionary: describes language terms, slow growth – Encyclopedia: contains knowledge, proper names, events, concepts, rapidly grow We chose Wikipedia – Very dynamic, constantly growing and updating – Covers a vast range of domains – Gaining popularity in research - AAAI 2008 workshop

Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 6 /20 Extraction Types Be-compliment noun in the position of a compliment of a verb ‘be’ All-Nouns all nouns in the definition different likelihood to be entailed

Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 7 /20 The likelihood of entailment depends greatly on the syntactic path connecting the title and the noun. – Path in a parsed tree An unsupervised entailment likelihood score for a syntactic path p within a definition: Split Def-N into Def-Ntop and Def-Nbot – Indicative for rule reliability - Def-Ntop rules’ precision is much higher than Def-Nbot’s. Ranking All-Nouns Rules film titledirected by noun subjvrel by-subj pcomp-n

Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 8 /20 Extraction Types Redirect noun in the position of a Parenthesis all nouns in the definition Link all nouns in the definition

Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 12 /20 Ranking Rules by Supervised Learning Extraction Types An alternative approach for deciding which rules to select out of all extracted rules. Each rule is represented by: – 6 binary features: one for each extraction type – 2 binary features: one for each side of the rule indicating whether it is NE – 2 numerical features: rule sides’ co-occurrence & count extracted – 1 numeric feature: the score of the path for Def-N extraction type Manually annotated set used to train SVM light – Varied the J parameter in order to obtain different recall-precision tradeoffs

Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 13 /20 Results and Evaluation The obtained knowledge base include: – About 10 million rules For comparison: Snow’s extension to WordNet includes 400,000 relations. – More than 2.4 million distinct RHSs – 18% of the rules extracted by more than one extraction type – Mostly named entities and specific concepts, as expected from encyclopedia Two Evaluation types: – Rule-based: rule correctness relative to human judgment – Inside real application: the utility of the extracted rules for lexical expansion in keyword-based text categorization Results & Evaluations

Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 14 /20 Rule-base Evaluation Randomly sampled 830 rules and annotated them for correctness – inter annotators agreement achieved Kappa of 0.7 Precision: the percentage of correct rules Est. # of correct rules: number of rules annotated as correct multiply by the sampling proportion. AccumulatedPer TypeExtraction Type RPEst. # RulesP 0.310.872,232,8770.87Redirect 0.60.822,740,9570.8Be-Comp 0.710.772,179,3950.72Def-N top 0.720.7766,8530.71Parenthesis 0.80.76708,6380.7Link 10.661,657,9440.47Def-N bot Results & Evaluations

Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 15 /20 Supervised Learning Evaluation 5-fold cross validation on the annotated sample: Although considering additional information, performance is almost identical to considering only extraction types. Further research is needed to improve our current feature set and classification performance. Results & Evaluations 1.31.10.90.50.40.3J 0.660.70.75 0.820.86R 10.910.810.730.590.32P

Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 16 /20 Text Categorization Evaluation Represent a category by a feature vector of characteristic terms for it. – The characteristic terms should entail the category name. Compare the term-based feature vector of a classified document with the feature vectors of all categories. – Assign the document to the category which yields the highest cosine similarity score (single-class classification). 20-News Groups collection 3 baselines: No expansions, WordNet, WikiBL, [Snow] Also evaluated the union of Wikipedia and WordNet Results & Evaluations

Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 17 /20 Text Categorization Evaluation Results & Evaluations F1RPRule Base 0.280.190.53No Expansion Baselines 0.360.290.46WordNet 0.280.190.53WikiBL 0.30.210.54Redirect only Extraction Types 0.30.210.55+ Be-comp 0.350.30.41+ Parenthesis and Link 0.350.30.42+ Def-Ntop 0.350.320.39+ Def-Nbot (all rules) 0.310.210.55J = 0.3SVM 0.30.280.31J = 1.1 0.370.340.4WN + Wiki (all)Union 0.390.330.5WN + Wiki (redir + Be-comp)

Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 18 /20 Promising Directions for Future Work Learning semantic relations in addition to Taxonomical relations (hyponym, synonyms) : Fine-grained relations of LE is important for inference Path PatternRuleRelation Lovek city in Cambodia Lovek  Cambodia Location George Bogdan Kistiakowsky chemistry professor George Bogdan Kistiakowsky  chemistry Occupation Crime and Punishment is a novel by Fyodor Dostoyevsky Crime and Punishment  Fyodor Dostoyevsky Creation Willem van Aelst Dutch artist Willem van Aelst  Dutch Origin Dean Moriarty is an alias of Benjamin Linus on Lost Dean Moriarty  Benjamin Linus Alias Egushawa, also spelled Agushaway... Egushawa  Agushaway Spelling Conclusions & Future Work

Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 19 /20 Promising Directions for Future Work Natural Types, naturally phrased entities: – 56,000 terms entail Album – 31,000 terms entail Politician – 11,000 terms entail Footballer – 20,000 terms entail Actor – 15,000 terms entail Actress – 4,000 terms entail American Actor Conclusions & Future Work

Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 20 /20 Conclusions Conclusions & Future Work First large-scale rule base directed to cover LE. Learning ontology which is a very important knowledge for reasoning systems (one of the conclusions of the first 3 RTE benchmarks). Automatically extracting lexical entailment rules from an unstructured source Comparable results, on a real NLP task, to a costly manually crafted resource such as WordNet.

Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 21 /20 Inference System t: Strong sales were shown for Abbey Road in 1969. grammar rule: passive to active Abbey Road showed strong sales in 1969. lexical entailment rule: Abbey Road  The Beatles The Beatles showed strong sales in 1969. lexico-syntactic rule: show strong sales  gain commercial success h: The Beatles gained commercial success in 1969. Textual Entailment

Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 1 /20 Extracting a.

Similar presentations

Presentation on theme: "Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 1 /20 Extracting a."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 1 /20 Extracting a.

Similar presentations

Presentation on theme: "Textual Entailment | Learning Lexical Entailment | Wikipedia | Extraction Types | Results & Evaluations | Conclusions & Future Work 1 /20 Extracting a."— Presentation transcript:

Similar presentations

About project

Feedback