Unsupervised Acquisition of Axioms to Paraphrase Noun Compounds and Genitives CICLING 2012, New Delhi Anselmo Peñas NLP & IR Group, UNED, Spain Ekaterina.

Slides:



Advertisements
Similar presentations
COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.
Advertisements

COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.
Question Answering for Machine Reading Evaluation Evaluation Campaign at CLEF 2011 Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner.
MINING FEATURE-OPINION PAIRS AND THEIR RELIABILITY SCORES FROM WEB OPINION SOURCES Presented by Sole A. Kamal, M. Abulaish, and T. Anwar International.
Statistic for Combination of Results from Multiple Gravitational-Wave Searches Chris Pankow and Sergey Klimenko GWPAW 2011 Milwaukee, Wisconsin LIGO G v5G v5.
LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.
Automatic Discovery of Technology Trends from Patent Text Youngho Kim, Yingshi Tian, Yoonjae Jeong, Ryu Jihee, Sung-Hyon Myaeng School of Engineering Information.
The Informative Role of WordNet in Open-Domain Question Answering Marius Paşca and Sanda M. Harabagiu (NAACL 2001) Presented by Shauna Eggers CS 620 February.
UNED at PASCAL RTE-2 Challenge IR&NLP Group at UNED nlp.uned.es Jesús Herrera Anselmo Peñas Álvaro Rodrigo Felisa Verdejo.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
Semantic Enrichment of Text with Background Knowledge Anselmo Peñas NLP & IR Group UNED nlp.uned.es Eduard Hovy USC / ISI isi.edu.
Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Mining and Summarizing Customer Reviews
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
A Compositional Context Sensitive Multi-document Summarizer: Exploring the Factors That Influence Summarization Ani Nenkova, Stanford University Lucy Vanderwende,
Assessing the Impact of Frame Semantics on Textual Entailment Authors: Aljoscha Burchardt, Marco Pennacchiotti, Stefan Thater, Manfred Pinkal Saarland.
A Survey for Interspeech Xavier Anguera Information Retrieval-based Dynamic TimeWarping.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute.
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
Term Frequency. Term frequency Two factors: – A term that appears just once in a document is probably not as significant as a term that appears a number.
Context-Sensitive Information Retrieval Using Implicit Feedback Xuehua Shen : department of Computer Science University of Illinois at Urbana-Champaign.
LIS618 lecture 3 Thomas Krichel Structure of talk Document Preprocessing Basic ingredients of query languages Retrieval performance evaluation.
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Detecting compositionality using semantic vector space models based on syntactic context Guillermo Garrido and Anselmo Peñas NLP & IR Group at UNED Madrid,
Erasmus University Rotterdam Introduction Content-based news recommendation is traditionally performed using the cosine similarity and TF-IDF weighting.
Minimally Supervised Event Causality Identification Quang Do, Yee Seng, and Dan Roth University of Illinois at Urbana-Champaign 1 EMNLP-2011.
Natural Language Processing for Information Retrieval -KVMV Kiran ( )‏ -Neeraj Bisht ( )‏ -L.Srikanth ( )‏
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
KnowItAll April William Cohen. Announcements Reminder: project presentations (or progress report) –Sign up for a 30min presentation (or else) –First.
Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.
Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo UNED NLP & IR group nlp.uned.es The Second.
Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,
Probabilistic Context Free Grammars Grant Schindler 8803-MDM April 27, 2006.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
Comparing Word Relatedness Measures Based on Google n-grams Aminul ISLAM, Evangelos MILIOS, Vlado KEŠELJ Faculty of Computer Science Dalhousie University,
TWC Illuminate Knowledge Elements in Geoscience Literature Xiaogang (Marshall) Ma, Jin Guang Zheng, Han Wang, Peter Fox Tetherless World Constellation.
Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Survey on Long Queries in Keyword Search : Phrase-based IR Sungchan Park
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Using Semantic Relations to Improve Information Retrieval
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Event Detection Aliaksei Antonau 1 5. Juli
Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.
Erasmus University Rotterdam
Terminology problems in literature mining and NLP
Background & Overview Proposed Model Experimental Results Future Work
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
Thanks to Bill Arms, Marti Hearst
Recognizing Partial Textual Entailment
Dept. of Computer Science University of Liverpool
Measuring Complexity of Web Pages Using Gate
Information Retrieval and Web Design
Part-of-Speech Tagging Using Hidden Markov Models
Presentation transcript:

Unsupervised Acquisition of Axioms to Paraphrase Noun Compounds and Genitives CICLING 2012, New Delhi Anselmo Peñas NLP & IR Group, UNED, Spain Ekaterina Ovchinnikova USC – Information Science Institute, USA

UNED nlp.uned.es Texts omit information Humans optimize language generation effort We omit information that we know the receptor is able to predict and recover Our research goal is to make explicit the omitted information in texts

UNED nlp.uned.es Implicit predicates In particular, some noun compounds and genitives are used in such way In these cases, we want to recover the implicit predicates For example: Morning coffee -> coffee drunk in the morning Malaria mosquito -> mosquito that carries malaria

UNED nlp.uned.es How to find the candidates? Nakov & Hearst 2006 Search the web N1 N2 -> N2 THAT * N1 Malaria mosquito -> mosquito THAT * malaria Here we use Proposition Stores Harvest a text collection that will serve as context Parse documents Count N-V-N, N-V-P-N, N-P-N, … structures Build Proposition Stores (Peñas & Hovy, 2010)

UNED nlp.uned.es Proposition Stores Example: propositions that relate Bomb, attack npn:[bomb:n, in:in, attack:n]:13. nvpn:[bomb:n, explode:v, in:in, attack:n]:11. nvnpn:[bomb:n, kill:v, people:n, in:in, attack:n]:8. npn:[attack:n, with:in, bomb:n]:8. … All of them could be paraphrases for the noun compound “bomb attack”

UNED nlp.uned.es NE Semantic Classes Now, What happens if we have a Named Entity? Shakespeare’s tragedy -> write Why? Consider John’s tragedy Airbus’ tragedy

UNED nlp.uned.es NE Semantic Classes We are considering the “semantic classes” of the NE Shakespeare -> writer writer, tragedy -> write

UNED nlp.uned.es Class-Instance relations Fortunately, relevant semantic classes are pointed out in texts through well-known structures appositions, copulative verbs, “such as”, … Here we take advantage of dependency parsing to get class-instance relations NNP NN nn NNP NN appos NNP NN be

UNED nlp.uned.es Class-Instance relations World News has_instance(leader,'Yasir':'Arafat'):1491. has_instance(spokesman,'Marlin':'Fitzwater'):1001. has_instance(leader,'Mikhail':'S.':'Gorbachev'):980. has_instance(chairman,'Yasir':'Arafat'):756. has_instance(agency,'Tass'):637. has_instance(leader,'Radovan':'Karadzic'):611. has_instance(adviser,'Condoleezza':'Rice'):590. …

UNED nlp.uned.es So far Propositions: | P(p,a) p: predicate a: list of arguments P(p,a): joint probability Class-instance relations: | P(c,i) c: class i: instance P(c,i): joint probability

UNED nlp.uned.es Probability of a predicate Let’s consider the following example Favre pass Assume the text has pointed out he is a quarterback What is Favre doing with the pass? The same as other quarterbacks The quarterbacks we observed before in the background collection – Proposition Store

UNED nlp.uned.es Probability of a predicate Favre pass -> p | P(p|i) Favre -> quarterback | P(c|i) quarterback, pass -> throw | P(p|c) We already have: We need to estimate: P(p|c) (What other quarterbacks do with passes)

UNED nlp.uned.es Probability of a predicate quarterback pass -> p | P(p|c) Steve:Young pass -> throw | P(p|i) Culpepper pass -> complete | P(p|i) … We already have and P(p|i) comes from previous observation: Proposition Store

UNED nlp.uned.es Evaluation We want to address the following questions Do we find the paraphrases required to enable Textual Entailment? Do all the noun-noun dependencies need to be paraphrased? How frequently NEs appear in them?

UNED nlp.uned.es Experimental setting Proposition Store from 216,303 World News 7,800,000 sentences parsed RTE-2 (Recognizing Textual Entailment) 83 entailment decisions depend on noun-noun paraphrases 77 different noun-noun paraphrases

UNED nlp.uned.es Results How frequently NEs appear in these pairs? 82% of paraphrases contain at least one NE 62% are paraphrasing NE-N (e.g. Vikings quarterback)

UNED nlp.uned.es Results Do all the noun-noun dependencies need to be paraphrased? No, only 54% in our test set Some compounds encode semantic relations such as: 12% are locative relations (e.g. New York club) Temporal relations (e.g. April 23 rd strike, Friday semi-final) Class-instance relations (e.g. quarterback Favre) Measure, … Some are trivial: 27% are paraphrased with “of”

UNED nlp.uned.es Results Do we find the paraphrases required to enable Textual Entailment? Yes in 63% of non-trivial cases Proposition type Paraphrase NPNJackson trial ↔ trial against Jackson engine problem ↔ problem with engine NVNU.S. Ambassador ↔ Ambassador represents the U.S. ETA bombing ↔ ETA carried_out bombing NVNPNwife of Joseph Wilson ↔ wife is married to Joseph Wilson NVPNVietnam veteran ↔ veteran comes from Vietnam Shapiro’s office ↔ Shapiro work in office Germany's people ↔ people live in Germany Abu Musab al-Zarqawi's group ↔ group led by Abu Musab al-Zarqawi

UNED nlp.uned.es Results RTE-2 pair 485: Paraphrase not found United Nations vehicle ↔ United Nations produces vehicles United Nations doesn’t share any class with the instances that “produce vehicles” Toyota vehicle -> develop, build, sell, produce, make, export, recall, assemble, …

UNED nlp.uned.es Conclusions A significant proportion of noun-noun dependencies includes Named Entities Some noun-noun dependencies don’t require the retrieval of implicit predicates The method proposed is sensitive to different Nes Different NEs retrieve different predicates Current work: to select the most relevant paraphrase according to the text We are exploring weighted abduction

Unsupervised Acquisition of Axioms to Paraphrase Noun Compounds and Genitives CICLING 2012, New Delhi Thanks!