Using Semantic Relations to Improve Information Retrieval

Slides:



Advertisements
Similar presentations
Academic Writing Writing an Abstract.
Advertisements

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
QA-LaSIE Components The question document and each candidate answer document pass through all nine components of the QA-LaSIE system in the order shown.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Automatic Metaphor Interpretation as a Paraphrasing Task Ekaterina Shutova Computer Lab, University of Cambridge NAACL 2010.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Search Engines and Information Retrieval
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
Automatic Web Page Categorization by Link and Context Analysis Giuseppe Attardi Antonio Gulli Fabrizio Sebastiani.
The Informative Role of WordNet in Open-Domain Question Answering Marius Paşca and Sanda M. Harabagiu (NAACL 2001) Presented by Shauna Eggers CS 620 February.
Basi di dati distribuite Prof. M.T. PAZIENZA a.a
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Introduction to Machine Learning Approach Lecture 5.
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
Information Retrieval in Practice
Your PowerQuest Title A ? PowerQuest for ? Grade By: Insert Your Name and School.
Robert Hass CIS 630 April 14, 2010 NP NP↓ Super NP tagging JJ ↓
Search Engines and Information Retrieval Chapter 1.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
Using Semantic Relations to Improve Information Retrieval Tom Morton.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
UB LIS 571 Soergel Lecture 6.2b Document analysis for retrieval and information extraction Dagobert Soergel Department of Library and Information Studies.
GTRI.ppt-1 NLP Technology Applied to e-discovery Bill Underwood Principal Research Scientist “The Current Status and.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Natural Language Processing for Information Retrieval -KVMV Kiran ( )‏ -Neeraj Bisht ( )‏ -L.Srikanth ( )‏
Using Semantic Relations to Improve Passage Retrieval for Question Answering Tom Morton.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Friday Finish chapter 24 No written homework.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 3. Word Association.
Unit 4: REFERRING EXPRESSIONS
Blog Summarization We have built a blog summarization system to assist people in getting opinions from the blogs. After identifying topic-relevant sentences,
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Communicative and Academic English for the EFL Professional.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
Hybrid Method for Tagging Arabic Text Written By: Yamina Tlili-Guiassa University Badji Mokhtar Annaba, Algeria Presented By: Ahmed Bukhamsin.
Natural Language Processing for Information Retrieval D a v i d D. L e w i s AT&T Bell Lab.’s K a r e n S p a r c k J o n e s University of Cambridge Ferhat.
Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.
Topic by Topic Performance of Information Retrieval Systems Walter Liggett National Institute of Standards and Technology TREC-7 (1999)
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
AUTONOMOUS REQUIREMENTS SPECIFICATION PROCESSING USING NATURAL LANGUAGE PROCESSING - Vivek Punjabi.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.
Chapter 5: Organizing and Displaying Data. Learning Objectives Demonstrate techniques for showing data in graphical presentation formats Choose the best.
Unit 6 Predicates, Referring Expressions, and Universe of Discourse.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
gesture features for coreference
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Social Knowledge Mining
Presentation transcript:

Using Semantic Relations to Improve Information Retrieval Tom Morton

Introduction NLP techniques have been largely unsuccessful at information retrieval. Why? Document retrieval has been the primary measure of information retrieval success. Document retrieval reduces the need for NLP techniques. Discourse factors can be ignored. Query words perform word-sense disambiguation. Lack of robustness: NLP techniques are typically not as robust as word indexing.

Introduction Paragraph retrieval for natural-language questions. Paragraphs can be influenced by discourse factors. Correctness of answers to natural language questions can be accurately determined automatically. Standard precursor to TREC question answering task. What NLP technologies might help at this information retrieval task and are they robust enough?

NLP Technologies Question Analysis: Named-Entity Detection: Questions tend to specify the semantic type of their answer. This component tries to identify this type. Named-Entity Detection: Named-entity detection determines the semantic type of proper nouns and numeric amounts in text.

How these technologies help? Question Analysis The category predicted is appended to the question. Named-Entity Detection: The NE categories found in text are included as new terms. This approach requires additional question terms to be in the paragraph. What party is John Major in? (ORGANIZATION) It probably won't be clear for some time whether the Conservative Party has chosen in John Major a truly worthy successor to Margaret Thatcher, who has been a giant on the world stage. +ORGANIZATION +PERSON

NLP Technologies Coreference Relations: Interpretation of a paragraph may depend on the context in which it occurs. Syntactically-based Categorical Relation Extraction: Appositive and predicate nominative constructions provide descriptive terms about entities.

How these technologies help? Coreference: Use coreference relationships to introduce new terms referred to but not present in the paragraph’s text. How long was Margaret Thatcher the prime minister? (DURATION) The truth, which has been added to over each of her 11 1/2 years in power, is that they don't make many like her anymore. +MARGARET +THATCHER +PRIME +MINISTER +DURATION

How these technologies help? Categorical Relation Extraction Identifies DESCRIPTION category. Allows descriptive terms to be used in term expansion. Who is Frank Lloyd Wright? (DESCRIPTION) What architect designed Robie House? (PERSON) Famed architect Frank Lloyd Wright… +DESCRIPTION Buildings he designed include the Guggenheim Museum in New York and Robie House in Chicago. +FRANK +LLOYD +WRIGHT +FAMED +ARCHITECT

Conclusion Developed and evaluated new techniques in: Coreference Resolution. Categorical Relation Extraction. Question Analysis. Integrated these techniques with existing NLP components: NE detection, POS tagging, sentence detection, etc. Demonstrated that these techniques can be used to improve performance in an information retrieval task. Paragraph retrieval for natural language questions.

System overview Indexing Retrieval Documents Paragraphs+ Paragraphs Coreference Resolution Pre-processing Documents Categorical Relation Extraction NE Detection Paragraphs+ Search Engine Question Analysis Paragraphs Question

Will it work? Will these semantic relations improve paragraph retrieval? Are the implementations robust enough to see a benefit across large document collections and question sets? Are there enough questions where these relationships are required to find an answer. Questions need only be answered once. Short Answer: Yes!

How does it work? Coreference Use Approach described in ACL (Morton 2000). Divide referring expressions into three classes and create a separate resolution approach for each. Singular third person pronouns: Statistical Proper nouns: Rule-based Definite noun phrases: Rule-based Apply resolution approaches to text in an interleaved fashion.

Coreference John Major, a truly worthy… Margaret Thatcher, her, … The Conservative Party the undoubted exception Winston Churchill … she ? 20% 70% 10% 5% Pronoun is resolved to entity rather than most recent extent.

Paragraph Retrieval Results

Conclusion Developed and evaluated new techniques in: Coreference Resolution. Categorical Relation Extraction. Question Analysis. Integrated these techniques with existing NLP components: NE detection, POS tagging, Sentence detection, etc. Demonstrated that these techniques can be used to improve performance in an information retrieval task. Paragraph retrieval for natural language questions.

Future Work Extend answer categories and named-entity detection to include new types. Develop completely statistical coreference resolution mechanism. Re-run paragraph retrieval evaluation.