LING 573: Deliverable 3 Group 7 Ryan Cross Justin Kauhl Megan Schneider.

Slides:



Advertisements
Similar presentations
1 Opinion Summarization Using Entity Features and Probabilistic Sentence Coherence Optimization (UIUC at TAC 2008 Opinion Summarization Pilot) Nov 19,
Advertisements

Elliot Holt Kelly Peterson. D4 – Smells Like D3 Primary Goal – improve D3 MAP with lessons learned After many experiments: TREC 2004 MAP = >
Group 3 Chad Mills Esad Suskic Wee Teck Tan. Outline  System and Data  Document Retrieval  Passage Retrieval  Results  Conclusion.
Group 3 Chad Mills Esad Suskic Wee Teck Tan 1. Outline  Pre-D4 Recap  General Improvements  Short-Passage Improvements  Results  Conclusion 2.
Semantic Access to Data from the Web Raquel Trillo *, Laura Po +, Sergio Ilarri *, Sonia Bergamaschi + and E. Mena * 1st International Workshop on Interoperability.
Multimedia Answer Generation for Community Question Answering.
Cross-Language Retrieval INST 734 Module 11 Doug Oard.
Natural Language Processing Group Department of Computer Science University of Sheffield, UK IR4QA: An Unhappy Marriage Mark A. Greenwood.
A Maximum Coherence Model for Dictionary-based Cross-language Information Retrieval Yi Liu, Rong Jin, Joyce Y. Chai Dept. of Computer Science and Engineering.
Final Project of Information Retrieval and Extraction by d 吳蕙如.
Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.
Deliverable #3: Document and Passage Retrieval Ling 573 NLP Systems and Applications May 10, 2011.
Passage Retrieval & Re-ranking Ling573 NLP Systems and Applications May 5, 2011.
A Basic Q/A System: Passage Retrieval. Outline  Query Expansion  Document Ranking  Passage Retrieval  Passage Re-ranking.
’ strict ’ strict ’ strict ’ lenient ‘ lenient ‘ lenient
The College of Saint Rose CIS 460 – Search and Information Retrieval David Goldschmidt, Ph.D. from Search Engines: Information Retrieval in Practice, 1st.
WMES3103 : INFORMATION RETRIEVAL
Inverted Indices. Inverted Files Definition: an inverted file is a word-oriented mechanism for indexing a text collection in order to speed up the searching.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Team 8 Mowry, Srinivasan and Wong Ling 573, Spring University of Washington.
HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI Baseline Document Retrieval Component N. Bassiou, C. Kotropoulos, I. Pitas 20/07/2000,
Chapter 5: Information Retrieval and Web Search
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
Course G Web Search Engines 3/9/2011 Wei Xu
A New Approach for Cross- Language Plagiarism Analysis Rafael Corezola Pereira, Viviane P. Moreira, and Renata Galante Universidade Federal do Rio Grande.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( Bridging Languages for Question Answering: DIOGENE at CLEF-2003.
COMP423.  Query expansion  Two approaches ◦ Relevance feedback ◦ Thesaurus-based  Most Slides copied from ◦
LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
Question Answering From Zero to Hero Elena Eneva 11 Oct 2001 Advanced IR Seminar.
1 Query Operations Relevance Feedback & Query Expansion.
MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
Chapter 6: Information Retrieval and Web Search
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Lucene-Demo Brian Nisonger. Intro No details about Implementation/Theory No details about Implementation/Theory See Treehouse Wiki- Lucene for additional.
By: Namrata Lele Mentors: Dave Vieglais Bruce Wilson 1 VDC/TWG Meeting August 09.
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
Basic Implementation and Evaluations Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
LING 573 Deliverable 3 Jonggun Park Haotian He Maria Antoniak Ron Lockwood.
© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.
UIC at TREC 2006: Genomics Track Wei Zhou, Clement T. Yu University of Illinois at Chicago Nov. 16, 2006.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
Shallow & Deep QA Systems Ling 573 NLP Systems and Applications April 9, 2013.
Indri at TREC 2004: UMass Terabyte Track Overview Don Metzler University of Massachusetts, Amherst.
(Pseudo)-Relevance Feedback & Passage Retrieval Ling573 NLP Systems & Applications April 28, 2011.
Multilingual Search Shibamouli Lahiri
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
Hui Fang (ACL 2008) presentation 2009/02/04 Rick Liu.
Integrating linguistic knowledge in passage retrieval for question answering J¨org Tiedemann Alfa Informatica, University of Groningen HLT/EMNLP 2005.
Query expansion COMP423. Menu Query expansion Two approaches Relevance feedback Thesaurus-based Most Slides copied from
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)
CLEF Budapest1 Measuring the contribution of Word Sense Disambiguation for QA Proposers: UBC: Agirre, Lopez de Lacalle, Otegi, Rigau, FBK: Magnini.
A Formal Study of Information Retrieval Heuristics
F. López-Ostenero, V. Peinado, V. Sama & F. Verdejo
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Nouns Nouns not noun noun noun not not
A research literature search engine with abbreviation recognition
ABDULLAH ALOTAYQ, DONG WANG, ED PHAM PROJECT BY:
Question Answer System Deliverable #2
Presentation transcript:

LING 573: Deliverable 3 Group 7 Ryan Cross Justin Kauhl Megan Schneider

The Basics Implemented in Python with Indri – For document retrieval used standard #combine (“query”) operator #combine(x1 x2 … xn) = (score for x1)^(1/n) * (score for x2)^(1/n) * … (score for xn)^(1/n) – Used passage#:# to get windows for passage retrieval (100:50, 150:50, 150:75, also 150:10, 150:15, and longer windows) – Used regexes to clean up the Indri printPassages output

Approaches Stemming Stop word removal Question word removal Query expansion

Approaches (cont.) Stemming – Tried with stemming in index and stemming query – Porter and Krovetz stemmers – Krovetz performed better (less aggressive) TREC 2004 (150/75/20) MAPMRR StrictMRR Lenient Porter Krovetz

Approaches (cont.) Stop word removal – Made runtime faster when removed from index – Offered improvement in all circumstances if removed from queries Question word removal – Performed in almost all cases for query; some improvement. – Largely intuitive. However some questions had slightly better results when left in because of Q&A files in the corpus.

Approaches (cont.) Query expansion – Tried adding synonyms from Wordnet – Only added synonyms for nouns, verbs, adjectives, and adverbs – Restricted synonyms added based on a word’s POS (as predicted by NLTK.pos_tag) – Also tried not restricting synonyms by POS

Approaches (cont.) Query expansion – In both cases, retrieval results were worse with query expansion TREC 2004 DataMAPStrict MRSLenient MRS No synonyms Restricted synonyms Full synonyms

Approaches (cont.) Passage retrieval – Used Indri #combine[passage size:increment]( “query” ) operator – Originally intended to only use documents returned from document retrieval phase – Decided instead to run passage retrieval as a standalone system.

Approaches (cont.) Passage retrieval results – Attempted with a few different variables. – Krovetz stemming, stopwords + question words removed. – Trying to get a window size that did not return too many characters and meaningful increments. TREC 2004 Data Window size/Increment Strict MRSLenient MRS 100/ / /

Overall Krovetz stemmer Stopwords removed from query (kept in index) 150/75/20MAPStrict MRSLenient MRS TREC TREC

Critical Analysis Our query expansion attempts did not help – Too many misleading terms were introduced Stopword based results were unusual – Assumed that removing them from the index would help. Passage retrieval yielded better results than document retrieval – It is more meaningful to see a query term in a passage

References Hitesh Sabnani, Prasenjit Majumder. Question Answering System: Retrieving Relevant Passages. In Proceedings of Cross-Language Evaluation Forum - CLEF. Stefanie Tellex, Boris Katz, Jimmy Lin, Aaron Fernandes, and Gregory Marton. Quantitative Evaluation of Passage Retrieval Algorithms for Question Answering Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.

Questions? ?