Group 3 Chad Mills Esad Suskic Wee Teck Tan. Outline  System and Data  Document Retrieval  Passage Retrieval  Results  Conclusion.

Slides:

Advertisements

Similar presentations

A Support Vector Method for Optimizing Average Precision

Advertisements

Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.

1 Opinion Summarization Using Entity Features and Probabilistic Sentence Coherence Optimization (UIUC at TAC 2008 Opinion Summarization Pilot) Nov 19,

Elliot Holt Kelly Peterson. D4 – Smells Like D3 Primary Goal – improve D3 MAP with lessons learned After many experiments: TREC 2004 MAP = >

QA-LaSIE Components The question document and each candidate answer document pass through all nine components of the QA-LaSIE system in the order shown.

Distant Supervision for Emotion Classification in Twitter posts 1/17.

Teaching Finite Automata with AutomataTutor Rajeev Alur (Penn), Loris D’Antoni (Penn), Sumit Gulwani (MSR), Bjoern Hartmann (Berkeley), Dileep Kini (UIUC),

Group 3 Chad Mills Esad Suskic Wee Teck Tan 1. Outline  Pre-D4 Recap  General Improvements  Short-Passage Improvements  Results  Conclusion 2.

Great Food, Lousy Service Topic Modeling for Sentiment Analysis in Sparse Reviews Robin Melnick Dan Preston

Overview of the KBP 2013 Slot Filler Validation Track Hoa Trang Dang National Institute of Standards and Technology.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

Final Project of Information Retrieval and Extraction by d 吳蕙如.

Deliverable #3: Document and Passage Retrieval Ling 573 NLP Systems and Applications May 10, 2011.

Passage Retrieval and Re-ranking Ling573 NLP Systems and Applications May 3, 2011.

Deliverable #2: Question Classification Group 5 Caleb Barr Maria Alexandropoulou.

A Basic Q/A System: Passage Retrieval. Outline  Query Expansion  Document Ranking  Passage Retrieval  Passage Re-ranking.

’ strict ’ strict ’ strict ’ lenient ‘ lenient ‘ lenient

Ang Sun Ralph Grishman Wei Xu Bonan Min November 15, 2011 TAC 2011 Workshop Gaithersburg, Maryland USA.

The College of Saint Rose CIS 460 – Search and Information Retrieval David Goldschmidt, Ph.D. from Search Engines: Information Retrieval in Practice, 1st.

A novel log-based relevance feedback technique in content- based image retrieval Reporter: Francis 2005/6/2.

Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

LING 573: Deliverable 3 Group 7 Ryan Cross Justin Kauhl Megan Schneider.

Team 8 Mowry, Srinivasan and Wong Ling 573, Spring University of Washington.

Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.

Evaluation David Kauchak cs458 Fall 2012 adapted from:

Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.

Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.

Text Classification using SVM- light DSSI 2008 Jing Jiang.

Query Processing: Query Formulation Ling573 NLP Systems and Applications April 14, 2011.

Linking Wikipedia to the Web Antonio Flores Bernal Department of Computer Sciencies San Pablo Catholic University 2010.

PERSONALIZED SEARCH Ram Nithin Baalay. Personalized Search? Search Engine: A Vital Need Next level of Intelligent Information Retrieval. Retrieval of.

A search-based Chinese Word Segmentation Method ——WWW 2007 Xin-Jing Wang: IBM China Wen Liu: Huazhong Univ. China Yong Qin: IBM China.

Learning to Extract Keyphrases from Text Paper by: Peter Turney National Research Council of Canada Technical Report (1999) Presented by: Prerak Sanghvi.

Unsupervised Constraint Driven Learning for Transliteration Discovery M. Chang, D. Goldwasser, D. Roth, and Y. Tu.

INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing.

NTCIR /21 ASQA: Academia Sinica Question Answering System for CLQA (IASL) Cheng-Wei Lee, Cheng-Wei Shih, Min-Yuh Day, Tzong-Han Tsai, Tian-Jian Jiang,

Mebi 591D – BHI Kaggle Class Baselines kaggleclass.weebly.com/

CIKM Opinion Retrieval from Blogs Wei Zhang 1 Clement Yu 1 Weiyi Meng 2 1 Department of.

Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.

LING 573 Deliverable 3 Jonggun Park Haotian He Maria Antoniak Ron Lockwood.

National Taiwan University, Taiwan

August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval O. Chum, et al. Presented by Brandon Smith Computer Vision.

1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.

Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.

Generating Query Substitutions Alicia Wood. What is the problem to be solved?

Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.

26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.

Indri at TREC 2004: UMass Terabyte Track Overview Don Metzler University of Massachusetts, Amherst.

(Pseudo)-Relevance Feedback & Passage Retrieval Ling573 NLP Systems & Applications April 28, 2011.

HANGMAN OPTIMIZATION Kyle Anderson, Sean Barton and Brandyn Deffinbaugh.

Web Intelligence and Intelligent Agent Technology 2008.

UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.

Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.

Integrating linguistic knowledge in passage retrieval for question answering J¨org Tiedemann Alfa Informatica, University of Groningen HLT/EMNLP 2005.

Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,

An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

A Straightforward Author Profiling Approach in MapReduce

Challenges in Creating an Automated Protein Structure Metaserver

Compact Query Term Selection Using Topically Related Text

ABDULLAH ALOTAYQ, DONG WANG, ED PHAM PROJECT BY:

CMU Y2 Rosetta GnG Distillation

Relevance and Reinforcement in Interactive Browsing

Question Answer System Deliverable #2

Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007

Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.

Presentation transcript:

Group 3 Chad Mills Esad Suskic Wee Teck Tan

Outline  System and Data  Document Retrieval  Passage Retrieval  Results  Conclusion

System and Data DevelopmentTesting TREC 2004 TREC 2005  System: Indri  Data:

Document Retrieval  Baseline: Remove “?” Add Target String  MAP: 0.307

Document Retrieval  Attempted Improvement 1: Settings From Baseline Rewrite “When was…” questions as “[target] was [last word] on” queries  MAP: Best so far: 0.307

Document Retrieval  Attempted Improvement 2: Settings From Baseline Remove “Wh” words Remove Stop Words Replaced Pronoun with Target String  MAP: Best so far: “Wh” / Stop WordsWhat, Who, Where, Why, How many, How often, How long, Which, How did, Does, is, the, a, an, of, was, as Pronounhe, she, it, its, they, their, his

Document Retrieval  Attempted Improvement 3: Settings From Improvement 2 Index Stemmed (Krovetz Stemmer)  MAP: Best so far: 0.319

Document Retrieval  Attempted Improvement 4: Settings From Improvement 3 Remove Punctuations Remove Non Alphanumeric Characters  MAP: Best so far: 0.336

Document Retrieval  Attempted Improvement 5: Settings From Improvement 4 Remove Duplicate Words  MAP: Best so far: 0.374

Passage Retrieval  Baseline: Out-of-the-box Indri Same Question Formulation Changed “#combine(“ to “#combine[passageX:Y](” Passage Window, Top 20, No Re-ranking X=40 Y=20 Strict0.126 Lenient0.337 X=200 Y=100 Strict0.414 Lenient0.537

Passage Retrieval  Attempted Re-ranking Mallet MaxEnt Classifier Training Set TREC 2004 ○ 80% Train : 20% Dev ○ Split by Target ○ Avoid Cheating e.g. Question 1.* all in either Train or Dev Labels: + Passage has Correct Answer - Passage doesn’t have Answer

Passage Retrieval  Features used: For both Passage and Question+Target: ○ unigram, bigram, trigram ○ POS tags – unigram, bigram, trigram Question/Passage Correspondence: ○ # of Overlapping Terms (and bigrams) ○ Distance between Overlapping Terms  Tried Top 20 Passages from Indri, and Expanding to Top 200 Passages

Passage Retrieval  Result: all attempts were worse than before Example confusion matrix:  Many negative examples, 67-69% accurate on all feature combinations tried

 Indri was very good to start with  E.g. Q10.1 Passage Re-Ranking Indri RankHas Answer 1Yes 2No 3Yes 4 5No Our RankHas AnswerP(Yes)P(No)Indri Rank 1No No Yes No Yes  Our first 2 were wrong, only 1 of Indri’s top 5 in our top 5  If completely replacing rank, must be very good  Many low confidence scores (e.g. 7.6% P(Yes) was best)  Slight edit to Indri ranking less bad, but no good system found  E.g. bump high-confidence Yes to top of list, leave others in Indri order

Results  TREC 2004:  TREC 2005: MAP0.377 Strict MRR0.414 Lenient MRR0.537 MAP0.316 Strict MRR0.366 Lenient MRR0.543

References  Fang – “A Re-examination of Query Expansion Using Lexical Resources”  Tellex – “Quantitative Evaluation of Passage Retrieval Algorithms for Question Answering”

Conclusions  Cleaned Input  Small Targeted Stop Word List  Minimal Setting  Indri Performs PR Well OOTB  Re-ranking Implementation Needs to be Really Good Feature Selection didn’t Help Slight Adjustment Instead of Whole Different Ranking Might Help