ABDULLAH ALOTAYQ, DONG WANG, ED PHAM PROJECT BY:

Slides:

Advertisements

Similar presentations

1 Opinion Summarization Using Entity Features and Probabilistic Sentence Coherence Optimization (UIUC at TAC 2008 Opinion Summarization Pilot) Nov 19,

Advertisements

Chapter 5: Introduction to Information Retrieval

Elliot Holt Kelly Peterson. D4 – Smells Like D3 Primary Goal – improve D3 MAP with lessons learned After many experiments: TREC 2004 MAP = >

QA-LaSIE Components The question document and each candidate answer document pass through all nine components of the QA-LaSIE system in the order shown.

Group 3 Chad Mills Esad Suskic Wee Teck Tan. Outline  System and Data  Document Retrieval  Passage Retrieval  Results  Conclusion.

Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.

Group 3 Chad Mills Esad Suskic Wee Teck Tan 1. Outline  Pre-D4 Recap  General Improvements  Short-Passage Improvements  Results  Conclusion 2.

Ke Liu1, Junqiu Wu2, Shengwen Peng1,Chengxiang Zhai3, Shanfeng Zhu1

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

Answer Extraction Ling573 NLP Systems and Applications May 17, 2011.

Deliverable #3: Document and Passage Retrieval Ling 573 NLP Systems and Applications May 10, 2011.

Passage Retrieval & Re-ranking Ling573 NLP Systems and Applications May 5, 2011.

Q/A System First Stage: Classification Project by: Abdullah Alotayq, Dong Wang, Ed Pham.

Passage Retrieval and Re-ranking Ling573 NLP Systems and Applications May 3, 2011.

Deliverable #2: Question Classification Group 5 Caleb Barr Maria Alexandropoulou.

A Basic Q/A System: Passage Retrieval. Outline  Query Expansion  Document Ranking  Passage Retrieval  Passage Re-ranking.

’ strict ’ strict ’ strict ’ lenient ‘ lenient ‘ lenient

Question-Answering: Overview Ling573 Systems & Applications March 31, 2011.

1 Quasi-Synchronous Grammars  Based on key observations in MT: translated sentences often have some isomorphic syntactic structure, but not usually in.

Named Entity Recognition in an Intranet Query Log Richard Sutcliffe 1, Kieran White 1, Udo Kruschwitz University of Limerick, Ireland 2 - University.

The Informative Role of WordNet in Open-Domain Question Answering Marius Paşca and Sanda M. Harabagiu (NAACL 2001) Presented by Shauna Eggers CS 620 February.

Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.

Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.

LING 573: Deliverable 3 Group 7 Ryan Cross Justin Kauhl Megan Schneider.

Team 8 Mowry, Srinivasan and Wong Ling 573, Spring University of Washington.

Employing Two Question Answering Systems in TREC 2005 Harabagiu, Moldovan, et al 2005 Language Computer Corporation.

Question-Answering: Systems & Resources Ling573 NLP Systems & Applications April 8, 2010.

A Web-based Question Answering System Yu-shan & Wenxiu

AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.

MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.

Query Processing: Query Formulation Ling573 NLP Systems and Applications April 14, 2011.

“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.

A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA

AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.

1 Named Entity Recognition based on three different machine learning techniques Zornitsa Kozareva JRC Workshop September 27, 2005.

Question Answering From Zero to Hero Elena Eneva 11 Oct 2001 Advanced IR Seminar.

MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.

Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.

Question Answering over Implicitly Structured Web Content

Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.

Ling573 NLP Systems and Applications May 7, 2013.

LING 573 Deliverable 3 Jonggun Park Haotian He Maria Antoniak Ron Lockwood.

Department of Software and Computing Systems Research Group of Language Processing and Information Systems The DLSIUAES Team’s Participation in the TAC.

August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.

Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu.

Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.

Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013.

(Pseudo)-Relevance Feedback & Passage Retrieval Ling573 NLP Systems & Applications April 28, 2011.

Survey on Long Queries in Keyword Search : Phrase-based IR Sungchan Park

1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.

Hui Fang (ACL 2008) presentation 2009/02/04 Rick Liu.

Integrating linguistic knowledge in passage retrieval for question answering J¨org Tiedemann Alfa Informatica, University of Groningen HLT/EMNLP 2005.

AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.

Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,

1 Predicting Answer Location Using Shallow Semantic Analogical Reasoning in a Factoid Question Answering System Hapnes Toba, Mirna Adriani, and Ruli Manurung.

Information Retrieval in Practice

Queensland University of Technology

Query Reformulation & Answer Extraction

Ling573 NLP Systems and Applications May 16, 2013

Convolutional Neural Networks for sentence classification

CMU Y2 Rosetta GnG Distillation

Question Answer System Deliverable #2

Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.

Tiran Software RadeX Tahir Bilal Onur Deniz Soner Kara

Presentation transcript:

ABDULLAH ALOTAYQ, DONG WANG, ED PHAM PROJECT BY: Q/A SYSTEM ABDULLAH ALOTAYQ, DONG WANG, ED PHAM PROJECT BY:

COMPONENTS: Query Processing Passage Retrieval Answer Extraction

QUERY PROCESSING Classification Package Mallet Classifiers: Maxent DecisionTree NaiveBayes Balanced Winnow

QUERY PROCESSING Features Semantic Morphological Neighboring (Syntactic)

QUERY PROCESSING Stemming NLTK stemmer Trigrams: Poor Classification results Named Entity Recognition NLTK NER Pre-trained model to do this task. 6 types of NE

Query Processing Our Results: For Binary: BalancedWinnow: Testing Accuracy = 0.804 MaxEnt: Testing Accuracy = 0.78 For Real Values: BalancedWinnow: Testing Accuracy = 0.784 MaxEnt: Testing Accuracy = 0.758 Named Entity Recognition Testing Accuracy = 0.802

QUERY EXPANSION Two different methods: Target Concatenation Add the target for each question to the end of the question. Deletion/Addition Deletion of wh-words + function words Addition of synonyms and hypernyms (via WordNet)

QUERY EXPANSION Addition Synonyms Hypernyms First Ancestor Morphological variants WordNet as thesaurus: wordnet.morphy Poor results

PASSAGE RETRIEVAL Using Indri/Lemur Ran both query reformulation/expansion approaches through the software. Took the top 50 documents per query.

PASSAGE RETRIEVAL Used Indri/Lemur Took the top passage from each of the top 50 documents for each query. Query grammar #combine[passageWIDTH:INC] Default for system: 120 terms, 1000 terms window

PASSAGE RETRIEVAL Passage Re-ranking Modified the window size 500, 1000 terms Modified the number of top passages taken from the top 50 documents: 1, 5, 10, 20, 25 passages

ANSWER EXTRACTION Stemming Applied to queries. Stopwords Applied it to the indexing Removed all the stopwords Removed all but the wh-words

ANSWER EXTRACTION Term Weighting Applied it to the queries Changed the weights of the target terms and query terms Utilized query grammar to implement this Snippet Extraction Using Indri’s API to implement this Encountered problems with fixed snippet size (due to hardcoding)

EVALUATION QE Approach MAP Target Concatenation 0.3223 Document ranking Note: All results based on TREC-2004 QE Approach MAP Target Concatenation 0.3223 Subtraction + WordNet 0.2381

EVALUATION (CONT.) Stopwords in indexing MAP No stopwords removed 0.3223 Stopwords removed 0.3262 Keeping WH-words 0.3407

EVALUATION (CONT.) Passage Retrieval QE Approach Type MRR Target Concatenation Strict 0.195439095783 Lenient 0.392501775644 Subtraction + WordNet 0.180698539579 0.341009813194

EVALUATION (CONT.) Window Size Type MRR 1000 Strict 0.195439095783 Passage Re-ranking: Window Size Window Size Type MRR 1000 Strict 0.195439095783 Lenient 0.392501775644 500 0.209317276517 0.383193743722 100 0.340829170969 0.48166863823

EVALUATION (CONT.) Strict Lenient Non-stemmed 0.340829170969 Stemming on query terms (using the Porter Stemmer) Strict Lenient Non-stemmed 0.340829170969 0.48166863823 Stemmed 0.372366362694 0.500396143568

EVALUATION (CONT.) Strict Lenient 100 0.290138907213 0.407219304738 Snippet Extraction (using Indri/Lemur with different window sizes) Strict Lenient 100 0.290138907213 0.407219304738 500 0.212304221474 0.352469418397 1000 0.201755977721 0.332300412861

EVALUATION (CONT.) Strict Lenient Term weighting on queries Balanced (no weights) 0.372366362694 0.500396143568 Query = .33, Target = .66 0.27309189993 0.415572848272 Query = .66, Target = .33 0.34215110982 0.466614280652 Query = .80, Target = .20 0.302979241834 0.420147005052

FINAL RESULTS Strict Lenient 100 0.078516123253 0.132782385953 250 TREC 2004 (Training Data) Strict Lenient 100 0.078516123253 0.132782385953 250 0.260734831756 0.36403939276 1000 0.385047304625 0.518316248372

FINAL RESULTS Strict Lenient 100 0.0617858062617 0.150900599492 250 TREC 2004 (Training Data) Strict Lenient 100 0.0617858062617 0.150900599492 250 0.131509659052 0.237742464648 1000 0.294352184431 0.477678260382

CONCLUSIONS Some things were helpful… Stemming Stopwords Window Size/Query Grammar changes While others weren’t… Our attempt at Query Expansion Term Weighting We found improvement from the previous deliverable, but nothing dramatic. Still a lot left to be desired for future work (i.e. apply other answer extraction methods)

FUTURE WORK Work more with the Text Snippet feature from Indri? Change the code to enable different snippet sizes Applying the work from query classification to our answer extraction or passage re- ranking Semantic Role Labeling Finding Bad Candidates Using redundancy-based QA ARANEA Structure-based extraction FrameNet

SOFTWARE PACKAGES USED Mallet Indri/Lemur NLTK Porter Stemmer Self-written Code Stanford Parser, Berkeley Parser

READINGS Employing Two Question Answering Systems in TREC-2005, Sanda Harabagiu & others. Query Expansion/Reformulation Kwok, Etzioni, and Weld, 2001 Lin, 2007 Fang, 2008 Aktolga et al, 2011 Passage Retrieval Tiedemann et al, 2008 Indri/Lemur documentation