LING 573 Deliverable 3 Jonggun Park Haotian He Maria Antoniak Ron Lockwood.

Slides:



Advertisements
Similar presentations
1 Opinion Summarization Using Entity Features and Probabilistic Sentence Coherence Optimization (UIUC at TAC 2008 Opinion Summarization Pilot) Nov 19,
Advertisements

Chapter 5: Introduction to Information Retrieval
Elliot Holt Kelly Peterson. D4 – Smells Like D3 Primary Goal – improve D3 MAP with lessons learned After many experiments: TREC 2004 MAP = >
Group 3 Chad Mills Esad Suskic Wee Teck Tan. Outline  System and Data  Document Retrieval  Passage Retrieval  Results  Conclusion.
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Answer Extraction Ling573 NLP Systems and Applications May 19, 2011.
Information Retrieval in Practice
Q/A System First Stage: Classification Project by: Abdullah Alotayq, Dong Wang, Ed Pham.
Deliverable #2: Question Classification Group 5 Caleb Barr Maria Alexandropoulou.
’ strict ’ strict ’ strict ’ lenient ‘ lenient ‘ lenient
The College of Saint Rose CIS 460 – Search and Information Retrieval David Goldschmidt, Ph.D. from Search Engines: Information Retrieval in Practice, 1st.
1 Advanced Searching Use Query Languages. Use more than one search engine. –Or metasearches like at Start with simple searches. Add.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
1 Question-Answering via the Web: the AskMSR System Note: these viewgraphs were originally developed by Professor Nick Kushmerick, University College Dublin,
Team 8 Mowry, Srinivasan and Wong Ling 573, Spring University of Washington.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
A Web-based Question Answering System Yu-shan & Wenxiu
Improving Software Package Search Quality Dan Fingal and Jamie Nicolson.
QA SYSTEM Maria Alexandropoulou Max Kaufmann Alena Hrynkevich.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
1 Named Entity Recognition based on three different machine learning techniques Zornitsa Kozareva JRC Workshop September 27, 2005.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
A search-based Chinese Word Segmentation Method ——WWW 2007 Xin-Jing Wang: IBM China Wen Liu: Huazhong Univ. China Yong Qin: IBM China.
Querying Structured Text in an XML Database By Xuemei Luo.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
Text Mining In InQuery Vasant Kumar, Peter Richards August 25th, 1999.
A Language Independent Method for Question Classification COLING 2004.
Day 14 Information Retrieval Question Answering 1.
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing.
Chapter 6: Information Retrieval and Web Search
From Social Bookmarking to Social Summarization: An Experiment in Community-Based Summary Generation Oisin Boydell, Barry Smyth Adaptive Information Cluster,
NTCIR /21 ASQA: Academia Sinica Question Answering System for CLQA (IASL) Cheng-Wei Lee, Cheng-Wei Shih, Min-Yuh Day, Tzong-Han Tsai, Tian-Jian Jiang,
Alexey Kolosoff, Michael Bogatyrev 1 Tula State University Faculty of Cybernetics Laboratory of Information Systems.
 Examine two basic sources for implicit relevance feedback on the segment level for search personalization. Eye tracking Display time.
A Novel Pattern Learning Method for Open Domain Question Answering IJCNLP 2004 Yongping Du, Xuanjing Huang, Xin Li, Lide Wu.
A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Post-Ranking query suggestion by diversifying search Chao Wang.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
Shallow & Deep QA Systems Ling 573 NLP Systems and Applications April 9, 2013.
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
SEMANTIC VERIFICATION IN AN ONLINE FACT SEEKING ENVIRONMENT DMITRI ROUSSINOV, OZGUR TURETKEN Speaker: Li, HueiJyun Advisor: Koh, JiaLing Date: 2008/5/1.
Improving QA Accuracy by Question Inversion John Prager, Pablo Duboue, Jennifer Chu-Carroll Presentation by Sam Cunningham and Martin Wintz.
Date: 2013/9/25 Author: Mikhail Ageev, Dmitry Lagun, Eugene Agichtein Source: SIGIR’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Improving Search Result.
ENHANCING CLUSTER LABELING USING WIKIPEDIA David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab SIGIR’09.
UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.
Integrating linguistic knowledge in passage retrieval for question answering J¨org Tiedemann Alfa Informatica, University of Groningen HLT/EMNLP 2005.
Text Similarity: an Alternative Way to Search MEDLINE James Lewis, Stephan Ossowski, Justin Hicks, Mounir Errami and Harold R. Garner Translational Research.
Information Retrieval in Practice
A Simple Approach for Author Profiling in MapReduce
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Search Engine Architecture
Query Reformulation & Answer Extraction
A Straightforward Author Profiling Approach in MapReduce
Ling573 NLP Systems and Applications May 16, 2013
ABDULLAH ALOTAYQ, DONG WANG, ED PHAM PROJECT BY:
CMU Y2 Rosetta GnG Distillation
Question Answer System Deliverable #2
The Voted Perceptron for Ranking and Structured Classification
Introduction to Search Engines
Presentation transcript:

LING 573 Deliverable 3 Jonggun Park Haotian He Maria Antoniak Ron Lockwood

Closed Class filters 14 Animals, colors, companies, continents, countries, sports team, languages, occupations, periodic table, race, us-cities, us-presidents, us-states, and us- universities.

Query EXPAANSSION!

Query Expansion Who is the president of the United States? President united states nations council How long did it take to build the Tower of Pisa? long build tower pisa women’s station

Question Classification Software package: Mallet Classification algorithms: MaxEnt, NaiveBayes, Winnow, DecisionTree Training Data: - TREC-2004.xml - Training set 5 (5500 labeled questions) (Li & Roth) Test Data: - TREC-2005.xml - Testing set (Li & Roth)

Feature selection: - Unigram - Bigram - Trigram - Question word - NER tags

Conclusion: Maximum accuracy: TREC-2005 as test file: MaxEnt, Unigram + Bigram + Wh-words TREC-10 as test file: MaxEnt, Unigram + Bigram + Wh-words

Other findings: 1.Trigram does not helps and drags the accuracy down. 2.NER feature does not helps and causes a slight drop-down.

Web Boosting Resources: jsoup, Bing.com Query: original question + target string Results: top 50 web snippets, stored in a text file

Web Boosting Challenges and Successes Which search engine or answer website to use? How to avoid throttling? How to integrate results into our system? How to edit results to make them more useful for our answer ranking system?

Main Changes Use web query as input to the redundancy-based answer extraction engine This replaces our paragraph based index Answer type classification now feeds into answer extraction Filtering of candidate answers by answer type in combination with NER on the answers Following types are handled: NUM, LOC, HUM, ENTY

Main changes (continued) Filtering of closed class questions using lists E.g. pro sports teams, colors, etc. Filtering out of terms with occurrences in less than 2 snippets Return 250 char. answer instead of 1-4 words

Answer Extraction Details Input to the Extraction Engine Query word list Stop-word list Focus-word list (e.g. meters, liters, miles, etc.) Passage list – the paragraph results of the query 1.N-gram generation and occurrence counting 2.Filtering out stop words and query words 3.Filter by answer type

Answer Extraction Details 4. Combining unigram counts with n-gram counts 5. Weighting candidates with idf scores 6. Re-rank candidates 1.Eliminate ones that don’t have evidence in at least 2 snippets 2.Eliminate ones that don’t match a closed class list (for certain questions.) 7. Verifying candidates in documents 1.Use bag of words query from the candidate sub- snippet + query words against Lucene index

Results D2: strict = 0.01 lenient = D3: strict = lenient = 0.371