Group 3 Chad Mills Esad Suskic Wee Teck Tan 1. Outline  Pre-D4 Recap  General Improvements  Short-Passage Improvements  Results  Conclusion 2.

Slides:



Advertisements
Similar presentations
Microsoft® Office Word 2007 Training
Advertisements

Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.
1 Opinion Summarization Using Entity Features and Probabilistic Sentence Coherence Optimization (UIUC at TAC 2008 Opinion Summarization Pilot) Nov 19,
Elliot Holt Kelly Peterson. D4 – Smells Like D3 Primary Goal – improve D3 MAP with lessons learned After many experiments: TREC 2004 MAP = >
Group 3 Chad Mills Esad Suskic Wee Teck Tan. Outline  System and Data  Document Retrieval  Passage Retrieval  Results  Conclusion.
A review on “Answering Relationship Queries on the Web” Bhushan Pendharkar ASU ID
Final Project of Information Retrieval and Extraction by d 吳蕙如.
AP & SAT MULTIPLE CHOICE STRATEGIES Practice Using the Excerpt.
Passage Retrieval & Re-ranking Ling573 NLP Systems and Applications May 5, 2011.
Add 2 Numbers using a Function Input Number 1 Input Number 2 Call a function that adds these numbers Write the answer out to screen.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
LING 573: Deliverable 3 Group 7 Ryan Cross Justin Kauhl Megan Schneider.
Recommender systems Ram Akella November 26 th 2008.
Web Search – Summer Term 2006 II. Information Retrieval (Basics) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Quark QuarkXPress 4 Intermediate Level Course. Working with Master Pages The Document Layout Palette allows you to add, delete, and move document and.
Chapter 2: Algorithm Discovery and Design
Overview of Search Engines
How to construct a handbag
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
UNIX Filters.
1 Functional Testing Motivation Example Basic Methods Timing: 30 minutes.
Combining Keyword Search and Forms for Ad Hoc Querying of Databases Eric Chu, Akanksha Baid, Xiaoyong Chai, AnHai Doan, Jeffrey Naughton University of.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
ACT: The Reading Test.
Advanced Lesson 1: Advanced Data Organization In Excel 2007, you can use a table to manage and organize related data. You can use the Autofilter tools.
DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.
Beyond Usability: Measuring Speech Application Success Silke Witt-Ehsani, PhD VP, VUI Design Center TuVox.
Forcing a change into the Global Change Queue A strategy for handling heading changes when there is no matching authority record.
Parsing IV Bottom-up Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
In this presentation we will elaborate more on the importance of Choropleth Maps, Group Layers, Scales, Attribute Classification, Definition Queries, Hyperlinks,
Place value and ordering
Short Division.
1 Leaning Tower Output: Stack them leaning as far over as possible. Input: Lots of 2in wide blocks More of the output? With out falling How far over can.
COMPUTER-ASSISTED PLAGIARISM DETECTION PRESENTER: CSCI 6530 STUDENT.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14.
Organizing Information: Things to Consider Amber Roberts Sr. Legal Analyst Gavilon June 13, 2015.
Planning a search strategy.  A search strategy may be broadly defined as a conscious approach to decision making to solve a problem or achieve an objective.
Changing the Way We Write Jennifer Weninger Mountain Creek STEM Academy.
Pete Bohman Adam Kunk.  ChronoSearch: A System for Extracting a Chronological Timeline ChronoChrono.
Participate in a Team to Achieve Organizational Goal
The Internet 8th Edition Tutorial 4 Searching the Web.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
© EZ-R Stats, LLC Duplicate Payments Slide 1 Auditing for Duplicate Payments A better way … Web CAAT.
NTCIR /21 ASQA: Academia Sinica Question Answering System for CLQA (IASL) Cheng-Wei Lee, Cheng-Wei Shih, Min-Yuh Day, Tzong-Han Tsai, Tian-Jian Jiang,
Intelligent Web Topics Search Using Early Detection and Data Analysis by Yixin Yang Presented by Yixin Yang (Advisor Dr. C.C. Lee) Presented by Yixin Yang.
11/25/2015Slide 1 Scripts are short programs that repeat sequences of SPSS commands. SPSS includes a computer language called Sax Basic for the creation.
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
LING 573 Deliverable 3 Jonggun Park Haotian He Maria Antoniak Ron Lockwood.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Search Technologies. Examples Fast Google Enterprise – Google Search Solutions for business – Page Rank Lucene – Apache Lucene is a high-performance,
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
User-Friendly Systems Instead of User-Friendly Front-Ends Present user interfaces are not accepted because the underlying systems are too difficult to.
Week 2 English Welcome. Work on Vocabulary AND Dictionary Skills Monday.
INFO 638Lecture #91 Software Project Management Conclude Adaptive Project Framework INFO 638 Glenn Booker.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
Neural Networks Lecture 4 out of 4. Practical Considerations Input Architecture Output.
The Data Large Number of Workbooks Each Workbook has multiple worksheets Transaction worksheets have large (LARGE) number of lines (millions of records.
F. López-Ostenero, V. Peinado, V. Sama & F. Verdejo
Parsing IV Bottom-up Parsing
A Closer Look at Instruction Set Architectures: Expanding Opcodes
Topic 3: Data Binary Arithmetic.
ABDULLAH ALOTAYQ, DONG WANG, ED PHAM PROJECT BY:
Parsing and More Parsing
Programming We have seen various examples of programming languages
Dennis Zhao,1 Dragomir Radev PhD1 LILY Lab
Software Development Techniques
Presentation transcript:

Group 3 Chad Mills Esad Suskic Wee Teck Tan 1

Outline  Pre-D4 Recap  General Improvements  Short-Passage Improvements  Results  Conclusion 2

 Question Classification: not used  Document Retrieval: Indri  Passage Retrieval: Indri  Passage Retrieval Features: Remove non-alphanumeric characters Replace pronoun or append target Remove stop words Stemming Pre-D4 Recap 3

 Best MRR (2004):  Baseline: Same methodology New passage sizes Entering D4 Passage SizeMRR

 Trimming Improvements: Remove, tags Chop off beginnings like: ○ ___ (Xinhua) – ○ ___ (AP) – Results: General Improvements Best so far: Passage SizeMRR

 Aranea Query Aranea Question-neutral filtering: ○ Edge stopword ○ Question terms First Aranea answer matching a passage: ○ Move first matching passage to top ○ “Match:” ≥ 60%, by token General Improvements Best so far:

 Results: General Improvements Best so far: Passage SizeQuestion Only

 Results: General Improvements Best so far: Passage SizeQuestion OnlyQuestion + Target

 Results: General Improvements Best so far: Passage SizeQuestion OnlyQuestion + TargetIndri Input  Improvement: 11-25% (relative) 9

 Aranea Re-query: Ignore recent improvements ○ Add Aranea answers to query ○ Integrate if useful For 100-char passages: General Improvements Best so far: ConditionsMRR Before Aranea0.186 Top 5 terms0.169 Top 7 terms0.143 Top 10 terms0.125  Lots of Problems Many Qs: no results Add top 5+ Didn’t combine with non-Aranea output 10

 1000-character MRR “good enough”  100-character MRR needs help  New Focus: short passages only Focus Shift Best so far:

 What’s going wrong? Short-Passage Improvements Best so far: Short Passage: no answer at all 12

 What’s going wrong? Short-Passage Improvements Best so far: Short Passage Long Passages: answer in 2 nd passage 13

 What’s going wrong? 16 word passages: too short for Indri  Approach for Short Passages 82% of questions: answers in long passages Shorten long passages Don’t rely directly on Indri as much Needed: a way to shorten them Short-Passage Improvements Best so far:

Short-Passage Improvements Best so far:  36% of questions: date or location  56%: date, location, name, number 15

Short-Passage Improvements Best so far:  Putting these together: Answers do exist in the longer passages A few categories: large % of answer types  Solution Approach: Named Entity Recognition OpenNLP (C# port of java library) Handles date, time, location, people, percentage, … 16

Short-Passage Improvements Best so far:  “When” questions: Go through long passages w/NER for dates Require a year (filter out “last week” types) Center passage at NE, add surrounding tokens up to 100 characters Put these on top of short passage list MRR: (21% improvement) 17

Short-Passage Improvements Best so far: Before 18

Short-Passage Improvements Best so far: Before After 19

Short-Passage Improvements Best so far:  “When” questions (cont’d): Find dates in top 5 Aranea outputs ○ “July 3, 1995” ← not recognized as date ○ “blah blah blah July 3, 1995 blah blah blah” is Take Aranea+NER dates, then NER dates Passage matches date if year matches MRR:

Short-Passage Improvements Best so far:  “Where” questions: Basically the same as “When” Use “location” instead of “date” NER Location matches passage if: ○ >50% of NE chars are in exact token matches MRR: Ick! 21

Short-Passage Improvements Best so far:  Trying to fix “Where” logic: “…blah location blah…” trick doesn’t work well Lots of news stories starting with locations: ○ Examples: REFUGEE. OXFORD, England _ ARGENTAN, France (AP) _ William J. Broad. RELIGION-COLUMN (Undated) _ The weekly religion column. By Gustav Niebuhr words. &UR; COMMENTARY (k) &LR; NYHAN-COLUMN (Chappaqua, N.Y.) -- The wife of the outgo ○ Filter these if: _ or – has a “)” or 5+ caps in 15 chars to left Remove duplicate passages ○ “Jacksonville” and “Florida” both match “Jacksonville, Florida” If short passages have locations, put those first MRR:

Short-Passage Improvements Best so far:  Trying to fix “Where” logic: Don’t put all short passage locations over long passage locations ○ Only if in the top 5 short passages MRR:

Short-Passage Improvements Best so far:  Wikipedia Bing query for question targets only ○ site://wikipedia.org restriction Parse factbox as key/value pairs Match question terms, factbox keys ○ Levenshtein Distance poor man’s stemmer ○ NER for dates only (“When” Qs) MRR:

Short-Passage Improvements Best so far:  Revisiting Aranea output Before: Now that 100-character passages are doing better, try Question+Target again MRR: Passage SizeQuestion OnlyQuestion + TargetIndri Input

Final Results 2004 Baseline vs. Final Passage Size InitialFinalImprovement % % % 26

Future Work  Get re-querying w/Aranea to work  Improve location parsing  Add person, organization NER  Expand Wikipedia beyond dates 27

Conclusions  Indri: good on long, not on short  Aranea was very useful  NER on dates was similarly effective  Location NER was difficult but workable  Overall NER was the best Even with many more places to use NER left  Looking at the data is essential  Plenty to do – prioritization is important 28

Questions? 29