JAVELIN Project Briefing AQUAINT Program 1 AQUAINT 6-month Meeting 10/08/04 JAVELIN II: Scenarios and Variable Precision Reasoning for Advanced QA from.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Overview of the KBP 2013 Slot Filler Validation Track Hoa Trang Dang National Institute of Standards and Technology.
Information Retrieval in Practice
Search Engines and Information Retrieval
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
J. Turmo, 2006 Adaptive Information Extraction Summary Information Extraction Systems Multilinguality Introduction Language guessers Machine Translators.
INFO 624 Week 3 Retrieval System Evaluation
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
Methodology Conceptual Database Design
Introduction to Machine Learning Approach Lecture 5.
Overview of Search Engines
Information Retrieval – and projects we have done. Group Members: Aditya Tiwari ( ) Harshit Mittal ( ) Rohit Kumar Saraf ( ) Vinay.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Introduction to Systems Analysis and Design Trisha Cummings.
JAVELIN Project Briefing 1 AQUAINT Year I Mid-Year Review Language Technologies Institute Carnegie Mellon University Status Update for Mid-Year Program.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
Search Engines and Information Retrieval Chapter 1.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Survey of Semantic Annotation Platforms
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
METEOR-Ranking & M-BLEU: Flexible Matching & Parameter Tuning for MT Evaluation Alon Lavie and Abhaya Agarwal Language Technologies Institute Carnegie.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Carnegie Mellon School of Computer Science Copyright © 2001, Carnegie Mellon. All Rights Reserved. JAVELIN Project Briefing 1 AQUAINT Phase I Kickoff December.
Answering Definition Questions Using Multiple Knowledge Sources Wesley Hildebrandt, Boris Katz, and Jimmy Lin MIT Computer Science and Artificial Intelligence.
AQUAINT 18-Month Workshop 1 Light Semantic Processing for QA Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
JAVELIN Project Briefing AQUAINT Program 1 AQUAINT Workshop, October 2005 JAVELIN Project Briefing Eric Nyberg, Teruko Mitamura, Jamie Callan, Robert Frederking,
Querying Web Data – The WebQA Approach Author: Sunny K.S.Lam and M.Tamer Özsu CSI5311 Presentation Dongmei Jiang and Zhiping Duan.
Department of Software and Computing Systems Research Group of Language Processing and Information Systems The DLSIUAES Team’s Participation in the TAC.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Automatic Question Answering  Introduction  Factoid Based Question Answering.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Semi-automatic Product Attribute Extraction from Store Website
AQUAINT IBM PIQUANT ARDACYCORP Subcontractor: IBM Question Answering Update piQuAnt ARDA/AQUAINT December 2002 Workshop This work was supported in part.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
AQUAINT AQUAINT Evaluation Overview Ellen M. Voorhees.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Keyword Translation Accuracy and Cross-Lingual Question Answering in Chinese and Japanese Teruko Mitamura Mengqiu Wang Hideki Shima Frank Lin In CMU EACL.
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Text Summarization using Lexical Chains. Summarization using Lexical Chains Summarization? What is Summarization? Advantages… Challenges…
1 Evaluation of Multi-Media Data QA Systems AQUAINT Breakout Session – June 2002 Howard Wactlar, Carnegie Mellon Yiming Yang, Carnegie Mellon Herb Gish,
PROGRAMMING FUNDAMENTALS INTRODUCTION TO PROGRAMMING. Computer Programming Concepts. Flowchart. Structured Programming Design. Implementation Documentation.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
Information Retrieval in Practice
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Search Engine Architecture
Multimedia Information Retrieval
AMTEXT: Extraction-based MT for Arabic
Presentation transcript:

JAVELIN Project Briefing AQUAINT Program 1 AQUAINT 6-month Meeting 10/08/04 JAVELIN II: Scenarios and Variable Precision Reasoning for Advanced QA from Multilingual, Distributed Sources Eric Nyberg, Teruko Mitamura, Jamie Callan, Jaime Carbonell, Bob Frederking Language Technologies Institute Carnegie Mellon University

JAVELIN Project Briefing AQUAINT Program 2 AQUAINT 6-month Meeting 10/08/04 JAVELIN II Research Areas Emerging Fact Base 1. Scenario Dialog User: “I’m focusing on the new Iraqi minister Al Tikriti. What can you tell me about his family and associates?” 6. Answer Visualization and Scenario Refinement System: <displays instantiated scenario> User: “Can you find more about his brother-in-law’s business associates?” 2. Scenario Representation e1 e2e3 r1r2 e4 e5 r3r4 4. Multi-Strategy Information Gathering NL Parsing Statistical Extraction Pattern Matching Relevant Documents 3. Distributed, Multilingual Retrieval 5. Variable-Precision Knowledge Representation & Reasoning Scenario Reasoning Search Guidance Belief Revision Answer Justification

JAVELIN Project Briefing AQUAINT Program 3 AQUAINT 6-month Meeting 10/08/04 Recent Highlights Multi-Strategy Information Gathering –Participation in Relationship Pilot –Training Extractors with Minor Third Variable-Precision KR and Reasoning –Text Processor Module (1 st version complete) –Fact Base (1 st prototype complete) Distributed, Multilingual QA –Keyword Translation for CLQA (English to Chinese)

JAVELIN Project Briefing AQUAINT Program 4 AQUAINT 6-month Meeting 10/08/04 Relationship Pilot 50 sample scenarios, e.g. The analyst is interested in knowing if a particular country is a member of an international organization. Is Chechnya a member of the United Nations? Phase I JAVELIN system was used with manual tweaking Output of Question Analyzer module was manually corrected –Decompose into subquestions (17 of 50 scenarios) –Gather key terms from background text

JAVELIN Project Briefing AQUAINT Program 5 AQUAINT 6-month Meeting 10/08/04 NIST Evaluation Methodology Two categories of information “nuggets” vital : must be present okay : relevant but not necessary Each item could match more than one nugget Recall determined by vital nuggets Precision based on answer length Computed F-scores with recall 3 times as important as precision

JAVELIN Project Briefing AQUAINT Program 6 AQUAINT 6-month Meeting 10/08/04 JAVELIN Performance Statistics Average F-score computed by NIST Average F-score with recall based on both vital and okay nuggets0.322 Total scenarios with F=10 Total scenarios with all vital information correct: 9 1/1 – 18, 19, 36, 38 2/2 – 4, 16, 34, 37 3/3 – 33 Total scenarios with F=019 Total scenarios without any (vital or okay) correct answers10 no answer found - 3, 5 bad answers - 6, 8, 10, 11, 13, 27, 29, 30

JAVELIN Project Briefing AQUAINT Program 7 AQUAINT 6-month Meeting 10/08/04 JAVELIN Performance Statistics Average recall (vital) Average precision0.261 Matches per answer item: no nuggets nugget matched 57 2 nuggets matched 10 3 nuggets matched 6 4 nuggets matched 1 Not done (but potentially useful?): determine which decomposed questions we provided relevant information for

JAVELIN Project Briefing AQUAINT Program 8 AQUAINT 6-month Meeting 10/08/04 General Observations Nugget quality and assessment varies considerably (e.g., question #3, #8) Nuggets overlap, repeat given information, sometimes represent cues not answers; doesn’t count other relevant information if it was not in the assessors’ original set Difficult to assess retrieval performance No document IDs provided in the nugget file Difficult to reproduce the precision scores Relevant text spans appear to have been manually determined and are not noted in the annotated file

JAVELIN Project Briefing AQUAINT Program 9 AQUAINT 6-month Meeting 10/08/04 A standardized testbed to build and evaluate machine learning algorithms that work on text Includes a pattern language (Mixup) for building taggers (compiles to FSTs) Can we utilize MinorThird as a factory to build new information extractors for the QA task?

JAVELIN Project Briefing AQUAINT Program 10 AQUAINT 6-month Meeting 10/08/04 Initial Training Experiments Can Minor Third train new taggers for specific tags and corpora, based on bootstrap information from existing tagger(s)? Set Up: –Use Identifinder to annotate 101 messages (focus: ORGANIZATION) –Manually fix incorrect tags –Training set: 81; Test set: 20 Experiments: –Vary training set size: 40, 61, 81 messages –Vary history size and window size parameters used by the Minor Third Learner class

JAVELIN Project Briefing AQUAINT Program 11 AQUAINT 6-month Meeting 10/08/04 Varying Size of Training Set

JAVELIN Project Briefing AQUAINT Program 12 AQUAINT 6-month Meeting 10/08/04 The Text Processor (TP) A server capable of processing text annotation requests (batch or run-time) Receives a text stream input and assigns multiple levels of tags or features Application can specify which processors to run on a text, and in what order Provides a single API for a variety of processors: –Brill Tagger –BBN Identifinder –MXTerminator –Link Parser –RASP –WordNet –CLAWS –FrameNet

JAVELIN Project Briefing AQUAINT Program 13 AQUAINT 6-month Meeting 10/08/04 TP Object Model

JAVELIN Project Briefing AQUAINT Program 14 AQUAINT 6-month Meeting 10/08/04 Fact Base Relational data model containing: –Documents and metadata –Standoff annotations for: Linguistic analysis (segmentation, POS, parsing, predicate extraction) Semantic interpretation (frame filling -> facts/events/etc.) Reasoning (reference resolution, inference)

JAVELIN Project Briefing AQUAINT Program 15 AQUAINT 6-month Meeting 10/08/04 Fact Base [2] Text Processor API SegmenterTaggersParsersFramers Text 1. Relevant documents or passages are processed by the TP modules Features 2. Results are stored as features on text spans Facts 3. Extracted frames are stored as possible facts, events, etc. * All derived information directly linked to input source(s) at each level * Persistent storage in RDBMS supports: - training/learning on any combination of features - reuse of results across sessions, analysts, etc. when appropriate - use of relational querying for association chains (cf. G. Bhalotia, et al., Keyword searching and browsing in databases using BANKS. In ICDE, San Jose, CA, 2002.)

JAVELIN Project Briefing AQUAINT Program 16 AQUAINT 6-month Meeting 10/08/04 CLQA: The Keyword Translation Problem Given keywords extracted from the question, how do we correctly translate them into languages of the information sources? Keyword Translator Keywords in Language B Keywords in Language A Keywords in Language C

JAVELIN Project Briefing AQUAINT Program 17 AQUAINT 6-month Meeting 10/08/04 Tools For Query/Keyword Translation Machine Readable Dictionaries (MRD) –Pros: Easily obtained for high-density languages Domain-specific dictionaries provide good coverage in-domain –Cons: Publicly available general dictionaries usually have low coverage Cannot translate sentences MT Systems –Pros: Usually provide more coverage than publicly available MRD Translate whole sentences –Cons: Translation quality varies Low language-pair coverage compared to MRD Parallel Corpora –Pros: Good for domain-specific translation –Cons: Poor for open-domain translation

JAVELIN Project Briefing AQUAINT Program 18 AQUAINT 6-month Meeting 10/08/04 Tools For Query/Keyword Translation Machine Readable Dictionaries (MRD) –Pros: Easily obtained for high-density languages Domain-specific dictionaries provide good coverage in-domain –Cons: Publicly available general dictionaries usually have low coverage Cannot translate sentences MT Systems –Pros: Usually provide more coverage than publicly available MRD Translate whole sentences –Cons: Translation quality varies Low language-pair coverage compared to MRD Parallel Corpora –Pros: Good for domain-specific translation –Cons: Poor for open-domain translation

JAVELIN Project Briefing AQUAINT Program 19 AQUAINT 6-month Meeting 10/08/04 Research Questions Can we improve keyword translation correctness by building a keyword selection model that selects one translation from translations produced by multiple MT systems? Can we improve keyword translation correctness by using the question sentence?

JAVELIN Project Briefing AQUAINT Program 20 AQUAINT 6-month Meeting 10/08/04 The Translation Selection Problem Given a set of translation candidates and the question sentence, how do we select a translation that is most likely a correct translation of the keyword? Selection Model Source Keyword Source Question MT System 1 MT System 2 MT System 3 Target Keyword 1 Target Question 1 Score for Target Keyword 1 Selection Model Target Keyword 2 Target Question 2 Score for Target Keyword 2 Selection Model Target Keyword 3 Target Question 3 Score for Target Keyword 3

JAVELIN Project Briefing AQUAINT Program 21 AQUAINT 6-month Meeting 10/08/04 Keyword Selection Model A set of scoring metrics: –A translation candidate is assigned an initial base score of 0 –Each scoring metric adds to or subtracts from running total of the score –After all candidates go through the model, the translation candidate with the highest score is selected as the most likely correct translation

JAVELIN Project Briefing AQUAINT Program 22 AQUAINT 6-month Meeting 10/08/04 The Experiment Language Pair: From English to Chinese Uses three free web-based MT systems – – – Training Data: –50 Input questions (125 Keywords) from TREC-8, TREC-9, and TREC-10 Testing Data: –50 Input questions (147 Keywords) from TREC-8, TREC-9, and TREC-10 Evaluation: Translation correctness

JAVELIN Project Briefing AQUAINT Program 23 AQUAINT 6-month Meeting 10/08/04 Scoring Metrics In this experiment, we constructed different selection models, each uses a combination of following 5 scoring metrics: I.Baseline II.Segmented Word-Matching and Partial Word- Matching III.Full Sentence Word-Matching without Fall Back to Partial Word-Matching IV.Full Sentence Word-Matching with Fall Back to Partial Word-Matching V.Penalty for Partially Translated or Un-Translated Keywords

JAVELIN Project Briefing AQUAINT Program 24 AQUAINT 6-month Meeting 10/08/04 Scoring Metrics Summary Description of Scoring Metrics: Scoring Legend: ████ Full MatchPartial MatchSupport by >1 MTNot Fully Translated Abbr.DescriptionScoring BBaseline █ SSegmented Word-Matching and Partial Word-Matching █ ██ █ F¹F¹ Full Sentence Word-Matching without Fall Back to Partial Word- Matching █ F²F² Full Sentence Word-Matching with Fall Back to Partial Word- Matching █ ██ █ PPenalty for Partially Translated or Un-Translated Keywords █

JAVELIN Project Briefing AQUAINT Program 25 AQUAINT 6-month Meeting 10/08/04 Results ModelS¹S²S¹S² S¹S²S³S¹S²S³ B78.23%78.91% B+S61.90%64.63% B+F ¹ 80.27%80.95% B+F ² 75.51%78.91% B+P78.23%78.91% B+F ¹ +P82.99%85.71% B+F ² +P78.23%83.67% ModelS¹S²S¹S² S¹S²S³S¹S²S³ B0% B+S-20.87%-18.10% B+F ¹ 2.61%2.59% B+F ² -3.48%0.00% B+P0.00% B+F ¹ +P6.08%8.62% B+F ² +P0.00%6.95% Keyword Translation Accuracy of Different Models on the Test Set Improvement of Different Models over the Base Model [Lin, F. and T. Mitamura, “Keyword Translation from English to Chinese for Multilingual QA”, Proceedings of AMTA 2004, Georgetown.]

JAVELIN Project Briefing AQUAINT Program 26 AQUAINT 6-month Meeting 10/08/04 Results ModelS¹S²S¹S² S¹S²S³S¹S²S³ B78.23%78.91% B+S61.90%64.63% B+F ¹ 80.27%80.95% B+F ² 75.51%78.91% B+P78.23%78.91% B+F ¹ +P82.99%85.71% B+F ² +P78.23%83.67% Keyword Translation Accuracy of Different Models on the Test Set Best single MT system performance: 78.23% Best multiple MT model performance: 85.71% Best possible result if the correct keywords are selected every time they are produced: 92.52% [Lin, F. and T. Mitamura, “Keyword Translation from English to Chinese for Multilingual QA”, Proceedings of AMTA 2004, Georgetown.]

JAVELIN Project Briefing AQUAINT Program 27 AQUAINT 6-month Meeting 10/08/04 Observations Models which include scoring metrics that require segmentation did poorly Using more MT systems improves translation correctness Using the translated question improves keyword translation accuracy There is still room for improvement (85.71% to 92.52%)

JAVELIN Project Briefing AQUAINT Program 28 AQUAINT 6-month Meeting 10/08/04 More to Do… Use statistical/machine learning techniques –Result of each scoring metric a feature in a classification problem (SVM, MaxEnt) –Train weights for each scoring metric (EM) Use additional / improved scoring metrics –Validate translation using search engines –Use better segmentation tools Compare with other evaluation methods –retrieval performance –end-to-end system (QA) performance

JAVELIN Project Briefing AQUAINT Program 29 AQUAINT 6-month Meeting 10/08/04 Questions?