Download presentation
Presentation is loading. Please wait.
Published byCarol Day Modified over 9 years ago
1
JAVELIN Project Briefing AQUAINT Program 1 AQUAINT Workshop, October 2005 JAVELIN Project Briefing Eric Nyberg, Teruko Mitamura, Jamie Callan, Robert Frederking, Jaime Carbonell, Matthew Bilotti, Jeongwoo Ko, Frank Lin, Lucian Lita, Vasco Pedro, Andrew Schlaikjer, Hideki Shima, Luo Si, David Svoboda Language Technologies Institute Carnegie Mellon University
2
JAVELIN Project Briefing AQUAINT Program 2 AQUAINT Workshop, October 2005 Status Update Project Start: September 30, 2004 (now in Month 13) Last Six Months: –Initial CLQA system evaluated in NTCIR (English-Japanese, English-Chinese) –Multilingual Distributed IR evaluated in CLEF competition –Initial Phase II English system in TREC relationship track
3
JAVELIN Project Briefing AQUAINT Program 3 AQUAINT Workshop, October 2005 Multilingual QA
4
JAVELIN Project Briefing AQUAINT Program 4 AQUAINT Workshop, October 2005 Javelin Multilingual QA End-to-end systems for English to Chinese and English to Japanese Participated in NTCIR-5 CLQA-1 (E-C, E-J) evaluation –http://www.slt.atr.jp/CLQA/ NTCIR5 workshop will be held in Tokyo, Japan, December 6-9, 2005
5
JAVELIN Project Briefing AQUAINT Program 5 AQUAINT Workshop, October 2005 NTCIR CLQA1 task overview EC, CC, CE, EJ, JE subtasks –Answer named entities (e.g. person name, organization name, location, artifact, date, money, time, etc.) –We were the only team that participated in both EC and EJ subtasks Question/answer data set –EC: 200 for training and formal run –EJ: 300 for training and 200 for formal run Corpus –EC: United Daily News 2000-2001 (466,564 articles) –EJ: Yomiuri Newspaper 2000-2001 (658,719 articles)
6
JAVELIN Project Briefing AQUAINT Program 6 AQUAINT Workshop, October 2005 CLQA1 Evaluation Criteria Only the top answer candidate is judged, along with its supporting document Correct answers that were not properly supported by the returned document were judged to be unsupported Answer is incorrect even if a substring of the answer is correct Issue: we found that the gold-standard document set (supported) is not complete
7
JAVELIN Project Briefing AQUAINT Program 7 AQUAINT Workshop, October 2005 MLQA Architecture QARSIXAG Keyword Translator Chinese Corpus EM Japanese Corpus Chinese IX Japanese IX English Corpus English IX Original Module/Resources ML Module/Resources
8
JAVELIN Project Briefing AQUAINT Program 8 AQUAINT Workshop, October 2005 QARSIXAG Keyword Translator Chinese Corpus EM Japanese Corpus Chinese IX Japanese IX English Corpus English IX How much did the Japan Bank for International Cooperation decide to loan to the Taiwan High-Speed Corporation?
9
JAVELIN Project Briefing AQUAINT Program 9 AQUAINT Workshop, October 2005 QARSIXAG Keyword Translator Chinese Corpus EM Japanese Corpus Chinese IX Japanese IX English Corpus English IX Answer Type = MONEY Keyword = ___________________ Answer Type = MONEY Keyword = ___________________ Bank for International Cooperation Taiwan High-Speed Corporation loan
10
JAVELIN Project Briefing AQUAINT Program 10 AQUAINT Workshop, October 2005 QARSIXAG Keyword Translator Chinese Corpus EM Japanese Corpus Chinese IX Japanese IX English Corpus English IX Answer Type = MONEY Keyword = _____________ Answer Type = MONEY Keyword = _____________ Answer Type = MONEY Keyword = ___________________ Answer Type = MONEY Keyword = ___________________ Bank for International Cooperation Taiwan High-Speed Corporation loan
11
JAVELIN Project Briefing AQUAINT Program 11 AQUAINT Workshop, October 2005 QARSIXAG Keyword Translator Chinese Corpus EM Japanese Corpus Chinese IX Japanese IX English Corpus English IX DocID = JY- 20010705J1TYMCC1300010, Confidence = 44.01 DocID = JY-20011116J1TYMCB1300010, Confidence = 42.95 : DocID = JY- 20010705J1TYMCC1300010, Confidence = 44.01 DocID = JY-20011116J1TYMCB1300010, Confidence = 42.95 :
12
JAVELIN Project Briefing AQUAINT Program 12 AQUAINT Workshop, October 2005 QARSIXAG Keyword Translator Chinese Corpus EM Japanese Corpus Chinese IX Japanese IX English Corpus English IX Answer Candidate = Confidence = 0.0718 Passage =
13
JAVELIN Project Briefing AQUAINT Program 13 AQUAINT Workshop, October 2005 QARSIXAG Keyword Translator Chinese Corpus EM Japanese Corpus Chinese IX Japanese IX English Corpus English IX Cluster and Re-rank answer candidates.
14
JAVELIN Project Briefing AQUAINT Program 14 AQUAINT Workshop, October 2005 QARSIXAG Keyword Translator Chinese Corpus EM Japanese Corpus Chinese IX Japanese IX English Corpus English IX Answer =
15
JAVELIN Project Briefing AQUAINT Program 15 AQUAINT Workshop, October 2005 Formal Run Result No. of Participants Number of Submissions MAXMIN MEDIAN AVE EC4825(33)6(8)14.5(19)15.63(19.75) EJ41125(31)0(0)17(18)12.73(14.61) Only the top answer candidate is judged Measured by number of correct answers (unsupported answers in brackets)
16
JAVELIN Project Briefing AQUAINT Program 16 AQUAINT Workshop, October 2005 With Partial Gold Standard Input a Average precision of answer-type detection b Average precision of keyword translation over 200 formal run questions c Average precision of document retrieval. Counted if correct document was ranked between 1st–15th d Average precision of answer extraction. Counted if correct answer was ranked between 1st–100th e The MRR measure of IX performance, calculated by averaging the sum of the reciprocal of each answer’s rank f Overall accuracy of the system g Accuracy including unsupported answers
17
JAVELIN Project Briefing AQUAINT Program 17 AQUAINT Workshop, October 2005 Question Analyzer, Retrieval Strategist have relatively high accuracy
18
JAVELIN Project Briefing AQUAINT Program 18 AQUAINT Workshop, October 2005 QA, RS have relatively high accuracy Translation accuracy affects overall accuracy greatly –Accuracy in RS increased by 26.5% in EC and 22.5% in EJ. –If unsupported answers are considered, there is a 10.5% improvement in accuracy for EC and 2.5% for EJ. –We found correct documents that are not in the gold- standard set. plus 22.5% plus 26.5%
19
JAVELIN Project Briefing AQUAINT Program 19 AQUAINT Workshop, October 2005 QA, RS have relatively high accuracy Translation accuracy affects overall accuracy greatly There is room for improvement in IX –Raise accuracy and reduce noise Average precision of answer extraction is calculated by counting correct answers ranked between 1st–100th Average precision of answer extraction is calculated by counting correct answers ranked between 1st–100th The MRR measure of IX performance is calculated by averaging the sum of the reciprocal of each answer’s rank
20
JAVELIN Project Briefing AQUAINT Program 20 AQUAINT Workshop, October 2005 Translation accuracy affects overall accuracy greatly QA, RS have relatively high accuracy There is room for improvement in IX –Raise accuracy and reduce noise Validation function in AG is crucial –Filter out noise in IX output –Boost up rank of correct answer Only the topmost answer candidate is judged at the end, big accuracy drop
21
JAVELIN Project Briefing AQUAINT Program 21 AQUAINT Workshop, October 2005 Next Steps for Multilingual QA Improve translation of keywords from E-C and E-J (e.g. named entity translation) Improve extraction using syntactic and semantic information in Chinese and Japanese (e.g. use of Cabocha) Improve Validation function in AG Upcoming Evaluation(s): –NTCIR CLQA-2, if available in 2006 –AQUAINT E-C definition question pilot when training/test data is available Integrate with Distributed IR (next slides)
22
JAVELIN Project Briefing AQUAINT Program 22 AQUAINT Workshop, October 2005 Current Multilingual QA Systems English Japanese Chinese English QA Chinese CLQA Japanese CLQA English Questions Answers in Japanese Answers in Chinese 3 separate systems, no distributed IR
23
JAVELIN Project Briefing AQUAINT Program 23 AQUAINT Workshop, October 2005 Future Vision English QA Chinese CLQA Japanese CLQA English Questions Answers in Japanese Answers in Chinese Distributed IR single, integrated system with distributed IR Chinese Japanese English
24
JAVELIN Project Briefing AQUAINT Program 24 AQUAINT Workshop, October 2005 Multilingual Distributed Information Retrieval
25
JAVELIN Project Briefing AQUAINT Program 25 AQUAINT Workshop, October 2005 What Is Distributed IR? A method of searching across multiple full- text search engines –“federated search”, “the hidden Web” It is important when relevant information is scattered across many search engines –Within an organization –On the Web –Which ones have the information you need?
26
JAVELIN Project Briefing AQUAINT Program 26 AQUAINT Workshop, October 2005 Many Search Engines Don’t Speak English
27
JAVELIN Project Briefing AQUAINT Program 27 AQUAINT Workshop, October 2005 Multilingual Distributed IR: Recent Progress Research Extend monolingual algorithms to multilingual environments Multilingual query-based sampling –Monolingual corpora Multilingual result-merging –Given retrieval results in N languages, produce a single multilingual ranked list
28
JAVELIN Project Briefing AQUAINT Program 28 AQUAINT Workshop, October 2005 Multilingual Distributed IR: Recent Progress Evaluation CLEF Multi-8 Ad-hoc Retrieval task –English (2), Spanish (1), French (2), Italian (2), Swedish (1), German (2), Finnish (1), Dutch (2) Why CLEF? –More languages than NTCIR More languages is harder –CLEF is focusing on result-merging this year Models uncooperative environments, where we have no control over individual search engines
29
JAVELIN Project Briefing AQUAINT Program 29 AQUAINT Workshop, October 2005 CLEF 2005: Two Cross-Lingual Retrieval Tasks Usual Ad-hoc Cross-lingual Retrieval –Cooperative search engines, under our control –English queries, documents in 8 languages, 8 search engines Multilingual Results Merging Task –Uncooperative search engines, nothing under our control We get only ranked lists of documents from each engine –We treat the task as a multilingual federated search problem Documents in language l are stored in search engine s Minimize cost of downloading, indexing and translating documents
30
JAVELIN Project Briefing AQUAINT Program 30 AQUAINT Workshop, October 2005 CLEF 2005: Usual Ad hoc Cross-lingual Retrieval For each query: 1.Four distinct retrieval methods r –Translate English queries into target language With & without pseudo relevance feedback –Translate all documents into English With/without pseudo relevance feedback –Lemur search engine 2.Combine all results from method r into a multilingual result 3.Combine results from all methods into a final result Use training data to maximize combination accuracy
31
JAVELIN Project Briefing AQUAINT Program 31 AQUAINT Workshop, October 2005 CLEF 2005: Cross-lingual Results Merging Task For each query: 1.Download a few top-ranked documents from each source 2.Create “ comparable scores ” for each downloaded document by combining results of four methods (previous slide) 3.For each downloaded document we have 4.Train language-specific, query-specific logistic models to transform any source-specific score to a comparable score 5.Estimate comparable scores for all ranked documents from each source 6.Merge documents by their comparable scores
32
JAVELIN Project Briefing AQUAINT Program 32 AQUAINT Workshop, October 2005 Multilingual Distributed IR: CLEF Results Mean Average Precision (MAP) across 40 queries Our Best Run Other Best Run Median Run Ad hoc Cross- lingual Retrieval 0.4490.3330.261 Result Merging0.4190.3290.298
33
JAVELIN Project Briefing AQUAINT Program 33 AQUAINT Workshop, October 2005 Candidate Predicates Retrieval Strategist Ontology Annotations Database Text Annotator IdentifinderASSERTMXTerminator Corpus Question Key Predicates Question Analyzer Answer Passages Answer Generator Information Extractor Ranked Predicate List Semantic Index off-line indexing Extending JAVELIN with Domain Semantics
34
JAVELIN Project Briefing AQUAINT Program 34 AQUAINT Workshop, October 2005 John S. gave Mary an orchid for her birthday. NE tagger Entity tagger Semantic Parser Ref Resolver Basic Tokens Verb expansion Unified Terms Predicate Structure Formation All tags are stand-off annotations stored in a relational data model
35
JAVELIN Project Briefing AQUAINT Program 35 AQUAINT Workshop, October 2005 Retrieval on Predicate-Argument Structure Input Question Output Answers Question Analysis Document Retrieval Post- Processing Answer Extraction “Who did Smith meet?"
36
JAVELIN Project Briefing AQUAINT Program 36 AQUAINT Workshop, October 2005 Retrieval on Predicate-Argument Structure Predicate-Argument Template ARG0ARG1 meet ?x Input Question Output Answers Question Analysis Document Retrieval Post- Processing Answer Extraction “Who did Smith meet?" Smith
37
JAVELIN Project Briefing AQUAINT Program 37 AQUAINT Workshop, October 2005 IR What the IR engine sees: ARG0ARG1 meet ?x Input Question Output Answers Question Analysis Document Retrieval Post- Processing Answer Extraction “Who did Smith meet?" Smith “Frank met Alice. Smith dislikes Bob." “Smith met Jones.” Some Retrieved Documents: Retrieval on Predicate-Argument Structure
38
JAVELIN Project Briefing AQUAINT Program 38 AQUAINT Workshop, October 2005 RDBMS Input Question Output Answers Question Analysis Document Retrieval Post- Processing Answer Extraction “Who did Smith meet?" “Frank met Alice. John dislikes Bob." “Smith met Jones.” X Matching Predicate Instance ARG0ARG1 meet Jones Smith “Jones” ARG0ARG1 meet Alice Frank ARG0ARG1 dislikes Bob John Retrieval on Predicate-Argument Structure
39
JAVELIN Project Briefing AQUAINT Program 39 AQUAINT Workshop, October 2005 Preliminary Results: TREC 2005 Relationship QA Track Partial system: –Semantic indexing not fully integrated –Question analysis module incomplete Our Goal: measure ability to retrieve relevant nuggets Submitted a second run with manual predicate bracketing (questions) Results (in MRR of relevant nuggets): –Run 1: 0.1356 –Run 2: 0.5303
40
JAVELIN Project Briefing AQUAINT Program 40 AQUAINT Workshop, October 2005 Example: Question Analysis The analyst is interested in Iraqi oil smuggling. Specifically, is Iraq smuggling oil to other countries, and if so, which countries? In addition, who is behind the Iraqi oil smuggling? interested Iraqi oil smuggling The analyst ARG0ARG1 smuggling oil Iraq ARG0 ARG1 other countries ARG2 smuggling oil Iraq ARG0 ARG1 which countries ARG2 is behind the Iraqi oil smuggling Who ARG0ARG1
41
JAVELIN Project Briefing AQUAINT Program 41 AQUAINT Workshop, October 2005 Example: Results The analyst is interested in Iraqi oil smuggling. Specifically, is Iraq smuggling oil to other countries, and if so, which countries? In addition, who is behind the Iraqi oil smuggling? 1. “The amount of oil smuggled out of Iraq has doubled since August last year, when oil prices began to increase,” Gradeck said in a telephone interview Wednesday from Bahrain. 2. U.S.: Russian Tanker Had Iraqi Oil By ROBERT BURNS, AP Military Writer WASHINGTON (AP) – Tests of oil samples taken from a Russian tanker suspected of violating the U.N. embargo on Iraq show that it was loaded with petroleum products derived from both Iranian and Iraqi crude, two senior defense officials said. 5. With no American or allied effort to impede the traffic, between 50,000 and 60,000 barrels of Iraqi oil and fuel products a day are now being smuggled along the Turkish route, Clinton administration officials estimate. (7 of 15 nuggets judged relevant)
42
JAVELIN Project Briefing AQUAINT Program 42 AQUAINT Workshop, October 2005 Next Steps Better Question Analysis –Retrain ASSERT-style annotator –or- incorporate rule-based NLP from HALO (KANTOO) Semantic Indexing and Retrieval –Moving to Indri allows exact representation of our predicate structure in the index and in queries Ranking retrieved predicate instances –Aggregating information across documents Extracting answers from predicate-argument structure
43
JAVELIN Project Briefing AQUAINT Program 43 AQUAINT Workshop, October 2005 Key Predicates using Event Semantics from a Domain Ontology possess is a precondition of operate, export, … possess is a postcondition of assemble, buy, … More useful passages are matched
44
JAVELIN Project Briefing AQUAINT Program 44 AQUAINT Workshop, October 2005 Improved Results assemble operate install develop export import manufacture
45
JAVELIN Project Briefing AQUAINT Program 45 AQUAINT Workshop, October 2005 Indexing of Predicate Structures Implemented Using Indri (October ’05) (web demo available) #combine[predicate]( buy.target #any:gpe.arg0 weapon.arg1 )
46
JAVELIN Project Briefing AQUAINT Program 46 AQUAINT Workshop, October 2005 Some Recent Papers E. Nyberg, R. Frederking, T. Mitamura, M. Bilotti, K. Hannan, L. Hiyakumoto, J. Ko, F. Lin, L. Lita, V. Pedro, and A. Schlaikjer, "JAVELIN I and II Systems at TREC 2005", notebook paper submitted to TREC 2005. "CMU JAVELIN System for NTCIR5 CLQA1", F. Lin, H. Shima, M. Wang, T. Mitamura, to appear in Proceedings of the 5th NTCIR Workshop. L. Si and J. Callan, "Modeling search engine effectiveness for federated search.", Proceedings of the Twenty Eighth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil. L. Si and J. Callan, "CLEF 2005: Multilingual retrieval by combining multiple multilingual ranked lists.", Sixth Workshop of the Cross-Language Evaluation Forum, CLEF 2005, Vienna, Austria. E. Nyberg, T. Mitamura, R. Frederking, V. Pedro, M. Bilotti, A. Schlaikjer and K. Hannan (2005). "Extending the JAVELIN QA System with Domain Semantics", to appear in Proceedings of AAAI 2005 (Workshop on Question Answering in Restricted Domains) L. Hiyakumoto, L.V. Lita, E. Nyberg (2005). “Multi-Strategy Information Extraction for Question Answering”, FLAIRS 2005, to appear. http://www.cs.cmu.edu/~ehn/JAVELIN
47
JAVELIN Project Briefing AQUAINT Program 47 AQUAINT Workshop, October 2005 Questions ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.