Download presentation
Presentation is loading. Please wait.
Published byThomasine Sherman Modified over 9 years ago
1
22 August 2003CLEF 2003 Answering Spanish Questions from English Documents Abdessamad Echihabi, Douglas W. Oard, Daniel Marcu, Ulf Hermjakob USC Information Sciences Institute
2
Outline Development collection Choosing a cross-language approach TextMap-TMT architecture What did we learn?
3
Cross-Language QA Evaluation conditions: –English documents (LA Times) –200 questions (we chose Spanish) –Exact answers ISI development collection: –English documents (TREC-2003 QA track) –100 Spanish questions (translated, from TREC) –Answer patterns
4
Design Space Architecture –Question translation + English QA –Document translation + Spanish QA –Mix of language-specific + translation components Translation approaches –Statistical MT, trained on European Parliament –Transfer-method MT (Systran on the Web) –Human translation (as an upper bound)
5
TextMap-TMT Architecture [“Alaska became a state on”] OR [Alaska became a state in”] OR … [Alaska (49)] [state (7)] [{January 3 1959} OR {1867} OR {1959} (3)] cuanto-> whichever whichever-> “how many”” DATE ISLAND BASEBALL-SPORTS-TEAM Optional
6
Development Test Results
7
Official Results Top-1, Validated Top-1, Not validated Top-3, Not validated
8
Lessons Learned Cross-Language QA is a tractable problem –Better than 25% @ top 1, almost 40% @ top 3! Our best MT systems are statistical –But our best QA systems are heavily rule-based Virtually every component needs to be redone –As complex as making a new monolingual system Strong synergy with CLIR is possible –Web search, local collection search
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.