Download presentation
Presentation is loading. Please wait.
1
Cross-Lingual IR Salim Roukos IBM T. J. Watson Research Center 9/11/02
2
Assumptions for 2010 (Asilomar Report) 1 TB Mem, 1000 TB disk, 1B users, 1T devices=> 1b servers self-managing, very secure, and very reliable Auto-x: install, heal, adaptive, auto-tuning wizard Information discovery: metadata for describing schema, cast operations Federation across 1k, 1m databases "Find the average enterprise-wide employee salary. “ "Are there any really good Italian restaurants within 5 miles of where I live?"
3
Exploit multilingual information streams - Parallel vs comparable documents - Build Translingual search ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. ……… ………. - Xinhua - SDA -AFP -AP -...
4
xxx Docs X-lingual Retrieval: IR scoring X => E MT Ranked Docs Query English Chinese online “English” for gisting Caveat: Machine Translation isn’t perfect and queries tend to be short. English Docs French Docs Chinese Docs E => X MT E => C MT
5
From information need to query Who has the largest market share for notebooks: IBM or Dell? Q1: notebook market share Q2: laptop market share IBM Dell Q3: ThinkPad IBM Dell P(q| I) = p(q | D is R, C) ? IqD D D P(v in q| w in D) = Stemming Synonyms Translation
6
Probabilistic Models of IR P(D is R | q, C) = P(q| D is R, C) P (D is R | C) D = document C = doc collection q = query Prior Link analysis,other? LM: Beyond 1g? Currently P(q|D is R) = k p(q|D) + (1- k ) p(q) Need training data to estimate model Order 100k queries (not 1k)
7
Probabilistic Model of What? P(R| a,D, q, C) Many features in ME/MIX models word ngrams synonyms Wordnet ontologies hidden: topics, top N docs,..
8
Goal -- Give users info they are seeking in context Is XIR different from IR? Translingual search improved monolingual retrieval? Monolingual vs multilingual users How are XIR and MT related? How can we scale up? Create training sets to foster probabilistic modeling research for IR (100k queries) Modeling multilingual web: content and link structure Dialog Interaction It ’ s about modeling!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.