With or without users? Julio Gonzalo UNEDhttp://nlp.uned.es
The classical IR model query Relevant docs (precise) Information need (fixed) Document collection Query expansion Formal models Indexing Clustering Query/document comparison Data structures Weighting heuristics Visualization feedback Filtering Goal: all relevant information and only relevant information
Does it apply to web search?
Is Relevance what the user needs? Most frequent questions, Infoseek 1999 (SIGIR Forum) 1.Empty question 2.sex 8. Pamela Anderson (first multiword question in the rank) Google No! It is quality, saliency, reliability... In one or two links
Is word frequency useful?
Pagerank addresses user needs Clasificados.wanadoo.es Realizadores.tv Chat.rincondelvago.com mx.dir.yahoo.com telecinco ¡ El texto de los enlaces es el más valioso para indexar!
With or without users? Google’s first commandment: Focus on the user and all the rest will come along. Google’s first commandment: Focus on the user and all the rest will come along. “With or without users?” is not the right question “With or without users?” is not the right question “With or without user focus?” YES “With or without user focus?” YES
Is CLEF focusing on users? Multilingual track: If I have equivalent sets of relevant news in many languages, I do not want a merged set. I want the subset in my native language! Multilingual track: If I have equivalent sets of relevant news in many languages, I do not want a merged set. I want the subset in my native language! Q&A track: How much does it take to find an answer with an IR engine? (Ask QA assessors!!) Q&A track: How much does it take to find an answer with an IR engine? (Ask QA assessors!!) Interactive track: natural user task, but artificial users! Interactive track: natural user task, but artificial users! Only image CLEF & GIRT partially pass the test Only image CLEF & GIRT partially pass the test Why the intersection between ECDL and CLEF is almost null? Why the intersection between ECDL and CLEF is almost null? Multilingual web track: danger of making the same pre-google mistake. Multilingual web track: danger of making the same pre-google mistake.
The web is truly multilingual by nature... But the web is redundant, and average users are looking for a single perfect link!! Almost no need for cross-language users (cf Google)
Vertical search engines? Structured data Information need Web pages extraction query
Conclusions We need more focus on user needs... We need more focus on user needs And all the rest will come along!... And all the rest will come along! Tenth Google’s commandment: great just isn’t good enough Tenth Google’s commandment: great just isn’t good enough