Download presentation
Presentation is loading. Please wait.
1
Information Retrieval in Department 1
Visit of the Scientific Advisory Board Saarbrücken, June 2nd – 3rd, 2005 Holger Bast Max-Planck-Institut für Informatik (MPII) Saarbrücken, Germany
2
… and a lot of interaction with Gerhard Weikum's group
How it got started … I shifted from formerly very theoretical work … … to information retrieval topics Over time a number of PhD/Master/Bachelor students joined in … Regis Newo Christian Klein Josiane: master now PhD in Gerhard's group, Deb PhD in D1, Ingmar PhD in D1, Thomas joined one of our projects after his PhD with Gerhard's group: joint master students, ADFOCS summer school, EU-project DELIS, we will intensify this … Ingmar Weber Benedikt Grundmann Daniel Fischer Christian Mortensen Thomas Warken … and a lot of interaction with Gerhard Weikum's group Josiane Parreira Debapriyo Majumdar
3
What we are doing … Motivation
even basic retrieval tasks are still far from being solved satisfactorily, e.g. searching my Two main research areas in the past 2 years Concept-based retrieval Searching with Autocompletion This presentation main idea behind these areas lots of demos and examples highlight two results most search facilities I encounter, do not really make me happy …
4
Concept-Based Retrieval
Hawaii, 2nd June 2004 Dear Pen Pal, I am writing to you from Hawaii. They have got internet access right on the beach here, isn’t that great? I’ll go surfing now! your friend, CB Equally dissimilar to query! internet 2 1 web surfing beach hawaii a document expressed in terms a query 1 Now that sounds nice, but does it work?
5
Concept-Based Retrieval
internet 2 1 web surfing beach hawaii a document expressed in terms a query 1 query expressed in concepts 1 1 .5 WWW Hawaii document expressed in concepts Now that sounds nice, but does it work?
6
Concept-Based Retrieval
internet 2 1 web surfing beach hawaii a document expressed in terms a concept expressed in terms 2 1 1 .5 WWW Hawaii document expressed in concepts Now that sounds nice, but does it work?
7
Concept-Based Retrieval
internet 2 1 web surfing beach hawaii 2 1 1 .5 WWW Hawaii ● matrix multiplication Now that sounds nice, but does it work?
8
Concept-Based Retrieval
internet 2 1 web surfing beach .5 hawaii 2 1 1 .5 WWW Hawaii ● matrix multiplication Now that sounds nice, but does it work? Finding concepts = approximate low-rank matrix decomposition The approximation actually adds to the precision
9
A Concrete Example 676 abstracts from the Max-Planck-Institute
for example: We present two theoretically interesting and empirically successful techniques for improving the linear programming approaches, namely graph transformation and local cuts, in the context of the Steiner problem. We show the impact of these techniques on the solution of the largest benchmark instances ever solved. 3283 words (words like and, or, this, … removed) abstracts come from 5 departments: Algorithms, Logic, Graphics, CompBio, Databases reduce to 10 concepts
10
→ every fixed number of concepts is wrong!
How many concepts? Bast/Majumdar SIGIR 2005 Implicitly, the matrix decomposition assigns a relatedness score to each pair of terms relatedness voronoi / diagram 200 400 600 number of concepts logic / logics 200 400 600 number of concepts logic / voronoi 200 400 600 number of concepts → every fixed number of concepts is wrong!
11
we instead assess the shape of the curves!
How many concepts? Bast/Majumdar SIGIR 2005 Implicitly, the matrix decomposition assigns a relatedness score to each pair of terms relatedness voronoi / diagram 200 400 600 number of concepts logic / logics 200 400 600 number of concepts logic / voronoi 200 400 600 number of concepts we instead assess the shape of the curves!
12
Searching with Autocompletion
An interactive search technology suggests completions of the word that is currently being typed along with that, hits are displayed (for the yet to be completed query) best understood by example and you can try it yourself via the new MPII webpages
13
all this with a single functionality
Useful in many ways Learn about formulations used in the collection e.g. "guestbook" Minimum of information required e.g. people's names Gives stemming functionality (without stemmer) e.g. "raghavans", "raghavan3", … Gives error-correction functionality (without error-correction) e.g. "raghvan", "ragavan", … Database-like queries e.g. publications by Kurt Mehlhorn all this with a single functionality no dictionary, no training, readily applicable to any collection
14
The core algorithmic problem
Given a set of documents D (the hits of the preceding part of the query) a range of words W (all completions of the last word the user has started typing) Compute the subset of documents D' ⊆ D that contain at least one word from W the subset of words W' ⊆ W that occur in at least one document of D typically |W'| << |W| D = 17, 23, 48, 116, … raga 11, 47, 97, 134, … ragade 15, 77, 214, … ragan 58, 917, … ragchi 6, 107, 514, … ragavan 23, 118, … rage 211 raged 6, 111, 517, … ragen 37, 919, … ragged 14, 77, 112, 245, … raggett 17, 51, 116, … raggio 7, 22, 50, 714, … raghavan 23, 57, 116, …
15
The core algorithmic problem
Given a set of documents D (the hits of the preceding part of the query) a range of words W (all completions of the last word the user has started typing) Compute the subset of documents D' ⊆ D that contain at least one word from W the subset of words W' ⊆ W that occur in at least one document of D typically |W'| << |W| D = 17, 23, 48, 116, … raga 11, 47, 97, 134, … ragade 15, 77, 214, … ragan 58, 917, … ragchi 6, 107, 514, … ragavan 23, 118, … rage 211 raged 6, 111, 517, … ragen 37, 919, … ragged 14, 77, 112, 245, … raggett 17, 51, 116, … raggio 7, 22, 50, 714, … raghavan 23, 57, 116, … Bast/Mortensen/Weber ~|W'| time per query Ordinary Inverted Index ~|W| time per query
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.