Information Retrieval in Department 1

Slides:



Advertisements
Similar presentations
Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos.
Advertisements

Chapter 5: Introduction to Information Retrieval
CS 430 / INFO 430 Information Retrieval
Evaluating Search Engine
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Natural Language Query Interface Mostafa Karkache & Bryce Wenninger.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Web Search – Summer Term 2006 II. Information Retrieval (Basics) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web Search – Summer Term 2006 II. Information Retrieval (Basics Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
CSCI 5417 Information Retrieval Systems Jim Martin Lecture 6 9/8/2011.
 Fatemeh Lashkari UNB University May 7 th  Indexing  Semantic Search  Semantic Search Architecture  Index process  Index Maintenance.
The College of Saint Rose CSC 460 / CIS 560 – Search and Information Retrieval David Goldschmidt, Ph.D. from Search Engines: Information Retrieval in Practice,
Latent Semantic Indexing Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
MPI Informatik 1/17 Oberseminar AG5 Result merging in a Peer-to-Peer Web Search Engine Supervisors: Speaker : Sergey Chernov Prof. Gerhard Weikum Christian.
The CompleteSearch Engine: Interactive, Efficient, and Towards IR&DB integration Holger Bast Max-Planck-Institut für Informatik Saarbrücken, Germany joint.
The CompleteSearch Engine: Interactive, Efficient, and Towards IR&DB Integration Holger Bast, Ingmar Weber Max-Planck-Institut für Informatik CIDR 2007)
Numerical Methods Applications of Loops: The power of MATLAB Mathematics + Coding 1.
LIS618 lecture 2 the Boolean model Thomas Krichel
Query Routing in Peer-to-Peer Web Search Engine Speaker: Pavel Serdyukov Supervisors: Gerhard Weikum Christian Zimmer Matthias Bender International Max.
Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.
CONDUCTING RESEARCH How to find information on the Internet.
Type Less, Find More: Fast Autocompletion Search with a Succinct Index Holger Bast Max-Planck-Institut für Informatik Saarbrücken, Germany joint work with.
The CompleteSearch Engine: Interactive, Efficient, and Towards IR&DB Integration Holger Bast, Ingmar Weber CIDR 2007) Conference on Innovative Data Systems.
Cache-Conscious Performance Optimization for Similarity Search Maha Alabduljalil, Xun Tang, Tao Yang Department of Computer Science University of California.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Alexey Kolosoff, Michael Bogatyrev 1 Tula State University Faculty of Cybernetics Laboratory of Information Systems.
Introduction to Information Retrieval Introduction to Information Retrieval CS276 Information Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan.
Most of contents are provided by the website Introduction TJTSD66: Advanced Topics in Social Media Dr.
1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret
NRA Top k query processing using Non Random Access Only sequential access Only sequential accessAlgorithm 1) 1) scan index lists in parallel; 2) 2) consider.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Visualization in Text Information Retrieval Ben Houston Exocortex Technologies Zack Jacobson CAC.
Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,
Concept-based P2P Search How to find more relevant documents Ingmar Weber Max-Planck-Institute for Computer Science Joint work with Holger Bast Torino,
- Murtuza Shareef Authoritative Sources in a Hyperlinked Environment More specifically “Link Analysis” using HITS Algorithm.
Web Information Retrieval Textbook by Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze Notes Revised by X. Meng for SEU May 2014.
1 CS 430: Information Discovery Lecture 5 Ranking.
Why Spectral Retrieval Works Holger Bast Max-Planck-Institut für Informatik (MPII) Saarbrücken, Germany joint work with Debapriyo Majumdar SIGIR 2005 in.
A Powerful Principle for Automatically Finding Concepts in Unstructured Data Holger Bast Max-Planck-Institut für Informatik (MPII) Saarbrücken, Germany.
Cool algorithms for a cool feature Holger Bast Max-Planck-Institut für Informatik (MPII) Saarbrücken, Germany joint work with Christian Mortensen and Ingmar.
Laptop St Mary Star of the Sea College Mathematics Teacher’s Day University of Wollongong 2007 Gerry Sozio.
CONDUCTING RESEARCH How to find information on the Internet.
Introduction to CSCI 1311 Dr. Mark C. Lewis
Information Retrieval in Practice
An Efficient Algorithm for Incremental Update of Concept space
INTERMEDIATE PROGRAMMING WITH JAVA
Text Based Information Retrieval
The core algorithmic problem Ordinary Inverted Index
Type Less, Find More: Fast Autocompletion Search with a Succinct Index
Prepared by Rao Umar Anwar For Detail information Visit my blog:
Information Retrieval on the World Wide Web
Symbolic Implementation of the Best Transformer
The Five Stages of Writing
How does Google search for everything? Computer Science at Work
Disambiguation Algorithm for People Search on the Web
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
The Five Stages of Writing
MAP MAKING USING LINEAR ALGEBRA & STATISTICS
Joins and other advanced Queries
Actively Learning Ontology Matching via User Interaction
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
A Few Sample Reductions
Information Retrieval and Web Design
Introduction to Search Engines
Information Retrieval
Citation databases and social networks for researchers: measuring research impact and disseminating results - exercise Elisavet Koutzamani
Presentation transcript:

Information Retrieval in Department 1 Visit of the Scientific Advisory Board Saarbrücken, June 2nd – 3rd, 2005 Holger Bast Max-Planck-Institut für Informatik (MPII) Saarbrücken, Germany

… and a lot of interaction with Gerhard Weikum's group How it got started … I shifted from formerly very theoretical work … … to information retrieval topics Over time a number of PhD/Master/Bachelor students joined in … Regis Newo Christian Klein Josiane: master now PhD in Gerhard's group, Deb PhD in D1, Ingmar PhD in D1, Thomas joined one of our projects after his PhD with Gerhard's group: joint master students, ADFOCS summer school, EU-project DELIS, we will intensify this … Ingmar Weber Benedikt Grundmann Daniel Fischer Christian Mortensen Thomas Warken … and a lot of interaction with Gerhard Weikum's group Josiane Parreira Debapriyo Majumdar

What we are doing … Motivation even basic retrieval tasks are still far from being solved satisfactorily, e.g. searching my Email Two main research areas in the past 2 years Concept-based retrieval Searching with Autocompletion This presentation main idea behind these areas lots of demos and examples highlight two results most search facilities I encounter, do not really make me happy …

Concept-Based Retrieval Hawaii, 2nd June 2004 Dear Pen Pal, I am writing to you from Hawaii. They have got internet access right on the beach here, isn’t that great? I’ll go surfing now! your friend, CB Equally dissimilar to query! internet 2 1 web surfing beach hawaii a document expressed in terms a query 1 Now that sounds nice, but does it work?

Concept-Based Retrieval internet 2 1 web surfing beach hawaii a document expressed in terms a query 1 query expressed in concepts 1 1 .5 WWW Hawaii document expressed in concepts Now that sounds nice, but does it work?

Concept-Based Retrieval internet 2 1 web surfing beach hawaii a document expressed in terms a concept expressed in terms 2 1 1 .5 WWW Hawaii document expressed in concepts Now that sounds nice, but does it work?

Concept-Based Retrieval internet 2 1 web surfing beach hawaii 2 1 1 .5 WWW Hawaii ● matrix multiplication Now that sounds nice, but does it work?

Concept-Based Retrieval internet 2 1 web surfing beach .5 hawaii 2 1 1 .5 WWW Hawaii ● matrix multiplication Now that sounds nice, but does it work? Finding concepts = approximate low-rank matrix decomposition The approximation actually adds to the precision

A Concrete Example 676 abstracts from the Max-Planck-Institute for example: We present two theoretically interesting and empirically successful techniques for improving the linear programming approaches, namely graph transformation and local cuts, in the context of the Steiner problem. We show the impact of these techniques on the solution of the largest benchmark instances ever solved. 3283 words (words like and, or, this, … removed) abstracts come from 5 departments: Algorithms, Logic, Graphics, CompBio, Databases reduce to 10 concepts

→ every fixed number of concepts is wrong! How many concepts? Bast/Majumdar SIGIR 2005 Implicitly, the matrix decomposition assigns a relatedness score to each pair of terms relatedness voronoi / diagram 200 400 600 number of concepts logic / logics 200 400 600 number of concepts logic / voronoi 200 400 600 number of concepts → every fixed number of concepts is wrong!

we instead assess the shape of the curves! How many concepts? Bast/Majumdar SIGIR 2005 Implicitly, the matrix decomposition assigns a relatedness score to each pair of terms relatedness voronoi / diagram 200 400 600 number of concepts logic / logics 200 400 600 number of concepts logic / voronoi 200 400 600 number of concepts we instead assess the shape of the curves!

Searching with Autocompletion An interactive search technology suggests completions of the word that is currently being typed along with that, hits are displayed (for the yet to be completed query) best understood by example and you can try it yourself via the new MPII webpages

all this with a single functionality Useful in many ways Learn about formulations used in the collection e.g. "guestbook" Minimum of information required e.g. people's names Gives stemming functionality (without stemmer) e.g. "raghavans", "raghavan3", … Gives error-correction functionality (without error-correction) e.g. "raghvan", "ragavan", … Database-like queries e.g. publications by Kurt Mehlhorn all this with a single functionality no dictionary, no training, readily applicable to any collection

The core algorithmic problem Given a set of documents D (the hits of the preceding part of the query) a range of words W (all completions of the last word the user has started typing) Compute the subset of documents D' ⊆ D that contain at least one word from W the subset of words W' ⊆ W that occur in at least one document of D typically |W'| << |W| D = 17, 23, 48, 116, … raga 11, 47, 97, 134, … ragade 15, 77, 214, … ragan 58, 917, … ragchi 6, 107, 514, … ragavan 23, 118, … rage 211 raged 6, 111, 517, … ragen 37, 919, … ragged 14, 77, 112, 245, … raggett 17, 51, 116, … raggio 7, 22, 50, 714, … raghavan 23, 57, 116, …

The core algorithmic problem Given a set of documents D (the hits of the preceding part of the query) a range of words W (all completions of the last word the user has started typing) Compute the subset of documents D' ⊆ D that contain at least one word from W the subset of words W' ⊆ W that occur in at least one document of D typically |W'| << |W| D = 17, 23, 48, 116, … raga 11, 47, 97, 134, … ragade 15, 77, 214, … ragan 58, 917, … ragchi 6, 107, 514, … ragavan 23, 118, … rage 211 raged 6, 111, 517, … ragen 37, 919, … ragged 14, 77, 112, 245, … raggett 17, 51, 116, … raggio 7, 22, 50, 714, … raghavan 23, 57, 116, … Bast/Mortensen/Weber ~|W'| time per query Ordinary Inverted Index ~|W| time per query