Fuzzy Multi-Dimensional Search in the Wayfinder File System Christopher Peery, Wei Wang, Amélie Marian, Thu D. Nguyen Computer Science Department, Rutgers.

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Indexing. Efficient Retrieval Documents x terms matrix t 1 t 2... t j... t m nf d 1 w 11 w w 1j... w 1m 1/|d 1 | d 2 w 21 w w 2j... w 2m 1/|d.
Basic IR: Modeling Basic IR Task: Slightly more complex:
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
ISYS Search Software Personal Edition for CJA Attorneys.
IR Models: Overview, Boolean, and Vector
Information Retrieval in Practice
Architecture of a Search Engine
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
Suggestion of Promising Result Types for XML Keyword Search Joint work with Jianxin Li, Chengfei Liu and Rui Zhou ( Swinburne University of Technology,
1 The Four Dimensions of Search Engine Quality Jan Pedersen Chief Scientist, Yahoo! Search 19 September 2005.
Improving Query Results using Answer Corroboration Amélie Marian Rutgers University.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Circumventing Data Quality Problems Using Multiple Join Paths Yannis Kotidis, Athens University of Economics and Business Amélie Marian, Rutgers University.
Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities Francisco Matias Cuenca-Acuna, Christopher Peery, Richard P.
INEX 2003, Germany Searching in an XML Corpus Using Content and Structure INEX 2003, Germany Yiftah Ben-Aharon, Sara Cohen, Yael Grumbach, Yaron Kanza,
Text-Based Content Search and Retrieval in ad hoc P2P Communities Francisco Matias Cuenca-Acuna Thu D. Nguyen
IR Models: Review Vector Model and Probabilistic.
Query Relaxation for XML Database Award #: PI: Wesley W. Chu Computer Science Dept. UCLA.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer.
1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
CSE 6331 © Leonidas Fegaras Information Retrieval 1 Information Retrieval and Web Search Engines Leonidas Fegaras.
Information Retrieval Models - 1 Boolean. Introduction IR systems usually adopt index terms to process queries Index terms:  A keyword or group of selected.
PMLAB Finding Similar Image Quickly Using Object Shapes Heng Tao Shen Dept. of Computer Science National University of Singapore Presented by Chin-Yi Tsai.
The Key to Successful Searching Software patents pending. ™ Trademarks of SLICCWARE Corporation All rights reserved. SM Service Mark of SLICCWARE Corporation.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
Chapter 6: Information Retrieval and Web Search
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
IR Homework #2 By J. H. Wang Mar. 31, Programming Exercise #2: Query Processing and Searching Goal: to search relevant documents for a given query.
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
Web search basics (Recap) The Web Web crawler Indexer Search User Indexes Query Engine 1.
Web- and Multimedia-based Information Systems Lecture 2.
Johannes Kepler University Linz Department of Business Informatics Data & Knowledge Engineering Altenberger Str. 69, 4040 Linz Austria/Europe
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Vertical Search for Courses of UIUC Homepage Classification The aim of the Course Search project is to construct a database of UIUC courses across all.
1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness.
Information Retrieval in Practice
Efficient Multi-User Indexing for Secure Keyword Search
Search Engine Architecture
Inferring People’s Site Preference in Web Search
Information Retrieval and Web Search
Aidan Hogan CC Procesamiento Masivo de Datos Otoño 2018 Lecture 7 Information Retrieval: Ranking Aidan Hogan
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Naming and Directories
Multimedia Information Retrieval
Naming and Directories
Naming and Directories
Structure and Content Scoring for XML
6. Implementation of Vector-Space Retrieval
Chapter 5: Information Retrieval and Web Search
Dagstuhl Seminar on Ranked XML Querying
Structure and Content Scoring for XML
Recuperação de Informação B
Naming and Directories
Information Retrieval and Web Design
Recuperação de Informação B
Advanced information retrieval
Relax and Adapt: Computing Top-k Matches to XPath Queries
Ranking using Multiple Document Types in Desktop Search
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

Fuzzy Multi-Dimensional Search in the Wayfinder File System Christopher Peery, Wei Wang, Amélie Marian, Thu D. Nguyen Computer Science Department, Rutgers University Improve the quality of file system search by: Allowing approximate matches to query conditions Ranking results based on their relevance across multiple dimensions Unified Scoring Framework [EDBT08][SEARCH08]: Relaxation approaches defined for conditions in multiple dimension: content, metadata, and structure IDF-based scoring to support ranking of approximate matches Score decreases as number of matching files increases Individual dimension scores are aggregated into a single relevance score 1)Edge Generalization: /a/b/c  /a/b//c 2)Path Extension: /a/b  /a/b/* 3)Node Deletion: /a/b/c  /a//c 4)Node Inversion: /a/b/c  /a/(b/c) [/a/b/c or /a/c/b] Structure Index Query Path = /Personal/Ebooks/JackLondon Date Query = August 25, 2006 Example Query: [FileType = *.doc” and Content = “paper draft” and CreateDate = “11/10/07” and Path = “/pubs/submit”] Current tools will not return relevant files created around 11/10/07, similar to type.doc, or stored near “/pubs/submit” All relaxation compositions represented in a DAG Root represents exact match, leaf node represents relaxed form that matches all directories Each combination represented by node and has associated IDF score DAG is query dependent and computed at query time Indices/Algorithms developed to optimize the creation and evaluation of DAGs Exponential number of combinations Real User data set (approx. 27,000 files) Queries of varying dimensions and relaxations; built around specific target files Structure relaxation based on XML structure query relaxations Approximations are generalizations of query values Date: Exact Time Day Month Year File Type: PDF Documents All Files IDF score associated with each approximation Relaxations/Index are represented as a DAG Leaf to root node represents exact match to most general approximation IDF scores decrease from leaves to root IDF-based dimension scores measure similar information Vector projection used to aggregate scores from multiple dimensions [EDBT08] C. Peery, W. Wang, A. Marian, and T. D. Nguyen. Multi-Dimensional Search for Personal Information Management Systems. [SEARCH08] W. Wang, C. Peery, A. Marian, and T. D. Nguyen. Efficient Multi-Dimensional Query Processing in Personal Information Management Systems. Technical Report DCS- TR-627, Dept. of Computer Science, Rutgers University, Query Conditions Query (Content, Type, Path) RankComment Q1C49Base Query Q2C.txt/p/e/n/j2Correct Value (all Dim.) Q3C.txt6Correct Value Q4C.doc45Incorrect Value Q5CDocs.21Relaxed Value Q6C/p/e/n/j3Correct Path Q7C/j/e3Incorrect Path Q8CDocs./p/e/n15Relaxed (all Dim.) Q9CDocs./j/e2Incorrect & Relaxed Q10C.pdf/j/e2Incorrect Value (all Dim.) Keyword conditions scored using standard TFxIDF over inverted indices Exact Query Example Date Relaxation DAG: Evaluation Data Set Citations Behavior of Inaccurate Queries – 2DImpact of Flexible Multi-Dimensional Search Aggregation Metadata Index Metadata Scoring Content Scoring and Indexing Structure ScoringMotivations Contributions 107 Current desktop search tools use: Keyword search for ranking results Other dimensions such as metadata and structure for filtering results This approach is insufficient for personal file search Users may remember imprecise information about target files Limited use for filtering Highly useful to guide fuzzy search