MindReader: Querying databases through multiple examples Yoshiharu Ishikawa (Nara Institute of Science and Technology, Japan) Ravishankar Subramanya (Pittsburgh.

Slides:



Advertisements
Similar presentations
Relevance Feedback A relevance feedback mechanism for content- based image retrieval G. Ciocca, R. Schettini 1999.
Advertisements

Ranking Multimedia Databases via Relevance Feedback with History and Foresight Support / 12 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT AND EXPLORATION.
Active Appearance Models
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Introduction Distance-based Adaptable Similarity Search
Aggregating local image descriptors into compact codes
Efficient Evaluation of k-Range Nearest Neighbor Queries in Road Networks Jie BaoChi-Yin ChowMohamed F. Mokbel Department of Computer Science and Engineering.
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
3D Shape Histograms for Similarity Search and Classification in Spatial Databases. Mihael Ankerst,Gabi Kastenmuller, Hans-Peter-Kriegel,Thomas Seidl Univ.
Similarity Search for Adaptive Ellipsoid Queries Using Spatial Transformation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa (Nara.
by Rianto Adhy Sasongko Supervisor: Dr.J.C.Allwright
Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.
Random Walks and BLAST Marek Kimmel (Statistics, Rice)
Visual Recognition Tutorial
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin Nov
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
K nearest neighbor and Rocchio algorithm
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.
An efficient and effective region-based image retrieval framework Reporter: Francis 2005/5/12.
Speaker Adaptation for Vowel Classification
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Relevance Feedback based on Parameter Estimation of Target Distribution K. C. Sia and Irwin King Department of Computer Science & Engineering The Chinese.
Query Reformulation: User Relevance Feedback. Introduction Difficulty of formulating user queries –Users have insufficient knowledge of the collection.
Presentation in IJCNN 2004 Biased Support Vector Machine for Relevance Feedback in Image Retrieval Hoi, Chu-Hong Steven Department of Computer Science.
Presented by Zeehasham Rasheed
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.
Carnegie Mellon Exact Maximum Likelihood Estimation for Word Mixtures Yi Zhang & Jamie Callan Carnegie Mellon University Wei Xu.
Planning operation start times for the manufacture of capital products with uncertain processing times and resource constraints D.P. Song, Dr. C.Hicks.
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin The Chinese.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Optimizing Learning with SVM Constraint for Content-based Image Retrieval* Steven C.H. Hoi 1th March, 2004 *Note: The copyright of the presentation material.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Approximating the Algebraic Solution of Systems of Interval Linear Equations with Use of Neural Networks Nguyen Hoang Viet Michal Kleiber Institute of.
Feature and object tracking algorithms for video tracking Student: Oren Shevach Instructor: Arie nakhmani.
1/20 Obtaining Shape from Scanning Electron Microscope Using Hopfield Neural Network Yuji Iwahori 1, Haruki Kawanaka 1, Shinji Fukui 2 and Kenji Funahashi.
Query Routing in Peer-to-Peer Web Search Engine Speaker: Pavel Serdyukov Supervisors: Gerhard Weikum Christian Zimmer Matthias Bender International Max.
Query Operations J. H. Wang Mar. 26, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
Module 1: Statistical Issues in Micro simulation Paul Sousa.
COLOR HISTOGRAM AND DISCRETE COSINE TRANSFORM FOR COLOR IMAGE RETRIEVAL Presented by 2006/8.
1 Computing Relevance, Similarity: The Vector Space Model.
Distributed Information Retrieval Server Ranking for Distributed Text Retrieval Systems on the Internet B. Yuwono and D. Lee Siemens TREC-4 Report: Further.
Relevance Feedback: New Trends Derive global optimization methods: More computationally robust Consider the correlation between different attributes Incorporate.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
Relevance Feedback in Image Retrieval Systems: A Survey Part II Lin Luo, Tao Huang, Chengcui Zhang School of Computer Science Florida International University.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
IR Theory: Relevance Feedback. Relevance Feedback: Example  Initial Results Search Engine2.
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
Daniel A. Keim, Hans-Peter Kriegel Institute for Computer Science, University of Munich 3/23/ VisDB: Database exploration using Multidimensional.
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
Design of PCA and SVM based face recognition system for intelligent robots Department of Electrical Engineering, Southern Taiwan University, Tainan County,
Content-Based Image Retrieval Using Block Discrete Cosine Transform Presented by Te-Wei Chiang Department of Information Networking Technology Chihlee.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.
Data Mining, ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics Hitotsubashi, Chiyoda-ku Tokyo,
Continual Neighborhood Tracking for Moving Objects Yoshiharu Ishikawa Hiroyuki Kitagawa Tooru Kawashima University of Tsukuba, Japan
Flat clustering approaches
Large-Scale Matrix Factorization with Missing Data under Additional Constraints Kaushik Mitra University of Maryland, College Park, MD Sameer Sheoreyy.
Dec. 13, 2003W 2 Implementation and Evaluation of an Adaptive Neighborhood Information Retrieval System for Mobile Users Yoshiharu Ishikawa.
Spatial Range Querying for Gaussian-Based Imprecise Query Objects Yoshiharu Ishikawa, Yuichi Iijima Nagoya University Jeffrey Xu Yu The Chinese University.
Relevance Feedback Hongning Wang
FastMap : Algorithm for Indexing, Data- Mining and Visualization of Traditional and Multimedia Datasets.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
You can check broken videos in this slide here :
PRAKASH CHOCKALINGAM, NALIN PRADEEP, AND STAN BIRCHFIELD
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Multivariate Methods Berlin Chen
Presentation transcript:

MindReader: Querying databases through multiple examples Yoshiharu Ishikawa (Nara Institute of Science and Technology, Japan) Ravishankar Subramanya (Pittsburgh Supercomputing Center) Christos Faloutsos (Carnegie Mellon University)

Outline  Background & Introduction  Query by Example  Our Approach  Relevance Feedback  What’s New in MindReader?  Proposed Method  Problem Formulation  Theorems  Experimental Results  Discussion & Conclusion

Query-by-Example: an example Searching “mildly overweighted” patients The doctor selects examples by browsing patient database

Query-by-Example: an example Searching “mildly overweighted” patients : very good : good The doctor selects examples by browsing patient database

Query-by-Example: an example Searching “mildly overweighted” patients Height Weight : very good : good The doctor selects examples by browsing patient database

Query-by-Example: an example Searching “mildly overweighted” patients Height Weight : very good : good The doctor selects examples by browsing patient database

Query-by-Example: an example Searching “mildly overweighted” patients Height Weight : very good : good The doctor selects examples by browsing patient database The examples have “oblique” correlation

Query-by-Example: an example Searching “mildly overweighted” patients Height Weight : very good : good The doctor selects examples by browsing patient database The examples have “oblique” correlation We can “guess” the implied query

Query-by-Example: an example Searching “mildly overweighted” patients Height Weight : very good : good The doctor selects examples by browsing patient database The examples have “oblique” correlation We can “guess” the implied query q

Query-by-Example: the question Assume that  user gives multiple examples  user optionally assigns scores to the examples  samples have spatial correlation

Query-by-Example: the question Assume that  user gives multiple examples  user optionally assigns scores to the examples  samples have spatial correlation How can we “guess” the implied query?

Outline  Background & Introduction  Query by Example  Our Approach  Relevance Feedback  What’s New in MindReader?  Proposed Method  Problem Formulation  Theorems  Experimental Results  Discussion & Conclusion

Our Approach  Automatically derive distance measure from the given examples  Two important notions: 1. diagonal query: isosurfaces of queries have ellipsoid shapes 2. multiple-level scores: user can specify “goodness scores” on samples

Isosurfaces of Distance Functions Euclideanweighted Euclidean generalized ellipsoid distance q qq

Distance Function Formulas  Euclidean D (x, q) = (x – q) 2  Weighted Euclidean D (x, q) =  i m i (x i – q i ) 2  Generalized ellipsoid distance D (x, q) = (x – q) T M (x – q)

Outline  Background & Introduction  Query by Example  Our Approach  Relevance Feedback  What’s New in MindReader?  Proposed Method  Problem Formulation  Theorems  Experimental Results  Discussion & Conclusion

Relevance Feedback  Popular method in IR  Query is modified based on relevance judgment from the user  Two major approaches 1. query-point movement 2. re-weighting

Relevance Feedback — Query-point Movement —  Query point is moved towards “good” examples — Rocchio’s formula in IR Q0Q0 Q 0 : query point

Relevance Feedback — Query-point Movement —  Query point is moved towards “good” examples — Rocchio’s formula in IR Q 0 : query point : retrieved data Q0Q0

Relevance Feedback — Query-point Movement —  Query point is moved towards “good” examples — Rocchio’s formula in IR Q 0 : query point : retrieved data : relevance judgments Q0Q0

Relevance Feedback — Query-point Movement —  Query point is moved towards “good” examples — Rocchio’s formula in IR Q1Q1 Q 0 : query point : retrieved data : relevance judgments Q 1 : new query point Q0Q0

Relevance Feedback — Re-weighting —  Standard Deviation Method in MARS (UIUC) image retrieval system

Relevance Feedback — Re-weighting —  Standard Deviation Method in MARS (UIUC) image retrieval system  Assumption: the deviation is high the feature is not important

Relevance Feedback — Re-weighting —  Standard Deviation Method in MARS (UIUC) image retrieval system  Assumption: the deviation is high the feature is not important f2f2 f1f1

Relevance Feedback — Re-weighting —  Standard Deviation Method in MARS (UIUC) image retrieval system  Assumption: the deviation is high the feature is not important f1f1 f2f2

Relevance Feedback — Re-weighting —  Standard Deviation Method in MARS (UIUC) image retrieval system  Assumption: the deviation is high the feature is not important f1f1 f2f2

Relevance Feedback — Re-weighting —  Standard Deviation Method in MARS (UIUC) image retrieval system  Assumption: the deviation is high the feature is not important “bad” feature “good” feature f1f1 f2f2

Relevance Feedback — Re-weighting —  Standard Deviation Method in MARS (UIUC) image retrieval system  Assumption: the deviation is high the feature is not important  For each feature, weight w i = 1/  i  is assigned “bad” feature “good” feature f1f1 f2f2

Relevance Feedback — Re-weighting —  Standard Deviation Method in MARS (UIUC) image retrieval system  Assumption: the deviation is high the feature is not important  For each feature, weight w i = 1/  i  is assigned “bad” feature “good” feature f1f1 f2f2 Implied Query

Relevance Feedback — Re-weighting —  Standard Deviation Method in MARS (UIUC) image retrieval system  Assumption: the deviation is high the feature is not important  For each feature, weight w i = 1/  j  is assigned  MARS didn’t provide any justification for this formula “bad” feature “good” feature f1f1 f2f2 Implied Query

Outline zBackground & Introduction pQuery by Example pOur Approach pRelevance Feedback pWhat’s New in MindReader? zProposed Method pProblem Formulation pTheorems zExperimental Results zDiscussion & Conclusion

What’s New in MindReader? MindReader  does not use ad-hoc heuristics  cf. Rocchio’s expression, re-weighting in MARS  can handle multiple levels of scores  can derive generalized ellipsoid distance

What’s New in MindReader? MindReader can derive generalized ellipsoid distances q

Isosurfaces of Distance Functions Euclideanweighted Euclidean generalized ellipsoid distance q qq

Isosurfaces of Distance Functions Euclidean Rocchio weighted Euclidean generalized ellipsoid distance q qq

Isosurfaces of Distance Functions Euclidean Rocchio weighted Euclidean MARS generalized ellipsoid distance q qq

Isosurfaces of Distance Functions Euclidean Rocchio weighted Euclidean MARS generalized ellipsoid distance MindReader q qq

Outline  Background & Introduction  Query by Example  Our Approach  Relevance Feedback  What’s New in MindReader?  Proposed Method  Problem Formulation  Theorems  Experimental Results  Discussion & Conclusion

Method: distance function Generalized ellipsoid distance function  D (x, q) = (x – q) T M (x – q), or  D (x, q) =  j  k m jk (x j – q j ) (x k – q k )  q : query point vector  x : data point vector  M = [m jk ] : symmetric distance matrix

Method: definitions  N : no. of samples  n : no. of dimensions (features)  x i : n -d sample data vectors x i = [x i1, …, x in ] T  X : N×n sample data matrix X = [x 1, …, x N ] T  v : N -d score vector v = [v 1, …, v N ]

Method: problem formulation Problem formulation Given  N sample n -d vectors  multiple-level scores (optional) Estimate  optimal distance matrix M  optimal new query point q

Method: optimality  How do we measure “optimality”?  minimization of “penalty”  What is the “penalty”?  weighted sum of distances between query point and sample vectors  Therefore,  minimize  i (x i – q) T M (x i – q)  under the constraint det(M) = 1

Outline  Background & Introduction  Query by Example  Our Approach  Relevance Feedback  What’s New in MindReader?  Proposed Method  Problem Formulation  Theorems  Experimental Results  Discussion & Conclusion

Theorems: theorem 1  Solved with Lagrange multipliers  Theorem 1: optimal query point  q = x = [x 1, …, x n ] T = X T v /  v i poptimal query point is the weighted average of sample data vectors

Theorems: theorem 2 & 3  Theorem 2: optimal distance matrix  M = (det(C)) 1/n C –1  C = [c jk ] is the weighted covariance matrix  c jk =  v i (x ik - x k ) (x ij - x j )  Theorem 3  If we restrict M to diagonal matrix, our method is equal to standard deviation method  MindReader includes MARS!

Outline  Background & Introduction  Query by Example  Our Approach  Relevance Feedback  What’s New in MindReader?  Proposed Method  Problem Formulation  Theorems  Experimental Results  Discussion & Conclusion

Experiments 1. Estimation of optimal distance function  Can MindReader estimate target distance matrix M hidden appropriately?  Based on synthetic data  Comparison with standard deviation method 2. Query-point movement 3. Application to real data sets  GIS data

Experiment 1: target data Two-dimensional normal distribution

Experiment 1: idea  Assume that the user has “hidden” distance M hidden in his mind  Simulate iterative query refinement  Q: How fast can we discover “hidden” distance?  Query point is fixed to (0, 0)

Experiment 1: iteration steps 1. Make initial samples: compute k -NNs with Euclidean distance 2. For each object x, calculate its score that reflects the hidden distance M hidden 3. MindReader estimates the matrix M 4. Retrieve k -NNs with the derived matrix M 5. If the result is improved, go to step 2

Experiment 1: scores  Calculation of scores in terms of “hidden” distance function 1. Calculate distance from the query point q based on hidden distance matrix M hidden  d = D (x, q) (0  v  2. Translate distance value d to score (-  v  us = exp(-d 2 /2)  v = log s / (1 - s)

Experiment 1: evaluation measures  Used to check whether the query result is improved or not  CD- k measure  CD stands for “cumulative distance”  for k -NNs retrieved by matrix M, compute actual distance by matrix M hidden then take summation

Experiment 1: final k -NNs  Ellipse: isosurface for M hidden  Red points: final k - NNs obtained by standard deviation method  Green points: final k - NNs obtained by MindReader

Experiment 1: speed of convergence  x-axis: no. of iterations  y-axis: CD- k measure value  Red : standard deviation method  Green: MindReader  Blue: best CD- k value possible for the data set

Experiment 1: changes of isosurfaces After 0th and 2nd iterations

Experiment 1: changes of isosurfaces After 4th and 8th iterations

Experiment 2: query-point movement  Starts from query point (0.5, 0.5)  MindReader converges to M hidden with five iterations

Experiment 3: real data set  End-points of road segments from the Montgomery County, MD  Data is normalized to [-1, 1]  [-1, 1]  The query specifies five points along route I-270  Can we estimate good distance function?

Experiment 3: isosurfaces After 0th and 2nd iterations: fast convergence!

Discussion: efficiency  Don’t worry about speed!  ellipsoid query support using spatial access methods:  Seidl & Kriegel [VLDB97]  Ankerst, Branmüller, Kriegel, Seidl [VLDB98]  for the derived distance, we can efficiently use spatial index

Conclusion MindReader automatically guess diagonal queries from the given examples  multiple levels of scores  includes “Rocchio” and “MARS” (standard deviation method)  problem formulation & solution  evaluation based on the experiments