Download presentation
Presentation is loading. Please wait.
Published byAndrea Hubbard Modified over 9 years ago
2
MindReader: Querying databases through multiple examples Yoshiharu Ishikawa (Nara Institute of Science and Technology, Japan) Ravishankar Subramanya (Pittsburgh Supercomputing Center) Christos Faloutsos (Carnegie Mellon University)
3
Outline Background & Introduction Query by Example Our Approach Relevance Feedback What’s New in MindReader? Proposed Method Problem Formulation Theorems Experimental Results Discussion & Conclusion
4
Query-by-Example: an example Searching “mildly overweighted” patients The doctor selects examples by browsing patient database
5
Query-by-Example: an example Searching “mildly overweighted” patients : very good : good The doctor selects examples by browsing patient database
6
Query-by-Example: an example Searching “mildly overweighted” patients Height Weight : very good : good The doctor selects examples by browsing patient database
7
Query-by-Example: an example Searching “mildly overweighted” patients Height Weight : very good : good The doctor selects examples by browsing patient database
8
Query-by-Example: an example Searching “mildly overweighted” patients Height Weight : very good : good The doctor selects examples by browsing patient database The examples have “oblique” correlation
9
Query-by-Example: an example Searching “mildly overweighted” patients Height Weight : very good : good The doctor selects examples by browsing patient database The examples have “oblique” correlation We can “guess” the implied query
10
Query-by-Example: an example Searching “mildly overweighted” patients Height Weight : very good : good The doctor selects examples by browsing patient database The examples have “oblique” correlation We can “guess” the implied query q
11
Query-by-Example: the question Assume that user gives multiple examples user optionally assigns scores to the examples samples have spatial correlation
12
Query-by-Example: the question Assume that user gives multiple examples user optionally assigns scores to the examples samples have spatial correlation How can we “guess” the implied query?
13
Outline Background & Introduction Query by Example Our Approach Relevance Feedback What’s New in MindReader? Proposed Method Problem Formulation Theorems Experimental Results Discussion & Conclusion
14
Our Approach Automatically derive distance measure from the given examples Two important notions: 1. diagonal query: isosurfaces of queries have ellipsoid shapes 2. multiple-level scores: user can specify “goodness scores” on samples
15
Isosurfaces of Distance Functions Euclideanweighted Euclidean generalized ellipsoid distance q qq
16
Distance Function Formulas Euclidean D (x, q) = (x – q) 2 Weighted Euclidean D (x, q) = i m i (x i – q i ) 2 Generalized ellipsoid distance D (x, q) = (x – q) T M (x – q)
17
Outline Background & Introduction Query by Example Our Approach Relevance Feedback What’s New in MindReader? Proposed Method Problem Formulation Theorems Experimental Results Discussion & Conclusion
18
Relevance Feedback Popular method in IR Query is modified based on relevance judgment from the user Two major approaches 1. query-point movement 2. re-weighting
19
Relevance Feedback — Query-point Movement — Query point is moved towards “good” examples — Rocchio’s formula in IR Q0Q0 Q 0 : query point
20
Relevance Feedback — Query-point Movement — Query point is moved towards “good” examples — Rocchio’s formula in IR Q 0 : query point : retrieved data Q0Q0
21
Relevance Feedback — Query-point Movement — Query point is moved towards “good” examples — Rocchio’s formula in IR Q 0 : query point : retrieved data : relevance judgments Q0Q0
22
Relevance Feedback — Query-point Movement — Query point is moved towards “good” examples — Rocchio’s formula in IR Q1Q1 Q 0 : query point : retrieved data : relevance judgments Q 1 : new query point Q0Q0
23
Relevance Feedback — Re-weighting — Standard Deviation Method in MARS (UIUC) image retrieval system
24
Relevance Feedback — Re-weighting — Standard Deviation Method in MARS (UIUC) image retrieval system Assumption: the deviation is high the feature is not important
25
Relevance Feedback — Re-weighting — Standard Deviation Method in MARS (UIUC) image retrieval system Assumption: the deviation is high the feature is not important f2f2 f1f1
26
Relevance Feedback — Re-weighting — Standard Deviation Method in MARS (UIUC) image retrieval system Assumption: the deviation is high the feature is not important f1f1 f2f2
27
Relevance Feedback — Re-weighting — Standard Deviation Method in MARS (UIUC) image retrieval system Assumption: the deviation is high the feature is not important f1f1 f2f2
28
Relevance Feedback — Re-weighting — Standard Deviation Method in MARS (UIUC) image retrieval system Assumption: the deviation is high the feature is not important “bad” feature “good” feature f1f1 f2f2
29
Relevance Feedback — Re-weighting — Standard Deviation Method in MARS (UIUC) image retrieval system Assumption: the deviation is high the feature is not important For each feature, weight w i = 1/ i is assigned “bad” feature “good” feature f1f1 f2f2
30
Relevance Feedback — Re-weighting — Standard Deviation Method in MARS (UIUC) image retrieval system Assumption: the deviation is high the feature is not important For each feature, weight w i = 1/ i is assigned “bad” feature “good” feature f1f1 f2f2 Implied Query
31
Relevance Feedback — Re-weighting — Standard Deviation Method in MARS (UIUC) image retrieval system Assumption: the deviation is high the feature is not important For each feature, weight w i = 1/ j is assigned MARS didn’t provide any justification for this formula “bad” feature “good” feature f1f1 f2f2 Implied Query
32
Outline zBackground & Introduction pQuery by Example pOur Approach pRelevance Feedback pWhat’s New in MindReader? zProposed Method pProblem Formulation pTheorems zExperimental Results zDiscussion & Conclusion
33
What’s New in MindReader? MindReader does not use ad-hoc heuristics cf. Rocchio’s expression, re-weighting in MARS can handle multiple levels of scores can derive generalized ellipsoid distance
34
What’s New in MindReader? MindReader can derive generalized ellipsoid distances q
35
Isosurfaces of Distance Functions Euclideanweighted Euclidean generalized ellipsoid distance q qq
36
Isosurfaces of Distance Functions Euclidean Rocchio weighted Euclidean generalized ellipsoid distance q qq
37
Isosurfaces of Distance Functions Euclidean Rocchio weighted Euclidean MARS generalized ellipsoid distance q qq
38
Isosurfaces of Distance Functions Euclidean Rocchio weighted Euclidean MARS generalized ellipsoid distance MindReader q qq
39
Outline Background & Introduction Query by Example Our Approach Relevance Feedback What’s New in MindReader? Proposed Method Problem Formulation Theorems Experimental Results Discussion & Conclusion
40
Method: distance function Generalized ellipsoid distance function D (x, q) = (x – q) T M (x – q), or D (x, q) = j k m jk (x j – q j ) (x k – q k ) q : query point vector x : data point vector M = [m jk ] : symmetric distance matrix
41
Method: definitions N : no. of samples n : no. of dimensions (features) x i : n -d sample data vectors x i = [x i1, …, x in ] T X : N×n sample data matrix X = [x 1, …, x N ] T v : N -d score vector v = [v 1, …, v N ]
42
Method: problem formulation Problem formulation Given N sample n -d vectors multiple-level scores (optional) Estimate optimal distance matrix M optimal new query point q
43
Method: optimality How do we measure “optimality”? minimization of “penalty” What is the “penalty”? weighted sum of distances between query point and sample vectors Therefore, minimize i (x i – q) T M (x i – q) under the constraint det(M) = 1
44
Outline Background & Introduction Query by Example Our Approach Relevance Feedback What’s New in MindReader? Proposed Method Problem Formulation Theorems Experimental Results Discussion & Conclusion
45
Theorems: theorem 1 Solved with Lagrange multipliers Theorem 1: optimal query point q = x = [x 1, …, x n ] T = X T v / v i poptimal query point is the weighted average of sample data vectors
46
Theorems: theorem 2 & 3 Theorem 2: optimal distance matrix M = (det(C)) 1/n C –1 C = [c jk ] is the weighted covariance matrix c jk = v i (x ik - x k ) (x ij - x j ) Theorem 3 If we restrict M to diagonal matrix, our method is equal to standard deviation method MindReader includes MARS!
47
Outline Background & Introduction Query by Example Our Approach Relevance Feedback What’s New in MindReader? Proposed Method Problem Formulation Theorems Experimental Results Discussion & Conclusion
48
Experiments 1. Estimation of optimal distance function Can MindReader estimate target distance matrix M hidden appropriately? Based on synthetic data Comparison with standard deviation method 2. Query-point movement 3. Application to real data sets GIS data
49
Experiment 1: target data Two-dimensional normal distribution
50
Experiment 1: idea Assume that the user has “hidden” distance M hidden in his mind Simulate iterative query refinement Q: How fast can we discover “hidden” distance? Query point is fixed to (0, 0)
51
Experiment 1: iteration steps 1. Make initial samples: compute k -NNs with Euclidean distance 2. For each object x, calculate its score that reflects the hidden distance M hidden 3. MindReader estimates the matrix M 4. Retrieve k -NNs with the derived matrix M 5. If the result is improved, go to step 2
52
Experiment 1: scores Calculation of scores in terms of “hidden” distance function 1. Calculate distance from the query point q based on hidden distance matrix M hidden d = D (x, q) (0 v 2. Translate distance value d to score (- v us = exp(-d 2 /2) v = log s / (1 - s)
53
Experiment 1: evaluation measures Used to check whether the query result is improved or not CD- k measure CD stands for “cumulative distance” for k -NNs retrieved by matrix M, compute actual distance by matrix M hidden then take summation
54
Experiment 1: final k -NNs Ellipse: isosurface for M hidden Red points: final k - NNs obtained by standard deviation method Green points: final k - NNs obtained by MindReader
55
Experiment 1: speed of convergence x-axis: no. of iterations y-axis: CD- k measure value Red : standard deviation method Green: MindReader Blue: best CD- k value possible for the data set
56
Experiment 1: changes of isosurfaces After 0th and 2nd iterations
57
Experiment 1: changes of isosurfaces After 4th and 8th iterations
58
Experiment 2: query-point movement Starts from query point (0.5, 0.5) MindReader converges to M hidden with five iterations
59
Experiment 3: real data set End-points of road segments from the Montgomery County, MD Data is normalized to [-1, 1] [-1, 1] The query specifies five points along route I-270 Can we estimate good distance function?
60
Experiment 3: isosurfaces After 0th and 2nd iterations: fast convergence!
61
Discussion: efficiency Don’t worry about speed! ellipsoid query support using spatial access methods: Seidl & Kriegel [VLDB97] Ankerst, Branmüller, Kriegel, Seidl [VLDB98] for the derived distance, we can efficiently use spatial index
62
Conclusion MindReader automatically guess diagonal queries from the given examples multiple levels of scores includes “Rocchio” and “MARS” (standard deviation method) problem formulation & solution evaluation based on the experiments
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.