Download presentation
Presentation is loading. Please wait.
Published byLaura Benson Modified over 9 years ago
1
FastMap : Algorithm for Indexing, Data- Mining and Visualization of Traditional and Multimedia Datasets
2
AbstractAbstract Describe a fast algorithm to map objects into points in some k- dimensional space, such that the dis- similarities are preserved.
3
AbstractAbstract Thus, we can subsequently use fine- tuned spatial access methods (SAMs) to answer queries such as “ query by example ” or “ all pairs query ”.
4
IntroductionIntroduction Not easy to extract k feature-extraction functions, which map to k-dimensional points For instance, typed English words, what distance function should we consider to transform one string to the other?
5
SolutionsSolutions Old : Multi-Dimensional Scaling (MDS) Unsuitable for indexing Proposed : Fast Algorithm Much faster Allow indexing
6
ApplicationsApplications Image and multimedia databases Medical databases
7
ApplicationsApplications String databases, e.g. OCR Time series, e.g. financial data
8
ApplicationsApplications Data mining and visualization applications
9
Desirable types of queries query-by-example search a collection of objects to find the ones that are within a user-defined distance from the query object all pairs query find the pairs of objects which are within distance from each other
10
Benefit of mapping objects Accelerate the search time for queries, by employing SAMs like R*-trees and z-ordering Help with visualization, clustering and data-mining
11
Ideal mapping fulfills … Fast to compute: O(N) or O(N logN), but not O(N 2 ) Preserve distances with little discrepancies Should be very fast to map a new object
12
MDSMDS Used to discover the underlying (spatial) structure of a set of data items from the (dis)similarity information Map objects to a k-dimensional space, so as to minimize the stress function
13
MDSMDS Stress function it is the average difference between the distance of the "images" and the actual distance.
14
Drawbacks of MDS Requires O(N 2 ) time, which is impractical for large databases Fast retrieval is questionable as MDS is not prepared for “ query-by-example ” operation
15
DefinitionsDefinitions k-d point P i that corresponds to the object O i, will be called the ‘image’ of object O i. That is, P i = (x i,1, x i,2,…, x i,k) k-d space containing ‘images’ will be called target space
16
Proposed algorithm Assumption: a domain expert has only provided us with a distance/dis- similarity function D (*, *) For instance, the Euclidean distance between two feature vectors as the distance function between the corresponding objects
17
Proposed algorithm Pretend that objects are indeed points in some unknown n-dimensional space, and to try to project these points on k mutually orthogonal directions The challenge is to compute these projections from the distance matrix only
18
Proposed algorithm Project the objects on a carefully selected “ line ” Choose O a and O b be “ pivot objects ”
19
Proposed algorithm compute the distance of each point from the pivot points using only information we know, i.e., the distances between objects
20
Proposed algorithm OaOb Oi Xi
21
Proposed algorithm By Cosine Law, in any triangle O a O i O b d b,i 2 = d a,i 2 + d a,b 2 – 2x i d a,b d i,j the shorthand for the distance D (O i, O j )
22
Proposed algorithm By simple math manipulation Xi = (d a,i 2 + d a,b 2 - d b,i 2 ) / 2d a,b We can map objects into points on a line, preserving some of the distance information
23
Proposed algorithm Solved 2-d space Extend to higher dimensions
24
Proposed algorithm Determines the coordinates of the N objects on a new axis, after each of k recursive calls Record the “ pivot objects ” in each recursive call is to facilitate queries Choose pivots objects by heuristic algorithm
25
Proposed algorithm All steps are linear Complexity is O(N k)
26
ExperimentsExperiments Compare FastMap with MDS speed and quality Illustrate the visualization and clustering abilities real and synthetic datasets
27
Comparison with MDS Response time vs. no. of database size
28
Comparison with MDS Response time vs. no. of dimensions k
29
Comparison with MDS Response time vs. stress
30
Clustering/visualization properties of FastMap
32
ConclusionConclusion A fast algorithm to map objects into points in k-d space Accelerate searching by highly optimized SAMs e.g. R-trees, R*-trees etc. Application of the algorithm to multimedia database, data-mining, clustering and document retrieval etc.
33
ReferenceReference Christos Faloutsos, King-Ip (David) Lin FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets Joseph B. Kruskal, Myron Wish Multidimensional scaling
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.