FastMap : Algorithm for Indexing, Data- Mining and Visualization of Traditional and Multimedia Datasets
AbstractAbstract Describe a fast algorithm to map objects into points in some k- dimensional space, such that the dis- similarities are preserved.
AbstractAbstract Thus, we can subsequently use fine- tuned spatial access methods (SAMs) to answer queries such as “ query by example ” or “ all pairs query ”.
IntroductionIntroduction Not easy to extract k feature-extraction functions, which map to k-dimensional points For instance, typed English words, what distance function should we consider to transform one string to the other?
SolutionsSolutions Old : Multi-Dimensional Scaling (MDS) Unsuitable for indexing Proposed : Fast Algorithm Much faster Allow indexing
ApplicationsApplications Image and multimedia databases Medical databases
ApplicationsApplications String databases, e.g. OCR Time series, e.g. financial data
ApplicationsApplications Data mining and visualization applications
Desirable types of queries query-by-example search a collection of objects to find the ones that are within a user-defined distance from the query object all pairs query find the pairs of objects which are within distance from each other
Benefit of mapping objects Accelerate the search time for queries, by employing SAMs like R*-trees and z-ordering Help with visualization, clustering and data-mining
Ideal mapping fulfills … Fast to compute: O(N) or O(N logN), but not O(N 2 ) Preserve distances with little discrepancies Should be very fast to map a new object
MDSMDS Used to discover the underlying (spatial) structure of a set of data items from the (dis)similarity information Map objects to a k-dimensional space, so as to minimize the stress function
MDSMDS Stress function it is the average difference between the distance of the "images" and the actual distance.
Drawbacks of MDS Requires O(N 2 ) time, which is impractical for large databases Fast retrieval is questionable as MDS is not prepared for “ query-by-example ” operation
DefinitionsDefinitions k-d point P i that corresponds to the object O i, will be called the ‘image’ of object O i. That is, P i = (x i,1, x i,2,…, x i,k) k-d space containing ‘images’ will be called target space
Proposed algorithm Assumption: a domain expert has only provided us with a distance/dis- similarity function D (*, *) For instance, the Euclidean distance between two feature vectors as the distance function between the corresponding objects
Proposed algorithm Pretend that objects are indeed points in some unknown n-dimensional space, and to try to project these points on k mutually orthogonal directions The challenge is to compute these projections from the distance matrix only
Proposed algorithm Project the objects on a carefully selected “ line ” Choose O a and O b be “ pivot objects ”
Proposed algorithm compute the distance of each point from the pivot points using only information we know, i.e., the distances between objects
Proposed algorithm OaOb Oi Xi
Proposed algorithm By Cosine Law, in any triangle O a O i O b d b,i 2 = d a,i 2 + d a,b 2 – 2x i d a,b d i,j the shorthand for the distance D (O i, O j )
Proposed algorithm By simple math manipulation Xi = (d a,i 2 + d a,b 2 - d b,i 2 ) / 2d a,b We can map objects into points on a line, preserving some of the distance information
Proposed algorithm Solved 2-d space Extend to higher dimensions
Proposed algorithm Determines the coordinates of the N objects on a new axis, after each of k recursive calls Record the “ pivot objects ” in each recursive call is to facilitate queries Choose pivots objects by heuristic algorithm
Proposed algorithm All steps are linear Complexity is O(N k)
ExperimentsExperiments Compare FastMap with MDS speed and quality Illustrate the visualization and clustering abilities real and synthetic datasets
Comparison with MDS Response time vs. no. of database size
Comparison with MDS Response time vs. no. of dimensions k
Comparison with MDS Response time vs. stress
Clustering/visualization properties of FastMap
ConclusionConclusion A fast algorithm to map objects into points in k-d space Accelerate searching by highly optimized SAMs e.g. R-trees, R*-trees etc. Application of the algorithm to multimedia database, data-mining, clustering and document retrieval etc.
ReferenceReference Christos Faloutsos, King-Ip (David) Lin FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets Joseph B. Kruskal, Myron Wish Multidimensional scaling