Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.

Slides:



Advertisements
Similar presentations
Dimensionality Reduction Techniques Dimitrios Gunopulos, UCR.
Advertisements

Clustering.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
PARTITIONAL CLUSTERING
CMU SCS : Multimedia Databases and Data Mining Lecture #25: Multimedia indexing C. Faloutsos.
Fast Parallel Similarity Search in Multimedia Databases (Best Paper of ACM SIGMOD '97 international conference)
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) Dimensionality Reductions or data projections Random projections.
Text Similarity David Kauchak CS457 Fall 2011.
3D Shape Histograms for Similarity Search and Classification in Spatial Databases. Mihael Ankerst,Gabi Kastenmuller, Hans-Peter-Kriegel,Thomas Seidl Univ.
Self Organization of a Massive Document Collection
Similarity Search for Adaptive Ellipsoid Queries Using Spatial Transformation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa (Nara.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
Multimedia DBs.
Time Series Indexing II. Time Series Data
Dimensionality Reduction and Embeddings
Indexing Time Series Based on Slides by C. Faloutsos (CMU) and D. Gunopulos (UCR)
Dimensionality Reduction
Dimensionality Reduction
Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.
0 Two-dimensional color images 2-D color image (QBIC) –Compute a k-element color histogram for each image 16×10 6 → 256 A: color-to-color similarity matrix.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
1 Numerical geometry of non-rigid shapes Spectral Methods Tutorial. Spectral Methods Tutorial 6 © Maks Ovsjanikov tosca.cs.technion.ac.il/book Numerical.
Multimedia DBs. Time Series Data
Spatial and Temporal Data Mining
Switch to Top-down Top-down or move-to-nearest Partition documents into ‘k’ clusters Two variants “Hard” (0/1) assignment of documents to clusters “soft”
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:
Dimensionality Reduction
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.
Footer Here1 Feature Selection Copyright, 1996 © Dale Carnegie & Associates, Inc. David Mount For CMSC 828K: Algorithms and Data Structures for Information.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Topics in Algorithms 2007 Ramesh Hariharan. Random Projections.
Multimedia and Time-series Data
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering Lionel F.
Lionel F. Lovett, II Jackson State University Research Alliance in Math and Science Computer Science and Mathematics Division Mentors: George Ostrouchov.
An Efficient Approach to Clustering in Large Multimedia Databases with Noise Alexander Hinneburg and Daniel A. Keim.
COLOR HISTOGRAM AND DISCRETE COSINE TRANSFORM FOR COLOR IMAGE RETRIEVAL Presented by 2006/8.
PMLAB Finding Similar Image Quickly Using Object Shapes Heng Tao Shen Dept. of Computer Science National University of Singapore Presented by Chin-Yi Tsai.
IMAGE DATABASES Prof. Hyoung-Joo Kim OOPSLA Lab. Computer Engineering Seoul National University.
CSC 211 Data Structures Lecture 13
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Jaruloj Chongstitvatana Advanced Data Structures 1 Index Structures for Multimedia Data Feature-based Approach.
A survey of different shape analysis techniques 1 A Survey of Different Shape Analysis Techniques -- Huang Nan.
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
Query Sensitive Embeddings Vassilis Athitsos, Marios Hadjieleftheriou, George Kollios, Stan Sclaroff.
Manifold learning: MDS and Isomap
CSC2535: Computation in Neural Networks Lecture 12: Non-linear dimensionality reduction Geoffrey Hinton.
Linear Models for Classification
CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases.
Database Systems Laboratory The Pyramid-Technique: Towards Breaking the Curse of Dimensionality Stefan Berchtold, Christian Bohm, and Hans-Peter Kriegal.
Content-Based Image Retrieval (CBIR) By: Victor Makarenkov Michael Marcovich Noam Shemesh.
Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis.
Math 285 Project Diffusion Maps Xiaoyan Chong Department of Mathematics and Statistics San Jose State University.
Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Time Series databases Text databases.
FastMap : Algorithm for Indexing, Data- Mining and Visualization of Traditional and Multimedia Datasets.
CSC321: Lecture 25: Non-linear dimensionality reduction Geoffrey Hinton.
IMinMax B.C. Ooi, K.-L Tan, C. Yu, S. Stephen. Indexing the Edges -- A Simple and Yet Efficient Approach to High dimensional Indexing. ACM SIGMOD-SIGACT-
CSC321: Extra Lecture (not on the exam) Non-linear dimensionality reduction Geoffrey Hinton.
Out of sample extension of PCA, Kernel PCA, and MDS WILSON A. FLORERO-SALINAS DAN LI MATH 285, FALL
Time Series Indexing II
Self-Organizing Maps for Content-Based Image Database Retrieval
Lecture 10: Sketching S3: Nearest Neighbor Search
Students will be able to dilate shapes
Presentation transcript:

Multimedia DBs

Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find the images in the database that are similar (or you can “describe” the query image) Extract features, index in feature space, answer similarity queries using GEMINI Again, average values help!

Image Features Features extracted from an image are based on: Color distribution Shapes and structure …..

Images - color what is an image? A: 2-d RGB array

Images - color Color histograms, and distance function

Images - color Mathematically, the distance function between a vector x and a query q is: D(x, q) = (x-q) T A (x-q) =  a ij (x i -q i ) (x j -q j ) A=I ?

Images - color Problem: ‘cross-talk’: Features are not orthogonal -> SAMs will not work properly Q: what to do? A: feature-extraction question

Images - color possible answers: avg red, avg green, avg blue it turns out that this lower-bounds the histogram distance -> no cross-talk SAMs are applicable

Images - color performance: time selectivity w/ avg RGB seq scan

Images - shapes distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: how to normalize them?

Images - shapes distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: how to normalize them? A: divide by standard deviation)

Images - shapes distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: other ‘features’ / distance functions?

Images - shapes distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: other ‘features’ / distance functions? A1: turning angle A2: dilations/erosions A3:... )

Images - shapes distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ Q: how to do dim. reduction?

Images - shapes distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ Q: how to do dim. reduction? A: Karhunen-Loeve (= centered PCA/SVD)

Images - shapes Performance: ~10x faster # of features kept log(# of I/Os) all kept

Dimensionality Reduction Many problems (like time-series and image similarity) can be expressed as proximity problems in a high dimensional space Given a query point we try to find the points that are close… But in high-dimensional spaces things are different!

Effects of High-dimensionality Assume a uniformly distributed set of points in high dimensions [0,1] d Let’s have a query with length 0.1 in each dimension  query selectivity in 100-d If we want constant selectivity (0.1) the length of the side must be ~1!

Effects of High-dimensionality Surface is everything! Probability that a point is closer than 0.1 to a (d-1) dimensional surface D= D = 10 ~1 D=100 ~1

Effects of High-dimensionality Number of grid cells and surfaces Number of k-dimensional surfaces in a d- dimensional hypercube Binary partitioning  2 d cells Indexing in high-dimensions is extremely difficult “curse of dimensionality”

Dimensionality Reduction The main idea: reduce the dimensionality of the space. Project the d-dimensional points in a k-dimensional space so that: k << d distances are preserved as well as possible Solve the problem in low dimensions (the GEMINI idea of course…)

DR requirements The ideal mapping should: 1. Be fast to compute: O(N) or O(N logN) but not O(N 2 ) 2. Preserve distances leading to small discrepancies 3. Provide a fast algorithm to map a new query (why?)

MDS (multidimensional scaling) Input: a set of N items, the pair-wise (dis) similarities and the dimensionality k Optimization criterion: stress = (  ij (D(S i,S j ) - D(S ki, S kj ) ) 2 /  ij D(S i,S j ) 2 ) 1/2 where D(S i,S j ) be the distance between time series S i, S j, and D(S ki, S kj ) be the Euclidean distance of the k- dim representations Steepest descent algorithm: start with an assignment (time series to k-dim point) minimize stress by moving points

MDS Disadvantages: Running time is O(N 2 ), because of slow convergence Also it requires O(N) time to insert a new point, not practical for queries

FastMap [ Faloutsos and Lin, 1995 ] Maps objects to k-dimensional points so that distances are preserved well It is an approximation of Multidimensional Scaling Works even when only distances are known Is efficient, and allows efficient query transformation

FastMap Find two objects that are far away Project all points on the line the two objects define, to get the first coordinate

FastMap - next iteration

Results Documents /cosine similarity -> Euclidean distance (how?)