Spatial and Temporal Data Mining

Slides:

Advertisements

Similar presentations

Alexander Gelbukh Special Topics in Computer Science Advanced Topics in Information Retrieval Lecture 6 (book chapter 12): Multimedia.

Advertisements

Indexing Time Series Based on original slides by Prof. Dimitrios Gunopulos and Prof. Christos Faloutsos with some slides from tutorials by Prof. Eamonn.

Relevance Feedback and User Interaction for CBIR Hai Le Supervisor: Dr. Sid Ray.

Wavelets Fast Multiresolution Image Querying Jacobs et.al. SIGGRAPH95.

Presented by Xinyu Chang

3D Shape Histograms for Similarity Search and Classification in Spatial Databases. Mihael Ankerst,Gabi Kastenmuller, Hans-Peter-Kriegel,Thomas Seidl Univ.

1 NNH: Improving Performance of Nearest- Neighbor Searches Using Histograms Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research) Chen Li (UC Irvine)

Fast Algorithm for Nearest Neighbor Search Based on a Lower Bound Tree Yong-Sheng Chen Yi-Ping Hung Chiou-Shann Fuh 8 th International Conference on Computer.

Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.

Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.

Lecture 12 Content-Based Image Retrieval

Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,

Multimedia DBs.

Time Series Indexing II. Time Series Data

Content-based Image Retrieval CE 264 Xiaoguang Feng March 14, 2002 Based on: J. Huang. Color-Spatial Image Indexing and Applications. Ph.D thesis, Cornell.

Indexing Time Series. Time Series Databases A time series is a sequence of real numbers, representing the measurements of a real variable at equal time.

Indexing Time Series Based on Slides by C. Faloutsos (CMU) and D. Gunopulos (UCR)

Efficient Similarity Search in Sequence Databases Rakesh Agrawal, Christos Faloutsos and Arun Swami Leila Kaghazian.

Content-Based Image Indexing Joel Ponianto Supervisor: Dr. Sid Ray.

0 Two-dimensional color images 2-D color image (QBIC) –Compute a k-element color histogram for each image 16×10 6 → 256 A: color-to-color similarity matrix.

T.Sharon 1 Internet Resources Discovery (IRD) Introduction to MMIR.

Multimedia DBs. Time Series Data

1 ISI’02 Multidimensional Databases Challenge: representation for efficient storage, indexing & querying Examples (time-series, images) New multidimensional.

Based on Slides by D. Gunopulos (UCR)

Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.

Fast multiresolution image querying CS474/674 – Prof. Bebis.

Spatial and Temporal Databases Efficiently Time Series Matching by Wavelets (ICDE 98) Kin-pong Chan and Ada Wai-chee Fu.

Indexing Time Series.

CMU SCS : Multimedia Databases and Data Mining Lecture #30: Conclusions C. Faloutsos.

Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.

Multimedia and Time-series Data

CH 14 Multimedia IR. Multimedia IR system The architecture of a Multimedia IR system depends on two main factors –The peculiar characteristics of multimedia.

Shape Matching for Model Alignment 3D Scan Matching and Registration, Part I ICCV 2005 Short Course Michael Kazhdan Johns Hopkins University.

Image Retrieval Part I (Introduction). 2 Image Understanding Functions Image indexing similarity matching image retrieval (content-based method)

Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.

COLOR HISTOGRAM AND DISCRETE COSINE TRANSFORM FOR COLOR IMAGE RETRIEVAL Presented by 2006/8.

IMAGE DATABASES Prof. Hyoung-Joo Kim OOPSLA Lab. Computer Engineering Seoul National University.

A Query Adaptive Data Structure for Efficient Indexing of Time Series Databases Presented by Stavros Papadopoulos.

M- tree: an efficient access method for similarity search in metric spaces Reporter ： Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU

Fast Subsequence Matching in Time-Series Databases Author: Christos Faloutsos etc. Speaker: Weijun He.

E.G.M. PetrakisSearching Signals and Patterns1  Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately  The ideal.

Content-Based Image Retrieval Using Fuzzy Cognition Concepts Presented by Tienwei Tsai Department of Computer Science and Engineering Tatung University.

Non-Photorealistic Rendering and Content- Based Image Retrieval Yuan-Hao Lai Pacific Graphics (2003)

2005/12/021 Content-Based Image Retrieval Using Grey Relational Analysis Dept. of Computer Engineering Tatung University Presenter: Tienwei Tsai ( 蔡殿偉.

2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )

Levels of Image Data Representation 4.2. Traditional Image Data Structures 4.3. Hierarchical Data Structures Chapter 4 – Data structures for.

Autonomous Robots Vision © Manfred Huber 2014.

Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.

Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Data Mining Multimedia Databases Text databases Image and.

Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.

Content Based Color Image Retrieval vi Wavelet Transformations Information Retrieval Class Presentation May 2, 2012 Author: Mrs. Y.M. Latha Presenter:

Time Series Sequence Matching Jiaqin Wang CMPS 565.

Intelligent Database Systems Lab Advisor ： Dr. Hsu Graduate ： Chien-Shing Chen Author ： Jessica K. Ting Michael K. Ng Hongqiang Rong Joshua Z. Huang 國立雲林科技大學.

Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Time Series databases Text databases.

Query by Image and Video Content: The QBIC System M. Flickner et al. IEEE Computer Special Issue on Content-Based Retrieval Vol. 28, No. 9, September 1995.

FastMap : Algorithm for Indexing, Data- Mining and Visualization of Traditional and Multimedia Datasets.

CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 4 – Audio and Digital Image Representation Klara Nahrstedt Spring 2010.

Presenters: Amool Gupta Amit Sharma. MOTIVATION Basic problem that it addresses?(Why) Other techniques to solve same problem and how this one is step.

Recognizing specific objects Matching with SIFT Original suggestion Lowe, 1999,2004.

Keogh, E. , Chakrabarti, K. , Pazzani, M. & Mehrotra, S. (2001)

Fast Subsequence Matching in Time-Series Databases.

Fast multiresolution image querying

Content-based Image Retrieval

Histogram—Representation of Color Feature in Image Processing Yang, Li

15-826: Multimedia Databases and Data Mining

Computer Vision Lecture 16: Texture II

بازیابی تصاویر بر اساس محتوا

15-826: Multimedia Databases and Data Mining

Multimedia Information Retrieval

Scale-Space Representation for Matching of 3D Models

Presentation transcript:

Spatial and Temporal Data Mining V. Megalooikonomou Generic Multimedia Indexing (slides are based on notes by C. Faloutsos)

General Overview Multimedia Indexing Spatial Access Methods (SAMs) k-d trees Point Quadtrees MX-Quadtree z-ordering R-trees Generic Multimedia Indexing

Mutlimedia Indexing – Detailed outline Generic Multimedia Indexing problem dfn Distance function Similarity queries – Types Requirements (ideal method) Basic idea, Lower-bounding Gemini approach Applications 1-D Time sequences 2-D Color images

Generic Multimedia Indexing - problem Given a database of multimedia objects Design fast search algorithms that locate objects that match a query object, exactly or approximately Objects: 1-d time sequences Digitized voice or music 2-d color images 2-d or 3-d gray scale medical images Video clips E.g.: “Find companies whose stock prices move similarly”

Mutlimedia Indexing – Detailed outline Generic Multimedia Indexing problem dfn Distance function Similarity queries – Types Requirements (ideal method) Basic idea, Lower-bounding Gemini approach Applications 1-D Time sequences 2-D Color images

Generic Multimedia Indexing- problem 1st step: provide a measure for the distance between two objects Distance function D(): Given two objects OA, OB the distance (=dis-similarity) of the two objects is denoted by D(OA, OB) E.g., Euclidean distance (sum of squared differences) of two equal-length time series

Mutlimedia Indexing – Detailed outline Generic Multimedia Indexing problem dfn Distance function Similarity queries Requirements (ideal method) Basic idea, Lower-bounding Gemini approach Applications 1-D Time sequences 2-D Color images

Types of Similarity Queries Sn avg 1 365 day F(S1) F(Sn) std Similarity queries are classified into: Whole match queries: Given a collection of N objects O1,…, ON and a query object Q find data objects that are within distance  from Q Sub-pattern Match: Given a collection of N objects O1,…, ON and a query (sub-) object Q and a tolerance  identify the parts of the data objects that match the query Q

Types of Similarity Queries std S1 F(S1) 1 365 day F(Sn) Sn avg day 1 365 Similarity queries are classified into: Whole match queries: Given a collection of N objects O1,…, ON and a query object Q find data objects that are within distance  from Q Sub-pattern Match: Given a collection of N objects O1,…, ON and a query (sub-) object Q and a tolerance  identify the parts of the data objects that match the query Q

Types of Similarity Queries std S1 F(S1) 1 365 day F(Sn) Sn avg day 1 365 Similarity queries are classified into: Whole match queries: Given a collection of N objects O1,…, ON and a query object Q find data objects that are within distance  from Q Sub-pattern Match: Given a collection of N objects O1,…, ON and a query (sub-) object Q and a tolerance  identify the parts of the data objects that match the query Q

Types of Similarity Queries Similarity queries are classified into: Whole match queries: Given a collection of N objects O1,…, ON and a query object Q find data objects that are within distance  from Q Sub-pattern Match: Given a collection of N objects O1,…, ON and a query (sub-) object Q and a tolerance  identify the parts of the data objects that match the query Q

Types of Similarity Queries Sn avg 1 365 day F(S1) F(Sn) std Additional types of queries: K- Nearest Neighbor queries: Given a collection of N objects O1,…, ON and a query object Q find the K most similar data objects to Q All pairs queries (or ‘spatial joins’): Given a collection of N objects O1,…, ON find all objects that are within distance  from each other

Types of Similarity Queries Sn avg 1 365 day F(S1) F(Sn) std Additional types of queries: K- Nearest Neighbor queries: Given a collection of N objects O1,…, ON and a query object Q find the K most similar data objects to Q All pairs queries (or ‘spatial joins’): Given a collection of N objects O1,…, ON find all objects that are within distance  from each other

Mutlimedia Indexing – Detailed outline Generic Multimedia Indexing problem dfn Distance function Similarity queries – Types Requirements (ideal method) Basic idea, Lower-bounding Gemini approach Applications 1-D Time sequences 2-D Color images

Idea method – requirements Fast: sequential scanning and distance calculation with each and every object too slow for large databases “Correct”: No false dismissals. False alarms are acceptable. Why? Small space overhead Dynamic: easy to insert, delete, and update objects

Approach Outline Use k feature extraction functions to map objects into k-dimensional space (applying a mapping F () ) Use highly fine-tuned database SAMs (Spatial Access Methods) like R-trees to accelerate the search (by pruning out large portions of the database that are not promising)…

Mutlimedia Indexing – Detailed outline Generic Multimedia Indexing problem dfn Distance function Similarity queries – Types Requirements (ideal method) Basic idea, Lower-bounding Gemini approach Applications 1-D Time sequences 2-D Color images

Basic idea Focus on ‘whole match’ queries Sequential scanning? Given a collection of N objects O1,…, ON, a distance/dis-similarity function D(Oi, Oj), and a query object Q find data objects that are within distance  from Q Sequential scanning?

Basic idea Focus on ‘whole match’ queries Sequential scanning? Given a collection of N objects O1,…, ON, a distance/dis-similarity function D(Oi, Oj), and a query object Q find data objects that are within distance  from Q Sequential scanning? May be too slow.. Why?

Basic idea Focus on ‘whole match’ queries Sequential scanning? Given a collection of N objects O1,…, ON, a distance/dis-similarity function D(Oi, Oj), and a query object Q find data objects that are within distance  from Q Sequential scanning? May be too slow.. for the following reasons: Distance computation is expensive (e.g., editing distance in DNA strings) The Database size N may be huge Faster alternative?

Basic idea Faster alternative: Example: Step 1: a ‘quick and dirty’ test to discard quickly the vast majority of non-qualifying objects Step 2: use of SAMs to achieve faster than sequential searching Example: Database of yearly stock price movements Euclidean distance function Characterize with a single number (‘feature’) Or use two or more features

Basic idea - illustration day 1 365 S1 Sn F(S1) F(Sn) Feature1 Feature2 A query with tolerance  becomes a sphere with radius 

Basic idea – caution! The mapping F() from objects to k-d points should not distort the distances D(): distance of two objects Df(): distance of their corresponding feature vectors Ideally, perfect preservation of distances In practice, a guarantee of no false dismissals How?

Basic idea – caution! The mapping F() from objects to k-d points should not distort the distances D(): distance of two objects Df(): distance of the corresponding feature vectors Ideally, perfect preservation of distances In practice, a guarantee of no false dismissals How? If the distance in f-space matches or underestimates the distance between two objects in the original space

Basic idea – Lower bounding Let O1, O2 be two objects with distance function D() and F(O1), F(O2), be their feature vectors with distance function Df(), then: To guarantee no false dismissals for whole match queries, the feature extraction function F() should satisfy: Df(F(O1), F(O2))  D(O1, O2) for every pair of objects O1, O2

Lower bounding - Proof Let Q be the query object and O be the qualifying object and  be the tolerance. Prove: If object O qualifies it will be retrieved by a range query in the f-space Or, D(Q, O)    Df(F(Q), F(O))   However, Df(F(Q), F(O))  D(Q, O)    What about ‘all-pairs’? What about ‘nearest-neighbor’ queries?

Lower bounding - Proof Let Q be the query object and O be the qualifying object and  be the tolerance. Prove: If object O qualifies it will be retrieved by a range query in the f-space Or, D(Q, O)    Df(F(Q), F(O))   However, Df(F(Q), F(O))  D(Q, O)    What about ‘all-pairs’? (‘spatial join’ on f-space) What about ‘nearest-neighbor’ queries?

Lower bounding - Proof Let Q be the query object and O be the qualifying object and  be the tolerance. Prove: If object O qualifies it will be retrieved by a range query in the f-space Or, D(Q, O)    Df(F(Q), F(O))   However, Df(F(Q), F(O))  D(Q, O)    What about ‘all-pairs’? (‘spatial join’ on f-space) What about ‘nearest-neighbor’ queries? ??

Mutlimedia Indexing – Detailed outline Generic Multimedia Indexing problem dfn Distance function Similarity queries – Types Requirements (ideal method) Basic idea, Lower-bounding Gemini approach Applications 1-D Time sequences 2-D Color images

GEneric Multimedia object INdexIng GEMINI approach: Determine distance function D() Find one or more numerical feature-extraction functions (to provide a ‘quick and dirty’ test) Prove that Df() lower-bounds D() to guarantee no false dismissals Use a SAM (e.g., R-tree) to store and retrieve k-d feature vectors !!! The methodology focuses on the speed of search only; not on the quality of the results which relies on the distance function

Generic Multimedia Object Indexing Applications: 1-d time sequences 2-d color images Problems to solve: How to apply the lower-bounding lemma ‘Curse of Dimensionality’ (time sequences) ‘Cross-talk’ of features (color images)

Mutlimedia Indexing – Detailed outline Generic Multimedia Indexing problem dfn Distance function Similarity queries – Types Requirements (ideal method) Basic idea, Lower-bounding Gemini approach Applications 1-D Time sequences 2-D Color images

1-D Time Sequences Distance function: Euclidean distance Find features that: Preserve/lower-bound the distance Carry as much information as possible(reduce false alarms) If we are allowed to use only one feature what would this be?

1-D Time Sequences Distance function: Euclidean distance Find features that: Preserve/lower-bound the distance Carry as much information as possible(reduce false alarms) If we are allowed to use only one feature what would this be? The average. … extending it…

1-D Time Sequences Distance function: Euclidean distance Find features that: Preserve/lower-bound the distance Carry as much information as possible(reduce false alarms) If we are allowed to use only one feature what would this be? The average. … extending it… The average of 1st half, of the 2nd half, of the 1st quarter, etc. Coefficients of the Fourier transform (DFT), wavelet transform, etc.

1-D Time Sequences Show that the distance in feature space lower-bounds the actual distance What about DFT?

1-D Time Sequences Show that the distance in feature space lower-bounds the actual distance What about DFT? Parseval’s Theorem: DFT preserves the energy of the signal as well as the distances between two signals. D(x,y) = D(X,Y) where X and Y are the Fourier transforms of x and y If we keep the first k  n coefficients of DFT we lower-bound the actual distance

1-D Time Sequences Response time improves as the transform concentrates more the energy of the signal DFT concentrates the energy for a large class of signals, the colored noises Colored noises: skewed energy spectrum that drops as O(f -b) Energy spectrum or power spectrum of a signal is the square of the amplitude |Xf| as a function of the frequency f b = 2: random walks or brown noise (very predictable) b  2: black noises b = 1: pink noise b = 0: white noise (completely unpredictable) Colored noises even in images (photographs)

Mutlimedia Indexing – Detailed outline Generic Multimedia Indexing problem dfn Distance function Similarity queries – Types Requirements (ideal method) Basic idea, Lower-bounding Gemini approach Applications 1-D Time sequences 2-D Color images

2-D color images Image features for Content Based Image Retrieval (CBIR): Low Level: Color – color histograms Texture – directionality, granularity, contrast Shape – turning angle, moments of inertia, pattern spectrum Position – 2D strings method …etc Object Level: Regions

2-D color images – Color histograms Each color image – a 2-d array of pixels Each pixel – 3 color components (R,G,B) h colors – each color denoting a point in 3-d color space (as high as 224 colors) For each image compute the h-element color histogram – each component is the percentage of pixels that are most similar to that color The histogram of image I is defined as: For a color Ci , Hci(I) represents the number of pixels of color Ci in image I OR: For any pixel in image I, Hci(I) represents the possibility of that pixel having color Ci.

2-D color images – Color histograms Usually cluster similar colors together and choose one representative color for each ‘color bin’ Most commercial CBIR systems include color histogram as one of the features (e.g., QBIC of IBM) No space information

Color histograms - distance One method to measure the distance between two histograms x and y is: where the color-to-color similarity matrix A has entries aij that describe the similarity between color i and color j

Color histograms – lower bounding Two obstacles for using color-histograms as feature vectors in GEMINI: ‘Dimensionality curse’ (h is large 64, 128) Distance function is quadratic It involves all cross terms (‘cross-talk’ among features) - expensive to compute - precludes the use of SAMs bright red pink orange x q e.g.,64 colors

Color histograms – lower bounding 1st step: define the distance function between two color images D()=dh() 2nd step: find numerical features (one or more) whose Euclidean distance lower-bounds dh() If we allowed to use one numerical feature to describe the color image what should it be? Avg. amount for each color component (R,G,B) Where … , similarly for G and B Where P is the number of pixels in the image, R(p) is the red component (intensity) of the p-th pixel

Color histograms – lower bounding Given the average color vectors and of two images we define davg() as the Euclidean distance between the 3-d average color vectors 3rd step: to prove that the feature distance davg() lower-bounds the actual distance dh() Main idea of approach: First a filtering using the average (R,G,B) color, then a more accurate matching using the full h-element histogram

Color auto-correlogram pick any pixel p1 of color Ci in the image I at distance k away from p1 pick another pixel p2 what is the probability that p2 is also of color Ci ? Red ? k P2 P1 Image: I

Color auto-correlogram The auto-correlogram of image I for color Ci , distance k: Integrate both color information and space information.

Color auto-correlogram

Implementations Pixel Distance Measures Use D8 distance (also called chessboard distance): Choose distance k=1,3,5,7 Computation complexity: Histogram: Correlogram:

Implementations Features Distance Measures: D( f(I1) - f(I2) ) is small  I1 and I2 are similar. Example: f(a)=1000, f(a’)=1050; f(b)=100, f(b’)=150 For histogram: For correlogram:

Color Histogram vs Correlogram If there is no difference between the query and the target images, both methods have good performance. Correlogram method Query Image (512 colors) 1st 2nd 3rd 4th 5th Histogram method 1st 2nd 3rd 4th 5th

Color Histogram vs Correlogram The correlogram method is more stable to color change than the histogram method. Query Correlogram method: 1st Histogram method: 48th Target

Color Histogram vs Correlogram The correlogram method is more stable to large appearance change than the histogram method Query Correlogram method: 1st Histogram method: 31th Target

Color Histogram vs Correlogram The correlogram method is more stable to contrast & brightness change than the histogram method. Query 1 Query 2 Query 3 Query 4 C: 178th H: 230th C: 1st H: 1st C: 1st H: 3rd C: 5th H: 18th Target

Color Histogram vs Correlogram The color correlogram describes the global distribution of local spatial correlations of colors. It’s easy to compute It’s more stable than the color histogram method

Mutlimedia Indexing – Conclusions GEMINI is a popular method Whole matching problem Should pay attention to: Distance functions Feature Extraction functions Lower Bounding Particular application Sub-pattern matching?