Download presentation
Presentation is loading. Please wait.
1
Multimedia DBs
2
Time Series Data 050100150200250300350400450500 23 24 25 26 27 28 29 25.1750 25.2250 25.2500 25.2750 25.3250 25.3500 25.4000 25.3250 25.2250 25.2000 25.1750.. 24.6250 24.6750 24.6250 24.6750 24.7500 A time series is a collection of observations made sequentially in time. time axis value axis
3
PAA and APCA Feature extraction for GEMINI: Fourier Wavelets Another approach: segment the time series into equal parts, store the average value for each part. Use an index to store the averages and the segment end points
4
0 1 2 3 4 5 6 7 Haar 0 Haar 1 Haar 2 Haar 3 Haar 4 Haar 5 Haar 6 Haar 7 020 40 60 80100 120 140 X X' DFT Agrawal, Faloutsos, Swami 1993 Chan & Fu 1999 eigenwave 0 eigenwave 1 eigenwave 2 eigenwave 3 eigenwave 4 eigenwave 5 eigenwave 6 eigenwave 7 Korn, Jagadish, Faloutsos 1997 Feature Spaces X X' DWT 020 40 60 80100 120 140 X X' SVD 020 40 60 80100 120 140
5
Piecewise Aggregate Approximation (PAA) value axis time axis Original time series (n-dimensional vector) S={s 1, s 2, …, s n } n’-segment PAA representation (n’-d vector) S = {sv 1, sv 2, …, sv n’ } sv 1 sv 2 sv 3 sv 4 sv 5 sv 6 sv 7 sv 8 PAA representation satisfies the lower bounding lemma (Keogh, Chakrabarti, Mehrotra and Pazzani, 2000; Yi and Faloutsos 2000)
6
Can we improve upon PAA? n’-segment PAA representation (n’-d vector) S = {sv 1, sv 2, …, sv N } sv 1 sv 2 sv 3 sv 4 sv 5 sv 6 sv 7 sv 8 sv 1 sv 2 sv 3 sv 4 sr 1 sr 2 sr 3 sr 4 n’/2-segment APCA representation (n’-d vector) S= { sv 1, sr 1, sv 2, sr 2, …, sv M, sr M } (M is the number of segments = n’/2) Adaptive Piecewise Constant Approximation (APCA)
7
1.69 3.02 1.21 1.75 3.77 1.03 Reconstruction error PAA Reconstruction error APCA APCA approximates original signal better than PAA Improvement factor =
8
APCA Representation can be computed efficiently Near-optimal representation can be computed in O(nlog(n)) time Optimal representation can be computed in O(n 2 M) (Koudas et al.)
9
Q D LB (Q’,S) Distance Measure S Q D(Q,S) Exact (Euclidean) distance D(Q,S) Lower bounding distance D LB (Q,S) S S Q’Q’
10
Index on 2M-dimensional APCA space Any feature-based index structure can used (e.g., R-tree, X-tree, Hybrid Tree) R1 R3 R2 R4 2M-dimensional APCA space S6 S5 S1 S2 S3 S4 S8 S7 S9 R2 R3 R4 R3 R4 R1 S3S4S5 S6 S7 S8 S9 S2S1 R2
11
k-nearest neighbor Algorithm R1 S7 R3 R2 R4 S1 S2 S3 S5 S4 S6 S8 S9 MINDIST(Q,R2) MINDIST(Q,R4) MINDIST(Q,R3) Q For any node U of the index structure with MBR R, MINDIST(Q,R) D(Q,S) for any data item S under U
12
Index Modification for MINDIST Computation APCA point S= { sv 1, sr 1, sv 2, sr 2, …, sv M, sr M } S1 S2 S3 S5 S4 S6 S8 S9 R1 R3 R2 R4 APCA rectangle S= (L,H) where L= { smin 1, sr 1, smin 2, sr 2, …, smin M, sr M } and H = { smax 1, sr 1, smax 2, sr 2, …, smax M, sr M } sv 1 sv 2 sv 3 sv 4 sr 1 sr 2 sr 3 sr 4 smax 3 smin 3 smax 1 smin 1 smax 2 smin 2 smax 4 smin 4 S7
13
REGION 3 REGION 2 REGION 1 MBR Representation in time-value space value axis time axis L= { l 1, l 2, l 3, l 4, l 5, l 6 } We can view the MBR R=(L,H) of any node U as two APCA representations L= { l 1, l 2, …, l (N-1), l N } and H= { h 1, h 2, …, h (N-1), h N } l1l1 l2 l2 l3l3 l4 l4 l6 l6 l5l5 H= { h 1, h 2, h 3, h 4, h 5, h 6 } h1h1 h2 h2 h3h3 h4 h4 h5h5 h6h6
14
Regions M regions associated with each MBR; boundaries of ith region: REGION i l (2i-1) h (2i-1) h 2i l (2i-2) +1 h3h3 h1h1 h5h5 h2 h2 h4 h4 h6 h6 value axis time axis l3l3 l1l1 l2 l2 l4 l4 l6 l6 l5l5 REGION 1 REGION 3 REGION 2
15
Regions h3h3 h1h1 h5h5 h2 h2 h4 h4 h6 h6 value axis time axis l3l3 l1l1 l2 l2 l4 l4 l6 l6 l5l5 REGION 2 t1t2 REGION 3 REGION 1 ith region is active at time instant t if it spans across t The value s t of any time series S under node U at time instant t must lie in one of the regions active at t (Lemma 2)
16
MINDIST Computation For time instant t, MINDIST(Q, R, t) = min region G active at t MINDIST(Q,G,t) h3h3 h1h1 h5h5 h2 h2 h4 h4 h6 h6 l3l3 l1l1 l2 l2 l4 l4 l6 l6 l5l5 t1 REGION 3 REGION 2 REGION 1 MINDIST(Q,R,t1) =min(MINDIST(Q, Region1, t1), MINDIST(Q, Region2, t1)) =min((q t1 - h1) 2, (q t1 - h3) 2 ) =(q t1 - h1) 2 MINDIST(Q,R) = Lemma3: MINDIST(Q,R) D(Q,C) for any time series C under node U
17
Approximate Search A simpler definition of the distance in the feature space is the following: But there is one problem… what? D LB (Q’,S)
18
Multimedia dbs A multimedia database stores also images Again similarity queries (content based retrieval) Extract features, index in feature space, answer similarity queries using GEMINI Again, average values help!
19
Images - color what is an image? A: 2-d array
20
Images - color Color histograms, and distance function
21
Images - color Mathematically, the distance function is:
22
Images - color Problem: ‘cross-talk’: Features are not orthogonal -> SAMs will not work properly Q: what to do? A: feature-extraction question
23
Images - color possible answers: avg red, avg green, avg blue it turns out that this lower-bounds the histogram distance -> no cross-talk SAMs are applicable
24
Images - color performance: time selectivity w/ avg RGB seq scan
25
Images - shapes distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: how to normalize them?
26
Images - shapes distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: how to normalize them? A: divide by standard deviation)
27
Images - shapes distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: other ‘features’ / distance functions?
28
Images - shapes distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: other ‘features’ / distance functions? A1: turning angle A2: dilations/erosions A3:... )
29
Images - shapes distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ Q: how to do dim. reduction?
30
Images - shapes distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ Q: how to do dim. reduction? A: Karhunen-Loeve (= centered PCA/SVD)
31
Images – shapes Performance: ~10x faster # of features kept log(# of I/Os) all kept
32
Is d(u,v) = sqrt ((u-v) T A(u-v) ) a metric? x T Ax = Σ x i x j A ij = Σ λ i x i 2 λ i is the ith eigenvalue x i is the projection of x along the ith eigenvector d(u,v) = sqrt ((u-v) T A(u-v) ) = sqrt ( Σ λ i (u i -v i ) 2 ) d(u,v) >= 0, d(u,u) = 0, d(u,v) = d(v,u) d(u,w) <= d(u,v) + d(v,w), provided sqrt (Σ λ i (u i -w i ) 2 ) <= sqrt (Σ λ i (u i -v i ) 2 ) + sqrt(Σ λ i (v i -w i ) 2 ) sqrt(Σ (√λ i u i - √λ i w i ) 2 ) <= sqrt(Σ (√λ i u i - √λ i v i ) 2 ) + sqrt(Σ(√λ i v i - √λ i w i ) 2 ) Metric condition for Lp norm
33
Filtering in QBIC Histogram column vectors x, y of length n Σ x i = 1, Σ y i = 1 Difference z = (x-y) Σ z i = 0 Contribution of each color bin to a smaller set of colors: V T = (c 1, c 2,.., c n ), each c i is a column vector of length 3 x avg = V T x, y avg = V t y, column vectors of length 3
34
Filtering in QBIC Distances d avg 2 = (x avg - y avg ) T (x avg - y avg ) = (V T z) T (V T z) = z T VV t z = z T W z d hist 2 = z T A z d hist 2 >= λ 1 d avg 2, where λ 1 is the smallest eigenvalue of A’z = λW’z
35
Filtering in QBIC Rewrite z to remove the extra condition that Σ z i = 0. z’ becomes a (n-1) dimensional column vector z T A z = z’ T A’ z’ and z T W z = z’ T W’ z’ A’ and W’ are (n-1)x(n-1) matrices Show that z’ T A’ z’ >= λ 1 z’ T W’ z’
36
Proof of z’ T A’ z’ >= λ 1 z’ T W’ z’ Minimize wrt z’, z’ T A’ z’, subject to the constraint z’ T W’ z’ = C. Same as minimizing wrt z’, z’ T A’ z’ - λ(z’ T W’ z’ - C) Differentiate wrt z and set to 0 A’z’ = λW’ z’ λ and z’ must be eigenvalues and eigenvectors resp. of A’z’ = λW’ z’
37
Proof of z’ T A’ z’ >= λ 1 z’ T W’ z’ z’ T A’ z’ = λz’ T W’ z’ = λC To minimize z’ T A’ z’, we must choose the smallest eigenvalue λ 1. The minimization of z’ T A’ z’, under z’, subject to the constraint z’ T W’ z’ = C equals λ 1 C If z’ T W’ z’ = C > 0 then z’ T A’ z’ >= λ 1 C If z’ T W’ z’ = 0 then z’ T A’ z’ >= 0, A’ is positive semi-definite
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.