Two heads better than one: pattern discovery in time-evolving multi-aspect data Jimeng Sun · Charalampos (Babis) E. Tsourakakis · Evan Hoke · Christos Faloutsos · Tina Eliassi-Rad
Motivation: multi-aspect streams Light Temperature Voltage Humidity Intel Berkeley lab
Motivation: multi-aspect streams Streams have multiple aspects e.g., time, modality, location Time aspect is special Natural ordering Temporal correlations The rest aspects (spatial aspects) exhibit strong but different correlations across different modalities across different locations How to spot/track correlations? Temperature Light Humidity Voltage
Outline Motivation: multi-aspect streams Problem definition 2-heads methods Mining case study Experimental Report
Data model D Type Location D1 Dn D(6, 3) time Input tensor is a nxN1xN2…xNM tensor D where n is increasing over time Time slice: Di is the i-th slice of D of size N1xN2…xNM Tensor window: D(n,w) = {Dn-W+1,…, Dn} the last W time slices ending at time n
Problem 1: static tensor mining Given tensor D, find D’ = [G; U0, U1, U2] such that Space requirement of D’ is small The reconstruction error e = || D-D’ || / || D || is small Both spatial and temporal patterns are revealed
Interpreting projection matrices (SVD-wise) Document to term matrix Documents to Document HCs Strength of each concept CS x x = MD data graph java brain lung Term to Term HCs
Problem 2: dynamic tensor mining Given tensor D(n,W) and old Tucker model for D(n-1,W), find the new Tucker model D’(n,W) = [G; U0, U1, U2] Space requirement of D’(n,W) is small The reconstruction error e = || D(n,W) - D’(n,W) || / || D(n,W) || is small Both spatial and temporal patterns are revealed.
Outline Motivation: multi-aspect streams Problem definition 2-heads methods Mining case study Experimental Report
Static 2-heads method Out In In: D Out: D’=[G;U0,U1,U2] location time In: D Out: D’=[G;U0,U1,U2] Spatial compression Tucker decomposition Temporal compression Wavelet transform Sparsify the core tensor G e2 = 1 - ||G||2/||D||2 sparsify G' U1 U2T location modality modality X U1 U2T location modality Tucker-2 Wavelet coefficients G Transform Matrix (fixed) U0
Reminder: Wavelets x0 x1 x2 x3 x4 x5 x6 x7
Reminder: Wavelets x0 x1 x2 x3 x4 x5 x6 x7 s1,0 ....... level 1 d1,0 + - x0 x1 x2 x3 x4 x5 x6 x7
Reminder: Wavelets x0 x1 x2 x3 x4 x5 x6 x7 s2,0 level 2 d2,0 s1,0 ....... d1,0 d1,1 s1,1 + - x0 x1 x2 x3 x4 x5 x6 x7
Reminder: Wavelets x0 x1 x2 x3 x4 x5 x6 x7 etc ... s2,0 d2,0 s1,0 ....... d1,0 d1,1 s1,1 + - x0 x1 x2 x3 x4 x5 x6 x7
Reminder: Wavelets x0 x1 x2 x3 x4 x5 x6 x7 Q: map each coefficient on the time-freq. plane Scalogram f s2,0 d2,0 t s1,0 ....... d1,0 d1,1 s1,1 + - x0 x1 x2 x3 x4 x5 x6 x7
Reminder: Wavelets x0 x1 x2 x3 x4 x5 x6 x7 Q: map each coefficient on the time-freq. plane Scalogram f s2,0 d2,0 t s1,0 ....... d1,0 d1,1 s1,1 + - x0 x1 x2 x3 x4 x5 x6 x7
Reminder: Wavelets Whole procedure can be rewritten in a matrix vector multiplication form: y = Ax where x=[x0 x1 …. x7] T, y resulting wavelet coefficients and A is equal to:
Dynamic 2-heads method In: D(n,W) and variance matrix C(i) for i=1,2 Out: D’=[G,U0,U1,U2] Incremental spatial compression Update C(i) Eigen-decompose C(i) Temporal compression Wavelet transform Sparsify the core
Outline Motivation: multi-aspect streams Problem definition 2-heads methods Mining case study Experimental Report
Environmental sensor monitoring Temperature Light Humidity Voltage In: normalized sensor measurements Out: Projection matrices U1 and U2 Core G’ (wavelet coefficients) Mining guide: U1 and U2 reveal the patterns on location and modality, respectively G’ provides the patterns on time D location time modality U1 G' location U2T modality
Location patterns 1st HC : dominant trend, e.g. daily periodicity. U1 Location patterns G' location U2T modality 1 . 54 1 . 54 1st Hidden Concept Daily Periodicity 2nd Hidden Concept Exceptions 1st HC : dominant trend, e.g. daily periodicity. 2nd HC: Exceptions (e.g. under AC)
Sensor modality patterns U1 Sensor modality patterns G' location U2T volt humid temp light modality volt temp humid light 1 2 3 4 1 2 3 4 1st Hidden Concept 2nd Hidden Concept 1st HC indicates the main sensor modality correlations Temperature and light are positively correlated, while humidity is anti-correlated with the rest 2nd HC indicates an abnormal pattern which is due to battery outage for some sensors
Temporal patterns 1st scalogram indicates daily periodicity U1 Temporal patterns G' location U2T modality 1st scalogram indicates daily periodicity 2nd scalogram gives abnormal flat trend due to battery outage
Outline Motivation: multi-aspect streams Problem definition 2-heads methods Mining case study Experimental Report
Experimental Report Main results of our experiments: Wavelets vs. 2-Heads: Same compression ratio, but wavelets do not reveal spatial patterns Tucker vs 2-Heads: Much worse compression ratio (2-15x worse) no temporal patterns. Dynamic vs. Static 2-Heads: Dynamic is much faster, almost same accuracy, same patterns revealed.
Related work Tensor mining Wavelet Vision: [Vasilescu’02, Xu’05] Web: [Kolda’05, Sun’05] Text: [Chew’07] Wavelet DWT [Daubechies’92] Incremental construction [Gilbert’03], Forecasting [S. Papadimitriou’03]
Conclusion We focused on multi-aspect streams. Our proposed methods 2-heads have the following properties: Spatio-temporal pattern discovery Streaming capability Error guarantees Compression (~10 to 1, for ~99% accuracy)