Jimeng Sun · Charalampos (Babis) E

Slides:



Advertisements
Similar presentations
Beyond Streams and Graphs: Dynamic Tensor Analysis
Advertisements

CMU SCS : Multimedia Databases and Data Mining Lecture #19: SVD - part II (case studies) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #21: Tensor decompositions C. Faloutsos.
Spatial Dependency Modeling Using Spatial Auto-Regression Mete Celik 1,3, Baris M. Kazar 4, Shashi Shekhar 1,3, Daniel Boley 1, David J. Lilja 1,2 1 CSE.
15-826: Multimedia Databases and Data Mining
Efficient Distribution Mining and Classification Yasushi Sakurai (NTT Communication Science Labs), Rosalynn Chong (University of British Columbia), Lei.
Streaming Pattern Discovery in Multiple Time-Series Spiros Papadimitriou Jimeng Sun Christos Faloutsos Carnegie Mellon University VLDB 2005, Trondheim,
SCS CMU Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong Aug , 2008, Las Vegas.
Text Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Data Mining Multimedia Databases Text databases Image and video.
Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases Jimeng Sun, Dimitris Papadias, Yufei Tao, Bin Liu.
Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.
CMU SCS Graph and stream mining Christos Faloutsos CMU.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos.
Multimedia Databases LSI and SVD. Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and.
Multimedia Security Digital Video Watermarking Supervised by Prof. LYU, Rung Tsong Michael Presented by Chan Pik Wah, Pat Nov 20, 2002 Department of Computer.
Singular Value Decomposition and Data Management
Privacy Preservation for Data Streams Feifei Li, Boston University Joint work with: Jimeng Sun (CMU), Spiros Papadimitriou, George A. Mihaila and Ioana.
Spatial and Temporal Databases Efficiently Time Series Matching by Wavelets (ICDE 98) Kin-pong Chan and Ada Wai-chee Fu.
RACE: Time Series Compression with Rate Adaptivity and Error Bound for Sensor Networks Huamin Chen, Jian Li, and Prasant Mohapatra Presenter: Jian Li.
NUS CS5247 A dimensionality reduction approach to modeling protein flexibility By, By Miguel L. Teodoro, George N. Phillips J* and Lydia E. Kavraki Rice.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P0-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
Introduction to tensor, tensor factorization and its applications
InteMon: Intelligent monitoring system for large clusters Evan Hoke, Jimeng Sun and Christos Faloutsos.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Byoung-Kee Yi N.D.Sidiropoulos Theodore Johnson 國立雲林科技大學 National.
BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos.
Fast Mining and Forecasting of Complex Time-Stamped Events Yasuko Matsubara (Kyoto University), Yasushi Sakurai (NTT), Christos Faloutsos (CMU), Tomoharu.
CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer.
Background Subtraction based on Cooccurrence of Image Variations Seki, Wada, Fujiwara & Sumi Presented by: Alon Pakash & Gilad Karni.
RIDA: A Robust Information-Driven Data Compression Architecture for Irregular Wireless Sensor Networks Nirupama Bulusu (joint work with Thanh Dang, Wu-chi.
Mingyang Zhu, Huaijiang Sun, Zhigang Deng Quaternion Space Sparse Decomposition for Motion Compression and Retrieval SCA 2012.
Gap-filling and Fault-detection for the life under your feet dataset.
CCN COMPLEX COMPUTING NETWORKS1 This research has been supported in part by European Commission FP6 IYTE-Wireless Project (Contract No: )
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 5: Graphs over time & tensors Faloutsos,
Lei Li Computer Science Department Carnegie Mellon University Pre Proposal Time Series Learning completed work 11/27/2015.
Incremental Pattern Discovery on Streams, Graphs and Tensors Jimeng Sun Ph.D.Thesis Proposal May 15, 2006.
Streaming Pattern Discovery in Multiple Time-Series Jimeng Sun Spiros Papadimitrou Christos Faloutsos PARALLEL DATA LABORATORY Carnegie Mellon University.
CMU SCS : Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos.
D YNA MM O : M INING AND S UMMARIZATION OF C OEVOLVING S EQUENCES WITH M ISSING V ALUES Lei Li joint work with Christos Faloutsos, James McCann, Nancy.
Facets: Fast Comprehensive Mining of Coevolving High-order Time Series Hanghang TongPing JiYongjie CaiWei FanQing He Joint Work by Presenter:Wei Fan.
Arizona State University1 Fast Mining of a Network of Coevolving Time Series Wei FanHanghang TongPing JiYongjie Cai.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
SCS CMU Speaker Hanghang Tong Colibri: Fast Mining of Large Static and Dynamic Graphs Speaking Skill Requirement.
Central limit theorem - go to web applet. Correlation maps vs. regression maps PNA is a time series of fluctuations in 500 mb heights PNA = 0.25 *
By Poornima Balakrishna Rajesh Ganesan George Mason University A Comparison of Classical Wavelet with Diffusion Wavelets.
Enabling Real Time Alerting through streaming pattern discovery Chengyang Zhang Computer Science Department University of North Texas 11/21/2016 CRI Group.
Singular Value Decomposition and its applications
CSCE822 Data Mining and Warehousing
Large Graph Mining: Power Tools and a Practitioner’s guide
15-826: Multimedia Databases and Data Mining
State Space Representation
Document Clustering Based on Non-negative Matrix Factorization
Zhu Han University of Houston Thanks for Dr. Hung Nguyen’s Slides
School of Computer Science & Engineering
Jure Leskovec and Christos Faloutsos Machine Learning Department
Lecture: Face Recognition and Feature Reduction
Fast Approximate Query Answering over Sensor Data with Deterministic Error Guarantees Chunbin Lin Joint with Etienne Boursier, Jacque Brito, Yannis Katsis,
LSI, SVD and Data Management
DISTRIBUTED CLUSTERING OF UBIQUITOUS DATA STREAMS
(Geo) Informatics across Disciplines!
Chao Zhang1, Yu Zheng2, Xiuli Ma3, Jiawei Han1
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
State Space Analysis UNIT-V.
Introduction to Connectivity Analyses
Yi Zhao1, Yanyan Shen*1, Yanmin Zhu1, Junjie Yao2
15-826: Multimedia Databases and Data Mining
Next Generation Data Mining Tools: SVD and Fractals
Presentation transcript:

Two heads better than one: pattern discovery in time-evolving multi-aspect data Jimeng Sun · Charalampos (Babis) E. Tsourakakis · Evan Hoke · Christos Faloutsos · Tina Eliassi-Rad

Motivation: multi-aspect streams Light Temperature Voltage Humidity Intel Berkeley lab

Motivation: multi-aspect streams Streams have multiple aspects e.g., time, modality, location Time aspect is special Natural ordering Temporal correlations The rest aspects (spatial aspects) exhibit strong but different correlations across different modalities across different locations How to spot/track correlations? Temperature Light Humidity Voltage

Outline Motivation: multi-aspect streams Problem definition 2-heads methods Mining case study Experimental Report

Data model D Type Location D1 Dn D(6, 3) time Input tensor is a nxN1xN2…xNM tensor D where n is increasing over time Time slice: Di is the i-th slice of D of size N1xN2…xNM Tensor window: D(n,w) = {Dn-W+1,…, Dn} the last W time slices ending at time n

Problem 1: static tensor mining Given tensor D, find D’ = [G; U0, U1, U2] such that Space requirement of D’ is small The reconstruction error e = || D-D’ || / || D || is small Both spatial and temporal patterns are revealed

Interpreting projection matrices (SVD-wise) Document to term matrix Documents to Document HCs Strength of each concept CS x x = MD data graph java brain lung Term to Term HCs

Problem 2: dynamic tensor mining Given tensor D(n,W) and old Tucker model for D(n-1,W), find the new Tucker model D’(n,W) = [G; U0, U1, U2] Space requirement of D’(n,W) is small The reconstruction error e = || D(n,W) - D’(n,W) || / || D(n,W) || is small Both spatial and temporal patterns are revealed.

Outline Motivation: multi-aspect streams Problem definition 2-heads methods Mining case study Experimental Report

Static 2-heads method Out In In: D Out: D’=[G;U0,U1,U2] location time In: D Out: D’=[G;U0,U1,U2] Spatial compression Tucker decomposition Temporal compression Wavelet transform Sparsify the core tensor G e2 = 1 - ||G||2/||D||2 sparsify G' U1 U2T location modality modality X U1 U2T location modality Tucker-2 Wavelet coefficients G Transform Matrix (fixed) U0

Reminder: Wavelets x0 x1 x2 x3 x4 x5 x6 x7

Reminder: Wavelets x0 x1 x2 x3 x4 x5 x6 x7 s1,0 ....... level 1 d1,0 + - x0 x1 x2 x3 x4 x5 x6 x7

Reminder: Wavelets x0 x1 x2 x3 x4 x5 x6 x7 s2,0 level 2 d2,0 s1,0 ....... d1,0 d1,1 s1,1 + - x0 x1 x2 x3 x4 x5 x6 x7

Reminder: Wavelets x0 x1 x2 x3 x4 x5 x6 x7 etc ... s2,0 d2,0 s1,0 ....... d1,0 d1,1 s1,1 + - x0 x1 x2 x3 x4 x5 x6 x7

Reminder: Wavelets x0 x1 x2 x3 x4 x5 x6 x7 Q: map each coefficient on the time-freq. plane Scalogram f s2,0 d2,0 t s1,0 ....... d1,0 d1,1 s1,1 + - x0 x1 x2 x3 x4 x5 x6 x7

Reminder: Wavelets x0 x1 x2 x3 x4 x5 x6 x7 Q: map each coefficient on the time-freq. plane Scalogram f s2,0 d2,0 t s1,0 ....... d1,0 d1,1 s1,1 + - x0 x1 x2 x3 x4 x5 x6 x7

Reminder: Wavelets Whole procedure can be rewritten in a matrix vector multiplication form: y = Ax where x=[x0 x1 …. x7] T, y resulting wavelet coefficients and A is equal to:

Dynamic 2-heads method In: D(n,W) and variance matrix C(i) for i=1,2 Out: D’=[G,U0,U1,U2] Incremental spatial compression Update C(i) Eigen-decompose C(i) Temporal compression Wavelet transform Sparsify the core

Outline Motivation: multi-aspect streams Problem definition 2-heads methods Mining case study Experimental Report

Environmental sensor monitoring Temperature Light Humidity Voltage In: normalized sensor measurements Out: Projection matrices U1 and U2 Core G’ (wavelet coefficients) Mining guide: U1 and U2 reveal the patterns on location and modality, respectively G’ provides the patterns on time D location time modality U1 G' location U2T modality

Location patterns 1st HC : dominant trend, e.g. daily periodicity. U1 Location patterns G' location U2T modality 1 . 54 1 . 54 1st Hidden Concept Daily Periodicity 2nd Hidden Concept Exceptions 1st HC : dominant trend, e.g. daily periodicity. 2nd HC: Exceptions (e.g. under AC)

Sensor modality patterns U1 Sensor modality patterns G' location U2T volt humid temp light modality volt temp humid light 1 2 3 4 1 2 3 4 1st Hidden Concept 2nd Hidden Concept 1st HC indicates the main sensor modality correlations Temperature and light are positively correlated, while humidity is anti-correlated with the rest 2nd HC indicates an abnormal pattern which is due to battery outage for some sensors

Temporal patterns 1st scalogram indicates daily periodicity U1 Temporal patterns G' location U2T modality 1st scalogram indicates daily periodicity 2nd scalogram gives abnormal flat trend due to battery outage

Outline Motivation: multi-aspect streams Problem definition 2-heads methods Mining case study Experimental Report

Experimental Report Main results of our experiments: Wavelets vs. 2-Heads: Same compression ratio, but wavelets do not reveal spatial patterns Tucker vs 2-Heads: Much worse compression ratio (2-15x worse) no temporal patterns. Dynamic vs. Static 2-Heads: Dynamic is much faster, almost same accuracy, same patterns revealed.

Related work Tensor mining Wavelet Vision: [Vasilescu’02, Xu’05] Web: [Kolda’05, Sun’05] Text: [Chew’07] Wavelet DWT [Daubechies’92] Incremental construction [Gilbert’03], Forecasting [S. Papadimitriou’03]

Conclusion We focused on multi-aspect streams. Our proposed methods 2-heads have the following properties: Spatio-temporal pattern discovery Streaming capability Error guarantees Compression (~10 to 1, for ~99% accuracy)