Download presentation
Presentation is loading. Please wait.
Published byBernard Summers Modified over 9 years ago
1
Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B. Aditya Prakash, Christos Faloutsos School of Computer Science Carnegie Mellon University © L. Li, 2010 Machine Learning Lunch 2010/11/29
2
Motion Capture Understanding human motion Robots to assist the disabled
3
Network security Anomaly detection in computer network traffic BGP updates in network
4
DataStream monitoring Monitoring a datacenter with 5000 servers: 1TB data per day, 55 million streams ([Reeves+ 2009]) Temperature in datacenter
5
Find similar motions SELECT * FROM WHERE data LIKE
6
Motivation Answering similarity queries in Time Series Databases © L. Li, 20106 SELECT * FROM TSDB WHERE data LIKE “ ” TSDB
7
Database + Machine learning “Databased Learning ” Statistical effective Deeper pattern/functional relation (regression, Bayesian network, SVM/kernels, Clustering) Efficient Scalable (indexing, hashing, query optimization, Buffering/caching) Research Philosophy
8
Database + Machine learning “Databased Learning ” Example find similar motions/time series
9
Beyond similarity queries 9 Featureextraction Query & Indexing Similarity function, clustering/classification Visualization Forecasting Compression
11
CMU SCS Outline Motivation Proposed Method: Intuition & Example Experiments & Results PLiF: Insight Details Conclusion © L. Li, 201011
12
CMU SCS Intuition: Goals © L. Li, 201012 Good features/similarity function G1 Good compression G2 Ability to forecast G3 Scalability G4
13
CMU SCS Intuition: Goals © L. Li, 201013 Good features/similarity function G1 Good compression G2 Ability to forecast G3 Scalability G4 (1a) lag independent (1b) frequency proximity (1c) grouping harmonics
14
CMU SCS 14 Example: synthetic signals Equations (a)sin(2πt/100) (b)cos(2πt/100) (c)sin(2πt/98 + π/6) (d) sin(2πt/110) + 0.2sin(2πt/30) (e) cos(2πt/110) + 0.2sin(2πt/30 + π/4) © L. Li, 2010
15
CMU SCS 15 Intuition (1a) Equations (a)sin(2πt/100) (b)cos(2πt/100) (c)sin(2πt/98 + π/6) (d) sin(2πt/110) + 0.2sin(2πt/30) (e) cos(2πt/110) + 0.2sin(2πt/30 + π/4) Time shift © L. Li, 2010 e.g. left-foot-start walking v.s. right-foot-start walking
16
CMU SCS 16 Intuition (1b) Equations (a)sin(2πt/100) (b)cos(2πt/100) (c)sin(2πt/98 + π/6) (d) sin(2πt/110) + 0.2sin(2πt/30) (e) cos(2πt/110) + 0.2sin(2πt/30 + π/4) nearby frequency & time shift © L. Li, 2010 e.g. running v.s. fast running
17
CMU SCS 17 Intuition (1c) Equations (a)sin(2πt/100) (b)cos(2πt/100) (c)sin(2πt/98 + π/6) (d) sin(2πt/110) + 0.2sin(2πt/30) (e) cos(2πt/110) + 0.2sin(2πt/30 + π/4) groups of harmonics © L. Li, 2010 ~ human voices
18
CMU SCS 18 Q: only two numbers to represent each! Proposed PLiF - + © L. Li, 2010 500
19
CMU SCS 19 Intuition: how it works © L. Li, 2010 500 find hidden variable/pattern f=1/100 f=1/110 f=1/30 HV1HV2HV3
20
CMU SCS 20 Intuition: how it works © L. Li, 2010 find hidden variable/pattern f=1/110 f=1/30 HV2HV3 Co-occur HV2’ = HV2 HV3
21
CMU SCS 21© L. Li, 2010 HV1HV2’ 1.0 0 0 0.9 0 0 1.0 0
22
CMU SCS 22© L. Li, 2010 HV1 1.00 0 0.90 01.0 0 HV2’
23
CMU SCS 23 Why it works? / How to interpret? Proposed PLiF - + harmonics.1 /100 Group of harmonics 1/110 & 1/30 © L. Li, 2010
24
CMU SCS 24 Basic Idea pattern/harmonics 1/100 pattern/harmonics 1/110 & 1/30 “walking” “running” projection to harmonics (aka. frequency) © L. Li, 2010
25
CMU SCS 25 Why not SVD/PCA? PCA PLiF - + no clear grouping Confused! © L. Li, 2010
26
CMU SCS Outline Motivation Proposed Method: Intuition & Example Experiments & Results PLiF: Insight Details Conclusion © L. Li, 201026
27
CMU SCS Experiment: Goals to Verify © L. Li, 201027 Good features (low dimensional) G1 Good compression G2 Ability to forecast G3 Scalability G4
28
CMU SCS Experiments Datasets: © L. Li, 201028 BGP: 10 * 103kChlorine:166 * 4k Mocap 49 * 100-500
29
CMU SCS Result – Visualization © L. Li, 201029 Mocap PLiF first two “fingerprints” With PLiF, now able to visualize very high dimensional time sequences
30
CMU SCS Result – Clustering Pred.walkrun 263 1020 © L. Li, 201030 Mocap PLiF first two “fingerprints” walking running PLiF + thresholding Pred.walkrun 1513 11110 PCA + kmeans Accuracy = 46/49 Accuracy = 25/49
31
CMU SCS Result – Clustering © L. Li, 201031 BGP data: PLiF + hierarchical clustering
32
CMU SCS Intuition: Goals © L. Li, 201032 Good features/similarity function G1 Good compression G2 Ability to forecast G3 Scalability G4
33
CMU SCS Result - Compression © L. Li, 201033 Chlorine 166 * 4k Storing only the PLiF features & sampling of hidden variables Ideal compression ratio error
34
CMU SCS Result - Compression © L. Li, 201034 Mocap: 93 * 300 Storing only the PLiF features & sampling of hidden variables Ideal compression ratio error
35
CMU SCS Intuition: Goals © L. Li, 201035 Good features/similarity function G1 Good compression G2 Ability to forecast G3 Scalability G4 later
36
CMU SCS Scalability © L. Li, 201036 Linear ~ sequence length sequence length wall clock time (s) sequence length wall clock time (s)
37
CMU SCS Scalability Optimized algorithm Details later © L. Li, 201037 PLiF-basic PLiF wall clock time SLOPE=1/3
38
CMU SCS Intuition: Goals © L. Li, 201038 Good features/similarity function G1 Good compression G2 Ability to forecast G3 Scalability G4 later
39
CMU SCS Outline Motivation Proposed Method: Intuition & Example Experiments & Results PLiF: Insight Details Conclusion © L. Li, 201039
40
CMU SCS Proposed Method: PLiF © L. Li, 201040 S4 S1 S2 S3 Learning Dynamics Finding Canonical Form Handling the Lag Grouping Harmonics
41
CMU SCS Step 1. Learning Dynamics Use machine learning to find: –“Transition” of Hidden Variables (HV): one time- tick to other –“Mixing” weights: HVs observed data © L. Li, 201041 Time series of hidden variables
42
CMU SCS 42 Underlying Model: Linear Dynamical Systems Details © L. Li, 2010
43
CMU SCS 43 Linear Dynamical Systems: parameters Details © L. Li, 2010 namemeaning & example µ0µ0 initial state for hidden variable e.g. initial position, velocity & acceleration Atransition matrixhow the states move forward, e.g. soccer flying in the air Ctransmission/ projection/ output matrix hidden state observation, e.g. camera taking picture of the soccer Q0Q0 Initial covariance Qtransition covariancehow precision is the soccer motion Rtransmission/ projection covariance i.e. observation noise; e.g. how accurate is the camera
44
CMU SCS Dynamics/Transition in Hidden Variables © L. Li, 201044 HV(t+1) transition matrix HV(t) - enables forecasting
45
CMU SCS Mixing Weights © L. Li, 201045 mixing/output matrix C - +
46
CMU SCS 46 Learning the Parameters Expectation-Maximization maximizing the expected log likelihood: Details © L. Li, 2010 Standard EM: expensive! Further speed optimization in our PLiF: matrix inversion using Woodbury matrix identity
47
CMU SCS Step 2: Canonicalization But, hidden variables –hard to interpret –non-unique: many combinations are essentially the same Intuition: –To make hidden variables compact and “uniquely” identified © L. Li, 201047
48
CMU SCS Canonicalization adds Interpretability © L. Li, 201048 Time series of HV after canonicalization (real part) frequency scaling (subtle) “Harmonics” HV before f=1/110 f=1/100 f=1/30
49
CMU SCS Step 2: Canonicalization Again, Estimating how each signal is composed of “harmonics”/patterns but, in complex space © L. Li, 201049 Mixing matrix (complex valued)
50
CMU SCS Step 3:Handling Lag Intuition: –Groups emerge.. –reducing redundancy –eliminating phase shift © L. Li, 201050 Conjugate! Mixing matrix (complex valued)
51
CMU SCS Step 3:Handling Lag Idea: –only magnitude counts –removing duplicates © L. Li, 201051 - +
52
CMU SCS Step 3:Handling Lag interpretability © L. Li, 201052 - + harmonics.1/100 harmonics 1/110 harmonics 1/30
53
CMU SCS Step 4:Grouping Harmonics Intuition: –Still a little redundancy © L. Li, 201053 - + harmonics.1/100 harmonics 1/110 harmonics 1/30 Think Minimum Description Length
54
CMU SCS Step 4: Grouping Harmonics © L. Li, 201054 Dimensional Reduction - + SVD/PCA U,S,V min |X-U*S*V T | 2
55
CMU SCS Step 4: Grouping Harmonics © L. Li, 201055 - + Group of harmonics 1/110 & 1/30 harmonics.1/100
56
CMU SCS Parsimonious Linear Fingerprinting Goals steps © L. Li, 201056 Good features/similarity function G1 Good compression G2 Ability to forecast G3 Scalability G4 (1a) lag independent (1b) frequency proximity (1c) grouping harmonics S4 S1 S2 S3 Learning Dynamics Canonical Form Handling Lag Grouping Harmonics PLiF alg. steps PLiF Goals
57
CMU SCS Outline Motivation Proposed Method: Intuition & Example Experiments & Results PLiF: Insight Details Conclusion © L. Li, 201057
58
CMU SCS Conclusion Need for finding compact representation of time series data Intuition & Insights of PLiF Interpretation of PLiF & How it works Experiments on a diverse set of data –It really works! –It is fast & scalable. © L. Li, 201058
59
CMU SCS Take away message Need to find Good feature for time series: Similarity func., Compression, Forecasting Design the method meets TS characteristics –e.g. Phase shift/lag correlation When to use PLiF –near periodic & relatively smooth signals © L. Li, 201059
60
CMU SCS References Lei Li, B. Aditya Prakash, Christos Faloutsos. Parsimonious Linear Fingerprinting for Time Series. VLDB 2010. Lei Li, Jim McCann, Christos Faloutsos, Nancy Pollard. DynaMMo: Mining and Summarization of Coevolving Sequences with Missing Values. ACM KDD 2009. © L. Li, 201060
61
CMU SCS Question? Thanks! © L. Li, 201061 Christos F aloutsos Lei Li B. Aditya P rakash http://www.cs.cmu.edu/~leili/http://www.cs.cmu.edu/~leili/ leili@cs.cmu.edu
62
CMU SCS BACKUP appendix © L. Li, 201062
63
CMU SCS 63 Why not Fourier (DFT)? 1. FT cannot do forecasting © L. Li, 2010
64
CMU SCS 64 Why not Fourier (DFT)? 1. FT cannot do forecasting © L. Li, 2010
65
CMU SCS 65 Why not Fourier (DFT)? FT spectrum 1. FT cannot do forecasting 2. No arbitrary frequency true freq. frequency © L. Li, 2010
66
CMU SCS 66 Why not Fourier (DFT)? 1. FT cannot do forecasting 2. No arbitrary frequency 3. nearby frequency treated differently, not suited for across signals freq.=5 freq.=5.1 © L. Li, 2010
67
CMU SCS Handling Missing Values Lei Li, Jim McCann, Nancy Pollard, Christos Faloutsos. BoLeRO: A Principled Technique for Including Bone Length Constraints in Motion Capture Occlusion Filling, ACM SIGGRAPH / Eurographics Symposium on Computer Animation, 2010. Lei Li, Jim McCann, Christos Faloutsos, Nancy Pollard. DynaMMo: Mining and Summarization of Coevolving Sequences with Missing Values. ACM KDD 2009. Lei Li, Jim McCann, Christos Faloutsos, Nancy Pollard. Laziness is a virtue: Motion stitching using effort minimization. Eurographics 2008, © L. Li, 201067
68
CMU SCS Details for Implementation © L. Li, 201068 Read this only if you want to implement it
69
CMU SCS 69 Modelling the data: Linear Dynamical Systems Details © L. Li, 2010
70
CMU SCS 70 Linear Dynamical Systems: parameters namemeaning & example µ0µ0 initial state for hidden variable e.g. initial position, velocity & acceleration Atransition matrixhow the states move forward, e.g. soccer flying in the air Ctransmission/ projection/ output matrix hidden state observation, e.g. camera taking picture of the soccer Q0Q0 Initial covariance Qtransition covariancehow precision is the soccer motion Rtransmission/ projection covariance i.e. observation noise; e.g. how accurate is the camera Details © L. Li, 2010
71
CMU SCS 71 Learning the Dynamics Expectation-Maximization maximizing the expected log likelihood Details © L. Li, 2010
72
CMU SCS 72 Finding Canonical Form Intuition: find the canonical dynamics taking eigenvalue decomposition of the transition matrix A compensate C with C h is a projection of the data to the dynamics but... Details © L. Li, 2010
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.