Zhu Han University of Houston Thanks for Dr. Hung Nguyen’s Slides Signal processing and Networking for Big Data Applications Lecture 19: Tensor Basics Zhu Han University of Houston Thanks for Dr. Hung Nguyen’s Slides
Outline 1. Basic concepts 2. Tensor operations 3. Tensor analysis 4. Applications 5. Tensor Voting 6. Conclusions
1. What is a tensor? Tensor is a generalization of an n-dimensional array Vector as a special case of Tensor Vector
Matrix as a special case of Tensor
Multiple dimensional array in most of programming languages 3rd order tensor
Tensors of arbitrary order
Dynamic Data Model (time, source, destination, port) keyword Author time (time, source, destination, port) (time, author, keyword)
Two important points Traditional matrix-based data analysis in inherently two-dimensional -> limit to apply to multi-dimensional data
2. Tensor operations Basis calculus
Vectorization
Matricize X(d) Unfold a tensor into a matrix 5 7 6 8 1 3 2 4
source Multiply a tensor with a matrix port port source group source destination destination group group source
3. Tensor analysis Tensor decomposition: generalize concept of low rank from matrices to tensors Result: Resulting tensor has just a few nonzero columns in each lateral slice. term doc author term author doc
Reminder: SVD m n U VT A m n
Reminder: SVD n 1u1v1 2u2v2 A + m
Rank of a tensor
Goal: extension to >=3 modes I x R K x R A B J x R C R x R x R I x J x K +…+ = =
Tucker Decomposition - intuition I x J x K = A I x R B J x S C K x T R x S x T author x keyword x conference A: author x author-group B: keyword x keyword-group C: conf. x conf-group G: how groups relate to each other
Tucker 3 Decomposition
PARAFAC Decomposition
In the presence of missing data Tensor completion
4. Applications GOAL: to differentiate between left and right hand stimulation
In the presence of missing data Tensor completion
Surveillance original Low rank sparse
Analyzing Publication Data: Doc x Doc x Similarity Representation
Traffic engineering Dest. port 125 ... 80 IP source IP destination
Tensor Voting Tensor Voting Framework Normal space Tensor inference Token refinement Token decomposition Results and conclusion
Tensor Voting Framework Objective: infer “hidden” objects—gaps and broken parts. Gestalt principles: the presence of each input token implies a hypothesis that structure passes through it. Proximity Closure Continuity Inclined to infer those structures by red lines: Proximity Closure Continuity
Tensor Voting Framework-Normal Space Normal Space: encodes structure information; spanned by normal vectors. Tensor: describes the linear relation of vectors; outer product. stick tensor – surface element plate tensor – curve element ball tensor – point element Structure types in 3D: surfaces; curves; volumes. White arrows: normal vectors Blue regions: normal space Red shape: structure element
Tensor Voting Framework-Tensor Inference Consider a voter point p on a surface Normal is a single vector with saliency (magnitude) Information propagates to its neighboring votee point x The most likely smooth path: arc of the osculating circle Tensor(structure information) received at x:
Tensor Voting Framework-Tensor Inference voter projects structure information via its tensor weighted by decay function (DF) to neighboring votees. sum up all the tensors received by each votee. ϴ P1 NP1 P2 s l Pk NPk_P1 NPk_P2 NP2 Voting procedure in 2D with stick vote Decompose voter’s tensor as stick tensors Vote as the fundamental stick vote. Decompose voted tensors at the votee to reconstruct the same normal space as voter’s
Tensor Voting Framework-Token Refinement Problem: prior knowledge of structure type(normal space) & its saliencies unknown. Token refinement procedure is thus needed: Initialize each token with unit ball tenor(no direction preference) Tensor voting between tokens Decompose resulting tensor for each token indicating direction preference Initial Ball Tensors in 2D After Token Refinement in 2D
Inference Algorithm 1> Input initial trace with missing parts 2> Perform token refinement 3> Mark sparse voting region to define potential votees 4> Perform tensor voting to infer structures 5> Decompose results & add determined votees to the previous trace 6> Re-input the thinned trace and iterate the whole procedure starting with 2> No Iterate enough times? Yes 7> Output final trace result
Results and Conclusions Inference results: Number of pixels σ values Polluted trace as algorithm input Tensor voting with σ=1 Tensor voting with σ=2 Compared with victim method:
Results and Conclusions Provide an efficient approach to infer the human mobility trace, given that the observed location data exist missing parts. No user instructions are required to identify which part to be inferred. Discovering missing positions and accomplishing inference are done in automatic fashion. Sparse per-votee implementation scheme reduces computation load. Achieve relatively high accuracy.
5. Conclusions Tensor is a multidimensional array Tensor decomposition (factorization) can be considered higher- order generalization of matrix SVD or PCA Wide applications in data reconstruction, cluster analysis, compression.