Download presentation
Presentation is loading. Please wait.
1
SPARSE TENSORS DECOMPOSITION SOFTWARE
Papa S. Diaw, Master’s Candidate Dr. Michael W. Berry, Major Professor 12/2/2018
2
Introduction Large data sets Nonnegative Matrix Factorization (NMF)
Insights on the hidden relationships Arrange multi-way data into a matrix Computation memory and higher CPU Linear relationships in the matrix representation Failure to capture important structure information Slower or less accurate calculations Large data sets ==> telecom records, facebook, Myspace NMF did a good job at first. It was the main tool 12/2/2018
3
Introduction (cont'd) Nonnegative Tensor Factorizations (NTF)
Natural way for high dimensionality Original multi-way structure of the data Image processing, text mining 12/2/2018
4
Tensor Toolbox For MATLAB
Sandia National Laboratories Licenses Proprietary Software 12/2/2018
5
Motivation of the PILOT
Python Software for NTF Alternative to Tensor Toolbox for MATLAB Incorporation into FutureLens Exposure to NTF Interest in the open source community 12/2/2018
6
Tensors Multi-way array Order/Mode/Ways High-order Fiber Slice
Unfolding Matricization or flattening Reordering the elements of an N-th order tensor into a matrix. Not unique 12/2/2018
7
Tensors (cont’d) Kronecker Product Khatri-Rao product
A⊙B=[a1⊗b1 a2⊗b2… aJ⊗bJ] 12/2/2018
8
Rewrite a given tensor as a finite sum of lower-rank tensors.
Tensor Factorization Hitchcock in 1927 and later developed by Cattell in and Tucker in 1966 Rewrite a given tensor as a finite sum of lower-rank tensors. Tucker and PARAFAC Rank Approximation is a problem 12/2/2018
9
PARAFAC Parallel Factor Analysis Canonical Decomposition (CANDE-COMPE)
Harsman,Carroll and Chang, 1970 12/2/2018
10
PARAFAC (cont’d) Given a three-way tensor X and an approximation rank R, we define the factor matrices as the combination of the vectors from the rank-one components. 12/2/2018
11
PARAFAC (cont’d) 12/2/2018
12
PARAFAC (cont’d) Alternating Least Square (ALS)
We cycle “over all the factor matrices and performs a least-square update for one factor matrix while holding all the others constant.”[7] NTF can be considered an extension of the PARAFAC model with the constraint of nonnegativity 12/2/2018
13
Python Object-oriented, Interpreted Runs on all systems
Flat learning curve Supports object methods (everything is an object in Python) 12/2/2018
14
Python (cont’d) Recent interest in the scientific community
Several scientific computing packages Numpy Scipy Python is extensible 12/2/2018
15
Data Structures Dictionary Store the tensor data
Mutable type of container that can store any number of Python objects Pairs of keys and their corresponding values Suitable for sparseness of our tensors VAST 2007 contest data 1,385,205,184 elements, with 1,184,139 nz Stores the nonzero elements and keeps track of the zeros by using the default value of the dictionary 12/2/2018
16
Data Structures (cont’d)
Numpy Arrays Fundamental package for scientific computing in Python Khatri-Rao products or tensors multiplications Speed 12/2/2018
17
Modules 12/2/2018
18
Modules (cont’d) SPTENSOR Most important module
Class (subscripts of nz, values) Flexibility (Numpy Arrays, Numpy Matrix, Python Lists) Dictionary Keeps a few instances variables Size Number of dimensions Frobenius norm (Euclidean Norm) 12/2/2018
19
Modules (cont’d) PARAFAC coordinates the NTF Implementation of ALS
Convergence or the maximum number of iterations Factor matrices are turned into a Kruskal Tensor 12/2/2018
20
Modules (cont’d) 12/2/2018
21
Modules (cont’d) 12/2/2018
22
Modules (cont’d) TTV INNERPROD
Inner product between SPTENSOR and KTENSOR PARAFAC to compute the residual norm Kronecker product for matrices TTV Product sparse tensor with a (column) vector Returns a tensor Workhorse of our software package Most computation It is called by the MTTKRP and INNERPROD modules 12/2/2018
23
Modules (cont’d) MTTKRP Ktensor
Khatri-Rao product off all factor matrices except the one being updated Matrix multiplication of the matricized tensor with KR product obtained above Ktensor Kruskal tensor Object returned after the factorization is done and the factor matrices are normalized. Class Instance variables such as the Norm. Norm of ktensor plays a big part in determining the residual norm in the PARAFAC module. 12/2/2018
24
Performance Python Profiler Run time performance
Tool for detecting bottlenecks Code optimization negligible improvement efficiency loss in some modules 12/2/2018
25
Performance (cnt’d) Lists and Recursions 12/2/2018
26
Performance (cnt’d) Numpy Arrays 12/2/2018
Return_unique ==> traverse the arguments given to sparse tensor to identify duplicates data points and add up their values 12/2/2018
27
After removing Recursions
Performance (cnt’d) After removing Recursions Myaccumarray is a poor man’s representation of the matlab accumarray. Only function that I am not happy about performance wise. 12/2/2018
28
Floating-Point Arithmetic
Binary floating-point “Binary floating-point cannot exactly represent decimal fractions, so if binary floating-point is used it is not possible to guarantee that results will be the same as those using decimal arithmetic.”[12] Makes the iterations volatile The value 0.1, for example, would need an infinitely recurring binary fraction. In contrast, a decimal number system can represent 0.1 exactly, as one tenth (that is, 10-1). Consequently, binary floating-point cannot be used for financial calculations 12/2/2018
29
Convergence Issues 12/2/2018
30
Convergence Issues (ctn’d)
12/2/2018
31
Convergence Issues (cont’d)
12/2/2018
32
Conclusion GUI There is still work to do after NTF
Preprocessing of data Post Processing of results such as FutureLens Expertise Extract and Identify hidden components Tucker Implementation. C extension to increase speed. GUI 12/2/2018
33
Tensor Toolbox For MATLAB (Bader and Kolda)
Acknowledgments Mr. Andrey Puretskiy Discussions at all stages of the PILOT Consultancy in text mining Testing Tensor Toolbox For MATLAB (Bader and Kolda) Understanding of tensor Decomposition PARAFAC 12/2/2018
34
References http://csmr.ca.sandia.gov/~tgkolda/TensorToolbox/
Tamara G. Kolda, Brett W. Bader , “Tensor Decompostions and Applications”, SIAM Review , June 10, 2008. Andrzej Cichocki, Rafal Zdunek, Anh Huy Phan, Shun-ichi Amari, “Nonnegative Matrix and Tensor Factorizations”, John Wiley & Sons, Ltd, 1009. Brett W. Bader, Andrey A. Puretskiy, Michael W. Berry, “Scenario Discovery Using Nonnegative Tensor Factorization”, J. Ruiz-Schulcloper and W.G. Kropatsch (Eds.): CIARP 2008, LNCS 5197, pp , 2008 Tamara G. Kolda, “Multilinear operators for higher-order decompositions”, SANDIA REPORT, April 2006 12/2/2018
35
QUESTIONS? 12/2/2018
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.