SPARSE TENSORS DECOMPOSITION SOFTWARE

Slides:



Advertisements
Similar presentations
Nonnegative Matrix Factorization with Sparseness Constraints S. Race MA591R.
Advertisements

Chapter 4 Systems of Linear Equations; Matrices
Eigen Decomposition and Singular Value Decomposition
Multilinear Algebra for Analyzing Data with Multiple Linkages
CS 450: COMPUTER GRAPHICS LINEAR ALGEBRA REVIEW SPRING 2015 DR. MICHAEL J. REALE.
Multilinear Algebra for Analyzing Data with Multiple Linkages Tamara G. Kolda plus: Brett Bader, Danny Dunlavy, Philip Kegelmeyer Sandia National Labs.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Chapter 4 Systems of Linear Equations; Matrices Section 2 Systems of Linear Equations and Augmented Matrics.
Numerical Algorithms Matrix multiplication
Solution of linear system of equations
Principal Component Analysis
By: S.M. Sajjadi Islamic Azad University, Parsian Branch, Parsian,Iran.
Curve-Fitting Regression
Basic Concepts and Definitions Vector and Function Space. A finite or an infinite dimensional linear vector/function space described with set of non-unique.
10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos.
Kathryn Linehan Advisor: Dr. Dianne O’Leary
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
Mathematics of Cryptography Part I: Modular Arithmetic, Congruence,
Informatics and Mathematical Modelling / Intelligent Signal Processing ISCAS Morten Mørup Approximate L0 constrained NMF/NTF Morten Mørup Informatics.
Arithmetic Operations on Matrices. 1. Definition of Matrix 2. Column, Row and Square Matrix 3. Addition and Subtraction of Matrices 4. Multiplying Row.
Chapter 7 Matrix Mathematics Matrix Operations Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Mathematics of Cryptography Part I: Modular Arithmetic, Congruence,
1 Information Retrieval through Various Approximate Matrix Decompositions Kathryn Linehan Advisor: Dr. Dianne O’Leary.
22 Feb 2005AGATA Week1 David Radford ORNL Signal Decomposition Algorithm for GRETINA.
Non Negative Matrix Factorization
Introduction to tensor, tensor factorization and its applications
Scientific Computing with NumPy & SciPy NumPy Installation and Documentation  Not much on the home page—don’t buy the guide, it’s.
SAND C 1/17 Coupled Matrix Factorizations using Optimization Daniel M. Dunlavy, Tamara G. Kolda, Evrim Acar Sandia National Laboratories SIAM Conference.
Decomposition-by-Normalization (DBN): Leveraging Approximate Functional Dependencies for Efficient Tensor Decomposition Mijung Kim (Arizona State University)
Curve-Fitting Regression
SINGULAR VALUE DECOMPOSITION (SVD)
تهیه کننده : نرگس مرعشی استاد راهنما : جناب آقای دکتر جمشید شنبه زاده.
CO1301: Games Concepts Dr Nick Mitchell (Room CM 226) Material originally prepared by Gareth Bellaby.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 5: Graphs over time & tensors Faloutsos,
Sparse & Redundant Representation Modeling of Images Problem Solving Session 1: Greedy Pursuit Algorithms By: Matan Protter Sparse & Redundant Representation.
NONNEGATIVE MATRIX FACTORIZATION WITH MATRIX EXPONENTIATION Siwei Lyu ICASSP 2010 Presenter : 張庭豪.
Large-Scale Matrix Factorization with Missing Data under Additional Constraints Kaushik Mitra University of Maryland, College Park, MD Sameer Sheoreyy.
CIS 595 MATLAB First Impressions. MATLAB This introduction will give Some basic ideas Main advantages and drawbacks compared to other languages.
Linear Equations in Linear Algebra
More File Input and Output
Chapter 4 Systems of Linear Equations; Matrices
Matlab.
Matrices and Vector Concepts
Large Graph Mining: Power Tools and a Practitioner’s guide
Estimation Techniques for High Resolution and Multi-Dimensional Array Signal Processing EMS Group – Fh IIS and TU IL Electronic Measurements and Signal.
Chapter 7 Matrix Mathematics
Author: Vikas Sindhwani and Amol Ghoting Presenter: Jinze Li
Zhu Han University of Houston Thanks for Dr. Hung Nguyen’s Slides
Department of Computer Science,
Compiling Dynamic Data Structures in Python to Enable the Use of Multi-core and Many-core Libraries Bin Ren, Gagan Agrawal 9/18/2018.
Torch 02/27/2018 Hyeri Kim Good afternoon, everyone. I’m Hyeri. Today, I’m gonna talk about Torch.
Introduction to MATLAB
Fitting Curve Models to Edges
CSE Social Media & Text Analytics
Singular Value Decomposition
CIS 5590: Large-Scale Matrix Decomposition Tensors and Applications
Algorithm An algorithm is a finite set of steps required to solve a problem. An algorithm must have following properties: Input: An algorithm must have.
Chap 3. The simplex method
Lecture on Linear Algebra
2.2 Introduction to Matrices
Communication and Coding Theory Lab(CS491)
Parallelization of Sparse Coding & Dictionary Learning
Simulation And Modeling
Linear Equations in Linear Algebra
The Elements of Linear Algebra
Error Correction Coding
NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &
Introduction to Artificial Intelligence Lecture 22: Computer Vision II
Progress Report Alvaro Velasquez.
Presentation transcript:

SPARSE TENSORS DECOMPOSITION SOFTWARE Papa S. Diaw, Master’s Candidate Dr. Michael W. Berry, Major Professor 12/2/2018

Introduction Large data sets Nonnegative Matrix Factorization (NMF) Insights on the hidden relationships Arrange multi-way data into a matrix Computation memory and higher CPU Linear relationships in the matrix representation Failure to capture important structure information Slower or less accurate calculations Large data sets ==> telecom records, facebook, Myspace NMF did a good job at first. It was the main tool 12/2/2018

Introduction (cont'd) Nonnegative Tensor Factorizations (NTF) Natural way for high dimensionality Original multi-way structure of the data Image processing, text mining 12/2/2018

Tensor Toolbox For MATLAB Sandia National Laboratories Licenses Proprietary Software 12/2/2018

Motivation of the PILOT Python Software for NTF Alternative to Tensor Toolbox for MATLAB Incorporation into FutureLens Exposure to NTF Interest in the open source community 12/2/2018

Tensors Multi-way array Order/Mode/Ways High-order Fiber Slice Unfolding Matricization or flattening Reordering the elements of an N-th order tensor into a matrix. Not unique 12/2/2018

Tensors (cont’d) Kronecker Product Khatri-Rao product A⊙B=[a1⊗b1 a2⊗b2… aJ⊗bJ] 12/2/2018

Rewrite a given tensor as a finite sum of lower-rank tensors. Tensor Factorization Hitchcock in 1927 and later developed by Cattell in 1944 and Tucker in 1966 Rewrite a given tensor as a finite sum of lower-rank tensors. Tucker and PARAFAC Rank Approximation is a problem 12/2/2018

PARAFAC Parallel Factor Analysis Canonical Decomposition (CANDE-COMPE) Harsman,Carroll and Chang, 1970 12/2/2018

PARAFAC (cont’d) Given a three-way tensor X and an approximation rank R, we define the factor matrices as the combination of the vectors from the rank-one components. 12/2/2018

PARAFAC (cont’d) 12/2/2018

PARAFAC (cont’d) Alternating Least Square (ALS) We cycle “over all the factor matrices and performs a least-square update for one factor matrix while holding all the others constant.”[7] NTF can be considered an extension of the PARAFAC model with the constraint of nonnegativity 12/2/2018

Python Object-oriented, Interpreted Runs on all systems Flat learning curve Supports object methods (everything is an object in Python) 12/2/2018

Python (cont’d) Recent interest in the scientific community Several scientific computing packages Numpy Scipy Python is extensible 12/2/2018

Data Structures Dictionary Store the tensor data Mutable type of container that can store any number of Python objects Pairs of keys and their corresponding values Suitable for sparseness of our tensors VAST 2007 contest data 1,385,205,184 elements, with 1,184,139 nz Stores the nonzero elements and keeps track of the zeros by using the default value of the dictionary 12/2/2018

Data Structures (cont’d) Numpy Arrays Fundamental package for scientific computing in Python Khatri-Rao products or tensors multiplications Speed 12/2/2018

Modules 12/2/2018

Modules (cont’d) SPTENSOR Most important module Class (subscripts of nz, values) Flexibility (Numpy Arrays, Numpy Matrix, Python Lists) Dictionary Keeps a few instances variables Size Number of dimensions Frobenius norm (Euclidean Norm) 12/2/2018

Modules (cont’d) PARAFAC coordinates the NTF Implementation of ALS Convergence or the maximum number of iterations Factor matrices are turned into a Kruskal Tensor 12/2/2018

Modules (cont’d) 12/2/2018

Modules (cont’d) 12/2/2018

Modules (cont’d) TTV INNERPROD Inner product between SPTENSOR and KTENSOR PARAFAC to compute the residual norm Kronecker product for matrices TTV Product sparse tensor with a (column) vector Returns a tensor Workhorse of our software package Most computation It is called by the MTTKRP and INNERPROD modules 12/2/2018

Modules (cont’d) MTTKRP Ktensor Khatri-Rao product off all factor matrices except the one being updated Matrix multiplication of the matricized tensor with KR product obtained above Ktensor Kruskal tensor Object returned after the factorization is done and the factor matrices are normalized. Class Instance variables such as the Norm. Norm of ktensor plays a big part in determining the residual norm in the PARAFAC module. 12/2/2018

Performance Python Profiler Run time performance Tool for detecting bottlenecks Code optimization negligible improvement efficiency loss in some modules 12/2/2018

Performance (cnt’d) Lists and Recursions 12/2/2018

Performance (cnt’d) Numpy Arrays 12/2/2018 Return_unique ==> traverse the arguments given to sparse tensor to identify duplicates data points and add up their values 12/2/2018

After removing Recursions Performance (cnt’d) After removing Recursions Myaccumarray is a poor man’s representation of the matlab accumarray. Only function that I am not happy about performance wise. 12/2/2018

Floating-Point Arithmetic Binary floating-point “Binary floating-point cannot exactly represent decimal fractions, so if binary floating-point is used it is not possible to guarantee that results will be the same as those using decimal arithmetic.”[12] Makes the iterations volatile The value 0.1, for example, would need an infinitely recurring binary fraction. In contrast, a decimal number system can represent 0.1 exactly, as one tenth (that is, 10-1). Consequently, binary floating-point cannot be used for financial calculations 12/2/2018

Convergence Issues 12/2/2018

Convergence Issues (ctn’d) 12/2/2018

Convergence Issues (cont’d) 12/2/2018

Conclusion GUI There is still work to do after NTF Preprocessing of data Post Processing of results such as FutureLens Expertise Extract and Identify hidden components Tucker Implementation. C extension to increase speed. GUI 12/2/2018

Tensor Toolbox For MATLAB (Bader and Kolda) Acknowledgments Mr. Andrey Puretskiy Discussions at all stages of the PILOT Consultancy in text mining Testing Tensor Toolbox For MATLAB (Bader and Kolda) Understanding of tensor Decomposition PARAFAC 12/2/2018

References http://csmr.ca.sandia.gov/~tgkolda/TensorToolbox/ Tamara G. Kolda, Brett W. Bader , “Tensor Decompostions and Applications”, SIAM Review , June 10, 2008. Andrzej Cichocki, Rafal Zdunek, Anh Huy Phan, Shun-ichi Amari, “Nonnegative Matrix and Tensor Factorizations”, John Wiley & Sons, Ltd, 1009. http://docs.python.org/library/profile.html http://www.mathworks.com/access/helpdesk/help/techdoc http://www.scipy.org/NumPy_for_Matlab_Users Brett W. Bader, Andrey A. Puretskiy, Michael W. Berry, “Scenario Discovery Using Nonnegative Tensor Factorization”, J. Ruiz-Schulcloper and W.G. Kropatsch (Eds.): CIARP 2008, LNCS 5197, pp.791-805, 2008 http://docs.scipy.org/doc/numpy/user/ http://docs.scipy.org/doc/ http://docs.scipy.org/doc/numpy/user/whatisnumpy.html Tamara G. Kolda, “Multilinear operators for higher-order decompositions”, SANDIA REPORT, April 2006 http://speleotrove.com/decimal/decifaq1.html#inexact 12/2/2018

QUESTIONS? 12/2/2018