Multilinear Algebra for Analyzing Data with Multiple Linkages

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
2k-p Fractional Factorial Designs
Using Matrices in Real Life
Advanced Piloting Cruise Plot.
Chapter 1 The Study of Body Function Image PowerPoint
Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
Exit a Customer Chapter 8. Exit a Customer 8-2 Objectives Perform exit summary process consisting of the following steps: Review service records Close.
My Alphabet Book abcdefghijklm nopqrstuvwxyz.
Multiplying binomials You will have 20 seconds to answer each of the following multiplication problems. If you get hung up, go to the next problem when.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Addition Facts
Year 6 mental test 5 second questions
Points, Vectors, Lines, Spheres and Matrices
Around the World AdditionSubtraction MultiplicationDivision AdditionSubtraction MultiplicationDivision.
Solve Multi-step Equations
Richmond House, Liverpool (1) 26 th January 2004.
CMU SCS : Multimedia Databases and Data Mining Lecture #17: Text - part IV (LSI) C. Faloutsos.
BT Wholesale October Creating your own telephone network WHOLESALE CALLS LINE ASSOCIATED.
Copyright © Cengage Learning. All rights reserved.
Randomized Algorithms Randomized Algorithms CS648 1.
Sparse Matrices sparse … many elements are zero dense … few elements are zero.
ABC Technology Project
Chapter 4 Systems of Linear Equations; Matrices
Chapter 4 Systems of Linear Equations; Matrices
VOORBLAD.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Squares and Square Root WALK. Solve each problem REVIEW:
Introduction to Information Retrieval Outline ❶ Latent semantic indexing ❷ Dimensionality reduction ❸ LSI in information retrieval 1.
Do you have the Maths Factor?. Maths Can you beat this term’s Maths Challenge?
© 2012 National Heart Foundation of Australia. Slide 2.
Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN
©Evergreen Public Schools Arithmetic Sequences Recursive Rules Vocabulary : arithmetic sequence explicit form recursive form 4/11/2011.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
6.4 Best Approximation; Least Squares
Chapter 5 Test Review Sections 5-1 through 5-4.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
Addition 1’s to 20.
25 seconds left…...
Test B, 100 Subtraction Facts
Januar MDMDFSSMDMDFSSS
Week 1.
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
PSSA Preparation.
Beyond Streams and Graphs: Dynamic Tensor Analysis
Maria Ugryumova Direct Solution Techniques in Spectral Methods CASA Seminar, 13 December 2007.
State Variables.
Multilinear Algebra for Analyzing Data with Multiple Linkages Tamara G. Kolda plus: Brett Bader, Danny Dunlavy, Philip Kegelmeyer Sandia National Labs.
CMU SCS : Multimedia Databases and Data Mining Lecture #21: Tensor decompositions C. Faloutsos.
Dimensionality Reduction PCA -- SVD
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Introduction to tensor, tensor factorization and its applications
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 5: Graphs over time & tensors Faloutsos,
Large Graph Mining: Power Tools and a Practitioner’s guide
Estimation Techniques for High Resolution and Multi-Dimensional Array Signal Processing EMS Group – Fh IIS and TU IL Electronic Measurements and Signal.
15-826: Multimedia Databases and Data Mining
Zhu Han University of Houston Thanks for Dr. Hung Nguyen’s Slides
CIS 5590: Large-Scale Matrix Decomposition Tensors and Applications
SPARSE TENSORS DECOMPOSITION SOFTWARE
15-826: Multimedia Databases and Data Mining
Latent Semantic Analysis
Presentation transcript:

Multilinear Algebra for Analyzing Data with Multiple Linkages Abstract: Tensors (also known as multidimensional arrays or N-way arrays) are used in a variety of applications ranging from chemometrics to psychometrics.  We present an overview of tensor decompositions and the software tools that are available for working with tensors.  We then consider the application of the PARAFAC tensor decomposition to the problem of link analysis.  We propose and test a new methodology that uses a higher-order representation of a web hyperlink graph. We label the edges in the link graph with the anchor text of the hyperlinks so that the associated linear algebra representation is a sparse, three-way tensor. The first two dimensions of the tensor represent the web pages while the third dimension adds the anchor text. We then use the rank-1 factors of the decomposition to automatically identify topics in the collection along with the associated authoritative web pages. This is joint work with Brett Bader, Sandia National Labs. Tamara G. Kolda In collaboration with: Brett Bader, Danny Dunlavy, Philip Kegelmeyer Sandia National Labs MMDS, Stanford, CA, June 21-24, 2006

Linear Algebra plays an important role in Graph Analysis PageRank Brin & Page (1998) Page, Brin, Motwani, Winograd (1999) HITS (hubs and authorities) Kleinberg (1998/99) Latent Semantic Indexing (LSI) Dumais, Furnas, Landauer, Deerwester, and Harshman (1988) Deerwester, Dumais, Landauer, Furnas, and Harshman (1990) Terms Documents car repair service military d1 d2 d3 One Use of LSI: Maps terms and documents to the “same” k-dimensional space.

Multi-Linear Algebra can be used in more complex graph analyses Nodes (one type) connected by multiple types of links Node x Node x Connection Two types of nodes connected by multiple types of links Node A x Node B x Connection Multiple types of nodes connected by a single link Node A x Node B x Node C Multiple types of nodes connected by multiple types of links Node A x Node B x Node C x Connection Etc…

Analyzing Publication Data: Term x Doc x Author 1999-2004 SIAM Journal Data (except SIREV) Terms must appear in at least 3 documents and no more than 10% of all documents. Moreover, it must have at least 2 characters and no more than 30. term doc author Form tensor X as: Element (i,j,k) is nonzero only if author k wrote document j using term i. 6928 terms 4411 documents 6099 authors 464645 nonzeros

A tensor is a multidimensional array An I x J matrix Other names for tensors… Multi-way array N-way array The “order” of a tensor is the number of dimensions Other names for dimension… Mode Way Example The matrix A (at left) has order 2. The tensor X (at left) has order 3 and its 3rd mode is of size K. aij Notation I scalar xijk J I K An I £ J £ K tensor vector matrix Before we can get into multilinear algebra and its applications, we need to talk about multilinear arrays. Note: All my examples will be in 3d, but it all extends to higher dimensions as well. tensor

Tensor “fibers” generalize the concept of rows and columns “Slice” Column Fibers Row Fibers Tube Fibers Note There’s no naming scheme past 3 dimensions; instead, we just say, e.g., the 4th-mode fibers.

Tucker Decomposition K x T C I x J x K I x R J x S = B A R x S x T Proposed by Tucker (1966) Also known as: Three-mode factor analysis, three-mode PCA, orthogonal array decomposition A, B, and C may be orthonormal (generally assume they have full column rank) G is not diagonal Not unique

CANDECOMP/PARAFAC CANDECOMP = Canonical Decomposition (Carroll and Chang, 1970) PARAFAC = Parallel Factors (Harshman, 1970) Columns of A, B, and C are not orthonormal If R is minimal, then R is called the rank of the tensor (Kruskal 1977) Can have rank(X) > min{I,J,K} K x R C I x J x K I x R J x R B = A = + + + … I R x R x R

Combining Tucker and PARAFAC Have: Want: Step 1: Choose orthonormal compression matrices for each dimension: ¼ Step 2: Form reduced tensor (implicitly) Step 3: Compute PARAFAC on reduced tensor ¼ + + Step 4: Convert to PARAFAC of full tensor

Matricize: X(n) The nth-mode fibers are rearranged to be the columns of a matrix 5 7 6 8 1 3 2 4

Tucker and PARAFAC Matrix Representations Fact 1: Fact 2: Khatri-Rao Matrix Product (Columnwise Kronecker Product): Special pseudu-inverse structure:

Implicit Compressed PARAFAC ALS Have: Want: Consider the problem of fixing the 2nd and 3rd factors and solving just for the 1st. with Update columnwise

Back to the Problem: Term x Doc x Author Terms must appear in at least 3 documents and no more than 10% of all documents. Moreover, it must have at least 2 characters and no more than 30. term doc author Form tensor X as: 6928 documents 4411 terms 6099 authors 464645 nonzeros Element (i,j,k) is nonzero only if author k wrote document j using term i.

Original problem is “overly” sparse Result: Resulting tensor has just a few nonzero columns in each lateral slice. term author doc Experimentally, PARAFAC seems to overfit such data and not do a good job of “mixing” different authors.

Compression Matrices & PARAFAC (rank 100) (rank 100) Run rank-100 PARAFAC on compressed tensor. Reassemble results.

Three-Way Fingerprints Each of the Terms, Docs, and Authors has a rank-k (k=100) fingerprint from the PARAFAC approximation All items can be directly compared in “concept space” Thus, we can compare any of the following Term-Term Doc-Doc Term-Doc Author-Author Author-Term Author-Doc The fingerprints can be used as inputs for clustering, classification, etc.

MATLAB Results Go to MATLAB

Dunlavy, Kolda, Kegelmeyer, Tech. Rep. SAND2006-2079 Wrap-Up Higher-order LSI for term-doc-author tensor Tucker-PARAFAC combination for sparse tensors Spasre Tensor Toolbox (release summer 2006) Mathematical manipulations Kolda, Tech. Rep. SAND2006-2081 Thanks to Kevin Boyack for journal data For more info: Tammy Kolda, tgkolda@sandia.gov Dunlavy, Kolda, Kegelmeyer, Tech. Rep. SAND2006-2079 Kolda, Bader, Kenny, ICDM05