15-826: Multimedia Databases and Data Mining

Slides:



Advertisements
Similar presentations
CMU SCS : Multimedia Databases and Data Mining Lecture #17: Text - part IV (LSI) C. Faloutsos.
Advertisements

Beyond Streams and Graphs: Dynamic Tensor Analysis
Multilinear Algebra for Analyzing Data with Multiple Linkages
CMU SCS : Multimedia Databases and Data Mining Lecture #20: SVD - part III (more case studies) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #19: SVD - part II (case studies) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
FUNNEL: Automatic Mining of Spatially Coevolving Epidemics Yasuko Matsubara, Yasushi Sakurai (Kumamoto University) Willem G. van Panhuis (University of.
Multilinear Algebra for Analyzing Data with Multiple Linkages Tamara G. Kolda plus: Brett Bader, Danny Dunlavy, Philip Kegelmeyer Sandia National Labs.
CMU SCS : Multimedia Databases and Data Mining Lecture #21: Tensor decompositions C. Faloutsos.
Dimensionality Reduction PCA -- SVD
15-826: Multimedia Databases and Data Mining
CMU SCS Large Graph Mining - Patterns, Tools and Cascade Analysis Christos Faloutsos CMU.
SCS CMU Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong Aug , 2008, Las Vegas.
10-603/15-826A: Multimedia Databases and Data Mining SVD - part II (more case studies) C. Faloutsos.
Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 4 March 30, 2005
Multimedia Databases SVD II. SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies SVD properties More case.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos.
Kathryn Linehan Advisor: Dr. Dianne O’Leary
Multimedia Databases LSI and SVD. Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and.
Singular Value Decomposition and Data Management
Matrices CS485/685 Computer Vision Dr. George Bebis.
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan Instructor: Dr. Gautam Das.
Linear Algebra Review 1 CS479/679 Pattern Recognition Dr. George Bebis.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P0-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
Introduction to tensor, tensor factorization and its applications
CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer.
CMU SCS Big (graph) data analytics Christos Faloutsos CMU.
CMU SCS Mining Large Graphs: Fraud Detection, and Algorithms Christos Faloutsos CMU.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 5: Graphs over time & tensors Faloutsos,
Introduction to Linear Algebra Mark Goldman Emily Mackevicius.
ParCube: Sparse Parallelizable Tensor Decompositions
Streaming Pattern Discovery in Multiple Time-Series Jimeng Sun Spiros Papadimitrou Christos Faloutsos PARALLEL DATA LABORATORY Carnegie Mellon University.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos.
Facets: Fast Comprehensive Mining of Coevolving High-order Time Series Hanghang TongPing JiYongjie CaiWei FanQing He Joint Work by Presenter:Wei Fan.
SCS CMU Speaker Hanghang Tong Colibri: Fast Mining of Large Static and Dynamic Graphs Speaking Skill Requirement.
MTH108 Business Math I Lecture 20.
Matrices and Vector Concepts
Large Graph Mining: Power Tools and a Practitioner’s guide
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Estimation Techniques for High Resolution and Multi-Dimensional Array Signal Processing EMS Group – Fh IIS and TU IL Electronic Measurements and Signal.
CS479/679 Pattern Recognition Dr. George Bebis
15-826: Multimedia Databases and Data Mining
Zhu Han University of Houston Thanks for Dr. Hung Nguyen’s Slides
15-826: Multimedia Databases and Data Mining
Jure Leskovec and Christos Faloutsos Machine Learning Department
Text & Web Mining 9/22/2018.
LSI, SVD and Data Management
Large Graph Mining: Power Tools and a Practitioner’s guide
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
SPARSE TENSORS DECOMPOSITION SOFTWARE
CS485/685 Computer Vision Dr. George Bebis
15-826: Multimedia Databases and Data Mining
Jimeng Sun · Charalampos (Babis) E
Graph and Tensor Mining for fun and profit
Graph and Tensor Mining for fun and profit
Dimension reduction : PCA and Clustering
Lecture 13: Singular Value Decomposition (SVD)
Junghoo “John” Cho UCLA
15-826: Multimedia Databases and Data Mining
Presentation transcript:

15-826: Multimedia Databases and Data Mining Faloutsos 15-826 15-826: Multimedia Databases and Data Mining Lecture #21: Tensor decompositions C. Faloutsos

Faloutsos 15-826 Must-read Material Tamara G. Kolda and Brett W. Bader. Tensor decompositions and applications. Technical Report SAND2007-6702, Sandia National Laboratories, Albuquerque, NM and Livermore, CA, November 2007

Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search Data Mining

Indexing - Detailed outline primary key indexing secondary key / multi-key indexing spatial access methods fractals text Singular Value Decomposition (SVD) … Tensors multimedia ...

3h tutorial: www.cs.cmu.edu/~christos/TALKS/SDM-tut-07/ Faloutsos 15-826 Most of foils by Dr. Tamara Kolda (Sandia N.L.) csmr.ca.sandia.gov/~tgkolda Dr. Jimeng Sun (CMU -> IBM) www.cs.cmu.edu/~jimeng 3h tutorial: www.cs.cmu.edu/~christos/TALKS/SDM-tut-07/

Outline Motivation - Definitions Tensor tools Case studies Faloutsos 15-826 Outline Motivation - Definitions Tensor tools Case studies

Motivation 0: Why “matrix”? Faloutsos 15-826 Motivation 0: Why “matrix”? Why matrices are important?

Examples of Matrices: Graph - social network Faloutsos 15-826 Examples of Matrices: Graph - social network Peter Mary Nick John ... John Peter Mary Nick ... 11 22 55 ... 5 6 7

Examples of Matrices: cloud of n-d points Faloutsos 15-826 Examples of Matrices: cloud of n-d points age chol# blood# .. ... John Peter Mary Nick ...

Examples of Matrices: Market basket Faloutsos 15-826 Examples of Matrices: Market basket market basket as in Association Rules milk bread choc. wine ... John Peter Mary Nick ...

Examples of Matrices: Documents and terms Faloutsos 15-826 Examples of Matrices: Documents and terms data mining classif. tree ... Paper#1 Paper#2 Paper#3 Paper#4 ...

Examples of Matrices: Authors and terms Faloutsos 15-826 Examples of Matrices: Authors and terms data mining classif. tree ... John Peter Mary Nick

Examples of Matrices: sensor-ids and time-ticks Faloutsos 15-826 Examples of Matrices: sensor-ids and time-ticks temp1 temp2 humid. pressure ... t1 t2 t3 t4 ...

Motivation: Why tensors? Faloutsos 15-826 Motivation: Why tensors? Q: what is a tensor?

Motivation 2: Why tensor? Faloutsos 15-826 Motivation 2: Why tensor? A: N-D generalization of matrix: KDD’07 data mining classif. tree ... John Peter Mary Nick

Motivation 2: Why tensor? Faloutsos 15-826 Motivation 2: Why tensor? A: N-D generalization of matrix: KDD’05 KDD’06 KDD’07 data mining classif. tree ... John Peter Mary Nick

Tensors are useful for 3 or more modes Faloutsos 15-826 Tensors are useful for 3 or more modes Terminology: ‘mode’ (or ‘aspect’): Mode#3 data mining classif. tree ... Mode#2 Mode (== aspect) #1

Motivating Applications Faloutsos 15-826 Motivating Applications Why matrices are important? Why tensors are useful? P1: social networks P2: web mining

P1: Social network analysis Faloutsos 15-826 P1: Social network analysis Traditionally, people focus on static networks and find community structures We plan to monitor the change of the community structure over time

P2: Web graph mining How to order the importance of web pages? Faloutsos 15-826 P2: Web graph mining How to order the importance of web pages? Kleinberg’s algorithm HITS PageRank Tensor extension on HITS (TOPHITS) context-sensitive hypergraph analysis

Outline Motivation – Definitions Tensor tools Case studies Faloutsos 15-826 Outline Motivation – Definitions Tensor tools Case studies Tensor Basics Tucker PARAFAC

Faloutsos 15-826 Tensor Basics

Answer to both: tensor factorization Recall: (SVD) matrix factorization: finds blocks ‘meat-eaters’ ‘steaks’ ‘vegetarians’ ‘plants’ ‘kids’ ‘cookies’ M products N users ~ + +

Answer to both: tensor factorization PARAFAC decomposition politicians artists athletes verb + subject + = object

Answer: tensor factorization PARAFAC decomposition Results for who-calls-whom-when 4M x 15 days ?? ?? ?? time + caller + = callee

Goal: extension to >=3 modes Faloutsos 15-826 Goal: extension to >=3 modes I x R K x R A B J x R C R x R x R I x J x K +…+ = ~

Main points: 2 major types of tensor decompositions: PARAFAC and Tucker both can be solved with ``alternating least squares’’ (ALS) Details follow

Specially Structured Tensors Faloutsos 15-826 Specially Structured Tensors 10 mins

Specially Structured Tensors Faloutsos 15-826 Specially Structured Tensors Tucker Tensor Kruskal Tensor Our Notation Our Notation +…+ = u1 uR v1 w1 vR wR = U I x R V J x R W K x R R x R x R “core” K x T W I x J x K I x J x K I x R J x S U V = R x S x T

Specially Structured Tensors Faloutsos 15-826 details Specially Structured Tensors Tucker Tensor Kruskal Tensor In matrix form: In matrix form:

Tensor Decompositions Faloutsos 15-826 Tensor Decompositions 20 mins

Tucker Decomposition - intuition Faloutsos 15-826 Tucker Decomposition - intuition I x J x K ~ A I x R B J x S C K x T R x S x T \cal{G} author x keyword x conference A: author x author-group B: keyword x keyword-group C: conf. x conf-group : how groups relate to each other Needs elaboration!

Intuition behind core tensor 2-d case: co-clustering [Dhillon et al. Information-Theoretic Co-clustering, KDD’03]

Faloutsos 15-826 n eg, terms x documents m k l n l k m

med. doc cs doc med. terms cs terms term group x doc. group Faloutsos 15-826 med. doc cs doc med. terms cs terms term group x doc. group common terms doc x doc group term x term-group

Tucker Decomposition ~ K x T C I x J x K I x R J x S B Faloutsos 15-826 Tucker Decomposition I x J x K ~ A I x R B J x S C K x T R x S x T Given A, B, C, the optimal core is: Proposed by Tucker (1966) AKA: Three-mode factor analysis, three-mode PCA, orthogonal array decomposition A, B, and C generally assumed to be orthonormal (generally assume they have full column rank) is not diagonal Not unique Recall the equations for converting a tensor to a matrix

Outline Motivation – Definitions Tensor tools Case studies Faloutsos 15-826 Outline Motivation – Definitions Tensor tools Case studies Tensor Basics Tucker PARAFAC

CANDECOMP/PARAFAC Decomposition Faloutsos 15-826 CANDECOMP/PARAFAC Decomposition I x R K x R A B J x R C R x R x R I x J x K +…+ = ¼ CANDECOMP = Canonical Decomposition (Carroll & Chang, 1970) PARAFAC = Parallel Factors (Harshman, 1970) Core is diagonal (specified by the vector ) Columns of A, B, and C are not orthonormal If R is minimal, then R is called the rank of the tensor (Kruskal 1977) Can have rank( ) > min{I,J,K}

Tucker vs. PARAFAC Decompositions Faloutsos 15-826 IMPORTANT Tucker vs. PARAFAC Decompositions Tucker Variable transformation in each mode Core G may be dense A, B, C generally orthonormal Not unique PARAFAC Sum of rank-1 components No core, i.e., superdiagonal core A, B, C may have linearly dependent columns Generally unique K x T C I x J x K I x J x K +…+ ~ a1 aR b1 c1 bR cR I x R J x S A B ~ R x S x T

Tensor tools - summary Two main tools PARAFAC Tucker Both find row-, column-, tube-groups but in PARAFAC the three groups are identical To solve: Alternating Least Squares Toolbox: from Tamara Kolda: http://csmr.ca.sandia.gov/~tgkolda/TensorToolbox/

Outline Motivation - Definitions Tensor tools Case studies Faloutsos 15-826 Outline Motivation - Definitions Tensor tools Case studies P1: web graph mining (‘TOPHITS’) P2: phone-call patterns P3: N.E.L.L. (never ending language learner)

P1: Web graph mining How to order the importance of web pages? Faloutsos 15-826 P1: Web graph mining How to order the importance of web pages? Kleinberg’s algorithm HITS PageRank Tensor extension on HITS (TOPHITS) 8 mins

Faloutsos 15-826 P1: Web graph mining T. G. Kolda, B. W. Bader and J. P. Kenny, Higher-Order Web Link Analysis Using Multilinear Algebra, ICDM 2005: ICDM, pp. 242-249, November 2005, doi:10.1109/ICDM.2005.77. [PDF] 8 mins

Kleinberg’s Hubs and Authorities (the HITS method) Faloutsos 15-826 Kleinberg’s Hubs and Authorities (the HITS method) Sparse adjacency matrix and its SVD: authority scores for 2nd topic authority scores for 1st topic to from hub scores for 1st topic hub scores for 2nd topic Kleinberg, JACM, 1999

HITS Authorities on Sample Data Faloutsos 15-826 HITS Authorities on Sample Data We started our crawl from http://www-neos.mcs.anl.gov/neos, and crawled 4700 pages, resulting in 560 cross-linked hosts. authority scores for 2nd topic authority scores for 1st topic to from hub scores for 1st topic hub scores for 2nd topic

Three-Dimensional View of the Web Faloutsos 15-826 Three-Dimensional View of the Web Observe that this tensor is very sparse! Kolda, Bader, Kenny, ICDM05

Topical HITS (TOPHITS) Faloutsos 15-826 Topical HITS (TOPHITS) Main Idea: Extend the idea behind the HITS model to incorporate term (i.e., topical) information. term scores for 1st topic term scores for 2nd topic term to from authority scores for 1st topic authority scores for 2nd topic hub scores for 2nd topic hub scores for 1st topic

Topical HITS (TOPHITS) Faloutsos 15-826 Topical HITS (TOPHITS) Main Idea: Extend the idea behind the HITS model to incorporate term (i.e., topical) information. term scores for 1st topic term scores for 2nd topic term to from authority scores for 1st topic authority scores for 2nd topic hub scores for 2nd topic hub scores for 1st topic

TOPHITS Terms & Authorities on Sample Data Faloutsos TOPHITS Terms & Authorities on Sample Data 15-826 TOPHITS uses 3D analysis to find the dominant groupings of web pages and terms. wk = # unique links using term k Tensor PARAFAC term scores for 1st topic term scores for 2nd topic term to from authority scores for 1st topic authority scores for 2nd topic hub scores for 2nd topic hub scores for 1st topic

P2: Anomaly detection in time-evolving graphs = Anomalous communities in phone call data: European country, 4M clients, data over 2 weeks ~200 calls to EACH receiver on EACH day! 1 caller 5 receivers 4 days of activity

P2: Anomaly detection in time-evolving graphs = Anomalous communities in phone call data: European country, 4M clients, data over 2 weeks ~200 calls to EACH receiver on EACH day! 1 caller 5 receivers 4 days of activity

P2: Anomaly detection in time-evolving graphs = Anomalous communities in phone call data: European country, 4M clients, data over 2 weeks ~200 calls to EACH receiver on EACH day! Miguel Araujo, Spiros Papadimitriou, Stephan Günnemann, Christos Faloutsos, Prithwish Basu, Ananthram Swami, Evangelos Papalexakis, Danai Koutra. Com2: Fast Automatic Discovery of Temporal (Comet) Communities. PAKDD 2014, Tainan, Taiwan.

GigaTensor: Scaling Tensor Analysis Up By 100 Times – 15-826 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Evangelos Papalexakis Abhay Harpale Christos Faloutsos KDD 2012

P3: N.E.L.L. analysis NELL: Never Ending Language Learner Q1: dominant concepts / topics? Q2: synonyms for a given new phrase? “Eric Clapton plays guitar” (48M) NELL (Never Ending Language Learner) Nonzeros =144M “Barrack Obama is the president of U.S.” (26M) (26M)

A1: Concept Discovery Concept Discovery in Knowledge Base

A1: Concept Discovery

A2: Synonym Discovery Synonym Discovery in Knowledge Base a1 a2 … aR (Given) subject (Discovered) synonym 1 (Discovered) synonym 2

A2: Synonym Discovery

Faloutsos 15-826 Conclusions Real data are often in high dimensions with multiple aspects (modes) Matrices and tensors provide elegant theory and algorithms I x J x K +…+ ~ a1 aR b1 c1 bR cR

References Inderjit S. Dhillon, Subramanyam Mallela, Dharmendra S. Modha: Information-theoretic co-clustering. KDD 2003: 89-98 T. G. Kolda, B. W. Bader and J. P. Kenny. Higher-Order Web Link Analysis Using Multilinear Algebra. In: ICDM 2005, Pages 242-249, November 2005. Jimeng Sun, Spiros Papadimitriou, Philip Yu. Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams, Proc. of the Int. Conf. on Data Mining (ICDM), Hong Kong, China, Dec 2006