1 / 24 Distributed Methods for High-dimensional and Large-scale Tensor Factorization Kijung Shin (Seoul National University) and U Kang (KAIST)

Slides:

Advertisements

Similar presentations

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Advertisements

Fast Algorithms For Hierarchical Range Histogram Constructions

Distributed Approximate Spectral Clustering for Large- Scale Datasets FEI GAO, WAEL ABD-ALMAGEED, MOHAMED HEFEEDA PRESENTED BY : BITA KAZEMI ZAHRANI 1.

Dimensionality Reduction PCA -- SVD

Yue Han and Lei Yu Binghamton University.

Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.

Solving Linear Systems (Numerical Recipes, Chap 2)

© 2011 IBM Corporation IBM Research SIAM-DM 2011, Mesa AZ, USA, Non-Negative Residual Matrix Factorization w/ Application to Graph Anomaly Detection Hanghang.

Dimensionality Reduction

Distributed Nonnegative Matrix Factorization for Web-Scale Dyadic Data Analysis on MapReduce Chao Liu, Hung-chih Yang, Jinliang Fan, Li-Wei He, Yi-Min.

Efficient and Numerically Stable Sparse Learning Sihong Xie 1, Wei Fan 2, Olivier Verscheure 2, and Jiangtao Ren 3 1 University of Illinois at Chicago,

Margin Based Sample Weighting for Stable Feature Selection Yue Han, Lei Yu State University of New York at Binghamton.

Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.

Abstract Shortest distance query is a fundamental operation in large-scale networks. Many existing methods in the literature take a landmark embedding.

Spatial and Temporal Databases Efficiently Time Series Matching by Wavelets (ICDE 98) Kin-pong Chan and Ada Wai-chee Fu.

Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)

FLANN Fast Library for Approximate Nearest Neighbors

Dimension Reduction and Visualization of Large High-Dimensional Data via Interpolation Seung-Hee Bae, Jong Youl Choi, Judy Qiu, and Geoffrey Fox School.

Identifying and Incorporating Latencies in Distributed Data Mining Algorithms Michael Sevilla.

Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.

 Optimal Packing of High- Precision Rectangles By Eric Huang & Richard E. Korf 25 th AAAI Conference, 2011 Florida Institute of Technology CSE 5694 Robotics.

Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.

WEMAREC: Accurate and Scalable Recommendation through Weighted and Ensemble Matrix Approximation Chao Chen ⨳ , Dongsheng Li

SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.

DisCo: Distributed Co-clustering with Map-Reduce S. Papadimitriou, J. Sun IBM T.J. Watson Research Center Speaker: 吳宏君陳威遠洪浩哲.

Wancai Zhang, Hailong Sun, Xudong Liu, Xiaohui Guo.

Introduction to tensor, tensor factorization and its applications

Training and Testing of Recommender Systems on Data Missing Not at Random Harald Steck at KDD, July 2010 Bell Labs, Murray Hill.

EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.

Independent Component Analysis (ICA) A parallel approach.

On Scaling Latent Semantic Indexing for Large Peer-to-Peer Systems Chunqiang Tang, Sandhya Dwarkadas, Zhichen Xu University of Rochester; Yahoo! Inc. ACM.

GAUSSIAN PROCESS FACTORIZATION MACHINES FOR CONTEXT-AWARE RECOMMENDATIONS Trung V. Nguyen, Alexandros Karatzoglou, Linas Baltrunas SIGIR 2014 Presentation:

Mining High Utility Itemset in Big Data

Decomposition-by-Normalization (DBN): Leveraging Approximate Functional Dependencies for Efficient Tensor Decomposition Mijung Kim (Arizona State University)

Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.

Community Grids Lab. Indiana University, Bloomington Seung-Hee Bae.

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Investigation of Various Factorization Methods for Large Recommender Systems G. Takács, I. Pilászy, B. Németh and D. Tikk 10th International.

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 5: Graphs over time & tensors Faloutsos,

Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.

Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.

Optimal Dimensionality of Metric Space for kNN Classification Wei Zhang, Xiangyang Xue, Zichen Sun Yuefei Guo, and Hong Lu Dept. of Computer Science &

ParCube: Sparse Parallelizable Tensor Decompositions

Practical Message-passing Framework for Large-scale Combinatorial Optimization Inho Cho, Soya Park, Sejun Park, Dongsu Han, and Jinwoo Shin KAIST 2015.

CSCI-455/552 Introduction to High Performance Computing Lecture 23.

Large-Scale Matrix Factorization with Missing Data under Additional Constraints Kaushik Mitra University of Maryland, College Park, MD Sameer Sheoreyy.

Large Scale Distributed Distance Metric Learning by Pengtao Xie and Eric Xing PRESENTED BY: PRIYANKA.

Yue Xu Shu Zhang.  A person has already rated some movies, which movies he/she may be interested, too?  If we have huge data of user and movies, this.

Kijung Shin Jinhong Jung Lee Sael U Kang

Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.

Unsupervised Streaming Feature Selection in Social Media

Facets: Fast Comprehensive Mining of Coevolving High-order Time Series Hanghang TongPing JiYongjie CaiWei FanQing He Joint Work by Presenter:Wei Fan.

Arizona State University1 Fast Mining of a Network of Coevolving Time Series Wei FanHanghang TongPing JiYongjie Cai.

Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir),

Matrix Factorization Reporter : Sun Yuanshuai

TensorFlow– A system for large-scale machine learning

Large Graph Mining: Power Tools and a Practitioner’s guide

Data Driven Resource Allocation for Distributed Learning

Distributed Network Traffic Feature Extraction for a Real-time IDS

Parallel Density-based Hybrid Clustering

Kijung Shin1 Mohammad Hammoud1

Parallel Matrix Operations

HPML Conference, Lyon, Sept 2018

RECOMMENDER SYSTEMS WITH SOCIAL REGULARIZATION

Scaling up Link Prediction with Ensembles

Fast and Exact K-Means Clustering

Asymmetric Transitivity Preserving Graph Embedding

Donghui Zhang, Tian Xia Northeastern University

Overview: Chapter 2 Localization and Tracking

Supporting Online Analytics with User-Defined Estimation and Early Termination in a MapReduce-Like Framework Yi Wang, Linchuan Chen, Gagan Agrawal The.

Presentation transcript:

1 / 24 Distributed Methods for High-dimensional and Large-scale Tensor Factorization Kijung Shin (Seoul National University) and U Kang (KAIST)

Overview  Introduction  Problem definition  Proposed method  Experiments  Conclusion 2 / 24 IntroductionProblem definitionProposed method Experiments Conclusion

Overview  A tensor is a high dimensional array  A tensor is partially observable if it contains missing (or unknown) entries 3 / 24 IntroductionProblem definitionProposed method Experiments Conclusion Mode length Observations A 3-dimensional tensor

Tensor (cont.)  Tensor data have become large and complex  Example: Movie rating data 4 / 24 IntroductionProblem definitionProposed method Experiments Conclusion  Increase in …  Dimension (context information)  Mode length (# users and # movies)  # observations (# reviews) Ann Tom Sam Up Cars Tangled

Tensor Factorization  Given a tensor, decompose the tensor into a core tensor and factor matrices whose product approximates the original tensor 5 / 24 IntroductionProblem definitionProposed method Experiments Conclusion

Tensor Factorization (cont.)  Factorizing partially observable tensors has been used in many data mining applications  Context-aware recommendation (A. Karatzoglou et al., 2010)  Social network analysis (D. M. Dunlavy et al., 2011)  Personalized Web search (J.-T. Sun et al., 2005)  Given a high dimensional and large-scale tensor, how can we factorize the tensor efficiently? 6 / 24 IntroductionProblem definitionProposed method Experiments Conclusion

Overview  Introduction  Problem definition  Proposed method  Experiments  Conclusion 7 / 24 IntroductionProblem definitionProposed method Experiments Conclusion

CP Decomposition 8 / 24  CP decomposition (Harshman et al., 1970)  Widely-used tensor factorization method  Given a tensor, CP decomposition factorizes the tensor into a sum of rank-one tensors IntroductionProblem definitionProposed method Experiments Conclusion 2 nd column set1 st column set

CP Decomposition (cont.) 9 / 24 IntroductionProblem definitionProposed method Experiments Conclusion

Overview  Introduction  Problem definition  Proposed method  Experiments  Conclusion 10 / 24 IntroductionProblem definitionProposed method Experiments Conclusion

Proposed methods  We propose two CP decomposition algorithms  CDTF: Coordinate Descent for Tensor Factorization  SALS: Subset Alternating Least Square  They solve higher-rank factorization through a series of lower-rank factorization  They are scalable with all the following factors:  dimension, # observations, mode length, and rank  They are parallelizable in distributed environments 11 / 24 IntroductionProblem definitionProposed method Experiments Conclusion

Coordinate Descent for Tensor Factorization (CDTF) 12 / 24 IntroductionProblem definitionProposed method Experiments Conclusion fixed Residual tensor

Subset Alternating Least Square (SALS) 13 / 24 IntroductionProblem definitionProposed method Experiments Conclusion fixed Residual tensor

Comparison 14 / 24  ALS is accurate but is not scalable  CDTF has much better scalability but has lower accuracy  In the view of optimization, CDTF optimizes one column set at a time, while ALS jointly optimizes column sets  SALS can enjoy both scalability and accuracy with proper IntroductionProblem definitionProposed method Experiments Conclusion MethodUpdate unit Time complexity Space complexity Accuracy ALS High CDTF (proposed) Low SALS (proposed) High

Parallelization in Distributed Environments 15 / 24  Both CDTF and SALS can be parallelized in distributed environments without affecting their correctness  Data distribution  The entries of a tensor are distributed into each machine IntroductionProblem definitionProposed method Experiments Conclusion Machine 1Machine 2Machine 3Machine 4

Parallelization in Distributed Environments (cont.) 16 / 24  Work distribution  Factors in each column are distributed and computed simultaneously  Computed factors are broadcasted to the other machines IntroductionProblem definitionProposed method Experiments Conclusion

Overview  Introduction  Problem definition  Proposed method  Experiments  Conclusion 17 / 24 IntroductionProblem definitionProposed method Experiments Conclusion

Experimental Settings  Cluster: a 40-node Hadoop cluster with maximum 8GB heap space per reducer  Competitors: distributed methods that can factorize partially observable tensors  ALS (Y. Zhou et al. 2008), FlexiFaCT (A. Beutel et al. 2014), PSGD (R. McDonald et al. 2008)  Datasets: 18 / 18 IntroductionProblem definitionProposed method Experiments Conclusion Size of synthetic datasets FactorS1S2S3S4 Dimension2345 Mode length300K1M3M10M # observations30M100M300M1B Rank K

Overall Scalability  Increase all factors (dimension, mode length, # observations, and rank) from S1 to S4  Only CDTF and SALS scale to S4, while the others fail  They require several orders of less memory space than their competitors 19 / 18 IntroductionProblem definitionProposed method Experiments Conclusion Running time / iter (min) * M: number of reducers / o.o.m. : out of memory / o.o.t.: out of time Required memory / reducer (MB) Running time Memory requirements

Scalability with Each Factor  Data scalability: when measuring the scalability w.r.t a factor, the factor is scaled up from S1 to S4, while all other factors are fixed at S2  Machine scalability: increase the number of reducers from 5 to 40 IntroductionProblem definitionProposed method Experiments Conclusion Method# observationsMode lengthRankDimension# machines CDTF OOOOO SALS OOOOO ALS OXXOO PSGD OXXOX FlexiFaCT OOOXX Due to the high memory requirements Due to the rapidly increasing communication cost CDTF and ALS are scalable with all the factors

Accuracy IntroductionProblem definitionProposed method Experiments Conclusion Test RMSE Elapsed time (min)

Overview  Introduction  Problem definition  Proposed method  Experiments  Conclusion 22 / 24 IntroductionProblem definitionProposed method Experiments Conclusion

23 / 24  CDTF and SALS  Distributed algorithms for tensor factorization  Solve higher-rank factorization through a series of lower-rank factorization  Scalable with dimension, # observations, mode length, rank, and # machines  Successfully factorize a 5-dimensional tensor with 10M mode length, 1B observations, and 1K rank IntroductionProblem definitionProposed method Experiments Conclusion

24 / 24 Thank you! IntroductionProblem definitionProposed method Experiments Conclusion Questions?

Backup slides: complexity analysis 25 / 24

Backup slides: scalability with each factor

Backup slides: FlexiFaCT 27

28