Decomposition-by-Normalization (DBN): Leveraging Approximate Functional Dependencies for Efficient Tensor Decomposition Mijung Kim (Arizona State University)

Slides:



Advertisements
Similar presentations
Partitional Algorithms to Detect Complex Clusters
Advertisements

An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Bahman Bahmani  Fundamental Tradeoffs  Drug Interaction Example [Adapted from Ullman’s slides, 2012]  Technique I: Grouping 
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
Baselines for Recognizing Textual Entailment Ling 541 Final Project Terrence Szymanski.
An Optimal Algorithm of Adjustable Delay Buffer Insertion for Solving Clock Skew Variation Problem Juyeon Kim, Deokjin Joo, Taehan Kim DAC’13.
OpenFOAM on a GPU-based Heterogeneous Cluster
Lecture 21: Spectral Clustering
Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Supporting Queries with Imprecise Constraints Ullas Nambiar Dept. of Computer Science University of California, Davis Subbarao Kambhampati Dept. of Computer.
Balanced Graph Partitioning Konstantin Andreev Harald Räcke.
1 Distributed Databases CS347 Lecture 14 May 30, 2001.
A scalable multilevel algorithm for community structure detection
Job Scheduling Lecture 19: March 19. Job Scheduling: Unrelated Multiple Machines There are n jobs, each job has: a processing time p(i,j) (the time to.
Research at Intel Distributed Localization of Modular Robot Ensembles Robotics: Science and Systems 25 June 2008 Stanislav Funiak, Michael Ashley-Rollman.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract.
Approximation Algorithms for MAX-MIN tiling Authors Piotr Berman, Bhaskar DasGupta, S. Muthukrishman S. Muthukrishman Published on Journal of Algorithms,
Answering Imprecise Queries over Autonomous Web Databases Ullas Nambiar Dept. of Computer Science University of California, Davis Subbarao Kambhampati.
Distributed Constraint Optimization * some slides courtesy of P. Modi
Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.
Building Efficient Time Series Similarity Search Operator Mijung Kim Summer Internship 2013 at HP Labs.
Hashed Samples Selectivity Estimators for Set Similarity Selection Queries.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Network Aware Resource Allocation in Distributed Clouds.
S DTW: COMPUTING DTW DISTANCES USING LOCALLY RELEVANT CONSTRAINTS BASED ON SALIENT FEATURE ALIGNMENTS K. Selçuk Candan Arizona State University Maria Luisa.
1 Converting Categories to Numbers for Approximate Nearest Neighbor Search 嘉義大學資工系 郭煌政 2004/10/20.
A Fast Clustering-Based Feature Subset Selection Algorithm for High- Dimensional Data.
Independent Component Analysis (ICA) A parallel approach.
Pairwise Document Similarity in Large Collections with MapReduce Tamer Elsayed, Jimmy Lin, and Douglas W. Oard Association for Computational Linguistics,
Selective Block Minimization for Faster Convergence of Limited Memory Large-scale Linear Models Kai-Wei Chang and Dan Roth Experiment Settings Block Minimization.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
Mining High Utility Itemset in Big Data
K. Selçuk Candan, Maria Luisa Sapino Xiaolan Wang, Rosaria Rossini
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
Palette: Distributing Tables in Software-Defined Networks Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
The Sweet Spot between Inverted Indices and Metric-Space Indexing for Top-K–List Similarity Search Evica Milchevski , Avishek Anand ★ and Sebastian Michel.
Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
An Efficient Linear Time Triple Patterning Solver Haitong Tian Hongbo Zhang Zigang Xiao Martin D.F. Wong ASP-DAC’15.
Real-Time Support for Mobile Robotics K. Ramamritham (+ Li Huan, Prashant Shenoy, Rod Grupen)
Patch Based Prediction Techniques University of Houston By: Paul AMALAMAN From: UH-DMML Lab Director: Dr. Eick.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Practical Message-passing Framework for Large-scale Combinatorial Optimization Inho Cho, Soya Park, Sejun Park, Dongsu Han, and Jinwoo Shin KAIST 2015.
Data Structures and Algorithms in Parallel Computing
Exponential random graphs and dynamic graph algorithms David Eppstein Comp. Sci. Dept., UC Irvine.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
 In the previews parts we have seen some kind of segmentation method.  In this lecture we will see graph cut, which is a another segmentation method.
Unsupervised Streaming Feature Selection in Social Media
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Nawanol Theera-Ampornpunt, Seong Gon Kim, Asish Ghoshal, Saurabh Bagchi, Ananth Grama, and Somali Chaterji Fast Training on Large Genomics Data using Distributed.
1 Double-Patterning Aware DSA Template Guided Cut Redistribution for Advanced 1-D Gridded Designs Zhi-Wen Lin and Yao-Wen Chang National Taiwan University.
1 / 24 Distributed Methods for High-dimensional and Large-scale Tensor Factorization Kijung Shin (Seoul National University) and U Kang (KAIST)
Provable Learning of Noisy-OR Networks
Data Driven Resource Allocation for Distributed Learning
Optimizing Parallel Algorithms for All Pairs Similarity Search
Maximal Planar Subgraph Algorithms
Boosted Augmented Naive Bayes. Efficient discriminative learning of
Efficient Image Classification on Vertically Decomposed Data
Parallel Algorithm Design
Network Flow.
Efficient Image Classification on Vertically Decomposed Data
Randomized Algorithms CS648
Spectral Clustering Eric Xing Lecture 8, August 13, 2010
Jongik Kim1, Dong-Hoon Choi2, and Chen Li3
Presentation transcript:

Decomposition-by-Normalization (DBN): Leveraging Approximate Functional Dependencies for Efficient Tensor Decomposition Mijung Kim (Arizona State University) K. Selçuk Candan (Arizona State University) This work is supported by an NSF Grant # ‘MiNC: NSDL Middleware for Network- and Context-aware Recommendations’ and the NSF Grant # `RanKloud: Data Partitioning and Resource Allocation Strategies for Scalable Multimedia and Social Media Analysis 1

Tensor Decomposition 2 Tensor is a high-dimensional array Tensor decomposition is widely used for multi- aspect data analysis for multi-dimensional data

High cost of tensor decomposition Data is commonly high-dimensional and large- scale Dense tensor decomposition  The cost increases exponentially with the number of modes of the tensor. Sparse tensor decomposition  The cost increases more slowly (linearly with the number of nonzero entries in the tensor)  But still be very expensive for large data sets.  Parallelization for ALS method faces difficulties such as communication cost. How do we tackle this high computational cost of tensor decomposition? 3

Normalization Reduce the dimensionality and the size of the input tensor  based on functional dependencies (FD) of the relation (tensor) 4

Join-by-Decomposition [ Kim and Candan 2011] Step 1a: Decomposition of (user, movie, rating) relation Step 1b: Decomposition of (movie, genre) relation Step 2: Combination of the two decompositions into a final decomposition 5 M. Kim and K. S. Candan. Approximate tensor decomposition within a tensor- relational algebraic framework. In CIKM, Find all rank-R1 and rank-R2 decompositions of the two input tensors, where R1 × R2 = R and choose one pair where two decompositions are as independent from each other as possible.

Decomposition-by-Normalization (DBN) High-dimensional data set (5-mode tensor)

Decomposition-by-Normalization (DBN) High-dimensional data set (5-mode tensor) Normalization based on functional dependencies (vertical partitioning)

Decomposition-by-Normalization (DBN) 8 High-dimensional data set (5-mode tensor) Lower-dimensional data sets (two 3- mode tensors) Normalization based on functional dependencies (vertical partitioning)

Decomposition-by-Normalization (DBN) 9 High-dimensional data set (5-mode tensor) Lower-dimensional data sets (two 3- mode tensors) Tensor decomposition on each vertical partition (sub-tensor) Normalization based on functional dependencies (vertical partitioning)

Decomposition-by-Normalization (DBN) 10 High-dimensional data set (5-mode tensor) Lower-dimensional data sets (two 3- mode tensors) Tensor decomposition on each vertical partition (sub-tensor) Combined into the decomposition of the original data set (tensor) Normalization based on functional dependencies (vertical partitioning)

Task 1: Normalization Process 11 Y. Huhtala et al. TANE: An ecient algorithm for discovering functional and approximate dependencies. Comput. J., 42 (2): , 1999.

Task 2: Find Approximate FD 12 Many data sets may not have perfect FDs to leverage for normalization Thus we rely on approximate FDs in the data with support (the minimum fraction of tuples that must be removed for FDs to hold)

Task 3: Partitioning Partition the data into two partitions that will lead to least amount of errors. Find the partitions as independent from each other as possible.  minimize inter-partition (between partitions) pair- wise FDs  maximize intra-partition (within partitions) pair- wise FDs 13

Parallelized DBN We parallelize the entire DBN operation by associating each pair of rank decompositions to an individual processor core 14 Rank-1 × Rank-12Rank-2 × Rank-6Rank-3 × Rank-4 Rank-4 × Rank-3Rank-6 × Rank-2Rank-12 × Rank-1 Rank-12 Each pair can run in a separate core in a parallel manner.

Desiderata The vertical partitioning should be s.t.:  Approx. FDs need to have high support to prevent over- thinning of the relation R.  Case 1: join attribute X determines only a subset of the attributes of the relation R) (|R|=|R2|, |R1|<=|R2|) For dense tensors, the number of attributes in each partition should be balanced For sparse tensors, the total number of tuples of R1 and R2 are minimized  Case 2: join attribute X determines all attributes of the relation R (|R|=|R1|=|R2|) The support for the inter-partition FDs are minimized. (For dense tensors, the partitions should be balanced.) 15

Vertical Partitioning Strategies Partition with all the attributes determined with a support higher than the threshold (support) by the join attribute. 16

Desiderata The vertical partitioning should be s.t.:  Approx. FDs need to have high support to prevent over- thinning of the relation R.  Case 1: join attribute X determines only a subset of the attributes of the relation R) (|R|=|R2|, |R1|<=|R2|) For dense tensors, the number of attributes in each partition should be balanced For sparse tensors, the total number of tuples of R1 and R2 are minimized  Case 2: join attribute X determines all attributes of the relation R (|R|=|R1|=|R2|) The support for the inter-partition FDs are minimized. (For dense tensors, the partitions should be balanced.) 17

Vertical Partitioning Strategies (Case 1: join attribute X determines only a subset of the attributes of the relation R (|R|=|R2|, |R1|<=|R2|) ) Sparse tensors  The size of R1 (X and all determined attributes) can be minimized down to the number of unique values of X by eliminating all the duplicate tuples. Dense tensors  Promote balanced partitioning by relaxing or tightening the support threshold. If # attr. of R2 > # attr. of R1, move the attributes with the highest support of R2 to R1 (relaxing) or if # attr. of R1 > # attr. of R2, move the attributes with the lowest support of R1 to R2 (tightening). 18

Desiderata The vertical partitioning should be s.t.:  Approx. FDs need to have high support to prevent over- thinning of the relation R.  Case 1: join attribute X determines only a subset of the attributes of the relation R) (|R|=|R2|, |R1|<=|R2|) For dense tensors, the number of attributes in each partition should be balanced For sparse tensors, the total number of tuples of R1 and R2 are minimized  Case 2: join attribute X determines all attributes of the relation R (|R|=|R1|=|R2|) The support for the inter-partition FDs are minimized. (For dense tensors, the partitions should be balanced.) 19

Vertical Partitioning Strategies (Case 2: join attribute X determines all attributes of the relation R) We formulate the interFD-based partitioning as a graph partitioning problem.  pairwise FD graph, G pfd (V, E), where each vertex represents an attribute and the weight of the edge the average support of the approximate FDs between the attr.  The problem is then to locate a cut on G pfd with the minimum average weight. (For dense tensors, balance criterion is imposed)  We use a modified version of a minimum cut algorithm [Stoer and Wagner 1997] to seek a minimum average cut. 20 M. Stoer and F. Wagner. A simple min-cut algorithm. J. ACM, 44 (4):585-59, 1997

Rank Pruning based on Intra-Partition Dependencies The higher the overall dependency between the attributes in a partition, the smaller should be the decomposition rank of that partition. Thus, we only consider rank pairs (r1, r2) s.t. r1 < r2 if intra-partition FD support for R1 is larger than the support for R2, and vice versa. 21

Experimental Setup (Data Sets) UCI Machine Learning Repository [Frank and Asuncion 2010] 22 A. Frank and A. Asuncion. UCI Machine Learning Repository. Irvine, CA: U. of California, School of ICS, 2010.

Experimental Setup (Algorithms) NNCP (Non-Negative CP) vs. DBN Dense tensor [N-way Toolbox 2000]  NNCP-NWAY vs. DBN-NWAY Sparse tensor [MATLAB Tensor Toolbox 2007]  NNCP-CP vs. DBN-CP With parallelization  NNCP-NWAY/CP-GRID2,6 [Phan and Cichocki 2011] vs. pp-DBN-NWAY/CP DBN with intraFD-based rank pruning  DBN2,3 (2 pairs or 3 pairs selection) 23 C. A. Andersson and R. Bro. The N-way Toolbox for MATLAB. Chemometr. Intell. Lab., 52(1):1-4, B. W. Bader and T. G. Kolda. MATLAB Tensor Toolbox Ver. 2.2, A. H. Phan and A. Cichocki. PARAFAC algorithms for large-scale problems. Neurocomputing, 74(11): , 2011.

Experimental Setup (rank) rank-12 decomposition  DBN uses 6 combinations (1×12, 2×6, 3×4, 4×3, 6×2, and 12×1) 24

Experimental Setup (H/W and S/W) H/W  6 cores Intel(R) Xeon(R) CPU 2.66GHz with 24GB of RAM. S/W  MATLAB Version (R2010b) 64-bit (glnxa64) for the general implementation  MATLAB Parallel Computing Toolbox for the parallel implementation of DBN and NNCP 25

26 Key Results: Running Time (Dense Tensor) ( Case 1: join attribute X determines only a subset of the attributes of the relation R (|R|=|R2|, |R1|<=|R2|) ) NNCP vs. DBN with parallelization

27 NNCP vs. DBN and DBN2,3 (DBN2,3: DBN with intraFD- based rank pruning) Key Results: Running Time (Sparse Tensor) (Case 1: join attribute X determines only a subset of the attributes of the relation R (|R|=|R2|, |R1|<=|R2|) ) NNCP vs. DBN with parallelization

28 NOTE: In both cases, most of data points are located under the diagonal, which indicates that DBN outperforms NNCP. Key Results: Running Time (Case 2: join attribute X determines all attributes of the relation R (|R|=|R1|=|R2|)) Dense Tensor NNCP vs. DBN3 with parallelization Sparse Tensor NNCP vs. DBN3 with parallelization

Key Results: Accuracy 29 NOTE: THE HIGHER THE BETTER!!!!!!!

InterFD-based vertical partitioning 30 Note: The higher the closer to the optimal partitioning strategy!

intraFD-based rank pruning strategy 31 Note: The higher the better intraFD- based rank pruning works!

Lifecycle of data requires capture, integration, projection, decomposition, and data analysis. Tensor decomposition is a costly operation. We proposed:  highly efficient, effective, and easily parallelizable decomposition-by-normalization strategy for approximately evaluating decompositions  interFD-based partitioning  intraFD-based rank pruning strategies Conclusions 32