Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix Carlos Ordonez, Yiqun Zhang University of Houston, USA 1
Parallel architecture Shared-nothing, message-passing N nodes Data partitioned before computation Examples: Parallel DBMSs, HDFS, MapReduce, Spark
3
Old: separate sufficient statistics 4
New: Generalizing and unifying Sufficient Statistics: Z=[1,X,Y] 5
Linear Algebra: Our main result for parallel and scalable computation
2-phase algorithm
Equivalent equations with projections from Gamma (descriptive, predictive) 8
Fundamental properties: non-commutative but distributive 9
Parallel Theoretical Guarantees of 10
Dense matrix algorithm: O(d2 n) 11
Sparse matrix algorithm: O(d n) for hyper-sparse matrix 12
Pros: Algorithm evaluation with physical array operators Since xi fits in one chunk joins are avoided (at least 2X I/O with hash or merge join) Since xi*xiT can be computed in RAM we avoid an aggregation (avoid sorting points by I) No need to store X twice: X, XT: half I/O, half RAM space No need transpose X, costly reorg even in RAM, especially if X spans several RAM segments C++ compiled code: fast; vector accessed once; direct assignment (bypass C++ calls) 13
Running on the cloud
Running in the cloud, 100 nodes
Conclusions One pass summarization matrix operator: parallel, scalable. Algorithm compatible with any parallel shared-nothing system, but better for array systems Optimization of outer matrix multiplication as sum (aggregation) of vector outer products Algorithm:Dense and sparse matrix versions required Gamma matrix must fit in RAM, but n unlimited ML methods to two phases: 1: Summarization, 2: Computing model parameters. Summarization matrix can be exploited in many intermediate computations. 16
Future work: Theory Study Gamma in other models like logistic regression, clustering, Factor Analysis, HMMs, Kalman filters Clustering Model: for frequent itemset Higher-order expected moments, co-variates Numeric stability with unnormalized sorted data: unlikely 17