Presentation is loading. Please wait.

Presentation is loading. Please wait.

Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix Carlos Ordonez, Yiqun Zhang University of Houston, USA 1.

Similar presentations


Presentation on theme: "Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix Carlos Ordonez, Yiqun Zhang University of Houston, USA 1."— Presentation transcript:

1 Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix
Carlos Ordonez, Yiqun Zhang University of Houston, USA 1

2 Parallel architecture
Shared-nothing, message-passing N nodes Data partitioned before computation Examples: Parallel DBMSs, HDFS, MapReduce, Spark

3 3

4 Old: separate sufficient statistics
4

5 New: Generalizing and unifying Sufficient Statistics: Z=[1,X,Y]
5

6 Linear Algebra: Our main result for parallel and scalable computation

7 2-phase algorithm

8 Equivalent equations with projections from Gamma (descriptive, predictive)
8

9 Fundamental properties: non-commutative but distributive
9

10 Parallel Theoretical Guarantees of 
10

11 Dense matrix algorithm: O(d2 n)
11

12 Sparse matrix algorithm: O(d n) for hyper-sparse matrix
12

13 Pros: Algorithm evaluation with physical array operators
Since xi fits in one chunk joins are avoided (at least 2X I/O with hash or merge join) Since xi*xiT can be computed in RAM we avoid an aggregation (avoid sorting points by I) No need to store X twice: X, XT: half I/O, half RAM space No need transpose X, costly reorg even in RAM, especially if X spans several RAM segments C++ compiled code: fast; vector accessed once; direct assignment (bypass C++ calls) 13

14 Running on the cloud

15 Running in the cloud, 100 nodes

16 Conclusions One pass summarization matrix operator: parallel, scalable. Algorithm compatible with any parallel shared-nothing system, but better for array systems Optimization of outer matrix multiplication as sum (aggregation) of vector outer products Algorithm:Dense and sparse matrix versions required Gamma matrix must fit in RAM, but n unlimited ML methods to two phases: 1: Summarization, 2: Computing model parameters. Summarization matrix can be exploited in many intermediate computations. 16

17 Future work: Theory Study Gamma in other models like logistic regression, clustering, Factor Analysis, HMMs, Kalman filters Clustering Model: for frequent itemset Higher-order expected moments, co-variates Numeric stability with unnormalized sorted data: unlikely 17


Download ppt "Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix Carlos Ordonez, Yiqun Zhang University of Houston, USA 1."

Similar presentations


Ads by Google