Download presentation
Presentation is loading. Please wait.
Published byAbraham Gilbert Modified over 6 years ago
1
A Cloud System for Machine Learning Exploiting a Parallel Array DBMS
Carlos Ordonez Department of Computer Science University of Houston, USA
2
Our contribution A cloud analytic system for machine learning
shared-nothing architecture backed by a parallel array DBMS; no HDFS, Hadoop, Spark, etc. In-DBMS data summarization orders of magnitude performance improvement even wider gap with GPU acceleration
3
System components and data flow
4
2-phase algorithm
5
System components and data flow
Summarization
6
Defining the input data set X
7
Data summarization Finding a compact description of the data set
Very useful technique in machine learning Save space Save I/O Save execution time No accuracy sacrifice
8
What to summarize? Introducing sufficient statistics
Matrix product => Summation of vector outer products.
10
Dense matrix algorithm: O(d2 n)
10
11
Sparse matrix algorithm: O(d n) for hyper-sparse matrix
11
12
Array Storage in SciDB: by Chunks
13
Parallel computation Coordinator Worker 1 Worker 2 1 2 d 1 2 d 1 2 d
Coordinator Worker 1 Worker 2 send send
14
1 2 d 1 2 d OK NO! Coordinator 1 2 d Coordinator Worker 1 Worker 1
15
Linear speed up Let Tj be processing time using j nodes, where 1 ≤ j ≤ N. Under our main assumption and Θ fits in main memory then our optimized algorithm gets close to optimal speedup T1/TN ≈ O(N).
16
Space complexity and parallel speedup
16
17
Benchmark
19
Optimize summarization with GPU
Transfer Summarize Transfer GPU
20
Optimize summarization with GPU
The C++ operator code is annotated with OpenACC directives to work with GPU The CPU only does the I/O part in the current implementation. Data is transferred from host memory to device (GPU) memory The vector outer products are evaluated and aggregated on GPU, the result is then transferred back.
22
Time saved by summarizing on GPU
n = 1M d = 400
23
System components and data flow
Summarization Model
24
Linear regression
25
Computing LR, SciDB vs. Spark
26
Future work Approach applicable in any parallel DBMS Square matrices
Low-level GPU instructions to parallelize the vector outer products on GPU for Gamma. Improve fault tolerance during computation; avoid restarts
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.