Download presentation
Presentation is loading. Please wait.
1
Scott Michael Indiana University July 6, 2017
Performance Benchmarking of the R Programming Environment on Knight's Landing Scott Michael Indiana University July 6, 2017 Intro Slide
2
Who am I? Theoretical Astrophysicist NOT a statistician
HPC application optimization and performance tuning Lead the Research Analytics team in Research Technologies at Indiana University
3
Contributors IU Eric Wernert Jefferson Davis James McCombs Esen Tuna
TACC Bill Barth Tommy Minyard David Walling
4
Talk Overview Targeting productivity languages for Xeon Phi based architecture: Motivation and History Benchmark results and lessons learned The RHPCBenchmark package Future directions Conclusions
5
IU, The Stampede Supercomputer, and Xeon Phi
IU Research Technologies has a partnership with TACC collaborating on systems and support Stampede – largest XSEDE machine by core count Wrangler – data intensive computing and 20 PB out of region replication Jetstream – XSEDE production science cloud IU supports data intensive and “high productivity” languages on Stampede Including R, python, and Matlab Large transition between Stampede 1 & 2
6
Evolution of Xeon Phi Knight’s Corner Knight’s Landing
Coprocessor only Coprocessor or Self-hosted 1 TF peak (DP) 3 TF peak (DP) 8GB device + system memory 16GB MCDRAM + system memory
7
R Support on Stampede 1 & 2 Primary support on Stampede 1 for R
Support several methods for distributed R (pbdR, Rmpi, snow, etc.) R built in offload mode Configured R to use GPUs in portion of Stampede via HiPLAR However, much of the R workload on Stampede didn’t rely on KNC Stampede 1 Nodes 6,400 Interconnect FDR IB Filesystem 14 PB Lustre Node Configuration Processor Dual E “SandyBridge” Phi SE10P Memory 32GB DDR3 8GB GDDR5 Stampede 2 Nodes 4,200 Interconnect OmniPath v1 Node Configuration Processor Phi 7250 Memory 16GB GDDR4
8
R Performance on KNL KNL the sole processor on Stampede 2
Has shown good performance for large scale HPC codes (MD, climate, astro, etc.) How does KNL perform with a language like R?
9
KNL Architecture Intel(R) Xeon Phi(TM) CPU 1.60GHz (68 physical cores) Features of note for KNL Tiled architecture supporting 4 SMT threads per physical core
10
KNL Architecture (cont.)
Features of note for KNL 16GB on chip MCDRAM to act as fast memory can be configured into several modes
11
Benchmarking Strategy
Look at industry standard performance benchmarks for R on KNL and compare to SNB Further explore some exemplar workflows in each language and compare to benchmark results Compare both single node and multinode benchmarks
12
Benchmarking Strategy
R standard benchmark: R-25 benchmark Very old, fixed (small) problem sizes, report output challenging to parse Reasonable mix of mini-kernels focused on dense matrix operations and linear solvers R benchmark for scalability focused on similar kernels to R-25 Built to distribute and for flexibility, currently available on CRAN at RHPCBenchmark
13
Talk Overview Targeting productivity languages for Xeon Phi based architecture: Motivation and History Benchmark results and lessons learned The RHPCBenchmark package Future directions Conclusions
14
R Benchmark Results Generally R lacks multithreading (some exceptions include mclapply) so we rely on the threading in MKL Standard profiling/tracing tools are challenging to employ Instrumenting entire R interpreter creates too much overhead
15
R Benchmark Results Benchmarks include
Cholesky decomp, eigendecomp, LS fit, linear solve, QR decomp, matrix cross, matrix det, matrix-matrix, matrix-vector Multiple threads per core aren’t useful Contrast to KNC
16
R Benchmark Results For some benchmarks single core KNL outperforms SNB
17
R Benchmark Results Need large matrices to make full use of all 68 cores
18
R Benchmark Results For math intensive kernels R interpreter overhead isn’t bad
19
Talk Overview Targeting productivity languages for Xeon Phi based architecture: Motivation and History Benchmark results and lessons learned The RHPCBenchmark package Future directions Conclusions
20
RHPCBenchmark Package
The RHPCBenchmark initial release is available on CRAN Provides a variety of dense matrix, sparse matrix, and machine learning benchmarks Users can configure the set of benchmarks to run and benchmark parameters Results are provided in .csv files and a data frame for further analysis
21
Talk Overview Targeting productivity languages for Xeon Phi based architecture: Motivation and History Benchmark results and lessons learned The RHPCBenchmark package Future directions Conclusions
22
Next Steps for R Performance
Internode performance Higher level functions Many R packages don’t rely on the building blocks tested (e.g. nnet, cluster) Other classes of functions Sparse matrix operations Data wrangling operations
23
Talk Overview Targeting productivity languages for Xeon Phi based architecture: Motivation and History Benchmark results and lessons learned The RHPCBenchmark package Future directions Conclusions
24
Conclusions R performance on KNL better for dense matrix operations (3x SNB) and close to native C performance Performance is best for large matrices SNB does perform better for small matrices New RHPCBenchmark offers flexibility in benchmarking your hardware and R build
25
Questions? Suggestions?
Scott Michael James McCombs
26
Backups: KNL Speedup in R
27
Backups: KNL vs. IvyBridge
28
Backups: KNL Flat vs. Cached
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.