Download presentation
Presentation is loading. Please wait.
1
Scaling Spark on HPC Systems
Presented by: Jerrod Dixon
2
Outline HDFS vs Lustre MapReduce vs Spark Spark on HPC
Experimental Setup Results Conclusions
3
HDFS vs Lustre
4
Hadoop HDFS Distributed filesystem Multi-node replication
Direct communication with NameNode
5
Lustre Very popular filesystem for HPC systems Leverages
Management Server (MGS) Metadata Server (MDS) Object Storage Servers
6
Lustre Full POSIX support
Metadata Server informs clients where parts of file object are located Clients connect directly
7
MapReduce vs Spark
8
MapReduce Typical method of interacting on HDFS
Maps data in files to key-pairs Reduces to unique key with value
9
Spark Similar to overall methodology of MapReduce
Maintains processes in memory Distributes data across global and local scopes
10
Spark – Vertical Data Processes from disk only when final results requested Pulls from filesystems and works against data in batch methodolgy
11
Spark –Horizontal Data
Distributes work done across nodes as it is processed Similar distribution to HDFS replication, but force-kept in memory
12
Spark Operates primarily on Resilient Distributed Databases (RDDs)
Map processes can be nested but lazy Reduce operation forces processing Caching method to force map into memory Here, making note that for ‘lazy’ means that spark does not execute transformations until data is needed
13
Spark on HPC
14
Spark on HPC Spark designed for HDFS Works on data in batches
Expects partial data on local disk Executes jobs as results requested Works on data in batches Vertical Data movement
15
Experimental Setup
16
Hardware Edison and Cori Cray XC supercomputers at NERSC
Edison uses 5,576 compute nodes Each has two 2.4 GHz 12-core Intel “Ivy Bridge” processors Cori uses 1,630 compute nodes Each has two 2.3 GHz 16-core Intel “Haswell” processors.
17
Edison cluster Leverages Lustre Standard implementation
Single MDS, single MDT
18
Cori Cluster Leverages Luster Leverages BurstBuffer
Accelerates I/O performance
19
BurstBuffer Sits between memory and Lustre
Stores frequently accessed files to improve I/O
20
Results
21
Single Node Clear bottle-neck in communicating with disk
22
Multi-node file I/O
23
BurstBuffer
24
GroupBy Benchmark 16 nodes (384 cores) Edison weak scaling
Partitions must exchanged with partitions shm – memory mapped storage
25
GroupBy Benchmark Cori specific
26
Impact of BurstBuffer Increase in mean time till operation
Lower variability in access time
27
Conclusions
28
No mention of .persist() .cache()
Spark memory management to preserve processed partitions after eviction .cache() Mask of .persist() with bare basic parameters MEMORY_ONLY mode
29
Conclusions Clear limitations to using Lustre as filesytems
Increases in access time, decreases in processing, BurstBuffer helps but only with certain amount of nodes No discussions on Spark methods to overcome issues
30
Issues Weak scaling covered extensively
Strong scaling covered almost not at all No comparisons to equivalent work on HDFS system Spark is designed for HDFS, comparing work done on HPC to standard HDFS implementation seems intuitive
31
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.