Download presentation
Presentation is loading. Please wait.
1
Steve Woods, Solutions Architect
Accelerating Lustre! with Cray DataWarp Steve Woods, Solutions Architect
2
Accelerate Your Storage!
The Problem a new storage hierarchy DataWarp overview End user Perspectives Use cases Features Examples Configuration Considerations Summary
3
The Problem Buying Disk for Bandwidth is Expensive
HPC Wire, May 1, 2014 Attributed to Gary Grider, LANL
4
Cray Storage and Data Management - 2015
New Storage Hierarchy CPU CPU On Node Near Memory (HBM/HMC) Highest effective cost Lowest latency On Node Memory (DRAM) Far Memory (DRAM/NVDIMM) Burst Buffer Storage (HDD) Off Node Near Storage (SSD) Off Node Lowest effective cost Highest latency Traditional Far Storage (HDD) Today Cray Storage and Data Management
5
Cray Storage and Data Management - 2015
New Storage Hierarchy DataWarp Software defined storage High performance storage pool Sonexion Scalable file system Resilient storage Problem solved! Scale bandwidth separately from capacity Reduce overall solution cost Improve application run time CPU Near Memory (HBM/HMC) Far Memory (DRAM/NVDIMM) ✓ Bandwidth needed Near Storage (SSD) ✓ Capacity needed Far Storage (HDD) Cray Today Cray Storage and Data Management
6
Blending Flash with Disk For high Performance Lustre
Blended Solution DataWarp to satisfy the bandwidth needs Sonexion to satisfy the capacity needs Drives down the cost of bandwidth ($/GB/s) Sonexion-only Solution Lots of SSU’s for bandwidth Drives up the cost of bandwidth ($/GB/s)
7
DataWarp Overview Hardware Software Raw performance
Intel Server Block-based SSD Aries I/O blade Raw performance Software Virtualizes the underlying HW Single solution of flash & HDD Automation via policy Intuitive interface Harnesses the performance
8
Software Phases of DataWarp
Phase 0 (available 2014) Statically configured compute node swap Single server file systems, /flash/ Phase 1 (fall 2015) [CLE 5.2UP04 + patches] Dynamic allocation and configuration of DataWarp storage to jobs (WLM support) Application controlled explicit movement of data between DataWarp and parallel file system (stage_in and stage_out) DVS striping across DataWarp nodes Phase 2 (late 2016) [CLE 6.0UP02] DVS client caching Implicit movement of data between DataWarp and PFS storage (cache) No application changes required 9/19/2018 Copyright 2015 Cray Inc
9
DataWarp Hardware C C LN C C A A A C C LN C C C C DW C C A A A C C DW
Package Standard XC I/O blade SSDs instead of PCIe cables Plugs right into the Aries network Capacity 2 nodes per blade 2 SSD’s per node 12.6 TB’s per blade (shown) Performance Node processors are already optimized for I/O and the Cray Aries network C C LN HCA C C A A A Lustre storage C C LN HCA C C C C 3.2TB =12.6TB DW SSD C C A A A C C DW SSD C C
10
DataWarp Software User Application PFS DWFS Open Source FS
Service layer (DWS) – Defines the user experience Application PFS WLM Service layer (DVS) – Virtualizes I/O DataWarp Service Data Virtualization Service Distributed File system layer – Virtualizes the pool of Flash DWFS File presentation File presentation File presentation Open Source FS Logical Volume Manager Devices
11
DataWarp User Perspectives
Transparent New user No change to their experience e.g. PFS Cache Active Experienced user WLM script cmds Common for most use cases Optimized Power user Control Via Lib/CLI e.g. async workflow
12
DataWarp User Perspectives
Workload Manager Integration (WLM) Researcher/engineer inserts DataWarp commands into the job script “I need this much space in the DataWarp pool” “I need the space in DataWarp to be shared” “I need the results saved out to the Parallel File System” Job Script requests resources via WLM DataWarp capacity Compute nodes, files, file locations WLM automates clean up after the application completes WLM integration is the key Ease of use Dynamic provisioning
13
DataWarp User Perspectives
Supported Workload Managers SLURM Moab/Torque PBS-Pro Application PFS WLM DataWarp Service Data Virtualization Service DWFS XFS Logical Volume Manager Devices
14
Use Cases for DataWarp Shared Storage Local Storage Burst Buffer
Checkpoint Restart Local Cache for the PFS Transparent user model Private scratch space Swap space Reference files File interchange High performance scratch Shared Storage Local Storage Burst Buffer PFS Cache We’ll focus here
15
Use Cases for DataWarp Reference files DataWarp Read intensive
Cray HPC Compute Nodes DataWarp Nodes Shared Storage Reference files Read intensive commonly used by multi-compute nodes DataWarp Used directed behavior Automated provisioning of resources ISC 2016 Copyright 2016 Cray Inc.
16
Use Cases for DataWarp File interchange DataWarp
Cray HPC Compute Nodes DataWarp Nodes Shared Storage File interchange Sharing intermediate work DataWarp Used directed behavior Automated provisioning of resources ISC 2016 Copyright 2016 Cray Inc.
17
Use Cases for DataWarp High performance scratch DataWarp
Cray HPC Compute Nodes DataWarp Nodes Shared Storage High performance scratch Files are striped across the pool DataWarp User directed behavior Automated provisioning of resources ISC 2016 Copyright 2016 Cray Inc.
18
Use Cases for DataWarp Shared Storage Local Storage Burst Buffer
Checkpoint Restart Local Cache for the PFS Transparent user model Private scratch space Swap space Reference files File interchange High performance scratch Shared Storage Local Storage Burst Buffer PFS Cache
19
DataWarp Application Flexibility
Cray HPC Compute Nodes DataWarp Nodes Burst Sonexion Lustre Trickle Burst Buffer Cray HPC Compute Nodes DataWarp Nodes Shared Storage Cray HPC Compute Nodes DataWarp Nodes Local Storage Cray HPC Compute Nodes Sonexion Lustre DataWarp Nodes PFS Cache ISC 2016 Copyright 2016 Cray Inc.
20
#DW jobdw ... Requests a job DataWarp instance capacity=<size>
Lifetime the same as batch job Only usable by that batch job capacity=<size> Indirect control over server count based on granularity. Might help to request more space than you need. type=scratch Selects use of DWFS file system type=cache Selects use of DWCFS file system
21
#DW jobdw ... (continued) access_mode=striped access_mode=private
All compute nodes see the same filesystem Files are striped across all allocated DW server nodes Files are visible to all compute nodes using the instance Aggregates both capacity and bandwidth per file access_mode=private All compute nodes see a different filesystem Files only go to a single DW server node A compute node uses the same DW node and files only seen by that compute node access_mode=striped,private Two mount points created on each compute node Share the same space
22
Simple DataWarp job with Moab
#!/bin/bash #PBS -l walltime=2:00 -joe -l nodes=8 #DW jobdw type=scratch access_mode=striped capacity=790GiB . /opt/modules/default/init/bash module load dws dwstat most # show DW space available and allocated cd $PBS_O_WORKDIR aprun -n 1 df -h $DW_JOB_STRIPED # only visible on compute nodes IOR=/home/users/dpetesch/bin/IOR.XC aprun -n 32 -N 4 $IOR -F -t 1m -b 2g -o $DW_JOB_STRIPED/IOR_file 9/19/2018 Copyright 2015 Cray Inc
23
DataWarp scratch vs. cache
Scratch (phase 1) #!/bin/bash #PBS -l walltime=4:00:00 -joe -l nodes=1 #DW jobdw type=scratch access_mode=striped capacity=200GiB cd $PBS_O_WORKDIR export TMPDIR=$DW_JOB_PRIVATE NAST="/msc/nast20131/bin/nast20131 scr=yes bat=no sdir=$TMPDIR" ccmrun ${NAST} input.dat mem=16gb mode=i8 out=dw_out Cache (phase 2) #DW jobdw type=cache access_mode=striped pfs=/lus/scratch/dw_cache capacity=200GiB export TMPDIR=$DW_JOB_STRIPED_CACHE ccmrun ${NAST} input.dat mem=16gb mode=i8 out=dw_cache_out 9/19/2018 Copyright 2015 Cray Inc
24
DataWarp Bandwidth The DataWarp bandwidth seen by an application depends on multiple factors: Transfer size of the I/O requests Number of Active Streams (files) per DataWarp server (for File-per-Process I/O, equals number of processes) Number of DataWarp server nodes (which is related to capacity requested) Other activity on the DW server nodes Administrative and other user jobs. It is a shared resource.
25
Minimize Compute Residence Time with Data Warp
Timestep Writes Initial Data Load Final Data Writes Lustre Node Count Compute Wall Time Key Timestep Writes (DW) Compute Nodes Compute Nodes - Idle DW Post Dump I/O Time Lustre DataWarp Node Count DW Preload I/O Time DW DW Nodes Wall Time ISC 2016 Copyright 2016 Cray Inc.
26
DataWarp with MSC NASTRAN
Cray blog reference: Job wall clock reduced by 2x with DataWarp DataWarp Lustre Only ISC 2016 Copyright 2016 Cray Inc.
27
9/19/2018 Copyright 2015 Cray Inc
28
DataWarp Considerations
Know your workload Capacity requirement Bandwidth requirement Iteration interval Calculate ratio of DataWarp to Spinning disk % of calculated bandwidth needed by DW vs HDD Is excess bandwidth needed to sync to HDD % of storage capacity needed by DW to maintain performance – capacity for multiple iterations Budget
29
DataWarp Bottom Line It is about reducing “Time to Solution”
Returning control back to compute Reducing the cost of “Time to Solution”
30
DataWarp Summary 1 3 2 Faster time to insight Easy to use
Accelerates performance Dynamic Flexible 2
31
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.