Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future.

Similar presentations


Presentation on theme: "Ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future."— Presentation transcript:

1 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 1 Feb 25, 2016 Steve Crusan IME: Breaking free from I/O bottlenecks with Burst Buffer technology

2 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 2 Main challenge in HPC Storage?

3 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 3 Parallel File System Read High client concurrency Random highly threaded to many files metadata intensive aligned Small file Storage Write Large file Sequentia l to one file not aligned IO intensive small IO large IO

4 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 4 Agenda ► IME SW architecture and features review Data Workflow, read and write Major features ► IME Testbed - Benchmark Review summary of the best of our benchmarks synthetic application ► User view – how do I use it? ► Summary

5 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 5 5 IME Architecture

6 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 6 DDN | IME Application I/O Workflow Fast Data NVM & SSD Persistent Data (Disk) Diverse, high concurrency applications Compute Lightweight IME client intercepts application I/O. Places fragments into buffers + parity IME client sends fragments to IME servers IME servers write buffers to NVM and manage internal metadata IME servers write aligned sequential I/O to SFA backend Parallel File system operates at maximum efficiency

7 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 7 DDN | IME Application I/O Workflow Fast Data NVM & SSD Persistent Data (Disk) Diverse, high concurrency applications COMPUTE Lightweight IME client passes fragments to application IME server sends fragments to IME clients IME servers write buffers to NVM and manage internal metadata IME prefetches data based upon scheduler request Parallel File system acts as persistent store for data

8 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 8 8 IME – Deeper Dive

9 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 9 IME Architecture Distributed Hash Table Log Structured Filesystem IME Client application interfaces extreme update rates decentralized fault tolerant scalable POSIX and MPI-IO interfaces place data anywhere dynamic data placement improve flash lifetime creates erasure coding manages data to/from app manages buffers for IO replay to PFS manages fast rebuilds PFS client IME Server

10 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 10 IME Architecture ► No application changes required ► Applications must be recompiled/relinked to utilize IME Distributed Hash Table Log Structured Filesystem IME Client application interfaces extreme update rates decentralized fault tolerant scalable POSIX and MPI-IO interfaces place data anywhere dynamic data placement improve flash lifetime creates erasure coding manages data to/from app manages buffers for IO replay to PFS manages fast rebuilds PFS client IME Server

11 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 11 IME Architecture ► Client buffers IO fragments and sends to IME servers ► also (optionally) creates parity buffers according to RAID scheme chosen ► sends application and data to IME servers such that loss of a server or SSD does not result in data loss ► Heuristics in IME clients can intelligently pre-fetch IME-resident data to applications Distributed Hash Table Log Structured Filesystem IME Client application interfaces extreme update rates decentralized fault tolerant scalable POSIX and MPI-IO interfaces place data anywhere dynamic data placement improve flash lifetime creates erasure coding manages data to/from app manages buffers for IO replay to PFS manages fast rebuilds PFS client IME Server

12 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 12 IME Architecture ► DHT is at the core of IME ► distributed index manages locations of files and objects ► Unique routing and data placement properties differentiate IME DHT from other DHTs ► Fast O(1) routing algorithm built on high- performance, non- cryptographic hash algorithms ► Load is uniformly distributed across DHT nodes Distributed Hash Table Log Structured Filesystem IME Client application interfaces extreme update rates decentralized fault tolerant scalable POSIX and MPI-IO interfaces place data anywhere dynamic data placement improve flash lifetime creates erasure coding manages data to/from app manages buffers for IO replay to PFS manages fast rebuilds PFS client IME Server

13 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 13 IME Architecture ► Data can be placed anywhere within IME – any server, any SSD ► IOPs to SSDs are minimized and optimized for NAND Flash Distributed Hash Table Log Structured Filesystem IME Client application interfaces extreme update rates decentralized fault tolerant scalable POSIX and MPI-IO interfaces place data anywhere dynamic data placement improve flash lifetime creates erasure coding manages data to/from app manages buffers for IO replay to PFS manages fast rebuilds PFS client IME Server

14 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 14 IME Architecture ► Actively reorganize I/Os to coalesce small block I/Os into large chunks and perform full stripe alignment before submission to the back end, Read acceleration is almost the reverse process ► Protect data by disseminating erasure coded chunks across the IME appliance cluster ► Pre-fetch data into IME from PFS using out-of- band commands and APIs Distributed Hash Table Log Structured Filesystem IME Client application interfaces extreme update rates decentralized fault tolerant scalable POSIX and MPI-IO interfaces place data anywhere dynamic data placement improve flash lifetime creates erasure coding manages data to/from app manages buffers for IO replay to PFS manages fast rebuilds PFS client IME Server

15 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 15 IME Dataflow IME server Application IME Client Parallel Filesystem 1. Application issues fragmented, misaligned IO 2. IME clients send fragments to IME servers 3. Fragments to IME servers accessible via DHT to all clients 4. Fragments to be flushed from IME are assembled into PFS stripes 5. PFS receives complete PFS stripe Application IME Client

16 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 16 IME Erasure Coding ► Data protection against IME server or SSD Failure is optional (the lost data is "just cache”) ► Erasure Coding calculated at the Client Great scaling with extremely high client count Servers don't get clogged up ► Erasure coding does reduce useable Client bandwidth and useable IME capacity: 3+1: 56Gb  42Gb 5+1: 56Gb  47Gb 7+1: 56Gb  49Gb 8+1: 56Gb  50Gb COMPUTE NODES IME PFS PARALLEL FILE SYSTEM & STORAGE PFS Data buffer Parity buffer Data buffer

17 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 17 IME Testbed Reviewing the Benchmarks

18 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 18 DDN IME Platforms ► IME SuperMicro Testbed Platform Runs IME server software 2P Ivy Bridge, 2x FDR, 2 RU 24 2.5” SATA SSDs 128 – 256 GB DRAM @1866 10.5 GB/s throughput per server ► IME14K Appliance Two controllers in 4 RU 2P Haswell / Broadwell, FDR / EDR 48 NVMe SSDs (V1.0.0) and 72 SAS SSDs (1H’16) 50 GB/s raw throughput per appliance o Erasure coding topology will impact achievable peak performance All benchmarks run on this or similar so far...

19 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 19 DDN IME Testbed @ TACC (Q’3 2014) IME Scaling Evaluation ► 8 IME Tesbed Servers (installed with the IME Testbed SW release) Deployed on TACC’s Stampede Cluster (~45 TB capacity, ~80 GB/s peak w/ IOR, no data protection) ► IME Testbed Servers attached to Stampede’s Mellanox FDR IB fabric ► PFS results generated using a non-production, non-empty Lustre file system (non-DDN hardware, ~18 GB/s peak w/ IOR) ► Experiments executed over several six hour, 1024 node reservations on Stampede while system was in production IME: 10.3 – 76.0% of peak PFS: 6.7 – 18.3% of peak IME: 78.5 – 91.5% of peak PFS: 12.2 – 96.1% of peak IME: 89.8 – 94.3% of peak PFS: 39.4 – 66.1% of peak

20 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 20 DDN IME Testbed @ ICHEC (Q’3 2014) ELAPSED TIME FOR A SINGLE SHOT FOR DIFFERENT TESTS CASES IN CHARGE BEHAVIOR FOR LUSTRE AND IME USING SMALL TEST CASE ICHEC researchers benchmarked their I/O bound, 3D seismic imaging application against IME and Lustre. Processing a 100 MB seismic image requires 1 TB of data movement. Typical surveys consist of 1000s of images. Therefore, 1 PB of data movement is expected by this application for a typical survey. Left: Single “shot” profiles that include compute, communication, and IO time (normalized against Lustre baseline). Right: Wall clock time for execution of multiple concurrent “shots.”

21 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 21 IOR: testing IME with small IO sizes ► This is the strongest test case for IME high concurrency any IO size, including very small single shared file

22 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 22 IME – How difficult is it to use?

23 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 23 Building the Application Makefile.h using stock MPI-IO and POSIX IO libraries Path to MPI Parallel HDF5 library Parallel netCDF library

24 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 24 Building the Application Enabling MPI-IO support requires building the application and dependent I/O libraries with IME support Path to IME-capable MPI Parallel HDF5 library built with IME-capable MPI Parallel netCDF library built with IME-capable MPI

25 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 25 Choosing the IO Path IME Bypass IME through POSIX mountpoint IME through MPI-IO /mnt/lustre_client is backend file system mount point /ime is IME projection mount point of /mnt/lustre_client backend file system ime:/ prefix routes IO requests into IME ROMIO driver

26 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 26 IME14K Architecture

27 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 27 IME14K Architecture 52.7GB/s IME Interface Virtualization Cache IME Interface Virtualization Cache PCI-E Fabric ► Active/Active IME Controllers 2 x Dual Socket E5-26xx v3 (Haswell) 128GB DDR4 12 x FDR/EDR IB up to 48 NVMe SSDs 4RU Total 6 uplinks per controller IME Server SW parallelized across cores Extreme bandwidth and IOPs to NVMe SSDs

28 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 28 IME Roadmap & What's New

29 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 29 What is Infinite Memory Engine (IME)? Non-Volatile Memory Tier ► What problems does IME solve? Traditional parallel file system (PFS) I/O has numerous application/platform performance limiters. DDN's Infinite Memory Engine (IME) reduces the PFS and disk subsystem bottlenecks, providing a new option in the memory/storage hierarchy between system RAM and disk-backed parallel file systems. Many use cases that are battered by disk-seek or parallel file system constraints can be freed by IME. Client data residing in IME is erasure coded to withstand failure of IME Servers. ► How? IME v1.0 provides a file level buffer cache for an underlying parallel file system. It accelerates checkpoint and application I/O via efficient write and read caching of data within the parallel file system namespace it is placed on.

30 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 30 Integration with HPC job schedulers using IME’s out-of-band, command line data management tools Goal is to ensure input data sets needed by an application are resident in IME before the job reservation begins Additional data can be moved into IME on- demand from the PFS during application execution Important data sets can be exported from IME to the PFS during or after job completion, and temporary applications data sets can be urged from IME Enable stage-in / stage-out capabilities for single applications or managing data sets for workflows of multiple applications How Does IME Integrate into a Workflow?

31 ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. 31 2929 Patrick Henry Drive Santa Clara, CA 95054 1.800.837.2298 1.818.700.4000 company/datadirect-networks @ddn_limitless sales@ddn.com Thank You! Keep in touch with us


Download ppt "Ddn.com © 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future."

Similar presentations


Ads by Google