Download presentation
Presentation is loading. Please wait.
Published byCayla Haywood Modified over 10 years ago
1
Challenges in Getting Flash Drives Closer to CPU Myoungsoo Jung (UT-Dallas) Mahmut Kandemir (PSU) The University of Texas at Dallas
2
Take-away Leveraging PCIe bus as storage interface – ≠ conventional memory system interconnects – ≠ thin storage interfaces – Requires new SSD architecture and storage stack Motivation: there are not many studies focusing on the system characteristics of these emerging PCIe SSD platforms. Contributions: we quantitatively analyze the challenges faced by PCIe SSDs in getting flash memory closer to CPU 1.Memory consumption 2.Computation resource requirement 3.Performance as a shared storage system 4.Latency impact on their storage-level queuing mechanisms
3
Bandwidth Trend Bandwidth improvement (150MB/s ~ 600MB/s)
4
Bandwidth Trend SSDs have improved their bandwidth 4x SSDs begin to blur the distinction between block and memory access semantic devices
5
Flash Storage Migration Core Flash Core Taking SSDs out from the I/O controller hub and locating them as close to the CPU side as possible Interface Bottleneck PCIe interface is by far one of the easiest ways to integrate flash memory into the processor-memory complex
6
Flash Integration 1.Bridge-based PCIe SSD (BSSD) 2.From-scratch PCIe SSD (FSSD)
7
Bridge-based PCIe SSD (BSSD) multiple traditional SAS/SATA SSD controllers Bridge controller exposing an aggregated SAS/SATA SSD performance RC = Root Complex, CTRL = Controller EP = Endpoint, HBA = Host Block Adapter
8
Bridge-based PCIe SSD (BSSD) High Compatibility Fast Development Process Redundant Control Logics Computational Overheads En-decoding Overheads PROS CONS RC = Root Complex, CTRL = Controller EP = Endpoint, HBA = Host Block Adapter
9
From-scratch PCIe SSD (FSSD) PCIe endpoints (EPs) has upstream and downstream buffers, which control in- bound and out-bound I/O requests PCIe EPs and switch are implemented as a form of native PCIe controller FSSD has been built bottom to top by directly interconnecting the NAND flash interface and the external PCIe link Point-to-point PCIe link network RC = Root Complex, CTRL = Controller EP = Endpoint, HBA = Host Block Adapter
10
From-scratch PCIe SSD (FSSD) Highly scalable Exposing flash performance Protocol design/implementation Tailoring SW/HW Resource competition PROS CONS RC = Root Complex, CTRL = Controller EP = Endpoint, HBA = Host Block Adapter
11
Flash Software Stack File System Block Storage Layer HBA Device Driver Host Interface Layer (NVMHC) Flash Software (FTL) Hardware Abstraction Layer Database Logical Block I/O Interface Host Storage Buffer cache Address mapping Wear-leveling
12
Experimental Setup Host configuration – Quad Core i7 Sandy Bridge 3.4GHz – External extra HDD (for logging the footprints) – 16GB Memory (4GB DDR3-1333 DIMM * 4) most performance values observed with FSSD are about 40% better than BSSD
13
Tool Synthesized micro-benchmark workloads of Iometer Modified Iometer – Time series evaluation: a script that generates log-data per every sec. – Memory usage evaluation: added a module in calling system API GlobalMemoryStatusEx() into Iometer
14
Memory Usage (Overall) [Writes][Reads] Request sizes (1 ~ 512 sectors ) Physical memory consumption FSSD consumes 3x~16x more memory space FSSD consumes 2.5x more memory space 0.6 GB (BSSD)
15
Memory Usage (BSSD) Memory consumption submits I/Os whenever device is available 128 entries BSSD requires only 0.6GB memory space regardless of the I/O type and size.
16
Memory Usage (FSSD) 2GB memory requirements 10GB memory usage to manage only the underlying SSD may not be acceptable in many applications As the I/O process progresses, the amount of memory usage keeps increasing in logarithmic fashion and reach 10GB
17
CPU Usage (BSSD) Time series Host-level CPU usages BSSD consumes 15%~30% of total CPU cycles for handling I/O requests
18
CPU Usage (FSSD) FSSD requires much higher CPU usages (50%~ 90%) A CPU usage over 60% for just I/O processing might be able to degrade overall system performance 60% of the cycles on host-side CPU I/O service with queue-mode operation requires 50% more CPU cycles
19
FSSD performance (multi-threads) Latency Throughput worse than four workers by 118% worse than single workers by 289 % 2.2x better than single worker FSSD offers very stable and predictable performance
20
FSSD resource usages (multi-threads) Memory consumption CPU usages the advantage decreases because of high memory requirement and CPU usages Require 134% more memory space Require 201% more computation resources
21
BSSD resource usages (multi-threads) offers similar memory requirements (less than 0.66GB) irrespective of # of threads offers similar CPU usages (less than 30%) irrespective of # of threads Memory consumption CPU usages
22
BSSD performance (multi-threads) worse than four workers by 289% worse than single workers by 708 % There exist no differences with varying number of workers Write-cliff occurs (garbage collection impact) Latency Throughput
23
Latency Impact on a Queuing Method worse than a legacy req. by 106x worse than a legacy req. by 86x worse than a legacy req. by 99x worse than a legacy req. by 184x FSSD BSSD
24
Summary Design trade-off between performance and resource utilization – All-Flash-Array – Data-center/HPC local node SSD Software stack optimization – Co-operative approaches – Unified/direct file systems – Garbage collection schedulers – Queue control We are constructing an environment for automated SSD evaluation in camelab.org
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.