Dynamic Data Layout Optimization for High Performance Parallel I/O Everett Rush, Bryan Harris, Nihat Altiparmak University of Louisville, USA Ali Saman Tosun UT San Antonio, USA 12/20/2016 HiPC 2016
Outline Background Dynamic Data Layout Optimization Evaluation High Performance Parallel I/O Block Correlations Dynamic Data Layout Optimization Monitoring & Analysis Placement Planning Data Reorganization Evaluation References
High Performance Parallel I/O Five Parallel Disk Accesses One Parallel Disk Access Disk 0 Disk 1 Disk 2 Disk 3 Disk 4 Static Data Placement Disk Modulo [Du ’82] RAID [Patterson ’88] Field-wise Exclusive OR [Kim ’88] Hilbert [Faloutsos ’93] Generalized Fibonacci [Prabhakar ’98] AOPT: Almost Optimal [Atallah ’00] Periodic [Altiparmak ’12] 1 1 2 3 4 2 3 4 5 6 7 8 9 10 Dynamic Data Layout Optimization is necessary! 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 One Layout Fits All!
Block Correlations Blocks are correlated if they are requested together [Li ’04] Can exist intra or inter requests Commonly encountered in storage workloads: Correlated blocks should be placed in separate disks!
Dynamic Data Layout Optimization Framework A Generic framework for Self-optimizing Parallel Storage Systems Can be applied to storage arrays, parallel/distributed file systems, key-value stores, internal parallelism of NVM devices for high performance parallel I/O Automatically adapt to skewed, changing, co-existing patterns
Monitoring & Analysis Modules Monitoring Output Disk I/O Monitoring Monitoring of block level I/O requests Such as using an I/O tracing tool like blktrace [Axboe ’07] Creating sessions of block IDs that are requested together Data Analysis Analyze sessions and find block correlations Use Frequent Itemset Mining (FIM) [Borgelt ’12] algorithms to find correlated pairs and their frequency Use support for minimum frequency Analysis Output
Placement Planning Module The aim is placing correlated blocks into separate disks (parallel storage units) Basic Layout Optimization Problem (BLOP): Definition 1: Given a set C of correlated block pairs (i, j), and N disks; plan a placement strategy so that for every block pair (i, j) ∈ C, blocks i and j are stored in different disks. Theorem: BLOP is NP-complete and equivalent to the proper (vertex) k-coloring [Jensen ’11] problem for k = N.
Placement Planning Module BLOP outlines the main purpose, but needs to be modified to be applied in real settings Optimal coloring is generally not feasible (|V| ≫ N) Use soft coloring techniques by minimizing the conflicts [Fitzpatrick ’01] Each disk has a maximum capacity not to be exceeded Use traditional bin-packing techniques Min-Conflict Bin Packing (MCBP) [Khanafer ’12] Definition 2: Given a set I of items i of size wi, N bins of size W, and a conflict graph G = (I,E) where (i, j) ∈ E if items i and j cannot be packed in the same bin, compute the minimum number of conflicts that must occur if the set I is packed in N bins of size W. Theorem 2: MCBP is NP-complete.
Placement Planning Heuristic Start with initial placement Calculate Total Correlation Frequency (TCF) values of each vertex Perform local optimizations in TCF order Consider correlation strengths stored in edge weights in conflict calculation If there are more than one candidate color, consider disk capacities Repeat local optimizations until delta conflicts < ε Worst-case Time Complexity: O(|V|log|V| + |E|)
Data Reorganization Module Aim: Reconsider color-to-disk mapping so that: Each color is mapped to a separate disk The number of block movements are minimized Construct the problem as flow network and solve using min-cost flow techniques [Ford ’62]: Capacities are set to 1 Costs are set to 0 if the edge is not between color and disk to the amount of block movement caused by such mapping, otherwise Push C flows from s to t Worst-case Time Complexity: O(|E|3/2 log(|V|Max(Cost))) [Goldberg ’15]
Additional Optimizations Preserving sequentiality is important for HDDs Solution: Group the sequential blocks from the same HDD and reorganize groups together without breaking their sequentiality Create a single vertex in the correlation graph for each group Update edge weights of a group vertex considering group memberships Set an upper limit for maximum group size in bytes based on the transfer rate of the HDD Larger groups will work against parallel I/O
Evaluation Simulations using DiskSim [Bucy ’08] + SSD patch [Agrawal ’08] HDD-based Storage Array (HSA) topology with 100 HDDs SSD-based All-flash Array (AFA) topology with 14 SSDs Zipf-like distributions [Breslau ’99] to control the skew in access patters Five publicly available [IOTTA] storage workloads from Microsoft for existence/reoccurrence of block correlations, request size, request arrival time/rate, R/W ratio behavior
Evaluation: I/O Performance src2 Trace - AFA Read Performance Write Performance
Evaluation: I/O Performance wdev Trace - HSA Read Performance Write Performance
Evaluation: Migration Cost Migration Amount vs. Overall (R+W) Performance AFA: HSA:
