Dynamic Data Layout Optimization for High Performance Parallel I/O

Dynamic Data Layout Optimization for High Performance Parallel I/O
Everett Rush, Bryan Harris, Nihat Altiparmak University of Louisville, USA Ali Saman Tosun UT San Antonio, USA 12/20/2016 HiPC 2016

Outline Background Dynamic Data Layout Optimization Evaluation
High Performance Parallel I/O Block Correlations Dynamic Data Layout Optimization Monitoring & Analysis Placement Planning Data Reorganization Evaluation References

High Performance Parallel I/O
Five Parallel Disk Accesses One Parallel Disk Access Disk 0 Disk 1 Disk 2 Disk 3 Disk 4 Static Data Placement Disk Modulo [Du ’82] RAID [Patterson ’88] Field-wise Exclusive OR [Kim ’88] Hilbert [Faloutsos ’93] Generalized Fibonacci [Prabhakar ’98] AOPT: Almost Optimal [Atallah ’00] Periodic [Altiparmak ’12] 1 1 2 3 4 2 3 4 5 6 7 8 9 10 Dynamic Data Layout Optimization is necessary! 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 One Layout Fits All!

Block Correlations Blocks are correlated if they are requested together [Li ’04] Can exist intra or inter requests Commonly encountered in storage workloads: Correlated blocks should be placed in separate disks!

Dynamic Data Layout Optimization Framework
A Generic framework for Self-optimizing Parallel Storage Systems Can be applied to storage arrays, parallel/distributed file systems, key-value stores, internal parallelism of NVM devices for high performance parallel I/O Automatically adapt to skewed, changing, co-existing patterns

Monitoring & Analysis Modules
Monitoring Output Disk I/O Monitoring Monitoring of block level I/O requests Such as using an I/O tracing tool like blktrace [Axboe ’07] Creating sessions of block IDs that are requested together Data Analysis Analyze sessions and find block correlations Use Frequent Itemset Mining (FIM) [Borgelt ’12] algorithms to find correlated pairs and their frequency Use support for minimum frequency Analysis Output

Placement Planning Module
The aim is placing correlated blocks into separate disks (parallel storage units) Basic Layout Optimization Problem (BLOP): Definition 1: Given a set C of correlated block pairs (i, j), and N disks; plan a placement strategy so that for every block pair (i, j) ∈ C, blocks i and j are stored in different disks. Theorem: BLOP is NP-complete and equivalent to the proper (vertex) k-coloring [Jensen ’11] problem for k = N.

Placement Planning Module
BLOP outlines the main purpose, but needs to be modified to be applied in real settings Optimal coloring is generally not feasible (|V| ≫ N) Use soft coloring techniques by minimizing the conflicts [Fitzpatrick ’01] Each disk has a maximum capacity not to be exceeded Use traditional bin-packing techniques Min-Conflict Bin Packing (MCBP) [Khanafer ’12] Definition 2: Given a set I of items i of size wi, N bins of size W, and a conflict graph G = (I,E) where (i, j) ∈ E if items i and j cannot be packed in the same bin, compute the minimum number of conflicts that must occur if the set I is packed in N bins of size W. Theorem 2: MCBP is NP-complete.

Placement Planning Heuristic
Start with initial placement Calculate Total Correlation Frequency (TCF) values of each vertex Perform local optimizations in TCF order Consider correlation strengths stored in edge weights in conflict calculation If there are more than one candidate color, consider disk capacities Repeat local optimizations until delta conflicts < ε Worst-case Time Complexity: O(|V|log|V| + |E|)

Data Reorganization Module
Aim: Reconsider color-to-disk mapping so that: Each color is mapped to a separate disk The number of block movements are minimized Construct the problem as flow network and solve using min-cost flow techniques [Ford ’62]: Capacities are set to 1 Costs are set to 0 if the edge is not between color and disk to the amount of block movement caused by such mapping, otherwise Push C flows from s to t Worst-case Time Complexity: O(|E|3/2 log(|V|Max(Cost))) [Goldberg ’15]

Additional Optimizations
Preserving sequentiality is important for HDDs Solution: Group the sequential blocks from the same HDD and reorganize groups together without breaking their sequentiality Create a single vertex in the correlation graph for each group Update edge weights of a group vertex considering group memberships Set an upper limit for maximum group size in bytes based on the transfer rate of the HDD Larger groups will work against parallel I/O

Evaluation Simulations using DiskSim [Bucy ’08] + SSD patch [Agrawal ’08] HDD-based Storage Array (HSA) topology with 100 HDDs SSD-based All-flash Array (AFA) topology with 14 SSDs Zipf-like distributions [Breslau ’99] to control the skew in access patters Five publicly available [IOTTA] storage workloads from Microsoft for existence/reoccurrence of block correlations, request size, request arrival time/rate, R/W ratio behavior

Evaluation: I/O Performance src2 Trace - AFA
Read Performance Write Performance

Evaluation: I/O Performance wdev Trace - HSA
Read Performance Write Performance

Evaluation: Migration Cost
Migration Amount vs. Overall (R+W) Performance AFA: HSA:

References [Agrawal ’08] N. Agrawal et al., “Design tradeoffs for ssd performance,” in ATC’08: Usenix Annual Technical Conference, Berkeley, CA, USA, 2008. [Altiparmak ’12] N. Altiparmak and A. S. Tosun. “Equivalent disk allocations,” IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 3, 2012. [Atallah ’00] M. J. Atallah and S. Prabhakar. (Almost) optimal parallel block access for range queries, in PODS’00. [Axboe ’07] J. Axboe, blktrace User Guide, Feb 2007, available: [Borgelt ’12] C. Borgelt, “Frequent item set mining,” WIREs Data Mining Knowl. Discov., vol. 2, p , Nov [Breslau ’62] L. Breslau et al., “Web caching and zipf-like distributions: evidence and implications,” in INFOCOM ’99, vol. 1, Mar 1999, pp. 126–134. [Bucy ’08] J. S. Bucy et al., “The disksim simulation environment version 4.0,” Carnegie Mellon University Parallel Data Lab, Tech. Rep., May 2008. [Du ’82] H. C. Du and J. S. Sobolewski. Disk allocation for cartesian product ﬁles on multiple-disk systems. ACM Trans. on Database Systems, 7(1):82–101, March 1982. [Faloutsos ’93] C. Faloutsos and P. Bhagwat. Declustering using fractals, in PDIS’93. [Fitzpatrick ’01] S. Fitzpatrick et al., “An experimental assessment of a stochastic, anytime, decentralized, soft colourer for sparse graphs,” in Stochastic Algorithms: Foundations and Applications, 2001, vol. 2264, pp. 49–64. [Ford ’62] L. R. Ford et al., Flows in Networks. Princeton University Press, 1962. [Goldberg ’15] A. V. Goldberg et al., “Minimum Cost Flows in Graphs with Unit Capacities,” in STACS ’15, 2015, pp. 406–419. [IOTTA] SNIA IOTTA Trace Repository, Storage Networking Industry Association, available: [Jensen ’11] T. Jensen et al., Graph coloring problems. John Wiley & Sons, 2011. [Khanafer ’12] A. Khanafer et al., “The min-conflict packing problem,” Computers & Operations Research, vol. 39, no. 9, pp – 2132, 2012. [Kim’88] M. H. Kim and S. Pramanik. Optimal ﬁle distribution for partial match retrieval, in SIGMOD,’88. [Li ’04] Z. Li et al., “C-miner: Mining block correlations in storage systems,” in FAST ’04, Berkeley, CA, USA, 2004, pp. 173–186. [Patterson ’98] D. A. Patterson, .Garth Gibson, and Randy H. Katz “A case for redundant arrays of inexpensive disks (raid),” in SIGMOD ’88, 1988, pp. 109–116. [Prabhakar’98] S. Prabhakar, K. Abdel-Ghaffar, D. Agrawal, and A. El Abbadi. Cyclic allocation of two-dimensional data, in ICDE’93.

Thank You! Any Questions?

Dynamic Data Layout Optimization for High Performance Parallel I/O

Similar presentations

Presentation on theme: "Dynamic Data Layout Optimization for High Performance Parallel I/O"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dynamic Data Layout Optimization for High Performance Parallel I/O

Similar presentations

Presentation on theme: "Dynamic Data Layout Optimization for High Performance Parallel I/O"— Presentation transcript:

Similar presentations

About project

Feedback