Download presentation
Presentation is loading. Please wait.
Published bySean Mitchell Modified over 11 years ago
1
Towards Automating the Configuration of a Distributed Storage System Lauro B. Costa Matei Ripeanu {lauroc, matei}@ece.ubc.ca NetSysLab University of British Columbia
2
2 Distributed File Systems Different workloads: Read/write only, high data similarity Different optimizations: Temp local storage, deduplication, replication One size fits all: each choice may be the optimal for a specific workload, not for all
3
3 Configurable Optimizations MosaStore 1 and UrsaMinor 2 propose file systems with configurable optimizations User must choose the optimizations 1 The Case for a Versatile Storage System, S. Al-Kiswany, A. Gharaibeh, M. Ripeanu, SOSP Workshop on Hot Topics in Storage and File Systems (HotStorage09) 2 Ursa Minor: versatile cluster-based storage. M. Abd-El-Malek, W. V. Courtright et al. In Proceedings of the 4th Conference on USENIX Conference on File and Storage Technologies - FAST05
4
4 Tuning File System Indentify the system parameters Define a target metric do Define a target value Configure the parameters Measure and analyze the performance while not satisfied
5
5 Tuning is Hard Defining metrics and target values can be complex Lack of knowledge of distributed systems, application or applications workloads Workload or infrastructure can change Tuning is time-consuming
6
6 Deduplication: Detecting Similarity Only the first block is different File A X Y Z Blocks Hashing AAAA BBBB CCCC File B W Y Z Hashing Blocks DDDD BBBB CCCC
7
7 Deduplication for Checkpointing? Checkpointing applications write multiple snapshots Successive snapshots may have high data similarity Similarity depends on number of factors, e.g.: –Process or application level –Frequency of checkpointing
8
8 How can we configure the file system parameters (optimizations) with minimal human intervention?
9
9 Agenda Motivation: Configurable file systems Architecture to automatically configure a FS First Case: Checkpointing applications Implementation Evaluation Summary and Future Work
10
10 Requirements Be easy to configure Minimal human intervention Be able to choose a satisfactory performance Performance close to administrators intention Have a reasonable automated configuration cost Overhead small enough to make sense to use
11
11 Loop for Automated Configuration
12
12 Controller Utility function captures the metrics utility It is simple for one target metric, e.g. time It reduces several target metrics to just one dimension Predictor estimates how a change affects the target metrics
13
13 Controller decides the configuration by comparing the utility of current and predicted metrics
14
14 Agenda Motivation: Configurable file systems Architecture to automatically configure a FS First Case: Checkpointing applications Implementation Evaluation Conclusions Future Work
15
15 Data Deduplication Can save storage space and network bandwidth Has high computational cost to hash data Mechanism to choose among two options: data deduplication on or off
16
16 Control Loop for Deduplication Metrics: time spent and storage space Keep history for writes: –total time –number of blocks received –number of blocks similar
17
17 Utility Function Administrator gives weights to capture the relative importance, e.g.: –1 x time + 0 x storage –0.5 x time + 0.5 x storage
18
18 Predictor Space Time No deduplication Deduplication number of blocks I/O operations consider similarity + time for hashing data
19
19 Evaluation Three aspects: –Effort to configure –Performance –Overhead Experimental setup 10 storage nodes, 1Gbps NICs
20
20 Evaluation: Workload Synthetic Similarity varied For each similarity level, write 100 snapshots Similar results for several snapshot sizes 32, 64, 128, 256, 512 MB Plots for 256MB
21
21 Effort to Configure Small effort Administrator specifies the weights for each metric No effort in the default case System optimizes for time
22
22 Optimizing for Time
23
23 Optimizing for Time Hashing cost paid off by savings with I/O operations
24
24 Optimizing for Time
25
25 Overhead Memory less than 1KB Computational Low similarity - within 5% in evaluated cases High similarity – negligible
26
26 Summary Initial study on automatically configuring a file system Data deduplication configured properly with low overheads
27
27 Future Work More parameters for similarity detection variable block boundary, block sizes, offload to GPU Constraints for utility functions e.g., best time for a maximum storage space More optimizations and metrics replication, buffer size, caching policies reliability, energy
28
28
29
29 Mixing time and storage space
30
30
31
31 MosaStore Architecture Metadata Manager Benefactors (Storage nodes) Client (FS interface)...
32
32 Prototype in MosaStore Deduplication can be turned on and off on the fly Write flow collects the measurements Monitor and Controller are co-located with the client
33
33 Utility Utility is a measure of the relative satisfaction: How happy the administrator is Money is a good proxy, but complicated Focus on simple cases - 100% time + 0% space - 50% time + 50% space -Constraint on space, optimize for time Function cannot use different units: normalize
34
34 MosaStore Architecture Storage space aggregated from nodes in a network Naming scheme: BYHASH, BYSEQ File creation/write Collects metric It has the option to activate similarity detection
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.