Download presentation
Presentation is loading. Please wait.
Published byAllen Greer Modified over 9 years ago
1
Tivoli Software © 2010 IBM Corporation 1 Using Machine Learning Techniques to Enhance The Performance of an Automatic Backup and Recovery System Amir Ronen, Dan Pelleg, Machine Learning Group, HRL Eran Raichstein (IBM Software Group) Amir Ronen
2
Tivoli Software © 2010 IBM Corporation 2 Motivation IBM’s Fastback Automatic backup and recovery system Incremental back up of disk volumes to repository Instant restore (IR): allows applications to start working immediately after recovery Xpress mount: allows access to back up data without recovering it (e.g. for taking tape dumps) Goal Accelerate IR and mount via machine learning and algorithmic techniques Minimum intervention in Fastback’s internals Benefits: minimize bugs, easy upgrading, generality, …
3
Tivoli Software © 2010 IBM Corporation 3 Outline The Fastback system Algorithm for automatic determination of read-ahead –Basic observations –The algorithm –Experiments in the Fastback system Prefetching –Theoretical model and observation –Basic prefetching algorithms –Frequent pattern based algorithms – Controlling and combining prefetch algorithms Summary
4
Tivoli Software © 2010 IBM Corporation 4 1. Activate Instant Restore 2. Read IOs from un-recovered areas trigger block fetch from the repository 3. All other reads are performed as usual Production server New Production Disk New Production server Typical Production Disk FastBack’s Instant Restore and Mount Instant Restore allows users to start using applications on the same disk to which the volume is being restored, while the restore operation is still in process. Xpress Restore Server repository From an architectural perspective, mount is somewhat similar
5
Tivoli Software © 2010 IBM Corporation 5 CNF: An Algorithm for Readahead Amount Determination
6
Tivoli Software © 2010 IBM Corporation 6 New Production server Xpress Restore Server repository The problem A block is needed from repository Suppose that we are allowed to bring additional subsequent blocks How many to bring? - too many may slow down the system (in particular if they will not be used) - too few will cause high total latency
7
Tivoli Software © 2010 IBM Corporation 7 Simple cost model: T ~ T 1 + nT 2 + T 1 “fixed” latency T 2 time to bring one block n number of blocks noise (assumed zero) Key idea Suppose that we choose n such that T 1 = nT 2 The cost never more than doubles In many settings n can be large The algorithm is 2 competitive
8
Tivoli Software © 2010 IBM Corporation 8 Problem 1 The latency T1 and the block cost T2 are not known May vary over time Solution Hold a window of last k requests (e.g. 200) Use linear regression to estimate T1 and T2 Update can be done in O(1) Latency ~ 6.5 Block cost ~ 3
9
Tivoli Software © 2010 IBM Corporation 9 Problem 2 What if the n-values are similar so we will not be able to estimate? Sampling ideas We only need a few samples If mean(n) is large we sample small values If mean(n) is small, we sample 2*mean(n) Low amortized cost
10
Tivoli Software © 2010 IBM Corporation 10 The Algorithm Hold a window of the last k requests At each step update the linear regression (Refresh from time to time) If regression is possible: –Estimate T1, T2 –Compute desired n value –If the system asked for less, recommend readahead Otherwise –Sample as described Additional Heuristics unreasonable values, smoothing, mis-estimation…
11
Tivoli Software © 2010 IBM Corporation 11 Impact on Fastback Added latency per each request Outperformed the predetermined values Speedup up to X4 mounting continuous and fragmented data
12
Tivoli Software © 2010 IBM Corporation 12 Comments & open issues The algorithm may be applicable elsewhere Extensions to more complicated cost models Analyzing executions of parallel copies of the algorithm
13
Tivoli Software © 2010 IBM Corporation 13 Block Prediction and Prefetching for Enhancing Instant Restore
14
Tivoli Software © 2010 IBM Corporation 14 New Production server Xpress Restore Server repository Motivation IR needs to fetch blocks from the repository according to its workload Ideally, blocks will predicted and brought before they are needed Comments The network is not preemptive so prefetching can also be harmful Typical workloads are parallel processes, each with some locality of reference
15
Tivoli Software © 2010 IBM Corporation 15 A model for the prefetch problem Workload is an unknown sequence of events L1, … Ln. Each Lj is either: An access to a block Bj A process event System is composed of a CPU and network that can be ran in parallel. At each step j the system can do one of the following 1.Process (Lj is a process event, cost = 1 unit) 2.Access its local memory (If Lj is an access event and Bj is already in the local memory, cost = 1 unit) 3.Fetch a block from the repository (this occupies the network for C time units, can be done in parallel to 1 or 2)
16
Tivoli Software © 2010 IBM Corporation 16 A model for the prefetch problem (cont.) Slowdown Let L1, … Ln be a workload. The slowdown of the system on L is the ratio between the total system time and the time to perform the workload locally, i.e. Tsys / n. Fetch 17Fetch 18 Process CPU Network Access Delta B17 ProcessB18 WorkloadProcess… … Slowdown is ~1, Without prefetching, slowdown is around 2 C = 2 Access
17
Tivoli Software © 2010 IBM Corporation 17 Simple prefetch algorithms Delta rule Whenever B j is accessed put B j+1 in queue Whenever network is idle, prefetch in LIFO order Very effective rule, simple to implement No prefetch Can be shown as 2-competitive! Order by frequency In train time, order blocks by their frequency OPT Hypothetical optimal offline algorithm
18
Tivoli Software © 2010 IBM Corporation 18 Frequent pattern mining based algorithms CMiner (Li et el. FAST 2004) Identifies reoccurring block sub-sequences in train time Problematic runtime and space complexity in our settings B-tree Hot item A,E,L Z
19
Tivoli Software © 2010 IBM Corporation 19 Novel variants of CMiner CMiner( ) Identifies generic frequent delta rules Efficient runtime and space complexity CMiner-OBF A two level variant of cminer
20
Tivoli Software © 2010 IBM Corporation 20 Simulations Setup Used traces from OLTP financial transactions and of an SQL stress tool. Simulated the system under various parameters and measured slowdown in various time points
21
Tivoli Software © 2010 IBM Corporation 21 Simulations (cont) Simple delta rules were hard to bit Cminer( ) often improves upon them but not always Some schemas are harmful
22
Tivoli Software © 2010 IBM Corporation 22 Summary and open issues Automatic read-ahead determination Highly effective Can be applicable elsewhere Calls for more generalized cost models Block prediction and prefetch Simple delta rules seem hard to beat Potential for improvement Novel frequent pattern mining based algorithms. Might be interesting in other context (e.g. caching)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.