Tivoli Software © 2010 IBM Corporation 1 Using Machine Learning Techniques to Enhance The Performance of an Automatic Backup and Recovery System Amir Ronen,

Tivoli Software © 2010 IBM Corporation 1 Using Machine Learning Techniques to Enhance The Performance of an Automatic Backup and Recovery System Amir Ronen, Dan Pelleg, Machine Learning Group, HRL Eran Raichstein (IBM Software Group) Amir Ronen

Tivoli Software © 2010 IBM Corporation 2 Motivation IBM’s Fastback Automatic backup and recovery system  Incremental back up of disk volumes to repository  Instant restore (IR): allows applications to start working immediately after recovery  Xpress mount: allows access to back up data without recovering it (e.g. for taking tape dumps) Goal  Accelerate IR and mount via machine learning and algorithmic techniques  Minimum intervention in Fastback’s internals Benefits: minimize bugs, easy upgrading, generality, …

Tivoli Software © 2010 IBM Corporation 3 Outline  The Fastback system  Algorithm for automatic determination of read-ahead –Basic observations –The algorithm –Experiments in the Fastback system  Prefetching –Theoretical model and observation –Basic prefetching algorithms –Frequent pattern based algorithms – Controlling and combining prefetch algorithms  Summary

Tivoli Software © 2010 IBM Corporation 4 1. Activate Instant Restore 2. Read IOs from un-recovered areas trigger block fetch from the repository 3. All other reads are performed as usual Production server New Production Disk New Production server Typical Production Disk FastBack’s Instant Restore and Mount Instant Restore allows users to start using applications on the same disk to which the volume is being restored, while the restore operation is still in process. Xpress Restore Server repository From an architectural perspective, mount is somewhat similar

Tivoli Software © 2010 IBM Corporation 6 New Production server Xpress Restore Server repository The problem  A block is needed from repository  Suppose that we are allowed to bring additional subsequent blocks  How many to bring? - too many may slow down the system (in particular if they will not be used) - too few will cause high total latency

Tivoli Software © 2010 IBM Corporation 7 Simple cost model: T ~ T 1 + nT 2 +   T 1 “fixed” latency  T 2 time to bring one block  n number of blocks   noise (assumed zero) Key idea Suppose that we choose n such that T 1 = nT 2  The cost never more than doubles  In many settings n can be large The algorithm is 2 competitive

Tivoli Software © 2010 IBM Corporation 8 Problem 1  The latency T1 and the block cost T2 are not known  May vary over time Solution  Hold a window of last k requests (e.g. 200)  Use linear regression to estimate T1 and T2  Update can be done in O(1) Latency ~ 6.5 Block cost ~ 3

Tivoli Software © 2010 IBM Corporation 9 Problem 2  What if the n-values are similar so we will not be able to estimate? Sampling ideas  We only need a few samples  If mean(n) is large we sample small values  If mean(n) is small, we sample 2*mean(n)  Low amortized cost

Tivoli Software © 2010 IBM Corporation 10 The Algorithm  Hold a window of the last k requests  At each step update the linear regression (Refresh from time to time)  If regression is possible: –Estimate T1, T2 –Compute desired n value –If the system asked for less, recommend readahead  Otherwise –Sample as described Additional Heuristics unreasonable values, smoothing, mis-estimation…

Tivoli Software © 2010 IBM Corporation 12 Comments & open issues  The algorithm may be applicable elsewhere  Extensions to more complicated cost models  Analyzing executions of parallel copies of the algorithm

Tivoli Software © 2010 IBM Corporation 14 New Production server Xpress Restore Server repository Motivation  IR needs to fetch blocks from the repository according to its workload  Ideally, blocks will predicted and brought before they are needed Comments  The network is not preemptive so prefetching can also be harmful  Typical workloads are parallel processes, each with some locality of reference

Tivoli Software © 2010 IBM Corporation 15 A model for the prefetch problem Workload is an unknown sequence of events L1, … Ln. Each Lj is either:  An access to a block Bj  A process event System is composed of a CPU and network that can be ran in parallel. At each step j the system can do one of the following 1.Process (Lj is a process event, cost = 1 unit) 2.Access its local memory (If Lj is an access event and Bj is already in the local memory, cost = 1 unit) 3.Fetch a block from the repository (this occupies the network for C time units, can be done in parallel to 1 or 2)

Tivoli Software © 2010 IBM Corporation 16 A model for the prefetch problem (cont.) Slowdown Let L1, … Ln be a workload. The slowdown of the system on L is the ratio between the total system time and the time to perform the workload locally, i.e. Tsys / n. Fetch 17Fetch 18 Process CPU Network Access Delta B17 ProcessB18 WorkloadProcess… …  Slowdown is ~1,  Without prefetching, slowdown is around 2 C = 2 Access

Tivoli Software © 2010 IBM Corporation 17 Simple prefetch algorithms Delta rule  Whenever B j is accessed put B j+1 in queue  Whenever network is idle, prefetch in LIFO order  Very effective rule, simple to implement No prefetch  Can be shown as 2-competitive! Order by frequency  In train time, order blocks by their frequency OPT Hypothetical optimal offline algorithm

Tivoli Software © 2010 IBM Corporation 18 Frequent pattern mining based algorithms CMiner (Li et el. FAST 2004)  Identifies reoccurring block sub-sequences in train time  Problematic runtime and space complexity in our settings B-tree Hot item A,E,L  Z

Tivoli Software © 2010 IBM Corporation 19 Novel variants of CMiner CMiner(  )  Identifies generic frequent delta rules  Efficient runtime and space complexity CMiner-OBF  A two level variant of cminer

Tivoli Software © 2010 IBM Corporation 20 Simulations Setup  Used traces from OLTP financial transactions and of an SQL stress tool.  Simulated the system under various parameters and measured slowdown in various time points

Tivoli Software © 2010 IBM Corporation 22 Summary and open issues Automatic read-ahead determination  Highly effective  Can be applicable elsewhere  Calls for more generalized cost models Block prediction and prefetch  Simple delta rules seem hard to beat  Potential for improvement  Novel frequent pattern mining based algorithms. Might be interesting in other context (e.g. caching)

Tivoli Software © 2010 IBM Corporation 1 Using Machine Learning Techniques to Enhance The Performance of an Automatic Backup and Recovery System Amir Ronen,

Similar presentations

Presentation on theme: "Tivoli Software © 2010 IBM Corporation 1 Using Machine Learning Techniques to Enhance The Performance of an Automatic Backup and Recovery System Amir Ronen,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Tivoli Software © 2010 IBM Corporation 1 Using Machine Learning Techniques to Enhance The Performance of an Automatic Backup and Recovery System Amir Ronen,

Similar presentations

Presentation on theme: "Tivoli Software © 2010 IBM Corporation 1 Using Machine Learning Techniques to Enhance The Performance of an Automatic Backup and Recovery System Amir Ronen,"— Presentation transcript:

Similar presentations

About project

Feedback