Tivoli Software © 2010 IBM Corporation 1 Using Machine Learning Techniques to Enhance The Performance of an Automatic Backup and Recovery System Amir Ronen,

Slides:



Advertisements
Similar presentations
Background Virtual memory – separation of user logical memory from physical memory. Only part of the program needs to be in memory for execution. Logical.
Advertisements

1 Optimizing compilers Managing Cache Bercovici Sivan.
Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.
Chapter 11: File System Implementation
Chapter 3: CPU Scheduling
File System Implementation
File System Implementation
Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.
1 Operating Systems Chapter 7-File-System File Concept Access Methods Directory Structure Protection File-System Structure Allocation Methods Free-Space.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
File System Structure §File structure l Logical storage unit l Collection of related information §File system resides on secondary storage (disks). §File.
CSE 421 Algorithms Richard Anderson Lecture 6 Greedy Algorithms.
What we will cover…  CPU Scheduling  Basic Concepts  Scheduling Criteria  Scheduling Algorithms  Evaluations 1-1 Lecture 4.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
Chapter 4 Assessing and Understanding Performance
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
Computer Organization and Architecture
Rensselaer Polytechnic Institute CSC 432 – Operating Systems David Goldschmidt, Ph.D.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 14: Problem Recovery.
PMIT-6102 Advanced Database Systems
Storage Allocation in Prefetching Techniques of Web Caches D. Zeng, F. Wang, S. Ram Appeared in proceedings of ACM conference in Electronic commerce (EC’03)
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Chapter 6: CPU Scheduling
« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.
So far we have covered … Basic visualization algorithms Parallel polygon rendering Occlusion culling They all indirectly or directly help understanding.
Scheduling. Alternating Sequence of CPU And I/O Bursts.
1 Wenguang WangRichard B. Bunt Department of Computer Science University of Saskatchewan November 14, 2000 Simulating DB2 Buffer Pool Management.
Silberschatz and Galvin  Operating System Concepts Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor.
Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation.
Chapter 5: Process Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Basic Concepts Maximum CPU utilization can be obtained.
1 11/29/2015 Chapter 6: CPU Scheduling l Basic Concepts l Scheduling Criteria l Scheduling Algorithms l Multiple-Processor Scheduling l Real-Time Scheduling.
Chapter 11: File System Implementation Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 11: File System Implementation Chapter.
Precomputation- based Prefetching By James Schatz and Bashar Gharaibeh.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
1 Monte-Carlo Planning: Policy Improvement Alan Fern.
Lecture 10 Page 1 CS 111 Summer 2013 File Systems Control Structures A file is a named collection of information Primary roles of file system: – To store.
CSCI1600: Embedded and Real Time Software Lecture 24: Real Time Scheduling II Steven Reiss, Fall 2015.
Virtual Memory Various memory management techniques have been discussed. All these strategies have the same goal: to keep many processes in memory simultaneously.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
1 CS.217 Operating System By Ajarn..Sutapart Sappajak,METC,MSIT Chapter 5 CPU Scheduling Slide 1 Chapter 5 CPU Scheduling.
Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Time Management.  Time management is concerned with OS facilities and services which measure real time.  These services include:  Keeping track of.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Chapter 4 CPU Scheduling. 2 Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation.
Basic Concepts Maximum CPU utilization obtained with multiprogramming
Computer Organization
Memory Management.
William Stallings Computer Organization and Architecture 8th Edition
So far we have covered … Basic visualization algorithms
Chapter 6: CPU Scheduling
Process management Information maintained by OS for process management
Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin
CSCI1600: Embedded and Real Time Software
CPU Scheduling G.Anuradha
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
3: CPU Scheduling Basic Concepts Scheduling Criteria
Qingbo Zhu, Asim Shankar and Yuanyuan Zhou
Chapter 6: CPU Scheduling
Lecture 2 Part 3 CPU Scheduling
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Chapter 6: CPU Scheduling
CSCI1600: Embedded and Real Time Software
Module 5: CPU Scheduling
Presentation transcript:

Tivoli Software © 2010 IBM Corporation 1 Using Machine Learning Techniques to Enhance The Performance of an Automatic Backup and Recovery System Amir Ronen, Dan Pelleg, Machine Learning Group, HRL Eran Raichstein (IBM Software Group) Amir Ronen

Tivoli Software © 2010 IBM Corporation 2 Motivation IBM’s Fastback Automatic backup and recovery system  Incremental back up of disk volumes to repository  Instant restore (IR): allows applications to start working immediately after recovery  Xpress mount: allows access to back up data without recovering it (e.g. for taking tape dumps) Goal  Accelerate IR and mount via machine learning and algorithmic techniques  Minimum intervention in Fastback’s internals Benefits: minimize bugs, easy upgrading, generality, …

Tivoli Software © 2010 IBM Corporation 3 Outline  The Fastback system  Algorithm for automatic determination of read-ahead –Basic observations –The algorithm –Experiments in the Fastback system  Prefetching –Theoretical model and observation –Basic prefetching algorithms –Frequent pattern based algorithms – Controlling and combining prefetch algorithms  Summary

Tivoli Software © 2010 IBM Corporation 4 1. Activate Instant Restore 2. Read IOs from un-recovered areas trigger block fetch from the repository 3. All other reads are performed as usual Production server New Production Disk New Production server Typical Production Disk FastBack’s Instant Restore and Mount Instant Restore allows users to start using applications on the same disk to which the volume is being restored, while the restore operation is still in process. Xpress Restore Server repository From an architectural perspective, mount is somewhat similar

Tivoli Software © 2010 IBM Corporation 5 CNF: An Algorithm for Readahead Amount Determination

Tivoli Software © 2010 IBM Corporation 6 New Production server Xpress Restore Server repository The problem  A block is needed from repository  Suppose that we are allowed to bring additional subsequent blocks  How many to bring? - too many may slow down the system (in particular if they will not be used) - too few will cause high total latency

Tivoli Software © 2010 IBM Corporation 7 Simple cost model: T ~ T 1 + nT 2 +   T 1 “fixed” latency  T 2 time to bring one block  n number of blocks   noise (assumed zero) Key idea Suppose that we choose n such that T 1 = nT 2  The cost never more than doubles  In many settings n can be large The algorithm is 2 competitive

Tivoli Software © 2010 IBM Corporation 8 Problem 1  The latency T1 and the block cost T2 are not known  May vary over time Solution  Hold a window of last k requests (e.g. 200)  Use linear regression to estimate T1 and T2  Update can be done in O(1) Latency ~ 6.5 Block cost ~ 3

Tivoli Software © 2010 IBM Corporation 9 Problem 2  What if the n-values are similar so we will not be able to estimate? Sampling ideas  We only need a few samples  If mean(n) is large we sample small values  If mean(n) is small, we sample 2*mean(n)  Low amortized cost

Tivoli Software © 2010 IBM Corporation 10 The Algorithm  Hold a window of the last k requests  At each step update the linear regression (Refresh from time to time)  If regression is possible: –Estimate T1, T2 –Compute desired n value –If the system asked for less, recommend readahead  Otherwise –Sample as described Additional Heuristics unreasonable values, smoothing, mis-estimation…

Tivoli Software © 2010 IBM Corporation 11 Impact on Fastback  Added latency per each request  Outperformed the predetermined values  Speedup up to X4 mounting continuous and fragmented data

Tivoli Software © 2010 IBM Corporation 12 Comments & open issues  The algorithm may be applicable elsewhere  Extensions to more complicated cost models  Analyzing executions of parallel copies of the algorithm

Tivoli Software © 2010 IBM Corporation 13 Block Prediction and Prefetching for Enhancing Instant Restore

Tivoli Software © 2010 IBM Corporation 14 New Production server Xpress Restore Server repository Motivation  IR needs to fetch blocks from the repository according to its workload  Ideally, blocks will predicted and brought before they are needed Comments  The network is not preemptive so prefetching can also be harmful  Typical workloads are parallel processes, each with some locality of reference

Tivoli Software © 2010 IBM Corporation 15 A model for the prefetch problem Workload is an unknown sequence of events L1, … Ln. Each Lj is either:  An access to a block Bj  A process event System is composed of a CPU and network that can be ran in parallel. At each step j the system can do one of the following 1.Process (Lj is a process event, cost = 1 unit) 2.Access its local memory (If Lj is an access event and Bj is already in the local memory, cost = 1 unit) 3.Fetch a block from the repository (this occupies the network for C time units, can be done in parallel to 1 or 2)

Tivoli Software © 2010 IBM Corporation 16 A model for the prefetch problem (cont.) Slowdown Let L1, … Ln be a workload. The slowdown of the system on L is the ratio between the total system time and the time to perform the workload locally, i.e. Tsys / n. Fetch 17Fetch 18 Process CPU Network Access Delta B17 ProcessB18 WorkloadProcess… …  Slowdown is ~1,  Without prefetching, slowdown is around 2 C = 2 Access

Tivoli Software © 2010 IBM Corporation 17 Simple prefetch algorithms Delta rule  Whenever B j is accessed put B j+1 in queue  Whenever network is idle, prefetch in LIFO order  Very effective rule, simple to implement No prefetch  Can be shown as 2-competitive! Order by frequency  In train time, order blocks by their frequency OPT Hypothetical optimal offline algorithm

Tivoli Software © 2010 IBM Corporation 18 Frequent pattern mining based algorithms CMiner (Li et el. FAST 2004)  Identifies reoccurring block sub-sequences in train time  Problematic runtime and space complexity in our settings B-tree Hot item A,E,L  Z

Tivoli Software © 2010 IBM Corporation 19 Novel variants of CMiner CMiner(  )  Identifies generic frequent delta rules  Efficient runtime and space complexity CMiner-OBF  A two level variant of cminer

Tivoli Software © 2010 IBM Corporation 20 Simulations Setup  Used traces from OLTP financial transactions and of an SQL stress tool.  Simulated the system under various parameters and measured slowdown in various time points

Tivoli Software © 2010 IBM Corporation 21 Simulations (cont)  Simple delta rules were hard to bit  Cminer(  ) often improves upon them but not always  Some schemas are harmful

Tivoli Software © 2010 IBM Corporation 22 Summary and open issues Automatic read-ahead determination  Highly effective  Can be applicable elsewhere  Calls for more generalized cost models Block prediction and prefetch  Simple delta rules seem hard to beat  Potential for improvement  Novel frequent pattern mining based algorithms. Might be interesting in other context (e.g. caching)