ANALYZING STORAGE SYSTEM WORKLOADS Paul G. Sikalinda, Pieter S. Kritzinger {psikalin, DNA Research Group Computer Science Department.

Slides:



Advertisements
Similar presentations
I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science.
Advertisements

Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
Section 2.1 Visualizing Distributions: Shape, Center and Spread.
University of Texas at San Antonio Probabilistic Sensitivity Measures Wes Osborn Harry Millwater Department of Mechanical Engineering University of Texas.
ITEC 451 Network Design and Analysis. 2 You will Learn: (1) Specifying performance requirements Evaluating design alternatives Comparing two or more systems.
Measures of Dispersion or Measures of Variability
What will my performance be? Resource Advisor for DB admins Dushyanth Narayanan, Paul Barham Microsoft Research, Cambridge Eno Thereska, Anastassia Ailamaki.
Project 4 U-Pick – A Project of Your Own Design Proposal Due: April 14 th (earlier ok) Project Due: April 25 th.
Engineering Probability and Statistics - SE-205 -Chap 4 By S. O. Duffuaa.
Descriptive Statistics A.A. Elimam College of Business San Francisco State University.
1 Engineering Computation Part 6. 2 Probability density function.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Performance Evaluation
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
Intro to Descriptive Statistics
Probability and Statistics Review
Analysis of Simulation Input.. Simulation Machine n Simulation can be considered as an Engine with input and output as follows: Simulation Engine Input.
Chapter 19 Data Analysis Overview
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
1 I/O Management in Representative Operating Systems.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
Experimental Statistics I.  We use data to answer research questions  What evidence does data provide?  How do I make sense of these numbers without.
© 2009 EMC Corporation. All rights reserved. Intelligent Storage Systems Module 1.4.
Modeling client arrivals at access points in wireless campus-wide networks Maria Papadopouli Assistant Professor Department of Computer Science University.
Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.
By : Nabeel Ahmed Superior University Grw Campus.
Storage System: RAID Questions answered in this lecture: What is RAID? How does one trade-off between: performance, capacity, and reliability? What is.
Object-based Storage Long Liu Outline Why do we need object based storage? What is object based storage? How to take advantage of it? What's.
RAID: High-Performance, Reliable Secondary Storage Mei Qing & Chaoxia Liao Nov. 20, 2003.
Chapter 4 – Modeling Basic Operations and Inputs  Structural modeling: what we’ve done so far ◦ Logical aspects – entities, resources, paths, etc. 
Traffic Modeling.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Lecture 2b: Performance Metrics. Performance Metrics Measurable characteristics of a computer system: Count of an event Duration of a time interval Size.
Module – 4 Intelligent storage system
Applied Quantitative Analysis and Practices LECTURE#08 By Dr. Osman Sadiq Paracha.
Chapter 1: DESCRIPTIVE STATISTICS – PART I2  Statistics is the science of learning from data exhibiting random fluctuation.  Descriptive statistics:
An I/O Simulator for Windows Systems Jalil Boukhobza, Claude Timsit 27/10/2004 Versailles Saint Quentin University laboratory.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
Lecture 2: Combinatorial Modeling CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at.
IE 429, Parisay, January 2010 What you need to know from Probability and Statistics: Experiment outcome: constant, random variable Random variable: discrete,
1 Lecture 13: Other Distributions: Weibull, Lognormal, Beta; Probability Plots Devore, Ch. 4.5 – 4.6.
ICOM 6115: Computer Systems Performance Measurement and Evaluation August 11, 2006.
HY436: Mobile Computing and Wireless Networks Data sanitization Tutorial: November 7, 2005 Elias Raftopoulos Ploumidis Manolis Prof. Maria Papadopouli.
Selecting Input Probability Distribution. Simulation Machine Simulation can be considered as an Engine with input and output as follows: Simulation Engine.
Chapter 12 Continuous Random Variables and their Probability Distributions.
Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.
Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.
Distributed System Services Fall 2008 Siva Josyula
Learning Simio Chapter 10 Analyzing Input Data
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
ANALYZING THE SHAPE OF DATA (CC-37) PURPOSE: TO CHOOSE APPROPRIATE STATISTICS BASED ON THE SHAPE OF THE DATA DISTRIBUTION. ADRIAN, KARLA, ALLEN, DENISSE.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai The Normal Curve and Univariate Normality PowerPoint.
DATA ANALYSIS AND STATISTICS Methodology for Describing and Understanding VARIABILITY.
24 Nov 2007Data Management and Exploratory Data Analysis 1 Exploratory Data Analysis Exploratory Data Analysis (EDA) is an Approach that Employs a Variety.
Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer
System Simulation (CAP 4800) May 30, of xx Notes on Barford SURGE paper Ken Christensen Department of Computer Science and Engineering College of.
Selecting Input Probability Distributions. 2 Introduction Part of modeling—what input probability distributions to use as input to simulation for: –Interarrival.
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
OPERATING SYSTEMS CS 3502 Fall 2017
MATH-138 Elementary Statistics
Network Performance and Quality of Service
Chapter 7: Sampling Distributions
Data Science Process Chapter 2 Rich's Training 11/13/2018.
RAID RAID Mukesh N Tekwani
Chapter 5: Describing Distributions Numerically
Modelling Input Data Chapter5.
A Simulator to Study Virtual Memory Manager Behavior
Dept. of Computer Science, Univ. of Rochester
Statistics and Data (Algebraic)
RAID RAID Mukesh N Tekwani April 23, 2019
Presentation transcript:

ANALYZING STORAGE SYSTEM WORKLOADS Paul G. Sikalinda, Pieter S. Kritzinger {psikalin, DNA Research Group Computer Science Department University of Cape Town, and Lourens O. Walters. Mosaic Software Rondebosch Cape Town Republic of South Africa.

Presentation Outline Introduction Motivation and Objectives Storage Systems Storage System Workloads The Storage System Workload Analyzed Statistical Methodology Workload Analysis Results Conclusions Future Work 2

3 – specification of … – design of … – modelling of … – building of … – security of … – *workload analysis of … – correctness analysis of … – performance analysis of … concurrent computing systems (CCS). Introduction The DNA Group specializes, among other things, in using theory, formal methods and software tools in the:

Introduction (cont’d) ANALYZING STORAGE SYSTEM WORKLOADS 4

Introduction (cont’d) RP RQ PROCESSOR ANALYZING STORAGE SYSTEM WORKLOADS 5 Start Address Operation Type Request Size Timestamps Etc. 5

Motivation and Objectives A lot of effort is being spent in improving the I/O subsystem because it is a bottleneck in current computer systems. -In design, performance and correctness evaluation of storage systems the workload modelling is an important component. Common assumption not correct: -Uniform distribution of start addresses, -Exponential inter-arrival times. Therefore storage system workload analysis should be done to come up with correct models. 6

Motivation and Objectives (cont’d) -Designing storage systems. -Designing I/O optimization techniques (read caching, write caching, pre-fetching, I/O parallelism, I/O rescheduling) to improve performance. -Understanding application behavior and requirements. -Deciding to pool storage system resources (SSPs). -Implementing intelligent storage systems. etc. 7

Motivation and Objectives (cont’d) Our aim was to analyze storage system workloads in terms of (a)inter-arrival times, (b)sizes and (c)“seek distances” of I/O requests and provide statistics for these parameters to be used to: (a) derive models for storage system evaluation and (b) design optimization techniques (read caching, I/O parallelism etc. ) 8

Storage Systems Enterprise Storage System (ESS) 9 Host/Bus adapter Cache Array controller Path to disks Path to cache Path to controller Path to host Disk drives

Storage Systems (cont’d) ESS are powerful disk storage systems with the following capabilities: -High performance*, -Large capacity and availability -Protection against physical drive failure can be provided using RAID methods. *But can not still match the processor speeds because of mechanical processes in the disk drives. 10

Storage System Workloads I/O Request Servicing and workload classification: -Logical Workloads (File System Workloads) -Storage System Workloads (Physical I/O Traffic) 11 Operating System File System Application Software Disk System I/O request

Storage System Workloads (cont’d) Workload Parameters: -Logical Volume Number -*Start Address (seek distances) -*Request Size -Operation Type (i.e., read or write) -*Time Stamp (inter-arrival times) 12

The Storage System Workload Analyzed We analyzed inter-arrival times, request sizes, and ”seek distances” of I/O requests from a system running a web search engine deviation. Got the I/O trace files from Storage Performance Council (SPC). ( 13

Statistical Methodology -Visual Techniques: -Histogram and -ECDF graphs. -Key Data Statistics -Sample mean, -Variance and standard deviation, -Coefficient of skew, kurtosis, and variation, -Five number data summaries (minimum, lower quartile, median, upper quartile, maximum). -Lower and upper outlier limits 14

Results 1: inter-arrival times (µm) Sample Size Five Number Summary(126, 242, 1695, 4487, ) Sample Mean Sample Variance Standard Deviation Coefficient of Variation Coefficient of Skew Coefficient of Kurtosis Upper Outlier

Results 1: inter-arrival times -Highly variable data. Range (126, microseconds) -Coefficient of kurtosis shows that the distribution is heavy tailed. 16

Results 2: Request sizes (bytes) Sample Size Five Number Summary(512, 8192, 8192, 24580, ) Sample Mean15510 Sample Variance Standard Deviation Coefficient of Variation Coefficient of Skew Coefficient of Kurtosis Upper Outlier

Results 2: Request sizes Distribution peaks – 8192 (60%), 16384(10%), (9%) and (20%). Reason: OS Filesystem Block bytes 18

Results 3: Seek distances (blocks) Sample Size Five Number Summary( , , 6.4, , ) Sample Mean27.95 Sample Variance Standard Deviation Coefficient of Skew0 Coefficient of Variation Upper Outlier Lower Outlier

Results 3: Seek distances -The distribution of seek distances is symmetrical. 20

Conclusions (1) Analyzing storage system workloads is necessary to properly model the workloads: -To model Web inter-arrival time, Weibull, lognormal, beta, gamma, exponential probability density functions should be considered. -To model Web data size and seek distance using probability mass function is more appropriate. *We intend to use the models in simulations of ESS. 21

Conclusions (cont’d) (2) The analysis results are useful when designing optimization techniques of storage system. E.g., -Cache management block size – 8192 bytes. -I/O rescheduling and background tasking would be ideal for the workload. -The storage system handling the workload we analyzed can be optimized to handle the symmetrical behavior*. *The results are not broadly applicable. 22

Conclusions (cont’d) (3) Other conclusions: -Request sizes influenced by filesystem in use. -Seek distances are not always uniform distributed. *In summary, we have provided statistics about the parameters for the storage system workload that we analyzed and have shown how we can use them to derive models and design I/O optimization techniques. 23

Future Work -Rigorously find a probability density function matching a given data set of inter-arrival times. - Analyze the storage system workloads in terms of other parameters (e.g., logical volume numbers and operation types) 24

THANK YOU FOR YOUR ATTENTION! ? 25