Storage system designs must be evaluated with respect to many workloads New Disk Array Performance (CDF of latency) seconds % I/Os seconds % I/Os seconds.

Slides:



Advertisements
Similar presentations
Alternative Approaches to Data Dissemination and Data Sharing Jerome Reiter Duke University
Advertisements

Disk Arrays COEN 180. Large Storage Systems Collection of disks to store large amount of data. Performance advantage: Each drive can satisfy only so many.
ULC: An Unified Placement and Replacement Protocol in Multi-level Storage Systems Song Jiang and Xiaodong Zhang College of William and Mary.
A Survey of Web Cache Replacement Strategies Stefan Podlipnig, Laszlo Boszormenyl University Klagenfurt ACM Computing Surveys, December 2003 Presenter:
MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,
2P13 Week 11. A+ Guide to Managing and Maintaining your PC, 6e2 RAID Controllers Redundant Array of Independent (or Inexpensive) Disks Level 0 -- Striped.
Fingerprint Minutiae Matching Algorithm using Distance Histogram of Neighborhood Presented By: Neeraj Sharma M.S. student, Dongseo University, Pusan South.
Yoshiharu Ishikawa (Nagoya University) Yoji Machida (University of Tsukuba) Hiroyuki Kitagawa (University of Tsukuba) A Dynamic Mobility Histogram Construction.
Adapted from Menascé & Almeida.1 Workload Characterization for the Web.
Synthesizing Representative I/O Workloads for TPC-H J. Zhang*, A. Sivasubramaniam*, H. Franke, N. Gautam*, Y. Zhang, S. Nagar * Pennsylvania State University.
Generating Synthetic Workloads Using Iterative Distillation Zachary Kurmas – Georgia Tech Kimberly Keeton – HP Labs Kenneth Mackenzie – Reservoir Labs.
MIS DATABASE SYSTEMS, DATA WAREHOUSES, AND DATA MARTS MBNA
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)
Mutual Information Mathematical Biology Seminar
Chapter 6: CPU Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Feb 2, 2005 Chapter 6: CPU Scheduling Basic.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Performance Evaluation
Distillation of Performance- Related Characteristics
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
File Management.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Chapter 17 Methodology – Physical Database Design for Relational Databases Transparencies © Pearson Education Limited 1995, 2005.
Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.
Team Dosen UMN Physical DB Design Connolly Book Chapter 18.
Scalability Module 6.
Computer System Lifecycle Chapter 1. Introduction Computer System users, administrators, and designers are all interested in performance evaluation. Whether.
Determining the Significance of Item Order In Randomized Problem Sets Zachary A. Pardos, Neil T. Heffernan Worcester Polytechnic Institute Department of.
Toolbox for Dimensioning Windows Storage Systems Jalil Boukhobza, Claude Timsit 12/09/2006 Versailles Saint Quentin University.
MIS DATABASE SYSTEMS, DATA WAREHOUSES, AND DATA MARTS MBNA ebay
Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.
The Structured Specification. Why a Structured Specification? System analyst communicates the user requirements to the designer with a document called.
Lecture 9 Methodology – Physical Database Design for Relational Databases.
Secure Incremental Maintenance of Distributed Association Rules.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Chapter 3 System Performance and Models. 2 Systems and Models The concept of modeling in the study of the dynamic behavior of simple system is be able.
10/10/2012ISC239 Isabelle Bichindaritz1 Physical Database Design.
File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.
Ó 1998 Menascé & Almeida. All Rights Reserved.1 Part V Workload Characterization for the Web (Book, chap. 6)
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
March 23 & 28, Csci 2111: Data and File Structures Week 10, Lectures 1 & 2 Hashing.
March 23 & 28, Hashing. 2 What is Hashing? A Hash function is a function h(K) which transforms a key K into an address. Hashing is like indexing.
Chapter 5: Process Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Basic Concepts Maximum CPU utilization can be obtained.
Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.
Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.
DiskSim – Storage System Simulator Michigan-CMU
File Systems cs550 Operating Systems David Monismith.
0 / Database Management. 1 / Identify file maintenance techniques Discuss the terms character, field, record, and table Describe characteristics.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
1 MONALISA Compact Straightness Monitor Simulation and Calibration Week 2 Report By Patrick Gloster.
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
Software Design and Development Development Methodoligies Computing Science.
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
Memory Management for Scalable Web Data Servers
Chapter 6: CPU Scheduling
CPU Scheduling G.Anuradha
Module 5: CPU Scheduling
3: CPU Scheduling Basic Concepts Scheduling Criteria
Chapter5: CPU Scheduling
Qingbo Zhu, Asim Shankar and Yuanyuan Zhou
Chapter 6: CPU Scheduling
CSE 60641: Operating Systems
Chapter 5: CPU Scheduling
Lecture 2 Part 3 CPU Scheduling
Operating System , Fall 2000 EA101 W 9:00-10:00 F 9:00-11:00
False discovery rate estimation
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Presentation transcript:

Storage system designs must be evaluated with respect to many workloads New Disk Array Performance (CDF of latency) seconds % I/Os seconds % I/Os seconds % I/Os Database workload server workload File server workload Workloads Example Workloads Measure target workload’s high-level characteristics Production Workload (R,1024,120932,124) (W,8192,120834,126) (W,8192,120844,127) (R,2048,334321,131) (R,1024,120932,124) (W,8192,120834,126) (W,8192,120844,127) (R,2048,334321,131) (R,1024,120932,124) (W,8192,120834,126) (W,8192,120844,127) (R,2048,334321,131) (R,1024,120932,124) (W,8192,120834,126) (W,8192,120844,127) (R,2048,334321,131)... Synthetic Workload (R,1024,120932,124) (W,8192,120834,126) (W,8192,120844,127) (R,2048,334321,131) (R,1024,120932,124) (W,8192,120834,126) (W,8192,120844,127) (R,2048,334321,131) (R,1024,120932,124) (W,8192,120834,126) (W,8192,120844,127) (R,2048,334321,131) (R,1024,120932,124) (W,8192,120834,126) (W,8192,120844,127) (R,2048,334321,131)... Mean Request Size: 8Kb Mean interarrival Time:.04ms Read Percentage: 78% Location Distribution: (.01,.02,.0,.09,.14,.03,.12,… … Generate synthetic workload with same characteristics Goal: Workload trace and synthetic workload interchangeable Both workloads have similar response times Both workloads should lead to similar design decisions Attribute-values Generating Synthetic Workloads Using Iterative Distillation Two sources for evaluation workloads Real vs. Synthetic Synthetic Workloads Randomly generated to maintain high- level properties Compact representation Easily modified Compact rep. contains no specific data Rarely accurate T race of real workloads List of I/O requests made by production workload Large Inflexible Difficult to obtain (due to security concerns) Perfectly accurate Zachary Kurmas Georgia Tech Kimberly Keeton HP Labs Kenneth Mackenzie Reservoir Labs, Inc. Changes may be beneficial to some users and detrimental to others. Evaluate Synthetic Workload Initial 50% error Iteration 1 25% error Iteration 2 7% error Iteration 3 3% error Target performance Production Workload (R,1024,120932,124) (W,8192,120834,126) (W,8192,120844,127) (R,2048,334321,131) (R,1024,120932,124) (W,8192,120834,126) (W,8192,120844,127) (R,2048,334321,131) (R,1024,120932,124) (W,8192,120834,126) (W,8192,120844,127) (R,2048,334321,131) (R,1024,120932,124) (W,8192,120834,126) (W,8192,120844,127) (R,2048,334321,131)... Synthetic Workload (R,1024,120932,124) (W,8192,120834,126) (W,8192,120844,127) (R,2048,334321,131) (R,1024,120932,124) (W,8192,120834,126) (W,8192,120844,127) (R,2048,334321,131) (R,1024,120932,124) (W,8192,120834,126) (W,8192,120844,127) (R,2048,334321,131) (R,1024,120932,124) (W,8192,120834,126) (W,8192,120844,127) (R,2048,334321,131)... Attribute-values Initial Attributes Mean interarrival Time:.04ms Read Percentage: 78% Location Distribution: (.01,.02,.0,.09,.14,.03,.12,… … I As attributes added, performance becomes more similar PROBLEM We don’t know what high-level characteristics will lead to representative workloads Workloads that “look” alike do not necessarily behave alike. Key Observations Workload performance determined by relationships within sequence of requests and between different requests Attributes that measure the same parameters describe the same relationships We can test effects of a relationship by “subtracting” it from target workload. Attribute groups Choose Attribute Group 1 2 (Op Size Location Time) (W, 1024, ,.111 ) (R, 8192, ,.126 ) (R, 8192, ,.127 ) (W, 2048, ,.131 ) (W, 1024, ,.137 ) (R, 8192, ,.143 ) (R, 8192, ,.144 ) Patterns between locations may produce locality Patterns between arrival times may produce burstiness Patterns between location and arrival time may offset burstiness Attributes describe these patterns Short interarrival times produce bursts Underlined locations are spatial local, and form a “run” Permuting the locations destroys all relationships involving location Difference in performance estimate of effect of location attributes Rotating location column breaks relationships between location and other parameters, but preserves relationships between locations Workloads maintain same relationships except location (W, 1024,,.111 ) (R, 8192,,.126 ) (R, 8192, ,.127 ) (W, 2048, ,.131 ) (W, 1024, ,.137 ) (R, 8192, ,.143 ) (R, 8192, ,.144 ) , , , , , , , (W, 1024, ,.111 ) (R, 8192, ,.126 ) (R, 8192, ,.127 ) (W, 2048, ,.131 ) (W, 1024, ,.137 ) (R, 8192, ,.143 ) (R, 8192, ,.144 ) , , , , , , , Subtractive Method Distiller cannot accurately synthesize the target workload using only empirical distributions for I/O request parameters. Difference between lines for location indicates location attribute needed. Similarity of request size lines indicates no request size attribute needed Markov model able to generate representative list of location values. Markov model results in slightly more accurate synthetic workload. Attributes chosen in later iterations produce very accurate synthetic workload. High-Level Approach Iteratively add attributes Within Threshold? No Yes Done Choose Attribute Group Choose Specific Attribute Add new Attribute to List Evaluate Synthetic Workload Initial Attribute List Library of Attributes (W, 1024, ,.111 ) (R, 8192, ,.126 ) (R, 8192, ,.127 ) (W, 2048, ,.131 ) (W, 1024, ,.137 ) (R, 8192, ,.143 ) (R, 8192, ,.144 ) (W, 1024,,.111 ) (R, 8192,,.126 ) (R, 8192, ,.127 ) (W, 2048, ,.131 ) (W, 1024, ,.137 ) (R, 8192, ,.143 ) (R, 8192, ,.144 ) , , , , , , , Compare with “rotated” workload because relationships with other parameters still broken Location generated by attribute that measures runs. (Runs preserved, other locs random.) , , , , , , , To test specific location attribute, we generate synthetic workload using that attribute, and compare it to the “rotated” location workload. Choose Specific Attribute Location Arrival Time Size Op. Type Location, Op. Type Distribution of read locations Distribution of write locations Joint distribution Op Type Read Percentage Markov model Op Type, Arrival Time Op Type, Arrival Time, Request Size Request Size Distribution of request size Markov model of request size Location, Request Size Joint distribution Request size conditioned upon chosen location. Location Distribution of location LRU stack distance Jump Distance Run Count Request Size, Arrival Time Arrival Time Distribution of interarrival time Markov model of interarrival time Clustering Problem Testing every attribute in library takes too long Some attributes redundant or incompatible Many attributes not useful 3 Solution (part 1): Partition attributes into groups 1.Each group of attributes measures the same set of request parameters 2.Each group of attributes describes the same relationships Solution (part 2) Evaluate all attributes in an attribute group using only two workloads 1.One workload maintains the relationship under test 2.The other workload does not. 4 Results

(Op Size Location IAT ) (W, 1024, ,.111 ) (R, 8192, ,.126 ) (R, 8192, ,.127 ) (W, 2048, ,.131 ) (W, 1024, ,.137 ) (R, 8192, ,.143 ) (R, 8192, ,.144 ) Trace of production workload maintains all relationships (time, in seconds, from beginning of trace) (Op Size Location Time) (W, 1024, ,.111 ) (R, 8192, ,.126 ) (R, 8192, ,.127 ) (W, 2048, ,.131 ) (W, 1024, ,.137 ) (R, 8192, ,.143 ) (R, 8192, ,.144 ) Operation Type, Request Size, Location, Arrival Time Read or write Number of bytes accessed Identifies location of data on disk Time request made