Andy Wang CIS 5930 Computer Systems Performance Analysis

Slides:



Advertisements
Similar presentations
Topics to be discussed Introduction Performance Factors Methodology Test Process Tools Conclusion Abu Bakr Siddiq.
Advertisements

1 CS533 Modeling and Performance Evaluation of Network and Computer Systems Capacity Planning and Benchmarking (Chapter 9)
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
Lecture 19 Page 1 CS 111 Online Protecting Operating Systems Resources How do we use these various tools to protect actual OS resources? Memory? Files?
G. Alonso, D. Kossmann Systems Group
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Performance Evaluation
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Principle of Functional Verification Chapter 1~3 Presenter : Fu-Ching Yang.
Analysis of Simulation Results Andy Wang CIS Computer Systems Performance Analysis.
1 Validation & Verification Chapter VALIDATION & VERIFICATION Very Difficult Very Important Conceptually distinct, but performed simultaneously.
Bottlenecks: Automated Design Configuration Evaluation and Tune.
Lecture 2b: Performance Metrics. Performance Metrics Measurable characteristics of a computer system: Count of an event Duration of a time interval Size.
Computing and the Web Operating Systems. Overview n What is an Operating System n Booting the Computer n User Interfaces n Files and File Management n.
Other Regression Models Andy Wang CIS Computer Systems Performance Analysis.
Test Loads Andy Wang CIS Computer Systems Performance Analysis.
Generic Approaches to Model Validation Presented at Growth Model User’s Group August 10, 2005 David K. Walters.
CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees.
ICOM 6115: Computer Systems Performance Measurement and Evaluation August 11, 2006.
Task Analysis Methods IST 331. March 16 th
Lecture 11 Page 1 CS 111 Online Virtual Memory A generalization of what demand paging allows A form of memory where the system provides a useful abstraction.
Rassul Ayani 1 Performance of parallel and distributed systems  What is the purpose of measurement?  To evaluate a system (or an architecture)  To compare.
Modeling Virtualized Environments in Simalytic ® Models by Computing Missing Service Demand Parameters CMG2009 Paper 9103, December 11, 2009 Dr. Tim R.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Understanding Performance in Operating Systems Andy Wang COP 5611 Advanced Operating Systems.
© 2003, Carla Ellis Vague idea “groping around” experiences Hypothesis Model Initial observations Experiment Data, analysis, interpretation Results & final.
Vague idea “groping around” experiences Hypothesis Model Initial observations Experiment Data, analysis, interpretation Results & final Presentation Experimental.
Page 1 Monitoring, Optimization, and Troubleshooting Lecture 10 Hassan Shuja 11/30/2004.
ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Evaluation – Metrics, Simulation, and Workloads Copyright 2004 Daniel.
Ch. 31 Q and A IS 333 Spring 2016 Victor Norman. SNMP, MIBs, and ASN.1 SNMP defines the protocol used to send requests and get responses. MIBs are like.
Test Loads Andy Wang CIS Computer Systems Performance Analysis.
UNIX U.Y: 1435/1436 H Operating System Concept. What is an Operating System?  The operating system (OS) is the program which starts up when you turn.
Installing Windows 7 Lesson 2.
OPERATING SYSTEMS CS 3502 Fall 2017
Jonathan Walpole Computer Science Portland State University
Debugging Intermittent Issues
Software Architecture in Practice
Network Performance and Quality of Service
Reasoning in Psychology Using Statistics
Outline What does the OS protect? Authentication for operating systems
CHAPTER 4 Designing Studies
Debugging Intermittent Issues
Understanding Results
Outline Introduction Characteristics of intrusion detection systems
Introduction to Operating System (OS)
So far we have covered … Basic visualization algorithms
Outline What does the OS protect? Authentication for operating systems
Programmable Logic Controllers (PLCs) An Overview.
Anne Pratoomtong ECE734, Spring2002
4.2 Experiments.
Understanding Randomness
GEOMATIKA UNIVERSITY COLLEGE CHAPTER 2 OPERATING SYSTEM PRINCIPLES
Page Replacement.
Experimental Design.
Experimental Design.
Multiprocessor and Real-Time Scheduling
Andy Wang CIS 5930 Computer Systems Performance Analysis
Understanding Performance in Operating Systems
Dynamic Program Analysis
CSE451 Virtual Memory Paging Autumn 2002
Reasoning in Psychology Using Statistics
General Test-Taking Strategies
CHAPTER 6 ELECTRONIC DATA PROCESSING SYSTEMS
Andy Wang CIS Computer Systems Performance Analysis
MECH 3550 : Simulation & Visualization
Performance.
Reasoning in Psychology Using Statistics
Presentation transcript:

Andy Wang CIS 5930 Computer Systems Performance Analysis Test Loads Andy Wang CIS 5930 Computer Systems Performance Analysis

Applying Test Loads to Systems Designing test loads Tools for applying test loads

Test Load Design Most experiments require applying test loads to system General characteristics of test loads already discussed How do we design test loads?

Types of Test Loads Real users Traces Load-generation programs

Loads Caused by Real Users Put real people in front of your system Two choices: Have them run pre-arranged set of tasks Have them do what they’d normally do Always a difficult approach Labor-intensive Impossible to reproduce given load Load is subject to many external influences But highly realistic

Traces Collect set of commands/accesses issued to system under test Replay against your system Some traces of common activities available from others (e.g., file accesses) But often don’t contain everything you need

Issues in Using Traces May be hard to alter or extend (e.g., scaling) Accuracy of trace may depend on behavior of system Hard to match hardware settings (e.g., number of disks/CPUs) Only truly representative of traced system and execution

Running Traces Need process that reads trace and issues commands from trace Process must be reasonably accurate in timing But must have little performance impact How about lengthy traces? If trace is large, can’t keep it all in main memory But be careful of disk overheads

Load-Generation Programs Create model for load you want to apply Write program implementing that model Program issues commands & requests synthesized from model E.g., if model says open file, program builds appropriate open() command

Building the Model Tradeoff between ease of creation and use of model vs. its accuracy (e.g., IPCs, causal ordering, etc.) Base model on everything you can find out about the real system behavior Which may include examining traces Consider whether model can be memoryless, or requires keeping track of what’s already happened (Markov)

Using the Model May require creation of test files or processes or network connections Model should include how they should be created Shoot for minimum performance impact

Applying Test Loads Most experiments will need multiple repetitions Details covered later in course More accurate if repetitions run in identical conditions Test software should work hard to duplicate conditions on each run

Example of Applying Test Loads For the Ficus example, want to measure update propagation for multiple replicas Test load is set of benchmarks involving file access & other activities Must apply test load for varying numbers of replicas

Factors in Designing This Experiment Setting up volumes and replicas Network traffic Other load on test machines Caching effects Automation of experiment Very painful to start each run by hand

Experiment Setup Need volumes to access, and replicas of each volume on various machines What is a representative volume? Must be sure that setup completes before running experiment Watch out for asynchronous writes

Network Traffic Issues If experiment is distributed, how is it affected by other traffic on network? Is network traffic used in test similar to real-world traffic? If not, do you need to run on isolated network? And/or generate appropriate background network load?

Controlling Other Load Generally, want control over most if not all processes running on test machines Ideally, use dedicated machines But also be careful about background and periodic jobs In Unix context, check carefully on cron and network-related daemons Watch out for screen savers

Caching Effects Many jobs run much faster if cached Other things also change Is caching effect part of what you’re measuring? If not, clean out caches between runs Or arrange experiment so caching doesn’t help But sometimes you should measure caching

Automating Experiments For all but very small experiments, it pays to automate Don’t want to start each run by hand Automation must be done with care Make sure previous run is really complete Make sure you completely reset your state Make sure the data is really collected!

Common Mistakes in Benchmarking Many people have made these You will make some of them, too But watch for them, so you don’t make too many

Only Testing Average Behavior Test workload should usually include divergence from average workload Few workloads remain at their average Behavior at extreme points can be different Particularly bad if only average behavior is used

Ignoring Skewness of Device Demands More generally, not including skewness of any component E.g., distribution of file accesses among set of users Leads to unrealistic conclusions about how system behaves

Loading Levels Controlled Inappropriately Not all methods of controlling load are equivalent (e.g., scaling a workload) More users (more cache misses) Less think time (fewer disk flushes) Increased resource demands per user Another example, PostMark, ISP-like workload Can increase the number of files, number of transactions, update size, file size

Loading Levels Controlled Inappropriately Choose methods that capture effect you are testing for Prefer methods allowing more flexibility in control over those allowing less

Caching Effects Ignored Caching occurs many places in modern systems Performance on given request usually very different depending on cache hit or miss Must understand how cache works Must design experiment to use it realistically

Inappropriate Buffer Sizes Slight changes in buffer sizes can greatly affect performance in many systems Match reality

Sampling Inaccuracies Ignored Remember that your samples are random events Use statistical methods to analyze them Beware of sampling techniques whose periodicity interacts with your measurements Best to randomize experiment order

Ignoring Monitoring Overhead Primarily important in design phase If possible, must minimize overhead to point where it is not relevant But also important to consider it in analysis

Not Validating Measurements Just because your measurement says something is so, it isn’t necessarily true Extremely easy to make mistakes in experimentation Check whatever you can Treat surprising measurements especially carefully

Not Ensuring Constant Initial Conditions Repeated runs are only comparable if initial conditions are the same Not always easy to undo everything previous run did E.g., level of storage/memory fragmentation But do your best Understand where you don’t have control in important cases

Not Measuring Transient Performance Many systems behave differently at steady state than at startup (or shutdown) That’s not always everything we care about Understand whether you should care If you should, measure transients too Not all transients due to startup/shutdown

Performance Comparison Using Device Utilizations Sometimes this is right thing to do But only if device utilization is metric of interest Remember that faster processors will have lower utilization on same load Which isn’t bad

Lots of Data, Little Analysis The data isn’t the product! The analysis is! So design experiment to leave time for sufficient analysis If things go wrong, alter experiments to still leave analysis time

White Slide