1 CS533 Modeling and Performance Evaluation of Network and Computer Systems The Art of Data Presentation (Chapters 10 and 11)

Slides:



Advertisements
Similar presentations
Week 2 Vocabulary Review Quiz Friday, September 3 rd.
Advertisements

TABLES and FIGURES BIOL 4001.
Frequency Distributions Quantitative Methods in HPELS 440:210.
Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 14 Descriptive Statistics 14.1Graphical Descriptions of Data 14.2Variables.
1 CS533 Modeling and Performance Evaluation of Network and Computer Systems The Art of Data Presentation.
Evaluation of Speech Detection Algorithm Project 1b Due October 11.
Project 1b Evaluation of Speech Detection Due: February 17 th, at the beginning of class.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
1 Normal Probability Distributions. 2 Review relative frequency histogram 1/10 2/10 4/10 2/10 1/10 Values of a variable, say test scores In.
Introduction to Excel 2007 Part 2: Bar Graphs and Histograms February 5, 2008.
1.1 Displaying and Describing Categorical & Quantitative Data.
Graphic representations in statistics (part II). Statistics graph Data recorded in surveys are displayed by a statistical graph. There are some specific.
Chapter 5: Understanding and Comparing Distributions
Reading Graphs and Charts are more attractive and easy to understand than tables enable the reader to ‘see’ patterns in the data are easy to use for comparisons.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 2-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
10-1 ©2006 Raj Jain The Art of Data Presentation.
Types of Data Displays Based on the 2008 AZ State Mathematics Standard.
LSP 120: Quantitative Reasoning and Technological Literacy Section 903 Özlem Elgün.
Chapter 2 Graphs, Charts, and Tables – Describing Your Data
Chapter 2 Describing Data Sets
Summarizing Measured Data Part I Visualization (Chap 10) Part II Data Summary (Chap 12)
Presenting information
Introduction to Excel 2007 Bar Graphs & Histograms Psych 209 February 1st, 2011.
CS1100: Computer Science and Its Applications Creating Graphs and Charts in Excel.
Week 4 LSP 120 Joanna Deszcz. 3 Types of Graphs used in QR  Pie Charts Very limited use Category sets must make a whole  XY Graphs or Line Graphs Use.
Introduction to Excel 2007 Part 3: Bar Graphs and Histograms Psych 209.
LSP 120: Quantitative Reasoning and Technological Literacy Özlem Elgün Prepared by Ozlem Elgun1.
Charts and Graphs V
Bell Ringer Get out your notebook and prepare to take notes on Chapter 9 Convert your height to inches and be prepared to write this value on the whiteboard.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
Let’s Review for… AP Statistics!!! Chapter 1 Review Frank Cerros Xinlei Du Claire Dubois Ryan Hoshi.
Secondary National Strategy Handling Data Graphs and charts Created by J Lageu, KS3 ICT Consultant – Coventry Based on the Framework for teaching mathematics.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Describing Data: Frequency Tables, Frequency Distributions, and Graphic Presentation Chapter 2.
CPE 619 The Art of Data Presentation
Graphical Analysis. Why Graph Data? Graphical methods Require very little training Easy to use Massive amounts of data can be presented more readily Can.
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
Ratio Games and Designing Experiments Andy Wang CIS Computer Systems Performance Analysis.
Quantitative Skills 1: Graphing
Integrating Graphics, Charts, Tables Into your technical writing documents.
The Scientific Method Honors Biology Laboratory Skills.
 Frequency Distribution is a statistical technique to explore the underlying patterns of raw data.  Preparing frequency distribution tables, we can.
Chapter 10 The Art of Data Presentation. Overview 2 Types of Variables Guidelines for Preparing Good Charts Common Mistakes in Preparing Charts Pictorial.
Graphs An Introduction. What is a graph?  A graph is a visual representation of a relationship between, but not restricted to, two variables.  A graph.
Chapter 2 Describing Data.
Graphing Data: Introduction to Basic Graphs Grade 8 M.Cacciotti.
ICOM 6115: Computer Systems Performance Measurement and Evaluation August 11, 2006.
© 1998, Geoff Kuenning Common Mistakes in Graphics Excess information Multiple scales Using symbols in place of text Poor scales Using lines incorrectly.
Unit 4 Statistical Analysis Data Representations.
GrowingKnowing.com © Frequency distribution Given a 1000 rows of data, most people cannot see any useful information, just rows and rows of data.
1.1 example these are prices for Internet service packages find the mean, median and mode determine what type of data this is create a suitable frequency.
Grade 8 Math Project Kate D. & Dannielle C.. Information needed to create the graph: The extremes The median Lower quartile Upper quartile Any outliers.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
GRAPHICS GUIDELINES MUSE/CE 11B Anagnos/Williamson From Pfeiffer, W.S Technical Writing: A Practical Approach. 5th Edition. Prentice Hall. New Jersey.
Highline Class, BI 348 Basic Business Analytics using Excel Chapter 03: Data Visualization: Tables & Charts 1.
Chapter 2: Frequency Distributions. Frequency Distributions After collecting data, the first task for a researcher is to organize and simplify the data.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Charts Overview PowerPoint Prepared by Alfred P.
Class Two Before Class Two Chapter 8: 34, 36, 38, 44, 46 Chapter 9: 28, 48 Chapter 10: 32, 36 Read Chapters 1 & 2 For Class Three: Chapter 1: 24, 30, 32,
MATH & SCIENCE.  Pre-Algebra  Elementary algebra  Intermediate algebra  Coordinate geometry  Plane geometry  Trigonometry.
Graphical Presentation Dr. Amjad El-Shanti MD, PMH,Dr PH University of Palestine 2016.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Integrating Graphics, Illustrations, Figures, Charts.
Descriptive Statistics
Welcome to Week 02 Tues MAT135 Statistics
Probability & Statistics Displays of Quantitative Data
Frequency Distributions
Unit 4 Statistical Analysis Data Representations
Ms jorgensen Unit 1: Statistics and Graphical Representations
CHAPTER 1: Picturing Distributions with Graphs
More on Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 24, 2007.
Presentation transcript:

1 CS533 Modeling and Performance Evaluation of Network and Computer Systems The Art of Data Presentation (Chapters 10 and 11)

2 Introduction An analysis whose results cannot be understood is as good as one that is never performed. General techniques –Line charts, bar charts, pie charts, histograms Some specific techniques –Gantt charts, Kiviat graphs … A picture is worth a thousand words –Plus, easier to look at, more interesting It’s not what you say, but how you say it. – A. Putt

3 Outline Types of Variables Guidelines Common Mistakes Pictorial Games Special Purpose Charts Decision Maker’s Games Ratio Games

4 Types of Variables Qualitative (Categorical) variables –Have states or subclasses –Can be ordered or unordered Ex: PC, minicomputer, supercomputer  ordered Ex: scientific, engineering, educational  unordered Quantitative variables –Numeric levels –Discrete or continuous Ex: number of processors, disk blocks, etc. is discrete Ex: weight of a portable computer is continuous Variables Qualitative OrderedUnorderedDiscreteContinuous Quantitative

5 Outline Types of Variables Guidelines Common Mistakes Pictorial Games Special Purpose Charts Decision Maker’s Games Ratio Games

6 Guidelines for Good Graphs (1 of 5) Again, “art” not “rules”. Learn with experience. Recognize good/bad when see it. Require minimum effort from reader –Perhaps most important metric –Given two, can pick one that takes less reader effort a b c Direct Labeling a b c Legend Box Ex:

7 Guidelines for Good Graphs (2 of 5) Maximize information –Make self-sufficient –Key words in place of symbols Ex: “PIII, 850 MHz” and not “System A” Ex: “Daily CPU Usage” not “CPU Usage” –Axis labels as informative as possible Ex: “Response Time in seconds” not “Response Time” –Can help by using captions, too Ex: “Transaction response time in seconds versus offered load in transactions per second.”

8 Guidelines for Good Graphs (3 of 5) Minimize ink –Maximize information-to-ink ratio –Too much unnecessary ink makes chart cluttered, hard to read Ex: no gridlines unless needed to help read –Chart that gives easier-to-read for same data is preferred 1 Availability.1 Unavailability Same data Unavail = 1 – avail Right better

9 Guidelines for Good Graphs (4 of 5) Use commonly accepted practices –Present what people expect –Ex: origin at (0,0) –Ex: independent (cause) on x-axis, dependent (effect) on y-axis –Ex: x-axis scale is linear –Ex: increase left to right, bottom to top –Ex: scale divisions equal Departures are permitted, but require extra effort from reader so use sparingly

10 Guidelines for Good Graphs (5 of 5) Avoid ambiguity –Show coordinate axes –Show origin –Identify individual curves and bars –Do not plot multiple variables on same chart

11 Guidelines for Good Graphs (Summary) Checklist in Jain, Box 10.1, p. 143 The more “yes” answers, the better –But, again, may consciously decide not to follow these guidelines if better without them In practice, takes several trials before arriving at “best” graph Want to present the message the most: accurately, simply, concisely, logically

12 Outline Types of Variables Guidelines Common Mistakes Pictorial Games Special Purpose Charts Decision Maker’s Games Ratio Games

13 Common Mistakes (1 of 6) Presenting too many alternatives on one chart Guidelines –More than 5 to 7 messages is too many (Maybe related to the limit of human short- term memory?) –Line chart with 6 curves or less –Column chart with 10 bars –Pie chart with 8 components –Each cell in histogram should have 5+ values

14 Common Mistakes (2 of 6) Presenting many y-variables on a single chart –Better to make separate graphs –Plotting many y-variables saves space, but better to requires reader to figure out relationship Space constraints for journal/conf! throughpututilization Response time

15 Common Mistakes (3 of 6) Using symbols in place of text More difficult to read symbols than text Reader must flip through report to see symbol mapping to text –Even if “save” writers time, really “wastes” it since reader is likely to skip! Y=1 Y=3 Y=5   1 job/sec 3 jobs/sec 5 jobs/sec Arrival rate Service rate

16 Common Mistakes (4 of 6) Placing extraneous information on the chart –Goal is to convey particular message, so extra information is distracting –Ex: using gridlines only when exact values are expected to be read –Ex: “per-system” data when average data is only part of message required

17 Common Mistakes (5 of 6) Selecting scale ranges improperly –Most are prepared by automatic programs (excel, gnuplot) with built-in rules Give good first-guess –But May include outlying data points, shrinking body May have endpoints hard to read since on axis May place too many (or too few) tics –In practice, almost always over-ride scale values

18 Common Mistakes (6 of 6) Using a Line Chart instead of Column Chart –Lines joining successive points signify that they can be approximately interpolated –If don’t have meaning, should not use line chart MIPS - No linear relationship between processor types! - Instead, use column chart

19 Outline Types of Variables Guidelines Common Mistakes Pictorial Games Special Purpose Charts Decision Maker’s Games Ratio Games

20 Pictorial Games Can deceive as easily as can convey meaning Note, not always a question of bad practice but should be aware of techniques when reading performance evaluation

21 Non-Zero Origins to Emphasize (1 of 2) Normally, both axes meet at origin By moving and scaling, can magnify (or reduce!) difference MINE YOURS MINE YOURS Which graph is better?

22 Non-Zero Origins to Emphasize (2 of 2) Choose scale so that vertical height of highest point is at least ¾ of the horizontal offset of right-most point –Three-quarters rule (And represent origin as 0,0) MINE YOURS

23 Using Double-Whammy Graph Two curves can have twice as much impact –But if two metrics are related, knowing one predicts other … so use one! Response Time Goodput Number of Users

24 Plotting Quantities without Confidence Intervals When random quantification, representing mean (or median) alone (or single data point!) not enough MINE YOURS MINE YOURS (Worse) (Better)

25 Pictograms Scaled by Height If scaling pictograms, do by area not height since eye drawn to area –Ex: twice as good  doubling height quadruples area MINEYOURSMINEYOURS (Worse) (Better)

26 Using Inappropriate Cell Size in Histogram Getting cell size “right” always takes more than one attempt –If too large, all points in same cell –If too small, lacks smoothness Frequency Frequency Same data. Left is “normal” and right is “exponential”

27 Using Broken Scales in Column Charts By breaking scale in middle, can exaggerate differences –May be trivial, but then looks significant –Similar to “zero origin” problem System A-F

28 Outline Types of Variables Guidelines Common Mistakes Pictorial Games Special Purpose Charts Decision Maker’s Games Ratio Games

29 Scatter Plot (1 of 2) Useful in statistical analysis Also excellent for huge quantities of data –Can show patterns otherwise invisible –(Another example next) (Geoff Kuenning, 1998)

30 Scatter Plot (2 of 2)

31 Box and Whisker’s Plot Shows (range, median, quartiles) all in one: Variations: minimummaximumquartile median (Geoff Kuenning, 1998)

32 Stem and Leaf Display “Histogram-lite” for analysis w/out software Scores: 34, 81, 75, 51, 82, 96, 55, 66, 95, 87, 82, 88, 99, 50, 85, 72 9| | | 5 2 6| 6 5| | 3| 4

33 Gantt Charts (1 of 2) Resource too high is bottleneck Resource too low could be underutilization Want mix of jobs with significant overlap –Show with Gantt Chart In general, represents Boolean condition … on or off. Length of lines represent busy. CPU I/O Network (Example 10.1 Page 151 Next)

34 Gantt Charts (2 of 2) - Example ABCDTime ABCDTime (Jain, Example 10.1 Page 151) Pattern is A and not-A first Rest are not-R and R

35 Kiviat Graphs (1 of 2) Also called “star charts” or “radar plots” ½ are HB, ½ are LB Note, don’t have to have all at 100% can be “10% busy”, say Useful for looking at balance between HB and LB metrics (“Star” is best) (Geoff Kuenning, 1998) HB LB HB LB

36 Kiviat Graphs (2 of 2) Commonly occurring shapes can be useful to characterize system –“CPU keelboat” (CPU bound) (fig 10.19) (A shallow, covered riverboat for freight) –“I/O wedge” (I/O bound) (fig 10.20) –“I/O arrow” (CPU + I/O) (fig 10.21) Most for data processing, but can be applied to other systems. Ex: network HB MetricsLB Metrics App throughputApp response time Link utilizationLink overhead Router utilizationRouter overhead % packets arrive# duplicates % implicit acks% packets with error

37 Outline Types of Variables Guidelines Common Mistakes Pictorial Games Special Purpose Charts Decision Maker’s Games Ratio Games

38 Decision Maker’s Games Even if perf analysis is correctly done, may not convince decision makers (boss, conference referees, thesis advisor…) –Box 10.2, p. 162 has list of reasons Most common: 1) “More analysis.” This is always true. Does not mean analysis done is not valuable. 2) “Alternate workload”. Since based on past, can always be questioned as good future workload Lead to endless discussion (“rat holes”). Can “head off” criticism by stating this.

39 Outline Types of Variable Guidelines Common Mistakes Pictorial Games Special Purpose Charts Decision Maker’s Games Ratio Games

40 Ratio Games (Ch 11) A common way to play games with competitors Two ratios with different bases cannot be compared or averaged –Doing so is called “ratio game” Knowledge of “ratio games” will help protect ourselves, avoid doing If you can’t convince them, confuse them. – Truman’s Law

41 Games with Base System Beware! –Normalize each system’s performance for each workload by system A and average ratios –Normalize each system’s performance for each workload by system B and average ratiosWork- Systemload 1load 2Average A B102015Work- Systemload 1load 2Average A B111

42 Games with Ratio Metrics Choose a metric that is ratio of two other metrics. Power = thrput/respTime NetworkThrputRespTimePower A1025 B414 Suggests that A is better. But maybe it should be: power = thrput/respTime 2  Power A = 2.5, Power B = 4

43 Games with Relative Performance Metric may be specified but can still get ratio game if two are on different machines MFLOPS, System X-Y, accelerators A-B AlternativeWithoutWithRatio A on X B on Y (Base systems are different)

44 Games with Percentages (1 of 2) Percentages are really ratios, but disguised –So can play games A is worse under both tests  but it looks better in Total!

45 Games with Percentages (2 of 2) Percentages –Have bigger psychological impact 1000% sounds bigger than 10-fold –Are great when both original and final performance are lousy Ex: payment was $40 per week, is now $80 When used, base should be initial, not final value –Ex: Price was $400, now $100 Drop of 400%! But that makes no sense

46 Strategies for Winning Ratio Game (1 of 2) (Again, don’t do these, just be aware of them so no-one does them to you) If one system is better by all measures, a ratio game won’t (usually) work –Although, remember percent-passes example! –And selecting the base also lets you change the magnitude of the difference If each system wins on some measures, ratio games might be possible –May have to try all bases

47 Strategies for Winning Ratio Game (2 of 2) Work- Systemload 1 load 2Base BBase A A B For LB metrics, use your system as the base –Ex: response time For HB metrics, use the other system as a base –Ex: throughput If possible, adjust lengths of benchmarks –Run longer when your system performs best –Run short when your system is worst –This gives greater weight to your strengths

48 Extra Credit for Next Class Bring in one either notoriously bad or exceptionally good example of data presentation –The bad ones may be more fun From proceedings, technical documentation, newspaper … Make copies before class or send to me and I’ll make copies We’ll discuss why good/bad