CPE 619 The Art of Data Presentation

Slides:



Advertisements
Similar presentations
Quality control tools
Advertisements

1 CS533 Modeling and Performance Evaluation of Network and Computer Systems The Art of Data Presentation (Chapters 10 and 11)
1 CS533 Modeling and Performance Evaluation of Network and Computer Systems The Art of Data Presentation.
Evaluation of Speech Detection Algorithm Project 1b Due October 11.
Project 1b Evaluation of Speech Detection Due: February 17 th, at the beginning of class.
Chapter Two Organizing and Summarizing Data
E FFECTIVE V ISUALS Tables Graphs Charts Illustrations.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 2 Exploring Data with Graphs and Numerical Summaries Section 2.2 Graphical Summaries.
Chapter 5: Understanding and Comparing Distributions
Reading Graphs and Charts are more attractive and easy to understand than tables enable the reader to ‘see’ patterns in the data are easy to use for comparisons.
Chapter 2 Presenting Data in Tables and Charts
10-1 ©2006 Raj Jain The Art of Data Presentation.
Summarizing Measured Data Part I Visualization (Chap 10) Part II Data Summary (Chap 12)
PPA 415 – Research Methods in Public Administration Lecture 2 - Counting and Charting Responses.
TABLES, CHARTS, AND GRAPHS. TABLES  A Table is simply a set of numbers from which you could draw a graph or chart.  A table should provide a clear summary.
Understanding and Comparing Distributions
Understanding and Comparing Distributions
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Describing Data with Tables and Graphs.  A frequency distribution is a collection of observations produced by sorting observations into classes and showing.
Frequency Distributions and Graphs
Charts and Graphs V
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 2-1 What is a Frequency Distribution? A frequency distribution is a list or a.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Describing Data: Frequency Tables, Frequency Distributions, and Graphic Presentation Chapter 2.
CPE 619 2k-p Factorial Design
Graphical Analysis. Why Graph Data? Graphical methods Require very little training Easy to use Massive amounts of data can be presented more readily Can.
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Class Meeting #11 Data Analysis. Types of Statistics Descriptive Statistics used to describe things, frequently groups of people.  Central Tendency 
Quantitative Skills 1: Graphing
The Scientific Method Honors Biology Laboratory Skills.
Analyzing and Interpreting Quantitative Data
 Frequency Distribution is a statistical technique to explore the underlying patterns of raw data.  Preparing frequency distribution tables, we can.
Chapter 10 The Art of Data Presentation. Overview 2 Types of Variables Guidelines for Preparing Good Charts Common Mistakes in Preparing Charts Pictorial.
Copyright © 2009 Pearson Education, Inc. Chapter 5 Understanding and Comparing Distributions.
Statistics Unit 2: Organizing Data Ms. Hernandez St. Pius X High School
Chapter 2 Organizing Data Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
Graphing Data: Introduction to Basic Graphs Grade 8 M.Cacciotti.
Data Collection and Processing (DCP) 1. Key Aspects (1) DCPRecording Raw Data Processing Raw Data Presenting Processed Data CompleteRecords appropriate.
When data is collected from a survey or designed experiment, they must be organized into a manageable form. Data that is not organized is referred to as.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 2-1 Chapter 2 Presenting Data in Tables and Charts Statistics For Managers 4 th.
BUSINESS STATISTICS I Descriptive Statistics & Data Collection.
CPE 619 Two-Factor Full Factorial Design With Replications Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The.
Applied Quantitative Analysis and Practices
GRAPHICS GUIDELINES MUSE/CE 11B Anagnos/Williamson From Pfeiffer, W.S Technical Writing: A Practical Approach. 5th Edition. Prentice Hall. New Jersey.
© Copyright McGraw-Hill CHAPTER 2 Frequency Distributions and Graphs.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 Understanding and Comparing Distributions.
CPE 619 One Factor Experiments Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama in.
CPE 619 Comparing Systems Using Sample Data Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of.
Descriptive Statistics Review – Chapter 14. Data  Data – collection of numerical information  Frequency distribution – set of data with frequencies.
Sampling ‘Scientific sampling’ is random sampling Simple random samples Systematic random samples Stratified random samples Random cluster samples What?
Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Two Organizing Data.
Understanding Basic Statistics Fourth Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Two Organizing Data.
Surveillance and Population-based Prevention Department for Prevention of Noncommunicable Diseases Displaying data and interpreting results.
Effective Visuals Tables Graphs Charts Illustrations.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Charts Overview PowerPoint Prepared by Alfred P.
Copyright © 2009 Pearson Education, Inc. Slide 4- 1 Practice – Ch4 #26: A meteorologist preparing a talk about global warming compiled a list of weekly.
Copyright © 2009 Pearson Education, Inc. 3.2 Picturing Distributions of Data LEARNING GOAL Be able to create and interpret basic bar graphs, dotplots,
Unit 3 Guided Notes. Box and Whiskers 5 Number Summary Provides a numerical Summary of a set of data The first quartile (Q 1 ) is the median of the data.
Integrating Graphics, Illustrations, Figures, Charts.
Descriptive Statistics – Graphic Guidelines Pie charts – qualitative variables, nominal data, eg. ‘religion’ Bar charts – qualitative or quantitative variables,
Data organization and Presentation. Data Organization Making it easy for comparison and analysis of data Arranging data in an orderly sequence or into.
Exploratory Data Analysis
Chapter 2: Methods for Describing Data Sets
Frequency Distributions and Graphs
Analyzing and Interpreting Quantitative Data
Describing Distributions Numerically
Organizing and Visualizing Variables
THE STAGES FOR STATISTICAL THINKING ARE:
Experimental Design Experiments Observational Studies
Presentation transcript:

CPE 619 The Art of Data Presentation Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama in Huntsville http://www.ece.uah.edu/~milenka http://www.ece.uah.edu/~lacasa

Overview Types of Variables Guidelines for Preparing Good Charts Common Mistakes in Preparing Charts Pictorial Games Special Charts for Computer Performance Gantt Charts Kiviat Graphs Schumacher Charts Decision Maker’s Games

Types of Variables Type of computer: Super computer, minicomputer, microcomputer Type of Workload: Scientific, engineering, educational Number of processors Response time of system

Guidelines for Preparing Good Charts 1) Require minimum effort from the reader Direct labeling vs. legend box 2) Maximize Information Words in place of symbols; cleary label the axes

Guidelines (cont’d) 3) Minimize ink 4) Use commonly accepted practices No grid lines, more details 4) Use commonly accepted practices origin at (0,0); independent variable (cause) along x axis; the dependent variable (effect) along the y axis; linear scales; increasing scales; equal divisions 5) Avoid ambiguity Show coordinate axes, scale divisions, origin; Identify individual curves and bars

Checklist for Good Graphics Are both coordinate axes shown and labeled? Are the axes labels self-explanatory and concise? Are the scales and divisions shown on both axes? Are the minimum and maximum of the ranges shown on the axes appropriate to present maximum information Is the number of curves reasonably small? Do all graphs use the same scale? Is there no curve that can be removed without reducing information? Are the curves on a line chart individually labeled? Are the cells in a bar chart individually labeled? Are all symbols on the graph accompanied by appropriate textural explanations? If the curves cross, are the line patterns different to avoid confusion? Are the units of measurement indicated? Is the horizontal scale increasing from left to right? Is the vertical scale increasing from bottom to top? Are the grid lines aiding in reading the curves? Does this whole chart add to information available to the reader? Are the scales contiguous? Is the order of bars in a bar chart systematic? If the vertical axis represents a random quantity, are confidence intervals shown? Are there no curves, symbols, or texts on the graph that can be removed without affecting the information? Is there a title for the whole chart? Is the chart title self-explanatory and concise? For bar charts with unequal class interval, is the are and width representative of the frequency and interval? Do the variable plotted on this cart give more information that other alternatives? Does the chart clearly bring out the intended message? Is the figure referenced and discussed in the text of the report?

Common Mistakes in Preparing Charts Presenting too many alternatives on a single chart Max 5 to 7 messages => Max 6 curves in a line charts, no more than 10 bars in a bar chart, max 8 components in a pie chart Presenting many y variables on a single chart

Common Mistakes in Charts (cont’d) Using symbols in place of text Placing extraneous information on the chart E.g., grid lines, granularity of the grid lines Selecting scale ranges improperly Automatic selection by programs may not be appropriate

Common Mistakes in Charts (cont’d) Using a line chart in place of column chart line => continuity MIPS 8000 8100 8200 8300 CPU Type

Pictorial Games Using non-zero origins to emphasize the difference Three quarter high-rule => height/width > 3/4 Mine is much better than yours (emphasize difference) Mine and yours are almost the same (conceal difference) Height of the highest point should be at least ¾ of the horizontal offset of the rightmost point

Pictorial Games (cont’d) Using double-whammy graph for dramatization Using related metrics

Pictorial Games (cont’d) Plotting random quantities without showing confidence intervals Means of two random variables Means are not enough. Overlapping confidence intervals usually means that the two random quantities are statistically indifferent.

Pictorial Games (cont’d) Pictograms scaled by height Wrong scaling: Area(MINE) > 4*Area(YOURS)?? Mine Performance = 2 Yours Performance = 1

Pictorial Games (cont’d) Using inappropriate cell size in histograms Normal distribution Exponential distribution 12 12 10 10 8 8 Frequency Frequency 6 6 4 4 2 2 [0,2) [2,4) [4,6) [6,8) [8,10) [10,12) [0,6) [6,12) Response Time Response Time

Pictorial Games (cont’d) Using broken scales in column charts Amplify differences 12 12 10 11 8 Resp. Time Resp. Time 10 6 4 9 2 A B C D E F A B C D E F System System

Special Charts for Computer Performance Gantt charts Kiviat Graphs Schumacher's charts

Gantt Charts Shows relative duration of a number of conditions 60 CPU 20 20 IO Channel 30 10 5 15 Network 0% 20% 40% 60% 80% 100% Utilization

Example: Data for Gantt Chart

Draft of the Gantt Chart

Final Gantt Chart

CPU in Supervisor State Kiviat Graphs Radial chart with even number of metrics HB and LB metrics alternate Ideal shape: star CPU Busy CPU in Supervisor State CPU in Problem State CPU Wait Any Channel Busy Channel only Busy CPU/Channel Overlap CPU Only Busy

Kiviat Graph for a Balanced System CPU Busy CPU in Supervisor State CPU in Problem State CPU Wait Any Channel Busy Channel only Busy CPU/Channel Overlap CPU Only Busy Problem: Inter-related metrics CPU busy = problem state + Supervisor state CPU wait = 100 – CPU busy Channel only – any channel –CPU/channel overlap CPU only = CPU busy – CPU/channel overlap

Shapes of Kiviat Graphs CPU Keel boat I/O Wedge I/O Arrow CPU bound system I/O bound system CPU- and I/O bound system

Merrill’s Figure of Merit (FoM) Performance = {x1, x2, x3, …, x2n} Odd values are HB and even values are LB x2n+1 is the same as x1 Average FOM = 50%

Example: FoM System A:

FoM Example (Cont) System B: System B has a higher figure of merit and it is better.

Figure of Merit: Known Problems All axes are considered equal Extreme values are assumed to be better Utility is not a linear function of FoM Two systems with the same FoM are not equally good System with slightly lower FoM may be better

Kiviat Graphs For Other Systems Use Kiviat graphs for networks Application Throughput Link Overhead Packets With Error Link Utilization Implicit Acknowledgements Duplicate Packets

Schumacher Charts Performance matrix are plotted in a tabular manner Values are normalized with respect to long term means and standard deviations Any observations that are beyond mean  one standard deviation need to be explained See Figure 10.25 in the book

Performance Analysis Rat Holes Workload Metrics Configuration Details

Reasons for not Accepting an Analysis This needs more analysis. You need a better understanding of the workload. It improves performance only for long IOs/packets/jobs/files, and most of the IOs/packets/jobs/files are short. It improves performance only for short IOs/packets/jobs/files, but who cares for the performance of short IOs/packets/jobs/files, its the long ones that impact the system. It needs too much memory/CPU/bandwidth and memory/CPU/bandwidth isn't free. It only saves us memory/CPU/bandwidth and memory/CPU/bandwidth is cheap. See Box 10.2 on page 162 of the book for a complete list

Examples

Summary Qualitative/quantitative, ordered/unordered, discrete/continuous variables Good charts should require minimum effort from the reader and provide maximum information with minimum ink Use no more than 5-6 curves, select ranges properly, Three-quarter high rule Gantt Charts show utilizations of various components Kiviat Graphs show HB and LB metrics alternatively on a circular graph Schumacher Charts show mean and standard deviations Workload, metrics, configuration, and details can always be challenged. Should be carefully selected.

Exercise 10.1 What type of chart (line or bar) would you use to plot: CPU usage for 12 months of the year CPU usage as a function of time in months Number of I/O's to three disk drives: A, B, and C Number of I/O's as a function of number of disk drives in a system

Exercise 10.2 List the problems with the following charts

Exercise 10.3 On a system consisting of 3 resources, called A, B, and C. The measured utilizations are shown in the following table. A zero in a column indicates that the resource is not utilized. Draw a Gantt chart showing utilization profiles.

Exercise 10.4 The measured values of the eight performance metrics listed in Example 10.2 for a system are: 70%, 10%, 60%, 20%, 80%, 30%, 50%, and 20%. Draw the Kiviat graph and compute its figure of merit.

Exercise 10.5 For a computer system of your choice, list a number of HB and LB metrics and draw a typical Kiviat graph using data values of your choice.

Ratio Games

Overview Ratio Game Examples Using an Appropriate Ratio Metric Using Relative Performance Enhancement Ratio Games with Percentages Ratio Games Guidelines Numerical Conditions for Ratio Games

Case Study 11.1: 6502 vs. 8080 1. Ratio of Totals Conclusion: 6502 is worse. It takes 4.7% more time than 8080.

6502 vs. 8080 (Cont) 3. 8080 as the base: 2. 6502 as the base: Ratio of Totals: 6502 is worse. It takes 4.7% more time than 8080. With 6502 as a base: 6502 is better. It takes 1% less time than 8080. With 8080 as a base: 6502 is worse. It takes 6% more time.

Case Study 11.2: RISC vs. CISC Conclusion: RISC-I has the largest code size. The second processor Z8002 requires 9% less code than RISC-I.

RISC vs. CISC (Cont) 11.00 13.00 8.50 10.50 8.00 Conclusion: Z8002 has the largest code size and that it takes 18% more code than RISC-I. [Peterson and Sequin 1982]

Using an Appropriate Ratio Metric Example: Throughput: A is better Response Time: A is worse Power: A is better

Using Relative Performance Enhancement Example: Two floating point accelerators Problem: Incomparable bases. Need to try both on the same machine

Ratio Games with Percentages Example: Tests on two systems 1. System B is better on both systems 2. System A is better overall. System A: System B:

Percentages (Cont) Other Misuses of Percentages: 1000% sounds more impressive than 11-time. Particularly if the performance before and after the improvement are both small Small sample sizes disguised in percentages Base = Initial. 400% reduction in prices  Base = Final

Ratio Games Guidelines If one system is better on all benchmarks, contradicting conclusions can not be drawn by any ratio game technique

Guidelines (cont) Even if one system is better than the other on all benchmarks, a better relative performance can be shown by selecting appropriate base. In the previous example, System A is 40% better than System B using raw data, 43% better using system A as a base, and 42% better using System B as a base. If a system is better on some benchmarks and worse on others, contracting conclusions can be drawn in some cases. Not in all cases. If the performance metric is an LB metric, it is better to use your system as the base If the performance metric is an HB metric, it is better to use your opponent as the base Those benchmarks that perform better on your system should be elongated and those that perform worse should be shortened

Numerical Conditions for Ratio Games Raw Data A is better than B iff With A as the Base A is better than B iff

Numerical Conditions (Cont) With B as the base A is better than B iff

Numerical Conditions (Cont) 2 B is better using all 3 Ratio of B/A response on benchmark j 1 A is better using all 3 Base B Raw Data Base A 1 1 1 2 3 Ratio of B/A response on benchmark i

Summary Ratio games arise from use of incomparable bases Ratios may be part of the metric Relative performance enhancements Percentages are ratios For HB metrics, it is better to use opponent as the base

Exercise 11.1 The following table shows execution times of three benchmarks I, J, and K on three systems A, B, and C. Use ratio game techniques to show the superiority of various systems.

Exercise 11.2 Derive conditions necessary for you to be able to use the technique of combined percentages to your advantage.

Homework Read chapter 10&11