© 1998, Geoff Kuenning Common Mistakes in Graphics Excess information Multiple scales Using symbols in place of text Poor scales Using lines incorrectly.

Slides:



Advertisements
Similar presentations
Topic 12 – Further Topics in ANOVA
Advertisements

1 CS533 Modeling and Performance Evaluation of Network and Computer Systems The Art of Data Presentation (Chapters 10 and 11)
1 CS533 Modeling and Performance Evaluation of Network and Computer Systems The Art of Data Presentation.
Making effective plots: 1.Don’t use default Excel plots! 2.Figure should highlight the key relationships in the data. 3.Should be clear - no extraneous.
® Microsoft Office 2010 Excel Tutorial 4: Enhancing a Workbook with Charts and Graphs.
Statistics 100 Lecture Set 7. Chapters 13 and 14 in this lecture set Please read these, you are responsible for all material Will be doing chapters
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
6-1 Stats Unit 6 Sampling Distributions and Statistical Inference - 1 FPP Chapters 16-18, 20-21, 23 The Law of Averages (Ch 16) Box Models (Ch 16) Sampling.
1 Normal Probability Distributions. 2 Review relative frequency histogram 1/10 2/10 4/10 2/10 1/10 Values of a variable, say test scores In.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 4- 1.
10-1 ©2006 Raj Jain The Art of Data Presentation.
1 Price elasticity of demand and revenue implications Often in economics we look at how the value of one variable changes when another variable changes.
1 i247: Information Visualization and Presentation Marti Hearst Graphing and Basic Statistics.
1 Data Analysis H There are many “tricks of the trade” used in data analysis and results presentation H A few will be mentioned here: –statistical analysis.
Let's zoom in on one corner of the coordinate plane
CS1100: Computer Science and Its Applications Creating Graphs and Charts in Excel.
OCR Functional Skills Charts Presenting data – Good data presentation skills are important. – Poor graphs and tables lead to the wrong conclusions being.
Math 116 Chapter 12.
1 Statistics This lecture covers chapter 1 and 2 sections in Howell Why study maths in psychology? “Mathematics has the advantage of teaching you.
Objective To understand measures of central tendency and use them to analyze data.
Quantitative Skills: Data Analysis and Graphing.
Let’s Review for… AP Statistics!!! Chapter 1 Review Frank Cerros Xinlei Du Claire Dubois Ryan Hoshi.
EC339: Lecture 6 Chapter 5: Interpreting OLS Regression.
CMPT 880/890 Writing labs. Outline Presenting quantitative data in visual form Tables, charts, maps, graphs, and diagrams Information visualization.
Graphical Analysis. Why Graph Data? Graphical methods Require very little training Easy to use Massive amounts of data can be presented more readily Can.
UNLOCKING THE SECRETS HIDDEN IN YOUR DATA
Ratio Games and Designing Experiments Andy Wang CIS Computer Systems Performance Analysis.
Quantitative Skills 1: Graphing
UNLOCKING THE SECRETS HIDDEN IN YOUR DATA Data Analysis.
Graphing in Science Class
Examples of different formulas and their uses....
The Scientific Method Honors Biology Laboratory Skills.
CS1100: Computer Science and Its Applications Creating Graphs and Charts in Excel Martin Schedlbauer, Ph.D.
Chapter 10 The Art of Data Presentation. Overview 2 Types of Variables Guidelines for Preparing Good Charts Common Mistakes in Preparing Charts Pictorial.
Standard Error and Confidence Intervals Martin Bland Professor of Health Statistics University of York
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Introduction to Graphical Presentation Andy Wang CIS Computer Systems Performance Analysis.
© 1998, Geoff Kuenning The Art of Graphical Presentation Types of Variables Guidelines for Good Graphics Charts Common Mistakes in Graphics Pictorial Games.
Visualizing Data in Excel Geof Hileman, FSA Kennell & Associates, Inc June 4, 2012.
The Standard Deviation as a Ruler and the Normal Model
Graphing Data: Introduction to Basic Graphs Grade 8 M.Cacciotti.
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
STA Lecture 51 STA 291 Lecture 5 Chap 4 Graphical and Tabular Techniques for categorical data Graphical Techniques for numerical data.
Visual Displays of Data Chapter 3. Uses of Graphs Positive and negative uses – Can accurately and succinctly present information – Can reveal/conceal.
The hypothesis that most people already think is true. Ex. Eating a good breakfast before a test will help you focus Notation  NULL HYPOTHESIS HoHo.
1.1 example these are prices for Internet service packages find the mean, median and mode determine what type of data this is create a suitable frequency.
Significant Figures When using calculators we must determine the correct answer. Calculators are ignorant boxes of switches and don’t know the correct.
© 1998, Geoff Kuenning Vague idea “groping around” experiences Hypothesis Model Initial observations Experiment Data, analysis, interpretation Results.
Discrete Graphs Andrew Samuels. Data Set – a collection of data values Data Points – individual values within a data set (can consist of many numbers)
Statistical Analysis Topic – Math skills requirements.
Graphing and the Coordinate Plane. This is a chameleon: His name is Sam. Sam likes to eat bugs and flies. He always has a lot to eat, because he is very.
Slide Chapter 2d Describing Quantitative Data – The Normal Distribution Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley.
ANOVA, Regression and Multiple Regression March
Data Analysis, Presentation, and Statistics
Introduction to statistics I Sophia King Rm. P24 HWB
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
1 Research Methods in Psychology AS Descriptive Statistics.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
The Normal Approximation for Data. History The normal curve was discovered by Abraham de Moivre around Around 1870, the Belgian mathematician Adolph.
Graph Sketches (pg. 34). Bar Graph Compares measurements Used to display categorical data.
Review Design of experiments, histograms, average and standard deviation, normal approximation, measurement error, and probability.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Ratio Games and Designing Experiments
Graphing skills.
More on Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 24, 2007.
Topic 7: Visualization Lesson 1 – Creating Charts in Excel
Range, Width, min-max Values and Graphs
The Art of Graphical Presentation
Descriptive Statistics
Presentation transcript:

© 1998, Geoff Kuenning Common Mistakes in Graphics Excess information Multiple scales Using symbols in place of text Poor scales Using lines incorrectly

© 1998, Geoff Kuenning Excess Information Sneaky trick to meet length limits Rules of thumb: –6 curves on line chart –10 bars on bar chart –8 slices on pie chart Extract essence, don’t cram things in

© 1998, Geoff Kuenning Way Too Much Information

© 1998, Geoff Kuenning What’s Important About That Chart? Times for cp and rcp rise with number of replicas Most other benchmarks are near constant Exactly constant for rm

© 1998, Geoff Kuenning The Right Amount of Information

True Confessions

© 1998, Geoff Kuenning Multiple Scales Another way to meet length limits Basically, two graphs overlaid on each other Confuses reader (which line goes with which scale?) Misstates relationships –Implies equality of magnitude that doesn’t exist

© 1998, Geoff Kuenning Some Especially Bad Multiple Scales

© 1998, Geoff Kuenning Using Symbols in Place of Text Graphics should be self-explanatory –Remember that the graphs often draw the reader in So use explanatory text, not symbols This means no Greek letters! –Unless your conference is in Athens...

© 1998, Geoff Kuenning It’s All Greek To Me...

© 1998, Geoff Kuenning Explanation is Easy

© 1998, Geoff Kuenning Poor Scales Plotting programs love non-zero origins –But people are used to zero Fiddle with axis ranges (and logarithms) to get your message across –But don’t lie or cheat Sometimes trimming off high ends makes things clearer –Brings out low-end detail

© 1998, Geoff Kuenning Nonzero Origins (Chosen by Microsoft)

© 1998, Geoff Kuenning Proper Origins

© 1998, Geoff Kuenning A Poor Axis Range

© 1998, Geoff Kuenning A Logarithmic Range

© 1998, Geoff Kuenning A Truncated Range

© 1998, Geoff Kuenning Using Lines Incorrectly Don’t connect points unless interpolation is meaningful Don’t smooth lines that are based on samples –Exception: fitted non-linear curves

© 1998, Geoff Kuenning Incorrect Line Usage

© 1998, Geoff Kuenning Pictorial Games Non-zero origins and broken scales Double-whammy graphs Omitting confidence intervals Scaling by height, not area Poor histogram cell size

© 1998, Geoff Kuenning Non-Zero Origins and Broken Scales People expect (0,0) origins –Subconsciously So non-zero origins are a great way to lie More common than not in popular press Also very common to cheat by omitting part of scale –“Really, Your Honor, I included (0,0)”

© 1998, Geoff Kuenning Non-Zero Origins

© 1998, Geoff Kuenning The Three-Quarters Rule Highest point should be 3/4 of scale or more

© 1998, Geoff Kuenning Double-Whammy Graphs Put two related measures on same graph –One is (almost) function of other Hits reader twice with same information –And thus overstates impact

© 1998, Geoff Kuenning Omitting Confidence Intervals Statistical data is inherently fuzzy But means appear precise Giving confidence intervals can make it clear there’s no real difference –So liars and fools leave them out

© 1998, Geoff Kuenning Graph Without Confidence Intervals

© 1998, Geoff Kuenning Graph With Confidence Intervals

Confidence Intervals Sample mean value is only an estimate of the true population mean Bounds c 1 and c 2 such that there is a high probability, 1- , that the population mean is in the interval (c 1,c 2 ): Prob{ c 1 <  < c 2 } =1-  where  is the significance level and 100(1-  ) is the confidence level Overlapping confidence intervals is interpreted as “not statistically different”

© 1998, Geoff Kuenning Graph With Confidence Intervals

Reporting Only One Run (tell-tale sign) Probably a fluke (It’s likely that with multiple trials this would go away)

© 1998, Geoff Kuenning Scaling by Height Instead of Area Clip art is popular with illustrators: Women in the Workforce

© 1998, Geoff Kuenning The Trouble with Height Scaling Previous graph had heights of 2:1 But people perceive areas, not heights –So areas should be what’s proportional to data Tufte defines a lie factor: size of effect in graphic divided by size of effect in data –Not limited to area scaling –But especially insidious there (quadratic effect)

© 1998, Geoff Kuenning Scaling by Area Here’s the same graph with 2:1 area: Women in the Workforce

© 1998, Geoff Kuenning Histogram Cell Size Picking bucket size is always a problem Prefer 5 or more observations per bucket Choice of bucket size can affect results:

Histogram Cell Size Picking bucket size is always a problem Prefer 5 or more observations per bucket Choice of bucket size can affect results:

Histogram Cell Size Picking bucket size is always a problem Prefer 5 or more observations per bucket Choice of bucket size can affect results:

© 1998, Geoff Kuenning Don’t Quote Data Out of Context

© 1998, Geoff Kuenning The Same Data in Context

Tell the Whole Truth

© 1998, Geoff Kuenning Special-Purpose Charts Histograms Scatter plots Gantt charts Kiviat graphs

© 1998, Geoff Kuenning Tukey’s Box Plot Shows range, median, quartiles all in one: Variations: minimummaximumquartile median

© 1998, Geoff Kuenning Histograms

© 1998, Geoff Kuenning Scatter Plots Useful in statistical analysis Also excellent for huge quantities of data –Can show patterns otherwise invisible

© 1998, Geoff Kuenning Better Scatter Plots Again, Tufte improves the standard –But it can be a pain with automated tools Can use modified Tukey box plot for axes

© 1998, Geoff Kuenning Gantt Charts Shows relative duration of Boolean conditions Arranged to make lines continuous –Each level after first follows FTTF pattern

© 1998, Geoff Kuenning Gantt Charts Shows relative duration of Boolean conditions Arranged to make lines continuous –Each level after first follows FTTF pattern T TT TTTT F FF FFFF

© 1998, Geoff Kuenning Kiviat Graphs Also called “star charts” or “radar plots” Useful for looking at balance between HB and LB metrics HB LB

© 1998, Geoff Kuenning Useful Reference Works Edward R. Tufte, The Visual Display of Quantitative Information, Graphics Press, Cheshire, Connecticut, Edward R. Tufte, Envisioning Information, Graphics Press, Cheshire, Connecticut, Edward R. Tufte, Visual Explanations, Graphics Press, Cheshire, Connecticut, Darrell Huff, How to Lie With Statistics, W.W. Norton & Co., New York, 1954

© 1998, Geoff Kuenning Ratio Games Choosing a Base System Using Ratio Metrics Relative Performance Enhancement Ratio Games with Percentages Strategies for Winning a Ratio Game Correct Analysis of Ratios

© 1998, Geoff Kuenning Choosing a Base System Run workloads on two systems Normalize performance to chosen system Take average of ratios Presto: you control what’s best

Code Size Example ProgramRISC-1Z8002R/RZ/R F-bit Acker Towers Puzzle Sum Average or.67?

Simple Example Program121/22/1 A B Sum

Simple Example Program121/2 A B Sum Ave

© 1998, Geoff Kuenning Using Ratio Metrics Pick a metric that is itself a ratio –power = throughput  response time –cost / performance –improvement ratio Handy because division is “hidden”

© 1998, Geoff Kuenning Relative Performance Enhancement Compare systems with incomparable bases Turn into ratios Example: compare Ficus 1 vs. 2 replicas with UFS vs. NFS (1 run on chosen day): “Proves” adding Ficus replica costs less than going from UFS to NFS

© 1998, Geoff Kuenning Ratio Games with Percentages Percentages are inherently ratios –But disguised –So great for ratio games Example: Passing tests A is worse, but looks better in total line!

© 1998, Geoff Kuenning More on Percentages Psychological impact –1000% sounds bigger than 10-fold (or 11-fold) –Great when both original and final performance are lousy E.g., salary went from $40 to $80 per week Small sample sizes generate big lies Base should be initial, not final value –E.g., price can’t drop 400%

Sequential page placement normalized to random placement for static policies -- SPEC True Confessions

Power state policies with random placement normalized to all active memory -- SPEC True Confessions

© 1998, Geoff Kuenning Strategies for Winning a Ratio Game Can you win? How to win

© 1998, Geoff Kuenning Can You Win the Ratio Game? If one system is better by all measures, a ratio game won’t work –But recall percent-passes example –And selecting the base lets you change the magnitude of the difference If each system wins on some measures, ratio games might be possible (but no promises) –May have to try all bases

© 1998, Geoff Kuenning How to Win Your Ratio Game For LB metrics, use your system as the base For HB metrics, use the other as a base If possible, adjust lengths of benchmarks –Elongate when your system performs best –Short when your system is worst –This gives greater weight to your strengths

For Discussion Next Tuesday Bring in one either notoriously bad or exceptionally good example of data presentation from your proceedings. The bad ones are more fun. Or if you find something just really different, please show it.