Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007.

Similar presentations


Presentation on theme: "Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007."— Presentation transcript:

1 Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007

2 Lecture 13 Page 2 CS 239, Spring 2007 Outline General issues of data presentation Common methods of deceiving with data Graphical data presentation

3 Lecture 13 Page 3 CS 239, Spring 2007 Data Presentation Experiments are usually not done just for your own edification Generally, you need to describe the results to others How do you effectively present your data?

4 Lecture 13 Page 4 CS 239, Spring 2007 The Key Point You aren’t presenting numbers or graphs You are presenting insight –Backed up by solid evidence Think about what you want your audience to learn first Worry about your numbers and graphs later –And choose numbers and graphs that explain and defend your insights

5 Lecture 13 Page 5 CS 239, Spring 2007 A Corollary Present as little data as you need to present –But no less Don’t print every possible graph Don’t show redundant graphs that simply repeat the same point Make sure each piece of data you present: –Says something important –Says something the others don’t

6 Lecture 13 Page 6 CS 239, Spring 2007 Another Consideration Know your audience Is it likely to believe your results? Will it be skeptical? Does it have the expertise to understand complex issues? Tailor your presentation to the audience

7 Lecture 13 Page 7 CS 239, Spring 2007 Methods of Presenting Data In written form In live presentations

8 Lecture 13 Page 8 CS 239, Spring 2007 Presenting Data in Written Form Simple in-line description –“System x was 12% faster” –Only good for short, simple statements –Especially in summaries/conclusions Tables –Useful if seeing individual numbers is helpful –Also good if reader will make many comparisons across table entries –But big tables are hard to read –So minimize number of entries –Avoid big tables with lots of rows and columns Graphics –Often best way to explain large, complex data sets

9 Lecture 13 Page 9 CS 239, Spring 2007 Presenting Data in Talks If it’s important, make sure it’s written down –On your slides, handouts, or somewhere else they can go to later Generally want it short and snappy –Tables usually not appropriate –Brief mentions and graphics better

10 Lecture 13 Page 10 CS 239, Spring 2007 What About the Full Data? You should always save all the data When possible, make it available to everyone –Not possible if sensitive –Or legally prohibited –Or potential liability –Or a company asset Otherwise, put it in a suitable repository Or put it on the web yourself Make sure you include information on data’s format and basic meaning

11 Lecture 13 Page 11 CS 239, Spring 2007 Using Data Deceptively Well-known techniques for using real data in deceptive ways Often called “games” –Benchmarking games –Ratio games

12 Lecture 13 Page 12 CS 239, Spring 2007 Benchmarking Games Techniques for producing data likely to be deceptive Methods of designing experiments that lead to results that are “advantageous” –Assuming experimenter cares how experiment turns out –As he often does Not always intentional

13 Lecture 13 Page 13 CS 239, Spring 2007 Differing Configurations Use different configurations for the same workload on two systems E.g., a Linux vs. Windows comparison where one system has less memory Be suspicious when all conditions aren’t fully mentioned –Particularly if data comes from different sources

14 Lecture 13 Page 14 CS 239, Spring 2007 Differing Compilation Options One version of the code is compiled with optimization The other isn’t True performance difference, but... Again, if data gathered from different sources, might be hard to tell

15 Lecture 13 Page 15 CS 239, Spring 2007 Biased Testing Specifications Experiment designed to favor situations good for one alternative E.g., a system that makes heavy use of disk is tested on a machine with a fast, expensive drive Or specifying a 10 Mbps incoming link for testing a firewall that runs poorly above 11 Mbps

16 Lecture 13 Page 16 CS 239, Spring 2007 Playing With the Workload Design workload so it fits known cycles of one system E.g., design workload to hide page faults Or choose arbitrary workload that isn’t representative –But does well on preferred system

17 Lecture 13 Page 17 CS 239, Spring 2007 Very Small Benchmarks Set benchmark sizes small to hide scaling effects E.g., hide poor disk performance by working out of cache Or conceal poor scaling of key data structure by limiting number of entries in workload

18 Lecture 13 Page 18 CS 239, Spring 2007 Tweaking Benchmarks Can alter aspects of benchmarks specifically to play to system strengths E.g., removing all images from web benchmark if system handles them poorly Increasing proportions of graphics operations in a benchmark to favor machine with special graphics hardware

19 Lecture 13 Page 19 CS 239, Spring 2007 Ratios offer special possibility to conceal information Sometimes used specifically to hide what you don’t want to show How do you do that? How do you tell when others do that? Ratio Games

20 Lecture 13 Page 20 CS 239, Spring 2007 Choosing a Base System A simple game method using ratios Run workloads on two systems Normalize performance to system you want to look good Take average of ratios Presto: you control what’s best

21 Lecture 13 Page 21 CS 239, Spring 2007 Example of Choosing a Base System A processor comparison (lower is better) Raw Data: System BenchmarkAB Block41.1651.5 Sieve63.1748.08 Sum104.3399.58 Average52.1749.79

22 Lecture 13 Page 22 CS 239, Spring 2007 Making System A Look Better Calculate ratio with A as base System BenchmarkAB Block1.001.25 Sieve1.000.76 Sum2.002.01 Average1.001.01

23 Lecture 13 Page 23 CS 239, Spring 2007 Making System B Look Better Use B as a base System BenchmarkAB Block.801.00 Sieve1.311.00 Sum2.112.00 Average1.061.00

24 Lecture 13 Page 24 CS 239, Spring 2007 Lying With Ratio Metrics Pick a metric that is itself a ratio –E.g., power = throughput  response time –Or cost/performance Handy because division is “hidden”

25 Lecture 13 Page 25 CS 239, Spring 2007 Relative Performance Enhancement Compare systems with incomparable bases Turn into ratios Example: compare Ficus 1 vs. 2 replicas with UFS vs. NFS (1 run on chosen day): “Proves” adding Ficus replica costs less than going from UFS to NFS

26 Lecture 13 Page 26 CS 239, Spring 2007 Ratio Games with Percentages Percentages are inherently ratios –But disguised –So they’re great for ratio games Example: Passing tests A is worse, but looks better in total line!

27 Lecture 13 Page 27 CS 239, Spring 2007 More on Percentages Psychological impact –1000% sounds bigger than 10-fold (or 11-fold) –Great when both original and final performance are lousy E.g., salary went from $40 to $80 per week Base should be initial, not final value –E.g., price can’t drop 400%

28 Lecture 13 Page 28 CS 239, Spring 2007 Strategies for Winning a Ratio Game Can you win? How to win

29 Lecture 13 Page 29 CS 239, Spring 2007 Can You Win the Ratio Game? If one system is better by all measures, a ratio game won’t work –But recall percent-passes example –And selecting the base lets you change the magnitude of the difference If each system wins on some measures, ratio games might be possible (but no promises) –May have to try all bases

30 Lecture 13 Page 30 CS 239, Spring 2007 How to Win Your Ratio Game For LB metrics, use your system as the base For HB metrics, use the other as a base If possible, adjust lengths of benchmarks –Elongate when your system performs best –Short when your system is worst –This gives greater weight to your strengths

31 Lecture 13 Page 31 CS 239, Spring 2007 Correct Analysis of Ratios Discussed in detail in previous lectures Generally, harmonic or geometric means are appropriate –Or use only the raw data If someone’s doing something else with ratios, be suspicious

32 Lecture 13 Page 32 CS 239, Spring 2007 Reference works Types of variables Guidelines for good graphics charts Common mistakes in graphics Pictorial games Special-purpose charts Graphical Data Presentation

33 Lecture 13 Page 33 CS 239, Spring 2007 Edward R. Tufte, The Visual Display of Quantitative Information, Graphics Press, Cheshire, Connecticut, 1983. Edward R. Tufte, Envisioning Information, Graphics Press, Cheshire, Connecticut, 1990. Darrell Huff, How to Lie With Statistics, W.W. Norton & Co., New York, 1954 Useful References Works

34 Lecture 13 Page 34 CS 239, Spring 2007 Qualitative –Ordered (e.g., modem, Ethernet, satellite) –Unordered (e.g., CS, math, literature) Quantitative –Discrete (e.g., number of terminals) –Continuous (e.g., time) Types of Variables

35 Lecture 13 Page 35 CS 239, Spring 2007 Charting Based on Variable Types Qualitative variables usually work best with bar charts or Kiviat graphs –If ordered, use bar charts to show order Quantitative variables work well in X-Y graphs –Use points if discrete, lines if continuous –Bar charts sometimes work well for discrete

36 Lecture 13 Page 36 CS 239, Spring 2007 Principles of graphical excellence Principles of good graphics Specific hints for specific situations Aesthetics Friendliness Much of this based on Tufte’s principles of graphics design Guidelines for Good Graphics Presentation

37 Lecture 13 Page 37 CS 239, Spring 2007 Principles of Graphical Excellence Graphical excellence is the well- designed presentation of interesting data: –Substance –Statistics –Design

38 Lecture 13 Page 38 CS 239, Spring 2007 Graphical Excellence (2) Complex ideas get communicated with: –Clarity –Precision –Efficiency

39 Lecture 13 Page 39 CS 239, Spring 2007 Graphical Excellence (3) Viewer gets: –Greatest number of ideas –In the shortest time –With the least ink –In the smallest space Requires telling truth about data

40 Lecture 13 Page 40 CS 239, Spring 2007 Principles of Good Graphics Above all else show the data Maximize the data-ink ratio Erase non-data ink Erase redundant data ink Revise and edit

41 Lecture 13 Page 41 CS 239, Spring 2007 An Important Caveat “The principles should not be applied rigidly or in a peevish spirit; they are not logical or mathematically certain; and it is better to violate any principle than to place graceless or inelegant marks on paper.” Edward Tufte, “The Visual Display of Quantitative Information”

42 Lecture 13 Page 42 CS 239, Spring 2007 Above All Else Show the Data Where’s the data?

43 Lecture 13 Page 43 CS 239, Spring 2007 Above All Else Show the Data Limitations of linear model much clearer now

44 Lecture 13 Page 44 CS 239, Spring 2007 Maximize the Data-Ink Ratio

45 Lecture 13 Page 45 CS 239, Spring 2007 Maximize the Data-Ink Ratio Important features become much clearer

46 Lecture 13 Page 46 CS 239, Spring 2007 Erase Non-Data Ink Note disconcerting vibration effects

47 Lecture 13 Page 47 CS 239, Spring 2007 Erase Non-Data Ink EastWestNorth

48 Lecture 13 Page 48 CS 239, Spring 2007 Erase Redundant Data Ink EastWestNorth

49 Lecture 13 Page 49 CS 239, Spring 2007 Erase Redundant Data Ink EastWestNorth

50 Lecture 13 Page 50 CS 239, Spring 2007 Revise and Edit

51 Lecture 13 Page 51 CS 239, Spring 2007 Revise and Edit

52 Lecture 13 Page 52 CS 239, Spring 2007 Revise and Edit

53 Lecture 13 Page 53 CS 239, Spring 2007 Revise and Edit

54 Lecture 13 Page 54 CS 239, Spring 2007 Revise and Edit

55 Lecture 13 Page 55 CS 239, Spring 2007 Revise and Edit

56 Lecture 13 Page 56 CS 239, Spring 2007 Revise and Edit

57 Lecture 13 Page 57 CS 239, Spring 2007 Specific Things to Do Give information the reader needs Limit complexity and confusion Have a point Show statistics graphically Don’t always use graphics Discuss it in the text

58 Lecture 13 Page 58 CS 239, Spring 2007 Give Information the Reader Needs Show informative axes –Use axes to indicate range Label things fully and intelligently Highlight important points on the graph

59 Lecture 13 Page 59 CS 239, Spring 2007 Giving Information the Reader Needs What does any of this mean?

60 Lecture 13 Page 60 CS 239, Spring 2007 Giving Information the Reader Needs

61 Lecture 13 Page 61 CS 239, Spring 2007 Limit Complexity and Confusion Not too many curves Single scale for all curves No “extra” curves No pointless decoration (“ducks”)

62 Lecture 13 Page 62 CS 239, Spring 2007 Limiting Complexity and Confusion

63 Lecture 13 Page 63 CS 239, Spring 2007 Limiting Complexity and Confusion

64 Lecture 13 Page 64 CS 239, Spring 2007 Have a Point Graphs should add information not otherwise available to reader Don’t plot data just because you collected it Know what you’re trying to show, and make sure the graph shows it

65 Lecture 13 Page 65 CS 239, Spring 2007 Having a Point Sales were up 15% this quarter:

66 Lecture 13 Page 66 CS 239, Spring 2007 Having a Point

67 Lecture 13 Page 67 CS 239, Spring 2007 Having a Point

68 Lecture 13 Page 68 CS 239, Spring 2007 Having a Point

69 Lecture 13 Page 69 CS 239, Spring 2007 Show Statistics Graphically Put bars in a reasonable order –Geographical –Best to worst –Even alphabetic Make bar widths reflect interval widths –Hard to do with most graphing software Show confidence intervals on the graph –Examples will be shown later

70 Lecture 13 Page 70 CS 239, Spring 2007 Don’t Always Use Graphics Tables are best for small sets of numbers –Tufte says 20 or fewer Also best for certain arrangements of data –E.g., 10 graphs of 3 points each Sometimes a simple sentence will do Always ask whether the chart is the best way to present the information –And whether it brings out your message

71 Lecture 13 Page 71 CS 239, Spring 2007 Discuss It in the Text Figures should be self-explanatory –Many people scan papers, just look at graphs –Good graphs build interest, “hook” readers But text should highlight and aid figures –Tell readers when to look at figures –Point out what figure is telling them –Expand on what figure has to say

72 Lecture 13 Page 72 CS 239, Spring 2007 Aesthetics Not everyone is an artist –But figures should be visually pleasing Elegance is found in –Simplicity of design –Complexity of data

73 Lecture 13 Page 73 CS 239, Spring 2007 Principles of Aesthetics Use appropriate format and design Use words, numbers, drawings together Reflect balance, proportion, relevant scale Keep detail and complexity accessible Have a story about the data (narrative quality) Do a professional job of drawing Avoid decoration and chartjunk

74 Lecture 13 Page 74 CS 239, Spring 2007 Use Appropriate Format and Design Don’t automatically draw a graph –We’ve covered this before Choose graphical format carefully Sometimes a “text graphic” works best –Use text placement to communicate numbers –Very close to being a table

75 Lecture 13 Page 75 CS 239, Spring 2007 GNP: +3.8IPG: +5.8CPI: +7.7Profits: +13.3 CEA: +4.7 DR: +4.5 NABE: +4.5 WEF: +4.5 CBO: +4.4 CB: +4.2 IBM: +4.1 CE: +2.9 NABE: +6.2 IBM: +5.9 CB: +5.5 DR: +5.2 WEF: +4.8 IBM: +6.6 NABE: +6.5 CB: +6.2 WEF: +21 DR: +10.5 IBM: +10.4 CE: +6.5 WEF: 6.8 CB: 6.7 NABE: 6.7 IBM: 6.6 DR: 6.5 CBO: 6.3 CEA: 6.3 Unempl: 6.0 About a year ago, eight forecasters were asked for their predictions on some key economic indicators. Here’s how the forecasts stack up against the probable 1978 results (shown in the black panel). (New York Times, Jan. 2, 1979) Using Text as a Graphic

76 Lecture 13 Page 76 CS 239, Spring 2007 Choosing a Graphical Format Many options, more being invented all the time –Examples will be given later –See Jain for some commonly useful ones –Tufte shows ways to get creative Choose a format that reflects your data –Or that helps you analyze it yourself

77 Lecture 13 Page 77 CS 239, Spring 2007 Use Words, Numbers, Drawings Together Put graphics near or in text that discusses them –Even if you have to murder your word processor Integrate text into graphics Tufte: “Data graphics are paragraphs about data and should be treated as such”

78 Lecture 13 Page 78 CS 239, Spring 2007 Reflect Balance, Proportion, Relevant Scale Much of this boils down to “artistic sense” Make sure things are big enough to read –Tiny type is OK only for young people! Keep lines thin –But use heavier lines to indicate important information Keep horizontal larger than vertical –About 50% larger works well

79 Lecture 13 Page 79 CS 239, Spring 2007 Poor Balance and Proportion Sales in the North and West districts were steady through all quarters North sales varied widely, significantly outperforming the other districts in the third quarter

80 Lecture 13 Page 80 CS 239, Spring 2007 Better Proportion Sales in the North and West districts were steady through all quarters North sales varied widely, significantly outperforming the other districts in the third quarter

81 Lecture 13 Page 81 CS 239, Spring 2007 Keep Detail and Complexity Accessible Make your graphics friendly: –Avoid abbreviations and encodings –Run words left-to-right –Explain data with little messages –Label graphic, don’t use elaborate shadings and a complex legend –Avoid red/green distinctions –Use clean, serif fonts in mixed case

82 Lecture 13 Page 82 CS 239, Spring 2007 An Unfriendly Graph

83 Lecture 13 Page 83 CS 239, Spring 2007 A Friendly Version

84 Lecture 13 Page 84 CS 239, Spring 2007 Even Friendlier

85 Lecture 13 Page 85 CS 239, Spring 2007 Have a Story About the Data (Narrative Quality) May be difficult in technical papers But think about why you are drawing Example: –Performance is controlled by network speed –But it tops out at the high end –And that’s because we hit a CPU bottleneck

86 Lecture 13 Page 86 CS 239, Spring 2007 Showing a Story About the Data

87 Lecture 13 Page 87 CS 239, Spring 2007 Do a Professional Job of Drawing This is easy with modern tools –But take the time to do it right Align things carefully Check the final version in the format you will use –I.e., print the PDF one last time before submission –Or look at your slides on the projection screen From the machine you’ll use Preferably with the projector you’ll use

88 Lecture 13 Page 88 CS 239, Spring 2007 Avoid Decoration and Chartjunk Powerpoint, etc. make chartjunk easy Avoid clip art, automatic backgrounds, etc. Remember: the data is the story –Statistics aren’t boring –Uninterested readers aren’t drawn by cartoons –Interested readers are distracted Does removing it change the message? –If not, leave it out

89 Lecture 13 Page 89 CS 239, Spring 2007 Examples of Chartjunk Gridlines! Vibration Pointless Fake 3-D Effects Filled “Floor”Clip Art In or out? Filled “Walls” Borders and Fills Galore Unintentional Heavy Lines Filled Labels


Download ppt "Lecture 13 Page 1 CS 239, Spring 2007 Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 22, 2007."

Similar presentations


Ads by Google