INTRODUCTION TO CLINICAL RESEARCH How To Make A Bad Plot Karen Bandeen-Roche, Ph.D. July 13, 2010.

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

DESCRIBING DISTRIBUTION NUMERICALLY
Descriptive Measures MARE 250 Dr. Jason Turner.
Karl W Broman Biostatistics & Medical Informatics University of Wisconsin – Madison How to display data badly.
Karl W Broman Biostatistics & Medical Informatics University of Wisconsin – Madison How to display data badly.
Discrete Data Distributions and Summary Statistics Terms: histogram, mode, mean, range, standard deviation, outlier.
How to display data badly Karl W Broman Department of Biostatistics
Introduction to Data Analysis
Statistics and Probability: 5 sessions
Jan Shapes of distributions… “Statistics” for one quantitative variable… Mean and median Percentiles Standard deviations Transforming data… Rescale:
ISE 261 PROBABILISTIC SYSTEMS. Chapter One Descriptive Statistics.
Summarising and presenting data
Programming in R Describing Univariate and Multivariate data.
Describing distributions with numbers
Chapter 1 Exploring Data
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
Let’s Review for… AP Statistics!!! Chapter 1 Review Frank Cerros Xinlei Du Claire Dubois Ryan Hoshi.
With Statistics Workshop with Statistics Workshop FunFunFunFun.
Mark P. Baldwin Northwest Research Associates, USA Cargese UTLS Summer School, 6 Oct Data Graphics AndTypography.
Statistics 3502/6304 Prof. Eric A. Suess Chapter 3.
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Univariate Data Analysis.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
VCE Further Maths Chapter Two-Bivariate Data \\Servernas\Year 12\Staff Year 12\LI Further Maths.
Describing distributions with numbers
Lecture 3 Describing Data Using Numerical Measures.
Skewness & Kurtosis: Reference
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad.
Categorical vs. Quantitative…
To be given to you next time: Short Project, What do students drive? AP Problems.
Chapter 4: Quantitative Data Part 1: Displaying Quant Data (Week 2, Wednesday) Part 2: Summarizing Quant Data (Week 2, Friday)
Chapter SixteenChapter Sixteen. Figure 16.1 Relationship of Frequency Distribution, Hypothesis Testing and Cross-Tabulation to the Previous Chapters and.
SPSS Workshop Day 2 – Data Analysis. Outline Descriptive Statistics Types of data Graphical Summaries –For Categorical Variables –For Quantitative Variables.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Describing Data: One Quantitative Variable SECTIONS 2.2, 2.3 One quantitative.
Descriptive Statistics Tabular and Graphical Displays –Frequency Distribution - List of intervals of values for a variable, and the number of occurrences.
How to display data badly
Karl W Broman Biostatistics & Medical Informatics University of Wisconsin – Madison How to display data badly.
Why do we analyze data?  It is important to analyze data because you need to determine the extent to which the hypothesized relationship does or does.
Descriptive Statistics Unit 6. Variable Any characteristic (data) recorded for the subjects of a study ex. blood pressure, nesting orientation, phytoplankton.
How to display data badly
1 Take a challenge with time; never let time idles away aimlessly.
Class Two Before Class Two Chapter 8: 34, 36, 38, 44, 46 Chapter 9: 28, 48 Chapter 10: 32, 36 Read Chapters 1 & 2 For Class Three: Chapter 1: 24, 30, 32,
Chapter 5: Organizing and Displaying Data. Learning Objectives Demonstrate techniques for showing data in graphical presentation formats Choose the best.
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Applied Quantitative Analysis and Practices LECTURE#05 By Dr. Osman Sadiq Paracha.
Summarizing and Displaying Measurement Data
How to display data badly Karl W Broman Department of Biostatistics
Prof. Eric A. Suess Chapter 3
Statistics 200 Lecture #4 Thursday, September 1, 2016
CHAPTER 1 Exploring Data
Chapter 2: Methods for Describing Data Sets
Univariate Analysis/Descriptive Statistics
Statistical Reasoning
Descriptive Statistics
Distributions (Chapter 1) Sonja Swanson
Histograms: Earthquake Magnitudes
Displaying Distributions with Graphs
Statistics: The Interpretation of Data
Means & Medians.
Exploratory Data Analysis
Chapter 1: Exploring Data
Honors Statistics Review Chapters 4 - 5
The Five-Number Summary
Advanced Algebra Unit 1 Vocabulary
Lesson Plan Day 1 Lesson Plan Day 2 Lesson Plan Day 3
Biostatistics Lecture (2).
Presentation transcript:

INTRODUCTION TO CLINICAL RESEARCH How To Make A Bad Plot Karen Bandeen-Roche, Ph.D. July 13, 2010

How to display data badly Karl W Broman Department of Biostatistics and Medical Informatics University of Wisconsin

Using Microsoft Excel to obscure your data and annoy your readers Karl W Broman Department of Biostatistics

4 Inspiration This lecture was inspired by H Wainer (1984) How to display data badly. American Statistician 38(2): Dr. Wainer was the first to elucidate the principles of the bad display of data. The now widespread use of Microsoft Excel has resulted in remarkable advances in the field.

5 General principles The aim of good data graphics: Display data accurately and clearly. Some rules for displaying data badly: –Display as little information as possible. –Obscure what you do show (with chart junk). –Use pseudo-3d and color gratuitously. –Label badly –Use a poorly chosen scale. –Ignore sig figs.

6 Displaying data well Be accurate and clear. Let the data speak. –Show as much information as possible, taking care not to obscure the message. Science not sales. –Avoid unnecessary frills — esp. gratuitous 3d. In tables, every digit should be meaningful. Don’t drop ending 0’s.

7 Displaying data well Show “typical”, “average” values Convey extent of “spread”, “variability” in values Compare groups clearly Label explicitly

8 Einarsson K, et al (NEJM 313:277, 1985; reprinted in D-S & T, p st ed) Supersaturation of bile with cholesterol necessary for cholesterol gall stones Female gender and increasing age are risk factors for gall stones Is either gender or age associated with percentage cholesterol saturation of bile? Cross-sectional data on 60 healthy Swedish subjects (31 men, 29 women) who were not obese Supersaturation of Bile Data Set

9 Bile Data Set

10 “Average” -- typical or representative value; where the distribution is “centered” Different measures of the center -- usually, all the same for symmetric distributions (ones that look on right or left of center Median -- value such that half the observations are less than it and half are greater than it (50th percentile) MalesFemales 86% 84% Mode -- value where the distribution achieves maximum -- most likely value MalesFemales 80-90% (85%)80-90 (85%) Mean -- sum of values divided by the number of values = MalesFemales 84.5%88.5% Measures of the “Average”

11 Spread -- variability among the observations Different measures of spread, like averages, represent distinct aspects of distribution Interquartile range –75th-25th percentiles -- range of values that contains middle 50% of data MenWomen = 40.0% =40.5% Measures of Spread

12 Variance = (standard deviation) 2 = mean squared error deviation from the mean variance = standard deviation = square root of variance Men Women (24.0%) 2 =574(% 2 )(26.6%) 2 =761(% 2 ) to n from i=1 SUM Measures of Spread (cont’d)

13 Displays for continuous data –Histograms / Stem and leaf plots –Boxplots Displays for categorical data: tables Displays for relationships of two variables (on same “people”) to each other –Continuous data: scatterplots –Categorical data: cross-tabulations Some common data displays

14 4* 07 5* * 567 7* * * 0 10* 66 11* * 3 13* 7 Stem and Leaf Plot: % Saturation, Men

15 Boxplots of Bile Data

16 Scatterplot: SBP vs DBP SBP DBP

Some really bad plots

18 Example 1

19 Example 2 Distribution of genotypes AA21% AB48% BB22% missing 9%

20 Example 3

21 Example 4

22 Example 5

23 Example 6

24 Example 7

25 Example 8

26 Main points once again Be accurate and clear. Let the data speak. –Show as much data as possible, taking care not to obscure the message. Science not sales. –Avoid unnecessary frills –Go for the cleanest display that conveys the necessary info In tables, every digit should be meaningful. Don’t drop ending 0’s.

27 Displaying data well Show “typical”, “average” values Convey extent of “spread”, “variability” in values Compare groups clearly Label explicitly

28 Further reading ER Tufte (1983) The visual display of quantitative information. Graphics Press. ER Tufte (1990) Envisioning information. Graphics Press. ER Tufte (1997) Visual explanations. Graphics Press. WS Cleveland (1993) Visualizing data. Hobart Press. WS Cleveland (1994) The elements of graphing data. CRC Press.