Plot type specific considerations

Slides:



Advertisements
Similar presentations
Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation.
Advertisements

Chapter 5: Understanding and Comparing Distributions
Jan Shapes of distributions… “Statistics” for one quantitative variable… Mean and median Percentiles Standard deviations Transforming data… Rescale:
Understanding and Comparing Distributions
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Programming in R Describing Univariate and Multivariate data.
Univariate EDA (Exploratory Data Analysis). EDA John Tukey (1970s) data –two components: smooth + rough patterned behaviour + random variation resistant.
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
Quantitative Skills 1: Graphing
The introduction to SPSS Ⅱ.Tables and Graphs for one variable ---Descriptive Statistics & Graphs.
What is Statistics? Statistics is the science of collecting, analyzing, and drawing conclusions from data –Descriptive Statistics Organizing and summarizing.
StatisticsStatistics Graphic distributions. What is Statistics? Statistics is a collection of methods for planning experiments, obtaining data, and then.
Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 4 Describing Numerical Data.
Categorical vs. Quantitative…
Displaying Quantitative Data Graphically and Describing It Numerically AP Statistics Chapters 4 & 5.
Bellwork 1. If a distribution is skewed to the right, which of the following is true? a) the mean must be less than the.
Unit 4 Statistical Analysis Data Representations.
Statistics Chapter 1: Exploring Data. 1.1 Displaying Distributions with Graphs Individuals Objects that are described by a set of data Variables Any characteristic.
Descriptive statistics Petter Mostad Goal: Reduce data amount, keep ”information” Two uses: Data exploration: What you do for yourself when.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
MMSI – SATURDAY SESSION with Mr. Flynn. Describing patterns and departures from patterns (20%–30% of exam) Exploratory analysis of data makes use of graphical.
Understanding and Comparing Distributions
Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.
Plan for Today: Chapter 11: Displaying Distributions with Graphs Chapter 12: Describing Distributions with Numbers.
Chapter 5: Organizing and Displaying Data. Learning Objectives Demonstrate techniques for showing data in graphical presentation formats Choose the best.
1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Stem-and-Leaf Plots …are a quick way to arrange a set of data and view its shape or distribution A key in the top corner shows how the numbers are split.
AP Statistics. Chapter 1 Think – Where are you going, and why? Show – Calculate and display. Tell – What have you learned? Without this step, you’re never.
Statistics Unit 6.
UNIT ONE REVIEW Exploring Data.
Prof. Eric A. Suess Chapter 3
Thursday, May 12, 2016 Report at 11:30 to Prairieview
Exploratory Data Analysis
Quantitative Techniques – Class I
EHS 655 Lecture 4: Descriptive statistics, censored data
MATH-138 Elementary Statistics
EXPLORATORY DATA ANALYSIS and DESCRIPTIVE STATISTICS
BAE 5333 Applied Water Resources Statistics
Review 1. Describing variables.
1.3 Measuring Center & Spread, The Five Number Summary & Boxplots
Unit 4 Statistical Analysis Data Representations
Understanding and Comparing Distributions
Understanding and Comparing Distributions
Objective: Given a data set, compute measures of center and spread.
Describing Distributions Numerically
Quantitative Skills : Graphing
1st Semester Final Review Day 1: Exploratory Data Analysis
How could data be used in an EPQ?
Description of Data (Summary and Variability measures)
Laugh, and the world laughs with you. Weep and you weep alone
Understanding and Comparing Distributions
Displaying Distributions with Graphs
CHAPTER 1: Picturing Distributions with Graphs
Unit 7: Statistics Key Terms
Statistics Unit 6.
Topic 5: Exploring Quantitative data
Scientific Figure Design
Describing Distributions of Data
Displaying Distributions with Graphs
Organizing Data AP Stats Chapter 1.
Exploratory Data Analysis
Understanding and Comparing Distributions
Honors Statistics Review Chapters 4 - 5
Ten things about Descriptive Statistics
Probability and Statistics
Displaying data Seminar 2.
Advanced Algebra Unit 1 Vocabulary
Lesson Plan Day 1 Lesson Plan Day 2 Lesson Plan Day 3
Presentation transcript:

Plot type specific considerations V2017-06 Anne Segonds-Pichon Simon Andrews Phil Ewels anne.segonds-pichon@babraham.ac.uk simon.andrews@babraham.ac.uk phil.ewels@scilifelab.se

Types of plot Things you can illustrate

Plot types – Distribution/Exploration Histograms Very good for exploring data. Better on big dataset. Rules: Number of intervals ≈√N and Interval width ≈ Range ÷√N Histograms are great but careful with the resolution (= number of bins) as it affects the shape of the distribution.

Plot types – Distribution/Exploration Histograms Be careful with the resolution … Too few categories Too many categories … and the type of data you are dealing with. Histograms are great but careful with discrete data.

Boxplots and Bean plots Plot types – Distribution/Exploration Boxplots and Bean plots Cutoff = Q1 – 1.5*IQR Median Maximum Interquartile Range (IQR): 50% of the data Lower Quartile (Q1) 25th percentile (1st quartile) Outlier Upper Quartile (Q3) 75th percentile (3rd quartile) Minimum

Boxplots and Bean plots Plot types – Distribution/Exploration Boxplots and Bean plots Scatterplot shows individual data A bean= a ‘batch’ of data Bimodal Uniform Normal Distributions Data density mirrored by the shape of the polygon Very good for exploring data. Better on medium size dataset. Boxplots are great but be careful with underlying distribution.

Plot types – Exploration/Comparison Stripcharts/Scatterplots Very good for exploring data. Better on small/medium dataset (up to 100-ish). Very informative: exploration AND comparison. Very hard to cheat with these. Stripcharts are great but they don’t work so well with big samples.

Plot types – Comparisons Barcharts (not for exploration!) Standard error Standard deviation Very good for presenting results and emphasizing differences. Effectiveness: most important info with the most effective channel. Barcharts are great but higher bars get more attention.

Plot types – Comparisons Alternative to Barcharts Confidence interval Star wars

Plot types – Comparisons Barcharts Be careful with the scale when plotting ratio Very good for presenting results and emphasizing differences. Barcharts are great but after data exploration and the y-axis needs to be chosen wisely.

Plot types – Relationship/Comparison Line graphs 5 experiments Except for exploration … Very good for presenting results of matched/paired/repeated data. Linecharts are great but careful with the axes.

Plot types – Relationships Two conditions - Scatterplot Very good for understanding the relationship between quantitative variables.

Plot types – Relationships Scatterplots Problem: very big dataset Solution: smoothed densities colour representation Scatterplots are great but big data can be tricky.

Plot types – Composition Stack charts/Pie charts Very good for presenting categorical data. Stack /pie charts are great but keep an eye on the sample size.

Plot types – Relationships Many conditions - Heatmaps Normalisation Colouring Filtering Clustering

Plot types – Relationships Heatmaps A heatmap is basically a table that has colors in place of numbers.

Plot types – Relationships Heatmaps Colour scheme for grouping. Clustering

Clustering Heatmaps

Plot types – Relationships Heatmaps Euclidian distance: Correlation: differences between values Pattern 0 0 - Difference =1.1 =0.5 =19.6 =1.2 consistency of pattern Heatmaps are great but plot data that are changing.

Plot types – Relationships Heatmaps Don’t forget to remove unchanging points to focus on differences. Heatmaps are great but careful with clustering plot data that are changing.

Simon Andrews, Anne Segonds-Pichon Ethics of data representation v2017-06 Simon Andrews, Anne Segonds-Pichon simon.andrews@babraham.ac.uk anne.segonds-pichon@babraham.ac.uk

Data Visualisation Process Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Two parts of the process where visualisation is important. They have different requirements and will need different visualisations. Generate Visualisation Generate Conclusion

when it comes to data visualisation? What is Ethics when it comes to data visualisation? The figure/graph/image should show what is actually happening and not what you want to happen. Different ways of being unethical: not exploring/getting to know the data well enough, misusing your chosen graphical representation. deliberately showing the data in a misleading manner, choosing the ‘most representative’ image/experiment.

Not exploring/getting to know the data well enough Example 1 One experiment: change in the variable of interest between CondA to CondB. Data plotted as a bar chart.

Not exploring/getting to know the data well enough Example 2 Five experiments: change in the variable of interest between 3 treatments and a control. Data plotted as a bar chart. Comparisons: Treatments vs. Control p=0.001 Exp3 Exp4 Exp1 Exp5 Exp2 p=0.04 p=0.32

Choosing the wrong graph to present the data Four experiments: Before-After treatment effect on a variable of interest. Hypothesis: Applying a treatment will decrease the levels of the variable of interest. Data plotted as a bar chart. Exp2 Exp1 Exp3 Exp4 The fiction The reality

Choosing the wrong axis/scale Example: increase in salary in the last term.

Choosing the y-axis/scale Be careful with Linear vs. logarithmic scale.

Choosing the y-axis/scale For cheating, a bar graph using a log axis is a great tool, as it lets you either exaggerate differences between groups or minimize them. Linear scale Logarithmic scale

Choosing the y-axis/scale Logarithmic axis should be used for: Logarithmically spaced values Lognormal data

Manipulating images: Western blot Simply Cheating: Manipulating images: Western blot ‘Playing’ too much with contrast Presenting bands out of context Original Brightness and Contrast Adjusted Brightness and Contrast Adjusted Too Much: Oversaturation ‘Rebuilding’ a Western blot from several cuts

Is my plot ethical? Would a reader come to a different conclusion if they could see the details of the data which were omitted from the plot?