Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation.

Slides:



Advertisements
Similar presentations
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Advertisements

1 Chapter 1: Sampling and Descriptive Statistics.
The controlled assessment is worth 25% of the GCSE The project has three stages; 1. Planning 2. Collecting, processing and representing data 3. Interpreting.
Jan Shapes of distributions… “Statistics” for one quantitative variable… Mean and median Percentiles Standard deviations Transforming data… Rescale:
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Programming in R Describing Univariate and Multivariate data.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Univariate EDA (Exploratory Data Analysis). EDA John Tukey (1970s) data –two components: smooth + rough patterned behaviour + random variation resistant.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
1.1 Displaying Distributions with Graphs
Welcome to Math 6 Statistics: Use Graphs to Show Data Histograms.
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
Quantitative Skills 1: Graphing
The introduction to SPSS Ⅱ.Tables and Graphs for one variable ---Descriptive Statistics & Graphs.
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
StatisticsStatistics Graphic distributions. What is Statistics? Statistics is a collection of methods for planning experiments, obtaining data, and then.
Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 4 Describing Numerical Data.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Categorical vs. Quantitative…
Displaying Quantitative Data Graphically and Describing It Numerically AP Statistics Chapters 4 & 5.
Bellwork 1. If a distribution is skewed to the right, which of the following is true? a) the mean must be less than the.
Unit 4 Statistical Analysis Data Representations.
The hypothesis that most people already think is true. Ex. Eating a good breakfast before a test will help you focus Notation  NULL HYPOTHESIS HoHo.
Chapter 8 Making Sense of Data in Six Sigma and Lean
Descriptive statistics Petter Mostad Goal: Reduce data amount, keep ”information” Two uses: Data exploration: What you do for yourself when.
MMSI – SATURDAY SESSION with Mr. Flynn. Describing patterns and departures from patterns (20%–30% of exam) Exploratory analysis of data makes use of graphical.
CHAPTER 1 Basic Statistics Statistics in Engineering
Organizing Data AP Stats Chapter 1. Organizing Data Categorical Categorical Dotplot (also used for quantitative) Dotplot (also used for quantitative)
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Describing Data: One Quantitative Variable SECTIONS 2.2, 2.3 One quantitative.
Types of Graphs.
Using Measures of Position (rather than value) to Describe Spread? 1.
Histograms, Frequency Polygons, and Ogives
Statistics with TI-Nspire™ Technology Module E Lesson 1: Elementary concepts.
Plan for Today: Chapter 11: Displaying Distributions with Graphs Chapter 12: Describing Distributions with Numbers.
More Univariate Data Quantitative Graphs & Describing Distributions with Numbers.
II. Descriptive Statistics (Zar, Chapters 1 - 4).
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Displaying the Observed Distribution of Quantitative Variables Histogram –Divide the range of the variable into equally spaced intervals - called bins.
Chapter 5 Describing Distributions Numerically Describing a Quantitative Variable using Percentiles Percentile –A given percent of the observations are.
Chapter 5: Organizing and Displaying Data. Learning Objectives Demonstrate techniques for showing data in graphical presentation formats Choose the best.
1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious.
Stem-and-Leaf Plots …are a quick way to arrange a set of data and view its shape or distribution A key in the top corner shows how the numbers are split.
Plot type specific considerations
Exploratory Data Analysis
Describing Data: Two Variables
Quantitative Techniques – Class I
EMPA Statistical Analysis
EXPLORATORY DATA ANALYSIS and DESCRIPTIVE STATISTICS
STAT 4030 – Programming in R STATISTICS MODULE: Basic Data Analysis
Review 1. Describing variables.
1.3 Measuring Center & Spread, The Five Number Summary & Boxplots
Unit 4 Statistical Analysis Data Representations
Module 6: Descriptive Statistics
1st Semester Final Review Day 1: Exploratory Data Analysis
How could data be used in an EPQ?
Description of Data (Summary and Variability measures)
Laugh, and the world laughs with you. Weep and you weep alone
Unit 7: Statistics Key Terms
Topic 5: Exploring Quantitative data
HMI 7530– Programming in R STATISTICS MODULE: Basic Data Analysis
Scientific Figure Design
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
Organizing Data AP Stats Chapter 1.
Exploratory Data Analysis
Descriptive Statistics
Ten things about Descriptive Statistics
Displaying data Seminar 2.
Presentation transcript:

Ethics of data representation v2.0

Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation Data Visualisation Process

What is Ethics when it comes to data visualisation? The figure/graph/image should show what is actually happening and not what you want to happen. Different ways of being unethical: – knowingly: deliberately showing the data in a misleading manner, choosing the ‘most representative’ image/experiment. – unknowingly: not exploring/getting to know the data well enough, misusing your chosen graphical representation.

Cheating knowingly: Choice of graph You know that what is going on Hypothesis (what you want to see): Applying a treatment will decrease the levels of a variable. Exp2 Exp1 Exp3 Exp4 You choose to plot your data like that

Cheating knowingly: Choice of axis/scale You know that what is going on You want to show an increase in salary in the last term. You choose to plot your data like that

Cheating knowingly: Choice of axis/scale Be careful with Linear vs. logarithmic scale.

Cheating knowingly: Choice of axis/scale If you want to cheat, a bar graph using a log axis is a great tool, as it lets you either exaggerate differences between groups or minimize them. Linear scale Logarithmic scale

Cheating knowingly: Choice of axis/scale Logarithmic axis should be used for: Lognormal data Logarithmically spaced values

OriginalBrightness and Contrast Adjusted Brightness and Contrast Adjusted Too Much: Oversaturation Cheating knowingly: Manipulating images: Western blot Presenting bands out of context ‘Playing’ too much with contrast ‘Rebuilding’ a Western blot from several cuts

Cheating unknowingly: Not exploring/getting to know the data well enough Hypothesis: increase from CondA to CondB. You run the experiment once and you choose to plot the data as a bar chart.

Cheating unknowingly: Not exploring/getting to know the data well enough p=0.04 p=0.32 p=0.001 Comparisons: Treatments vs. Control Exp3 Exp4 Exp1 Exp5 Exp2

Types of plot Things you can illustrate

Plot types – Distribution/Exploration Histograms Very good for exploring data. Better on big dataset. Rules: Number of intervals ≈√N and Interval width ≈ Range ÷√N Histograms are great but careful with the resolution (= number of bins) as it affects the shape of the distribution.

Be careful with the resolution … … and the type of data you are dealing with. Plot types – Distribution/Exploration Histograms Histograms are great but careful with discrete data.

Cutoff = Q1 – 1.5*IQR Median Maximum Interquartile Range (IQR): 50% of the data Lower Quartile (Q1) 25 th percentile (1 st quartile) Outlier Upper Quartile (Q3) 75 th percentile (3 rd quartile) Plot types – Distribution/Exploration Boxplots and Bean plots Minimum

Plot types – Distribution/Exploration Boxplots and Bean plots BimodalUniformNormal Distributions A bean= a ‘batch’ of data Data density mirrored by the shape of the polygon Scatterplot shows individual data Very good for exploring data. Better on medium size dataset. Boxplots are great but be careful with underlying distribution.

Plot types – Exploration/Comparison Stripcharts/Scatterplots Very good for exploring data. Better on small/medium dataset. Very informative: exploration AND comparison. Very hard to cheat with these. Stripcharts are great but they don’t work so well with big samples.

Plot types – Comparisons Barcharts Standard deviation Standard error Confidence interval Star wars (cool graph!)

Plot types – Comparisons Barcharts Be careful with the scale when plotting ratio Very good for presenting results and emphasizing differences. Effectiveness: most important info with the most effective channel. Barcharts are great but after data exploration and the y-axis needs to be chosen wisely.

Plot types – Relationship/Comparison Line graphs Except for exploration … 5 experiments Very good for presenting results of matched/paired/repeated data. Linecharts are great but careful with the axes.

Plot types – Relationships Scatterplot Very good for understanding the relationship between quantitative variables.

Plot types – Relationships Scatterplots Solution: smoothed densities colour representation Scatterplots are great but big data can be tricky.

Plot types – Relationships Heatmaps Great for big data sets, allow to plot a third quantitative value: colour scheme for grouping. Euclidean distance Correlation Colour scheme Heatmaps are great but plot data that are changing.

A heatmap is basically a table that has colors in place of numbers. Simon’s data from simple numbers to correlation

Plot types – Composition Stack charts/Pie charts Stack /pie charts are great but keep an eye on the sample size.