ENV Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds
ENV Lecture Outline Preliminaries –Definitions –Datatypes Simple Data Presentation –Graphs and charts
ENV Fundamentals Basic Datatypes correspond to different levels of measurement Data can be: –Categorical - labels –Numerical – numbers Categorical –Nominal No sense of order Apples, oranges,… –Ordinal Ordered in sequence January, February,.. Numerical –Continuous Real numbers Height of students in class –Discrete Typically whole numbers Marks in an exam
ENV Question Give an example for each class in which numbers are involved… Categorical - nominal Categorical - ordinal Numerical – continuous Numerical - discrete
ENV Exploratory Data Analysis Pioneering figure is John Tukey New approach to data analysis, heavily based on visualization, as an alternative to classical data analysis See wikipedia Two stage process: –Exploratory: Search for evidence using all tools available –Confirmatory: evaluate strength of evidence using classical data analysis
ENV Simple Data Presentation
ENV Simple Data Presentation Simple data tables are often presented as line graphs, bar graphs, pie charts, dot graphs, histograms… Which should we use and when?
ENV Line Graph Fundamental technique of data presentation Used to compare two variables –X-axis is often the control variable –Y-axis is the response variable Good at: –Showing specific values –Trends –Trends in groups (using multiple line graphs) Students participating in sporting activities Mobile Phone use Note: graph labelling is fundamental Any critical comments here?
ENV Simple Representations – Bar Graph Bar graph –Presents categorical variables –Height of bar indicates value –Double bar graph allows comparison –Note spacing between bars –Can be horizontal (when would you use this?) Internet use at a school Number of police officers Note more space for labels
ENV Dot Graph Very simple but effective… Horizontal to give more space for labelling
ENV Pie Chart Pie chart summarises a set of categorical/nominal data But use with care… … too many segments are harder to compare than in a bar chart Should we have a long lecture? Favourite movie genres
ENV Histograms Histograms summarise discrete or continuous data that are measured on an interval scale No gaps if variable is continuous Distribution of salaries in a company
ENV Scatter Plot Used to present measurements of two variables Effective if a relationship exists between the two variables Car ownership by household income Example taken from NIST Handbook – Evidence of strong positive correlation
ENV Scatter Plots in Excel The scatter plot is a fundamental tool in Excel Chart type XY (Scatter) and subtype Unconnected Points
ENV Regression Line Excel allows you to add a linear regression line (trend line) Remember: correlation does not imply causality… ie a relationship exists but one is not necessarily causing the other – there may be a third factor?
ENV Tukey Sum-Difference Plot Better understanding of residuals …
ENV Box Plots In some situations we have, not a single data value at a point, but a number of data values, or even a probability distribution When might this occur? Tukey proposed the idea of a boxplot to visualize the distribution of values For explanation and some history, see: WhiskerPlot.html M – median Q1, Q3 – quarrtiles Whiskers – 1.5 * interquartile range Dots - outliers Darwins plant study
ENV Acknowledgement Thanks to Statistics Canada – an excellent web site for simple data presentation –
ENV Exercise for next week Understand a bit more about the merits of pie charts and bar graphs Create a dataset with roughly equal numbers in each class Which is best if the task is to discriminate?
ENV Exercise for next week Over the next week look for examples of basic graphs –In newspapers, magazines or other print media –On news web sites or other electronic media Analyse two examples –One should be a example where you think the use of graphics is good –One should be bad Be ready next week to present these results to the class…
ENV Envisioning Information : Practical Work Gnuplot R Excel