Graphical Display 1 Pictures of Data
Edward Tufte Visual Display of Quantitative Information Using graphics for –Data –Concepts –Instruction –Finding patterns and answering questions
Simple Plots One variable or one variable plus a grouping variable Reveals shape of the distribution Distribution of cases over a categorical variable Evidence that metric variable follows a normal distribution
Tufte’s Rules Above all else show data Maximize the data/ink ratio Erase non-data-ink Erase redundant data-ink Revise and edit
Pie Charts One variable, nominal scale, percentages A few categories (3-6) Exploded for emphasis Best in groups (2-4) Best replaced by other graphic displays
Rcmdr pie diagram of language family in the CA Indians data set
3D Pies Strongly discouraged plotrix has them: –Install.packages(“plotrix”) –library(plotrix) –pie3D(table(CAIndians$Language), radius=.85, labels = levels(CAIndians$Language), explode=.1)
Bar Charts One variable, nominal scale, counts or percentages More categories than pie charts(4- 15) Color or shaded Can be stacked, set side-by-side or rotated Can compare 2-3 groups
barplot Rcmdr uses this command: –barplot(table(CAIndians$Language), xlab="Language", ylab="Frequency") Just insert the following before the left parenthesis to get it in color: –, col=rainbow_hcl(6)
Barplots with 2 groups Side by side or stacked bar plots make it possible to compare two categories at a time Load MissFauna and extract the Lilbourn samples: –Lilbourn <- as.matrix(MissFauna[5:7,]) –LilbournPct <- prop.table(Lilbourn, 1)*100
Lilbourn Barplots par(mfrow=c(2,2)) barplot(LilbournPct) barplot(t(LilbournPct)) barplot(LilbournPct, beside=TRUE, legend=TRUE) barplot(t(LilbournPct), beside=TRUE, legend=TRUE)
Dot Chart Dot charts plot the amounts in each group along a common scale so they are more easily comparable: –dotchart(as.vector(table(CAIndians$La nguage)), pch=16, labels = levels(CAIndians$Language), xlab = "Frequency")
Stem and Leaf Plot Tukey proposed as a way of looking at the distribution of a numeric variable With a small sample, can preserve the original data while showing its shape
> stem.leaf(DartPoints$Length, unit=1, na.rm=TRUE) 1 | 2: represents 12 leaf unit: 1 n: * | | * | (11) 4. | * | | * | | * | | 8 1 8* | 4
Histograms Like a bar plot but the x-axis is a continuous measurement. Shape of data distribution is shown, but number of bars can change the shape.
Boxplot Also called box and whiskers plots show the quartiles and outlier points Multiple boxplots let you compare groups
Stripchart A stripchart plots the actual values along the y-axis It hides less information, but is not as familiar to many people
Kernel Density Plot Use the data to approximate a smooth distribution – varies according to the bandwidth A normal (or other distribution) is placed on each point and then the distributions are summed plot(density(DartPoints$Length))
Example x <- c(53, 59, 62, 63, 65, 67, 69, 71, 72, 77) hist(x, col="blue", las=1, cex.axis=1.5, cex.lab=1.5)
plot(density(x), main="Kernel Density Plot") rug(x) xi <- seq(45, 90,.5) for (i in 1:10) lines(xi, dnorm(xi, mean=x[i], sd=3)/10) d <- dnorm(x[1], mean=x[1], sd=3)/10 matlines(rbind(x, x), rbind(rep(0,10), rep(d, 10)), lty=3, lwd=2, col="dark gray")
# Density plots # Load DartPoints.RData par(mfrow=c(2, 2)) # Vary bandwidth plot(density(DartPoints$Length), cex.main=.8) plot(density(DartPoints$Length, bw=2), cex.main=.8) plot(density(DartPoints$Length, bw=6), cex.main=.8) plot(density(DartPoints$Length, bw=8), cex.main=.8) # To compare two distributions par(mfrow=c(1,1)) a <- density(DartPoints$Length[DartPoints$Name=="Darl"]) b <- density(DartPoints$Length[DartPoints$Name=="Pedernales"]) plot(a, main="Dart Points", xlab="Length", ylab="Density", xlim=c(min(a$x, b$x), max(a$x, b$x)), ylim=c(0, max(a$y, b$y)), col="red") lines(b, col="blue") legend("topright", levels(DartPoints$Name), lty=1, col=c("red", "blue"))
Violin Plot Combine box plot and kernel density plot (package vioplot) with(DartPoints, vioplot(Length[Name=="Darl"], Length[Name=="Pedernales"], names=levels(Name)))
Beanplot Add strip plot to violin plot (package beanplot) with(DartPoints, beanplot(Length[Name=="Darl"], Length[Name=="Pedernales"], names=levels(Name)))