Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)

Similar presentations


Presentation on theme: "Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)"— Presentation transcript:

1 Introduction to Graphics in R 3/12/2014

2 First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car) – data(Duncan)

3 Getting started Okay, now plot income levels: – plot(Duncan$income) What is this graph? Can you make it a line plot instead? – plot(Duncan$income, type=“l”)

4 Histogram The X axis is useless. Wouldn’t a histogram be more informative? Make a histogram If you’re stuck, use google – hist(Duncan$income)

5 Fix the title ‘Histogram of Duncan$income’ is not a good title Change it to ‘Income Distribution in Duncan Dataset’ – hist(Duncan$income, main="Income Distribution in Duncan Dataset")

6 Another option There’s another way to set the title. Maybe some of you will have done this (my crystal ball is murky): – hist(Duncan$income) – title("Income Distribution in Duncan Dataset“) But wait. That looks awful. We need to not print the title as part of the hist() call. How do we do that? hist(Duncan$income, main="")

7 Scatterplot Okay, let’s look at income vs. prestige Make a scatterplot comparing income (x-axis) to prestige (y-axis) – plot(Duncan$income, Duncan$prestige) Did you get the x- and y- axes right? Add a title: Income vs. Prestige – title(“Income vs. Prestige”)

8 Scatterplot: Axis labels The axis labels display the variable names. Can we do better than that? Label the X axis “Income” and the Y axis “Prestige” – plot(Duncan$income, Duncan$prestige, xlab="Income", ylab="Prestige")

9 Scatterplot: Axis range How come income doesn’t have ticks at 0 and 100 but prestige does? Make both axes run from 0 to 100 – plot(Duncan$income, Duncan$prestige, xlab="Income", ylab="Prestige", xlim=c(0,100))

10 Scatterplot Axis Tick Marks Actually, your collaborator wants tick marks every 5 points on the X axis. DO IT Caveat: this is trickier: – plot(Duncan$income, Duncan$prestige, xlab="Income", ylab="Prestige", xlim=c(0,100), xaxt="n") – axis(1, at=seq(0,100, by=5))

11 Axis labels sideways Your collaborator still isn’t happy. Turn the x labels sideways. – plot(Duncan$income, Duncan$prestige, xlab="Income", ylab="Prestige", xlim=c(0,100), xaxt="n") – axis(1, las=2, at=seq(0,100, by=5))

12 More columns Now your collaborator wants to see how education affect this relationship. Create a dichotomous variable named ‘high_education’ categorizing education > 50 as TRUE and <= 50 as FALSE – Duncan$high_education 50

13 High education: sanity check How many high and low education jobs are there? – table(Duncan$high_education) Plot education (y-axis) by high_education (x- axis) – plot(Duncan$high_education, Duncan$education) Does it look right?

14 Adding color Okay, now color your income/prestige graph so high-education jobs are blue and low- education jobs are red This is a little tricky – colors <- as.numeric(Duncan$high_education)+1 – plot(Duncan$income, Duncan$prestige, col=c("red", "blue")[colors], xlab="Income", ylab="Prestige", xlim=c(0,100), xaxt="n") – axis(1, at=seq(0,100, by=5))

15 Bar plot Okay, now run this code: – plot(Duncan$type, Duncan$income) What happened? Why didn't we get a scatterplot? Can you get one? – plot(as.numeric(Duncan$type), Duncan$income)

16 More than one plot at a time Now your collaborator wants your scatterplot and histogram side-by-side. (Don’t worry about color if you don't want to) – opar<-par() – par(mfrow=c(1,2)) – hist(Duncan$income, main="Income Distribution in Duncan Dataset") – plot(Duncan$income, Duncan$prestige, xlab="Income", ylab="Prestige", xlim=c(0,100), xaxt="n") – axis(1, at=seq(0,100, by=5)) – par(opar)

17 ggplot ggplot is a whole different beast from base graphics ggplot is like R itself – some work to get oriented, but powerful once you do You don't have to know ggplot to be successful using R – But you do have to experiment with it for this class

18 Load the ggplot library Hint: the package name, confusingly, is ggplot2

19 Plot income vs. prestige It will be easiest to start using qplot. Qplot mimics plot(), but uses the ggplot layout engine. – qplot(Duncan$income, Duncan$prestige)

20 ggplot qplot is the training wheels version of ggplot ggplot's syntax takes some getting used to. Try this: – ggplot(Duncan) + aes(x=income, y=prestige) + geom_point() Huh? What are the pluses about?

21 ggplot syntax ggplot objects are weird You execute them (like a command) to draw their plot But you construct them by adding options to them Options specify data source, data columns, etc, resulting in code like this: p <- ggplot(Duncan) p <- p + aes(x=income, y=prestige) p + geom_point()

22 Where ggplot shines In my opinion, it's harder to think about doing simple plots in ggplot But when I want to do something multi- faceted (e.g. with different colors, sizes, etc.), ggplot makes it really easy I use it a lot for to understand 3+-way relationships in data

23 ggplot example (one of many)

24 ggplot code for that example ggplot(data=nycnames) + aes(x=as.factor(race), y=n1_013002p, color=as.factor(nbhdarkwalk)) + geom_point(position="jitter") + scale_x_discrete(breaks=1:7, limits=1:7, name="Subject Race", labels=c('Asian', 'Black', 'First\nPeoples', 'Pacific\nIslander', 'Non-Hispanic\nWhite', 'Other', 'Hispanic')) + scale_color_discrete(breaks=1:4, limits=1:4, name="Neighborhood Safe After Dark", labels=c('Strongly Agree', 'Somewhat Agree', 'Somewhat disagree', 'Strongly Disagree')) + scale_y_continuous(name="Neighborhood percent white (1km buffer)")

25 Exercises


Download ppt "Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)"

Similar presentations


Ads by Google