Download presentation
Presentation is loading. Please wait.
Published byThomasine Owen Modified over 9 years ago
1
7/3/2015330 Lecture 21 STATS 330: Lecture 2
2
7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment Class rep – ???? Office hours –Alan: 10:30 – 12:00 Tuesday and Thursday in Rm 265, Building 303S –Tutors: TBA Assignment 1: Due 2 August
3
7/3/2015330 Lecture 23 Today’s lecture: Exploratory graphics Aim of the lecture: To give you a quick overview of the kinds of graphs that can be helpful in exploring data. Some of this material has been covered in 201/8. (We will discuss the R code used to make the plots in Lecture 4)
4
7/3/2015330 Lecture 24 Exploratory Graphics: topics Exploratory Graphics f or a single variable: aim is to show distribution of values –Histograms –Kernel density estimators –Qq plots For 2 variables: aim is to show relationships –Both continuous: scatter plots –One of each: side by side boxplots –Both categorical: mosaic plots – see Ch 5 3 variables –Pairs plot –Rotating plot –coplots –3D plots, contour plots
5
7/3/2015330 Lecture 25 Single variable: Exchange rate data The data of interest: daily changes in log(exchange rate) for the US$/Kiwi. –Monthly date from June 1986 to may 2012 –Source: Reserve Bank Questions: –What is the distribution of the daily changes in the logged exchange rate? –Is it normal? If not, how is it different?
6
7/3/2015330 Lecture 26 y t = exchange rate at time t Difference in logs is log(y t ) – log(y t-1 ) = log(y t /y t-1 )
7
7/3/2015330 Lecture 27 Data Analysis Suppose we have the data (3374 differences in the logs), in an R vector, Diff.in.Logs hist(Diff.in.Logs,nclass=100,freq=FALSE) # add density estimate lines(density(Diff.in.Logs),col="blue", lwd=2) # add fitted normal density xvec = seq(-0.1, 0.1, length=100) lines(xvec,dnorm(xvec,mean=mean(Diff.in.Logs), sd=sd(Diff.in.Logs)),col="red", lwd=2)
8
7/3/2015330 Lecture 28
9
7/3/2015330 Lecture 29
10
7/3/2015330 Lecture 210 Normal plot > qqnorm(Diff.in.Logs) Normal data? No – QQ plot indicates that the differences have longer tails than normal, since the plotted points are lower than the line for small values and higher for big ones
11
7/3/2015330 Lecture 211 Two Variables: Rats! Of interest: growth rates of 16 rats i.e. relationship between weight and time Want to explore the relationship graphically. Each rat was measured (roughly) every week for 11 weeks For weeks 1-6, all rats were on a fixed diet. Diet was changed after week 6.
12
7/3/2015330 Lecture 212 Two Variables: Rats! Data set rats.df has variables –rat (1-16) –growth (weight in grams) –day (day since start of study, 11 values, at approximately weekly intervals –group (litter, one of 3) –change (has values 1 or 2 - diet was changed after 6 weeks, diet 1 for weeks 1-6, diet 2 for weeks 7-11
13
7/3/2015330 Lecture 213 Rats: the data > rats.df growth group rat change day 1 240 1 1 1 1 2 250 1 1 1 8 3 255 1 1 1 15 4 260 1 1 1 22 5 262 1 1 1 29 6 258 1 1 1 36 7 266 1 1 2 43 8 266 1 1 2 44 9 265 1 1 2 50 10 272 1 1 2 57 11 278 1 1 2 64 12 225 1 2 1 1 12 230 1 2 1 8... More data
14
7/3/2015330 Lecture 214 Rats (cont) Could plot weight (i.e. the variable growth ) versus the variable day : plot(day,growth) BUT….
15
7/3/2015330 Lecture 215
16
7/3/2015330 Lecture 216 Criticisms Can’t tell which points belong to which rat Seem to be 2 groups of points In actual fact, the rats came from 3 different litters, is this relevant? Could do better
17
7/3/2015330 Lecture 217 More rats: improvements Join points representing the same rat with a line Use different colours (or different line types e.g. dashed or dotted) for the different litters Use a legend
18
7/3/2015330 Lecture 218
19
7/3/2015330 Lecture 219 More improvements Plot is too cluttered Could plot each rat on a different graph – important to use same scales (axes) for each graph This leads to the idea of “Trellis graphics”
20
7/3/2015330 Lecture 220
21
7/3/2015330 Lecture 221
22
7/3/2015330 Lecture 222 Two variables: one continuous, one categorical Insurance data: data on 14,000 insurance claims. Want to explore relationship between the amount of the claim (a continuous variable) and the type of car (a categorical variable). Use side-by side boxplots.
23
7/3/2015330 Lecture 223 Car Group Loess smooth
24
7/3/2015330 Lecture 224 More than 2 variables: If all variables are continuous, we can explore the relationships between them using a pairs plot If we have 3 variables, a rotating plot is a very useful tool
25
7/3/2015330 Lecture 225 Example: Cherry trees > cherry.df diameter height volume 1 8.3 70 10.3 2 8.6 65 10.3 3 8.8 63 10.2 4 10.5 72 16.4 5 10.7 81 18.8 6 10.8 83 19.7 7 11.0 66 15.6 8 11.0 75 18.2 9 11.1 80 22.6 10 11.2 75 19.9... more data – 31 trees in all
26
7/3/2015330 Lecture 226 Cherry trees: pairs plots > pairs(cherry.df)
27
3-d Rotating plots The challenge: to represent a 3- dimensional object on a 2-dimensional surface (a TV screen, computer screen etc) Traditional method uses projection, perspective A powerful idea is to use motion, looking at the 3-d scene from different angles 7/3/2015330 Lecture 227
28
7/3/2015330 Lecture 228 Perspective
29
7/3/2015330 Lecture 229 Diameter - volume view Diameter - height view Volume - height view Arbitrary view Projection
30
7/3/2015330 Lecture 230 Cherry trees: rotating plot
31
7/3/2015330 Lecture 231 Dynamic motion By dynamically changing the angle of view, we get a better impression of the 3- dimensional structure of the data “Dynamic graphics” is a very powerful tool
32
7/3/2015330 Lecture 232 A powerful idea: Coplots Coplot shows relationship between x and y for selected values of z (usually a narrow range of z’s) By showing separate plots for different z ranges, we can see how the relationship between x and y changes as z changes Coplot: conditioning plot, shows relationship between x and y conditional on z (ie for fixed z)
33
7/3/2015330 Lecture 233 Cherry trees: coplots To show the relationship between height and volume for different values of diameter: Divide the range of diameter (8.3 to 20.6) up into 6 subranges 8-11, 10.5 -11.5 etc Draw 6 plots, the first using all data whose diameter is between 8 and 11, the second using all data whose diameter is between 10.5 and 11.5, and so on
34
7/3/2015330 Lecture 234
35
7/3/2015330 Lecture 235 Interpretation Note that the lines are not of the same slope This implies that the point configuration is not “planar”
36
Other 3-d graphs 7/3/2015330 Lecture 236 plot of surface3-d scatter plot Both can be rotated
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.