Presentation is loading. Please wait.

Presentation is loading. Please wait.

7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment.

Similar presentations


Presentation on theme: "7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment."— Presentation transcript:

1 7/3/2015330 Lecture 21 STATS 330: Lecture 2

2 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment Class rep – ???? Office hours –Alan: 10:30 – 12:00 Tuesday and Thursday in Rm 265, Building 303S –Tutors: TBA Assignment 1: Due 2 August

3 7/3/2015330 Lecture 23 Today’s lecture: Exploratory graphics Aim of the lecture: To give you a quick overview of the kinds of graphs that can be helpful in exploring data. Some of this material has been covered in 201/8. (We will discuss the R code used to make the plots in Lecture 4)

4 7/3/2015330 Lecture 24 Exploratory Graphics: topics Exploratory Graphics f or a single variable: aim is to show distribution of values –Histograms –Kernel density estimators –Qq plots For 2 variables: aim is to show relationships –Both continuous: scatter plots –One of each: side by side boxplots –Both categorical: mosaic plots – see Ch 5 3 variables –Pairs plot –Rotating plot –coplots –3D plots, contour plots

5 7/3/2015330 Lecture 25 Single variable: Exchange rate data The data of interest: daily changes in log(exchange rate) for the US$/Kiwi. –Monthly date from June 1986 to may 2012 –Source: Reserve Bank Questions: –What is the distribution of the daily changes in the logged exchange rate? –Is it normal? If not, how is it different?

6 7/3/2015330 Lecture 26 y t = exchange rate at time t Difference in logs is log(y t ) – log(y t-1 ) = log(y t /y t-1 )

7 7/3/2015330 Lecture 27 Data Analysis Suppose we have the data (3374 differences in the logs), in an R vector, Diff.in.Logs hist(Diff.in.Logs,nclass=100,freq=FALSE) # add density estimate lines(density(Diff.in.Logs),col="blue", lwd=2) # add fitted normal density xvec = seq(-0.1, 0.1, length=100) lines(xvec,dnorm(xvec,mean=mean(Diff.in.Logs), sd=sd(Diff.in.Logs)),col="red", lwd=2)

8 7/3/2015330 Lecture 28

9 7/3/2015330 Lecture 29

10 7/3/2015330 Lecture 210 Normal plot > qqnorm(Diff.in.Logs) Normal data? No – QQ plot indicates that the differences have longer tails than normal, since the plotted points are lower than the line for small values and higher for big ones

11 7/3/2015330 Lecture 211 Two Variables: Rats! Of interest: growth rates of 16 rats i.e. relationship between weight and time Want to explore the relationship graphically. Each rat was measured (roughly) every week for 11 weeks For weeks 1-6, all rats were on a fixed diet. Diet was changed after week 6.

12 7/3/2015330 Lecture 212 Two Variables: Rats! Data set rats.df has variables –rat (1-16) –growth (weight in grams) –day (day since start of study, 11 values, at approximately weekly intervals –group (litter, one of 3) –change (has values 1 or 2 - diet was changed after 6 weeks, diet 1 for weeks 1-6, diet 2 for weeks 7-11

13 7/3/2015330 Lecture 213 Rats: the data > rats.df growth group rat change day 1 240 1 1 1 1 2 250 1 1 1 8 3 255 1 1 1 15 4 260 1 1 1 22 5 262 1 1 1 29 6 258 1 1 1 36 7 266 1 1 2 43 8 266 1 1 2 44 9 265 1 1 2 50 10 272 1 1 2 57 11 278 1 1 2 64 12 225 1 2 1 1 12 230 1 2 1 8... More data

14 7/3/2015330 Lecture 214 Rats (cont) Could plot weight (i.e. the variable growth ) versus the variable day : plot(day,growth) BUT….

15 7/3/2015330 Lecture 215

16 7/3/2015330 Lecture 216 Criticisms Can’t tell which points belong to which rat Seem to be 2 groups of points In actual fact, the rats came from 3 different litters, is this relevant? Could do better

17 7/3/2015330 Lecture 217 More rats: improvements Join points representing the same rat with a line Use different colours (or different line types e.g. dashed or dotted) for the different litters Use a legend

18 7/3/2015330 Lecture 218

19 7/3/2015330 Lecture 219 More improvements Plot is too cluttered Could plot each rat on a different graph – important to use same scales (axes) for each graph This leads to the idea of “Trellis graphics”

20 7/3/2015330 Lecture 220

21 7/3/2015330 Lecture 221

22 7/3/2015330 Lecture 222 Two variables: one continuous, one categorical Insurance data: data on 14,000 insurance claims. Want to explore relationship between the amount of the claim (a continuous variable) and the type of car (a categorical variable). Use side-by side boxplots.

23 7/3/2015330 Lecture 223 Car Group Loess smooth

24 7/3/2015330 Lecture 224 More than 2 variables: If all variables are continuous, we can explore the relationships between them using a pairs plot If we have 3 variables, a rotating plot is a very useful tool

25 7/3/2015330 Lecture 225 Example: Cherry trees > cherry.df diameter height volume 1 8.3 70 10.3 2 8.6 65 10.3 3 8.8 63 10.2 4 10.5 72 16.4 5 10.7 81 18.8 6 10.8 83 19.7 7 11.0 66 15.6 8 11.0 75 18.2 9 11.1 80 22.6 10 11.2 75 19.9... more data – 31 trees in all

26 7/3/2015330 Lecture 226 Cherry trees: pairs plots > pairs(cherry.df)

27 3-d Rotating plots The challenge: to represent a 3- dimensional object on a 2-dimensional surface (a TV screen, computer screen etc) Traditional method uses projection, perspective A powerful idea is to use motion, looking at the 3-d scene from different angles 7/3/2015330 Lecture 227

28 7/3/2015330 Lecture 228 Perspective

29 7/3/2015330 Lecture 229 Diameter - volume view Diameter - height view Volume - height view Arbitrary view Projection

30 7/3/2015330 Lecture 230 Cherry trees: rotating plot

31 7/3/2015330 Lecture 231 Dynamic motion By dynamically changing the angle of view, we get a better impression of the 3- dimensional structure of the data “Dynamic graphics” is a very powerful tool

32 7/3/2015330 Lecture 232 A powerful idea: Coplots Coplot shows relationship between x and y for selected values of z (usually a narrow range of z’s) By showing separate plots for different z ranges, we can see how the relationship between x and y changes as z changes Coplot: conditioning plot, shows relationship between x and y conditional on z (ie for fixed z)

33 7/3/2015330 Lecture 233 Cherry trees: coplots To show the relationship between height and volume for different values of diameter: Divide the range of diameter (8.3 to 20.6) up into 6 subranges 8-11, 10.5 -11.5 etc Draw 6 plots, the first using all data whose diameter is between 8 and 11, the second using all data whose diameter is between 10.5 and 11.5, and so on

34 7/3/2015330 Lecture 234

35 7/3/2015330 Lecture 235 Interpretation Note that the lines are not of the same slope This implies that the point configuration is not “planar”

36 Other 3-d graphs 7/3/2015330 Lecture 236 plot of surface3-d scatter plot Both can be rotated


Download ppt "7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment."

Similar presentations


Ads by Google