Introductory Data Analysis F73DA2
Contact Times (Spring Term 2008) Monday 4: : Lecture in LT3 Tuesday 2: : Lecture in LT3 Wednesday : Lecture in LT3 Group 1 Tuesday : Practical in SRG12/13 Group 2 Tuesday : Practical in SRG12/13
Group 1
Group 2
The web pages for this module can be found linked from John Phillips Home Page: John Phillips Office: CM S06
Aims This module aims to develop students' abilities in understanding and solving practical statistical problems, and to teach them how to choose appropriate techniques, analyse data and present results.
The module will consist of a mixture of lectures and practical work. Lectures will focus on statistical modelling, including the selection of appropriate models, the analysis and interpretation of results, and diagnostics. Exploratory and graphical techniques will be considered, as well as more formal statistical procedures.
Both parametric procedures (e.g. linear and generalized linear models) and nonparametric methods will be discussed, as will modern robust techniques. There will be considerable emphasis on examples, applications, and case studies, especially for continuous response variables. Computing facilities, especially R, will be used extensively.
Assessments The module will be assessed by the student's completion of two practical assignments, to be handed in by specified times during the term.
Installing R PC Caledonia
Simply double click on the “Installer” then select the “R” icon. This will produce a short-cut to R which should be available every time you log on.
Installing R On your own pc
Download free from the Comprehensive R Archive Network
R screen
Type command here …. appears in red
R screen Arrow keys on keyboard are very useful. Pressingrepeatedly allows you to retrieve previous commands entered.
Many keys and function names are very much as you would expect. > 6+4 [1] 10 > 18*3 [1] 54 > log(100) [1] > pi [1] > sin(pi) [1] e-16
Many keys and function names are very much as you would expect. > cos(pi) [1] -1 > x=7 > y=10 > x+y [1] 17 > sqrt(x*x+7*x*y-2*y*y) [1] >
Example : A survey produced the following 200 results of individuals salaries :
Graphical Representation Histogram Stem-and-Leaf Boxplot Frequency Polygon
>hist(salaries)
>hist(salaries, nclass=5)
> stem(salaries) The decimal point is 3 digit(s) to the right of the | 14 | 5 15 | 16 | | | | | | | | | | 48
>boxplot(salaries)
Summary Statistics
> mean(salaries)
> mean(salaries) [1]
> mean(salaries) [1] > median(salaries)
> mean(salaries) [1] > median(salaries) [1] 20020
> mean(salaries) [1] > median(salaries) [1] > sd(salaries)
> mean(salaries) [1] > median(salaries) [1] > sd(salaries) [1]
Scatter Diagrams
x y
>plot(x,y)
> plot(y~x) > abline(lm(y~x))
Pie Chart Example
> television=scan( ) 1: : Read 26 items
> television=scan( ) 1: : Read 26 items > barplot(table(television))
> television.counts=table(television) > names(television.counts)=c("BBC1","BBC2", "ITV1","CH4","Other") >pie(television.counts,col=c("purple","green2", "cyan","yellow","white"))
Binomial Distribution It takes ages to calculate a series of probabilities
If n= 5, a=0.2 and x runs from 0 to 5 5! p(0)= ! 5! p(0) =
If n= 5, a=0.2 and x runs from 0 to 5 5! p(1)= ! 4! p(1) =
If n= 5, a=0.2 and x runs from 0 to 5 5! p(2)= ! 3! p(2) =
If n= 5, a=0.2 and x runs from 0 to 5 5! p(2)= ! 3! p(2) = …………and so on
Using R > dbinom(0:5,5,0.2) [1]
Using R > dbinom(0:5,5,0.2) [1] > pf=dbinom(0:5,5,0.2) > pf [1] >
Using R > pf [1] > barplot(pf) >
R Packages
R is built from packages of datasets and functions. The base and ctest packages are loaded by default and contain everything necessary for basic statistical analysis. Other packages may be loaded on demand, either via the Packages menu, or via the R function library.
Once a package is loaded, the functions within it are automatically available. To make available a dataset from within a package, use the function data. Of particular interest to advanced statistical users is the package MASS, which contains the functions and datasets from the book Modern Applied Statistics with S by W N Venables and B D Ripley. This package can be loaded with > library(MASS)
To make available the dataset chem from within MASS, use additionally > data(chem) Documentation on any package is available via the R help system. Missing or further packages may usually be obtained from CRAN.
Some data sets are already in R when you open it. > data(iris) > iris Sepal.Length Sepal.Width Petal.Length Petal.Width Species setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
Notice, though, that if you haven’t used the data command, R will not know that iris exists. Type `demo()' for some demos, `help()' for on-line help, or `help.start()' for a HTML browser interface to help. Type `q()' to quit R. [Previously saved workspace restored] > iris Error: Object "iris" not found >
Similarly if you use a file from the library and do not use the library command first, R will not know that a data set exists. Type `demo()' for some demos, `help()' for on-line help, or `help.start()' for a HTML browser interface to help. Type `q()' to quit R. [Previously saved workspace restored] > data(chem) Warning message: Data set `chem' not found in: data(chem) >