Download presentation
Presentation is loading. Please wait.
Published bySidney Bail Modified over 9 years ago
1
R: A Statistics Program For Teaching & Research Josué Guzmán 11 Nov. 2007 JGuzmanPhD@Gmail.Com
2
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.2 Some Useful R Links R Home Page www.r-project.org CRAN http://cran.r-project.org Precompiled Binary Distributions Windows (95 and later) R Manuals
3
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.3 R Installation R: Statistical Analysis & Graphics Freely Available Under GPL Binary Distributions Installation – Standard Steps
4
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.4
5
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.5
6
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.6 Running R
7
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.7 Statistical Programming with R Learn Language Basics Learn Documentation / Help System Learn Data Manipulation & Graphics Perform Basic Statistical Analysis
8
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.8 First Steps: Interacting with R Type a Command & Press Enter R Executes (printing the result if relevant) R waits for more input
9
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.9 Some Examples 2 * 2 [1] 4 exp(-2) [1] 0.1353353 rdmnorm =rnormal(1000)
10
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.10 R Functions exp, log and rnorm are functions Function calls are indicated by the presence of parentheses Example: hist(rdmnorm, col = "magenta")
11
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.11 Variables and Assignments The = operator; the <- operator also works x = 2.2 y = x + 3.5 sqrt(x) y x ^ y
12
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.12 Variables and Assignments Variable names cannot start with a digit Names are Case-Sensitive Some common names are already used by R Examples: c, q, t, C, D, F, I, T Should be avoided
13
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.13 Vectorized Arithmetic Elementary data types in R are all vectors The c(...) construct used to create vectors: Bolstad, 2004, exercise 13.2, page 253 fertilizer = c(1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5) fertilizer
14
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.14 Vectorized Arithmetic [cont.] Arithmetic operations (+, -, *, /, ^) and mathematical functions (sin, cos, log, …) work element-wise on vectors yield = c(25, 31, 27, 28, 36, 35, 32, 34) log(yield)
15
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.15 Vectorized Arithmetic [cont.] sum.yield = sum(yield) sum.yield n = length(yield) n avg.yield = sum.yield/n avg.yield
16
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.16 Graphics plot(x, y) function – simple way to produce R graphics: plot(fertilizer, log(yield), main = "Fertilizer vs. Yield")
17
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.17 Getting Help help.start( ) Starts a browser window with an HTML help interface. Links to manual An Introduction to R, as well as topic-wise listings. help(topic) Help page for a particular topic or function. Every R function has a help page. help.search("search string") Subject/keyword search
18
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.18 Getting Help [cont.] Short-cut: question mark (?) help(plot) ? plot To know about a specific subject, use help.search function. Example: help.search("logarithm")
19
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.19 apropos( ) apropos function - list of topics that partially match its argument: apropos("plot")[1:10] [1] ".__C__recordedplot" "biplot" [3] "interaction.plot" "lag.plot" [5] "monthplot" "plot.TukeyHSD" [7] "plot.density" "plot.ecdf" [9] "plot.lm" "plot.mlm"
20
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.20 R Packages R makes use of a system of packages Each package is a collection of routines with a common theme The core of R itself is a package called base A collection of packages is called a library Some packages are already loaded when R starts up Other packages need be loaded using the library function
21
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.21 R Packages [cont.] Several packages come pre-installed with R: installed.packages( )[, 1] [1] "ISwR" "KernSmooth" "MASS" "base" [5] "boot" "class" "cluster" "foreign" [9] "graphics" "grid" "lattice" "methods" [13] "mgcv" "nlme" "nnet" "rpart" [17] "spatial" "splines" "stats" "stats4" [21] "survival" "tcltk" "tools" "utils"
22
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.22 Contributed Packages Many packages are available from CRAN Some packages are already loaded when R starts up. List of currently loaded packages - use search: search( ) [1] ".GlobalEnv" "package:tools" "package:methods" [4] "package:stats" "package:graphics" "package:utils" [7] "Autoloads" "package:base"
23
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.23 R Packages Can be loaded by the user. Example: UsingR package library(UsingR) New packages downloaded using the install.packages function: install.packages("UsingR") library(help = UsingR)
24
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.24 Data Types vector – Set of elements in a specified order matrix – Two-dimensional array of elements of the same mode factor – Vector of categorical data data frame – Two-dimensional array whose columns may represent data of different modes list – Set of components that can be any other object type
25
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.25 Editing Data Sets Can create and modify data sets on the command line xx = seq(from = 1, to = 5) xx x2 = 1 : 5 x2 yy = scan( ) 5 8 10 4 2 6 20 11 21 32 43 55 yy Can edit a data set once it is created edit(mydata) data.entry(mydata)
26
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.26 Built-in Data Data from a library: library(UsingR) attach(cfb)#Consumer-Finances Survey cfb$INCOME cfb$EDUC educ.fac = factor(EDUC) plot(INCOME ~ educ.fac, xlab = "EDUCATION", ylab = "INCOME") detach(cfb)
27
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.27 Data Modes logical – Binary mode, values represented as TRUE or FALSE numeric – Numeric mode [integer, single, & double precision] complex – Complex numeric values character – Character values represented as strings
28
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.28 Data Frames read.table( ) – Reads in data from an external file read.table("data.txt", header = T) read.table(file = file.choose( ), header = T) data.frame – Binds R objects of various kinds
29
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.29 read.table Function Reads ASCII file, creates a data frame Data in tables of rows and columns If first line contains column labels: Use argument header = T Field separator is white space Also read.csv and read.csv2 –Assume, and ; separations, respectively Treats characters as factors
30
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.30 save( ) and load( ) Used for R Functions and Objects Understandable to load only x = 23 y = 44 save(x, y, file = "xy.Rdata") load("xy.Rdata")
31
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.31 Comparison Operators != Not Equal To < Less Than <= Less Than or Equal To == Exactly Equal To > Greater Than >= Greater Than or Equal To
32
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.32 Some Logical Operators ! Not | Or (For Calculating Vectors and Arrays of Logicals) & And (For Calculating Vectors and Arrays of Logicals)
33
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.33 Some Mathematical Functions abs Absolute Value ceiling Next Larger Integer floor Next Smallest Integer cos, sin, tan Trigonometric Functions exp(x) e^x [e = 2.71828 …] log Natural Logarithm log10 Logarithm Base 10 sqrt Square Root
34
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.34 Statistical Summary Functions length Length of Object max Maximum Value mean Arithmetic Mean median Median min Minimum Value prod Product of Values quantile Empirical Quantiles sum Sum var Variance - Covariance sd Standard Deviation cor Correlation Between Vectors or Matrices
35
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.35 Sorting and Other Functions rev Put Values of Vectors in Reverse Order sort Sort Values of Vector order Permutation of Elements to Produce Sorted Order rank Ranks of Values in Vector match Detect Occurrences in a Vector cumsum Cumulative Sums of Values in Vector cumprod Cumulative Products
36
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.36 Plotting Functions Useful for One-Dimensional Data barplotBar plot boxplotBox & Whisker plot histHistogram dotchartDot plot piePie chart
37
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.37 Plotting Functions Useful for Two-Dimensional Data plot Creates a scatter plot: plot(x, y) qqnorm Quantile-quantile plot sample vs. N(0, 1): qqnorm(x) qqplot Plot quantile-quantile plot for two samples: qqplot(x, y) pairsCreates a pairs or scatter plot matrix: attach(babies) pairs(babies[, c("gestation", "wt", "age", "inc" ) ] )
38
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.38 Three-Dimensional Plotting Functions contourContour plot perspPerspective plot imageImage plot
39
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.39 Probability Distributions Using R Pseudo-random sampling sample(0:20, 5) # select 5 WOR sample(0:20, 5, replace = T) # select WR Coin toss simulation [0 = tail; 1 = head] 20 tosses: sample(c(0, 1), 20, replace=T)
40
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.40 For Any Probability Distribution ddist density or probability pdist cumulative probability qdist quantiles [percentiles] rdist pseudo-random selection
41
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.41 Binomial Distribution X ~ Binomial(n, p) ; x = 0, 1, …, n dbinom(x, n, p ) Density or point probability pbinom(x, n, p ) Cumulative distribution qbinom(q, n, p ) Quantiles [ 0 < q < 1 ] rbinom(m, n, p )Pseudo-random numbers
42
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.42 Binomial Distribution Coin toss simulation: x = 0:20 # num. of heads in 20 tosses px = dbinom(x, size = 20, prob = 0.5) plot(x, px, type = "h") # graph display curve(dnorm(x, 10, sqrt(20*.5*.5)), col=2, add=T)
43
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.43
44
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.44 Normal Distribution X ~ Normal(µ, ) dnorm(x, µ, ) Density pnorm(x, µ, ) Cumulative probability qnorm(q, µ, ) Quantiles rnorm(m, µ, ) Random numbers
45
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.45 Standard Normal x = seq(-3.5,3.5,0.1) # x ~ N(0,1) prx = dnorm(x) # M = 0, SD = 1 plot(x, prx, type = "l" ) Or using: curve(dnorm(x), from = -3.5, to = 3.5)
46
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.46 Cumulative Normal & Quantiles curve(pnorm(x), from=-3.5,to=3.5) qnorm(.25) #Percentile 25, x~N(0,1) qnorm(.75, m=50, sd=2) # M=50,SD=2 qnorm(c(.1,.3,.7,.9), m=65, sd=3)
47
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.47 Poisson Distribution X ~ Poisson( λ ) ; X = 0, 1, 2, 3, … x = 0:20 # Suppose λ = 3.5 prx = dpois(x, lambda = 3.5) plot(x, prx, type = "h", main = "Poisson Distribution") text(10,.10, "Lambda = 3.5")
48
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.48
49
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.49 Sampling Distributions n = 25; curve(dnorm(x, 0, 1/sqrt(n)), -3, 3, xlab = "Mean", ylab = "Densities of Sample Mean", bty = "l" ) n=5 ; curve(dnorm(x, 0, 1/sqrt(n)), add=T) n=1 ; curve(dnorm(x, 0, 1/sqrt(n)), add=T)
50
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.50
51
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.51 t – Distribution as df Increase curve(dnorm(x), -4, 4, main="Normal & t Distributions", ylab="Densities" ) k=3; curve(dt(x, df = k ), lty = k, add = T) k=5; curve(dt(x, df = k ), lty = k, add = T) k=15; curve(dt(x, df = k ), lty = k, add = T) k=100; curve(dt(x, df = k ), lty = k, add = T)
52
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.52
53
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.53 Binomial-Normal Approximation Coin toss example: n = 100, p =.5 P(X ≤ 40)? Using Larget’s prob.R file: source(file.choose( ) ) gbinom(100,.5, b = 40 ) Normal approximation: µ = 50, = 5 gnorm(50, 5, b = 40.5)
54
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.54
55
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.55
56
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.56 One-Sample t-test Ho: µ = µ 0 Null Hypothesis Ha: µ µ 0 Two-sided Ha: µ > µ 0 One-sided Ha: µ < µ 0 One-sided
57
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.57 R One-Sample t.test x = c(x1, x2, …, xn)# data set t.test(x, mu = Mo) # two-sided t.test(x, mu = Mo, alt = "g") # one-sided t.test(x, mu = Mo, alt = "l") # one-sided
58
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.58 R One-Sample t.test [cont.] Example: Text, Problem 8.11, page 226 library(UsingR) attach(stud.recs) x = sat.m # Math SAT Scores hist(x) # Visual display qqnorm(x) # Normal quantile plot qqline(x, col=2)# Add equality line t.test(x, mu = 500) detach(stud.recs)
59
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.59 Normality Test Shapiro-Wilk test: Ho: X ~ Normal Ha: X !~ Normal Command: shapiro.test(x) # Examine p-value
60
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.60 Normality Test [cont.] Example: On Base % data(OBP) summary(OBP) boxplot(OBP)
61
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.61 Normality Test [cont.] qqnorm(OBP) qqline(OBP, col=2) shapiro.test(OBP) wilcox.test(OBP, mu=.330)
62
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.62 One-Sample Proportion Test x total successes; n sample size prop.test(x, n, p = Po) # two-sided prop.test(x, n, p = Po, alt= "g") prop.test(x, n, p = Po, alt= "l")
63
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.63 Or Using Binomial “Exact” Test binom.test(x, n, p = Po) binom.test(x, n, p = Po, alt = "g") binom.test(x, n, p = Po, alt = "l")
64
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.64 Proportion Test Text, Example 8.3: Survey US Poverty Rate Ho: P = 0.113 # Year 2000 Rate Ha: P > 0.113 # Year 2001 Rate Increased x = 5850 # Sample people UPL n = 50000 # Sample size prop.test(x, n, p = 0.113, alt = "g") binom.test(x, n, p = 0.113, alt = "g")
65
© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.65 Some Modeling Functions/Packages Linear Models:anova, car, lm, glm Graphics:graphics, grid, lattice Multivariate:mva, cluster Survey:survey SQC:qcc Time Series:tseries Bayesian:BRugs, MCMCpack, … Simulation:boot, bootstrap, Zelig
66
You Perform An Experiment In Order To Learn, Not To Prove. W Edwards Deming
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.