R: A Statistics Program For Teaching & Research Josué Guzmán 11 Nov. 2007

Slides:



Advertisements
Similar presentations
Introduction to R Graphics
Advertisements

Jack Davis Andrew Henrey FROM N00B TO PRO. PURPOSE Create a simulator from scratch that: Generates data from a variety of distributions Makes a response.
Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
© 2003 Prentice-Hall, Inc.Chap 5-1 Business Statistics: A First Course (3 rd Edition) Chapter 5 Probability Distributions.
Introduction to R A. Di Bucchianico. Introduction to R2 Types of statistical software command-line software –requires knowledge of syntax of commands.
Normal and Poisson Distributions
Continuous Random Variables and Probability Distributions
Chapter 5 Continuous Random Variables and Probability Distributions
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
The Normal Distributions
Lecture 6: Descriptive Statistics: Probability, Distribution, Univariate Data.
Normal and Sampling Distributions A normal distribution is uniquely determined by its mean, , and variance,  2 The random variable Z = (X-  /  is.
Training on R-language Mārtiņš Liberts Central Statistical Bureau of Latvia.
Use of Quantile Functions in Data Analysis. In general, Quantile Functions (sometimes referred to as Inverse Density Functions or Percent Point Functions)
Probability Distributions 2014/04/07 Maiko Narahara
Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242.
Logistic Regression and Generalized Linear Models:
Chapter 14: Statistics Introductory Question: On the most recent Chemistry Test, Mrs. Jones’ class had the following scores: 81, 45, 67, 88, 72, 97, 59,
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
Objectives 1.2 Describing distributions with numbers
Fundamental Graphics in R Prof. Ke-Sheng Cheng Dept. of Bioenvironmental Systems Eng. National Taiwan University.
Quantitative Research in Education Sohee Kang Ph.D., lecturer Math and Statistics Learning Centre.
Quantitative Analysis: Statistical Testing using SPSS Geof Staniford Room Telephone:
Data Analysis Using R: 1. Introduction to the R language Tuan V. Nguyen Garvan Institute of Medical Research, Sydney, Australia.
5-1 Business Statistics: A Decision-Making Approach 8 th Edition Chapter 5 Discrete Probability Distributions.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
Dan Piett STAT West Virginia University Lecture 7.
Random Variables Numerical Quantities whose values are determine by the outcome of a random experiment.
Introduction to Engineering MATLAB – 1 Introduction to MATLAB Agenda Introduction Arithmetic Operations MATLAB Windows Command Window Defining Variables.
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
I❤RI❤R Kin Wong (Sam) Game Plan Intro R Import SPSS file Descriptive Statistics Inferential Statistics GraphsQ&A.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 6 Continuous Random Variables.
Introduction to Engineering MATLAB – 2 Introduction to MATLAB - 2 Agenda Defining Variables MATLAB Windows.
Applied Quantitative Analysis and Practices LECTURE#09 By Dr. Osman Sadiq Paracha.
Chapter 7 Lesson 7.6 Random Variables and Probability Distributions 7.6: Normal Distributions.
Measures of Dispersion How far the data is spread out.
Recap Sum and Product Functions Matrix Size Function Variance and Standard Deviation Random Numbers Complex Numbers.
Introduction to Quantitative Research Analysis and SPSS SW242 – Session 6 Slides.
Distributions, Iteration, Simulation Why R will rock your world (if it hasn’t already)
Random Variables Presentation 6.. Random Variables A random variable assigns a number (or symbol) to each outcome of a random circumstance. A random variable.
EAS31116/B9036: Statistics in Earth & Atmospheric Sciences Lecture 3: Probability Distributions (cont’d) Instructor: Prof. Johnny Luo
Simulations and programming in R. Why to simulate and program in R at all? ADVANTAGES –All R facilities can be used in the simulations Random number generators.
Lecture 9. Continuous Probability Distributions David R. Merrell Intermediate Empirical Methods for Public Policy and Management.
Computing for Research I Spring 2013
Probability Theory Modelling random phenomena. Permutations the number of ways that you can order n objects is: n! = n(n-1)(n-2)(n-3)…(3)(2)(1) Definition:
R tutorial Stat 140 Linjuan Qian
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions Basic Business.
MATH 2311 Help Using R-Studio. To download R-Studio Go to the following link: Follow the instructions for your computer.
Descriptive Statistics using R. Summary Commands An essential starting point with any set of data is to get an overview of what you are dealing with You.
Outline What is MATLAB MATLAB desktop Variables, Vectors and Matrices Matrix operations Array operations Built-in functions: Scalar, Vector, Matrix Data.
Biostatistics Class 3 Probability Distributions 2/15/2000.
Descriptive Statistics ( )
Stats Lab #3.
Applied statistics Usman Roshan.
Chapter 16: Exploratory data analysis: numerical summaries
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Chapter 8: Introduction to Statistics CIS Computational Probability.
From the binomial to the normal
BIOS 501 Lecture 3 Binomial and Normal Distribution
Description of Data (Summary and Variability measures)
Chapter 4 Continuous Random Variables and Probability Distributions
Statistics for Business and Economics
Fundamental Graphics in R
Stat 251 (2009, Summer) Lab 1 TA: Yu, Chi Wai.
Continuous Statistical Distributions: A Practical Guide for Detection, Description and Sense Making Unit 3.
Nonparametric Statistics
Advanced data management
Normal Distribution The Bell Curve.
R-lab 2 -Dorji Pelzom.
Introductory Statistics
Presentation transcript:

R: A Statistics Program For Teaching & Research Josué Guzmán 11 Nov

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.2 Some Useful R Links R Home Page CRAN Precompiled Binary Distributions Windows (95 and later) R Manuals

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.3 R Installation R: Statistical Analysis & Graphics Freely Available Under GPL Binary Distributions Installation – Standard Steps

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.4

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.5

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.6 Running R

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.7 Statistical Programming with R Learn Language Basics Learn Documentation / Help System Learn Data Manipulation & Graphics Perform Basic Statistical Analysis

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.8 First Steps: Interacting with R Type a Command & Press Enter R Executes (printing the result if relevant) R waits for more input

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.9 Some Examples  2 * 2 [1] 4  exp(-2) [1]  rdmnorm =rnormal(1000)

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.10 R Functions exp, log and rnorm are functions Function calls are indicated by the presence of parentheses Example:  hist(rdmnorm, col = "magenta")

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.11 Variables and Assignments The = operator; the <- operator also works  x = 2.2  y = x  sqrt(x)  y  x ^ y

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.12 Variables and Assignments Variable names cannot start with a digit Names are Case-Sensitive Some common names are already used by R Examples: c, q, t, C, D, F, I, T Should be avoided

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.13 Vectorized Arithmetic Elementary data types in R are all vectors The c(...) construct used to create vectors: Bolstad, 2004, exercise 13.2, page 253  fertilizer = c(1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5)  fertilizer

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.14 Vectorized Arithmetic [cont.] Arithmetic operations (+, -, *, /, ^) and mathematical functions (sin, cos, log, …) work element-wise on vectors  yield = c(25, 31, 27, 28, 36, 35, 32, 34)  log(yield)

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.15 Vectorized Arithmetic [cont.]  sum.yield = sum(yield)  sum.yield  n = length(yield)  n  avg.yield = sum.yield/n  avg.yield

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.16 Graphics plot(x, y) function – simple way to produce R graphics:  plot(fertilizer, log(yield), main = "Fertilizer vs. Yield")

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.17 Getting Help help.start( ) Starts a browser window with an HTML help interface. Links to manual An Introduction to R, as well as topic-wise listings. help(topic) Help page for a particular topic or function. Every R function has a help page. help.search("search string") Subject/keyword search

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.18 Getting Help [cont.] Short-cut: question mark (?)  help(plot)  ? plot To know about a specific subject, use help.search function. Example:  help.search("logarithm")

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.19 apropos( ) apropos function - list of topics that partially match its argument:  apropos("plot")[1:10] [1] ".__C__recordedplot" "biplot" [3] "interaction.plot" "lag.plot" [5] "monthplot" "plot.TukeyHSD" [7] "plot.density" "plot.ecdf" [9] "plot.lm" "plot.mlm"

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.20 R Packages R makes use of a system of packages Each package is a collection of routines with a common theme The core of R itself is a package called base A collection of packages is called a library Some packages are already loaded when R starts up Other packages need be loaded using the library function

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.21 R Packages [cont.] Several packages come pre-installed with R:  installed.packages( )[, 1] [1] "ISwR" "KernSmooth" "MASS" "base" [5] "boot" "class" "cluster" "foreign" [9] "graphics" "grid" "lattice" "methods" [13] "mgcv" "nlme" "nnet" "rpart" [17] "spatial" "splines" "stats" "stats4" [21] "survival" "tcltk" "tools" "utils"

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.22 Contributed Packages Many packages are available from CRAN Some packages are already loaded when R starts up. List of currently loaded packages - use search:  search( ) [1] ".GlobalEnv" "package:tools" "package:methods" [4] "package:stats" "package:graphics" "package:utils" [7] "Autoloads" "package:base"

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.23 R Packages Can be loaded by the user. Example: UsingR package  library(UsingR) New packages downloaded using the install.packages function:  install.packages("UsingR")  library(help = UsingR)

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.24 Data Types vector – Set of elements in a specified order matrix – Two-dimensional array of elements of the same mode factor – Vector of categorical data data frame – Two-dimensional array whose columns may represent data of different modes list – Set of components that can be any other object type

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.25 Editing Data Sets Can create and modify data sets on the command line  xx = seq(from = 1, to = 5)  xx  x2 = 1 : 5  x2  yy = scan( )  yy Can edit a data set once it is created  edit(mydata)  data.entry(mydata)

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.26 Built-in Data Data from a library:  library(UsingR)  attach(cfb)#Consumer-Finances Survey  cfb$INCOME  cfb$EDUC  educ.fac = factor(EDUC)  plot(INCOME ~ educ.fac, xlab = "EDUCATION", ylab = "INCOME")  detach(cfb)

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.27 Data Modes logical – Binary mode, values represented as TRUE or FALSE numeric – Numeric mode [integer, single, & double precision] complex – Complex numeric values character – Character values represented as strings

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.28 Data Frames read.table( ) – Reads in data from an external file  read.table("data.txt", header = T)  read.table(file = file.choose( ), header = T) data.frame – Binds R objects of various kinds

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.29 read.table Function Reads ASCII file, creates a data frame Data in tables of rows and columns If first line contains column labels: Use argument header = T Field separator is white space Also read.csv and read.csv2 –Assume, and ; separations, respectively Treats characters as factors

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.30 save( ) and load( ) Used for R Functions and Objects Understandable to load only  x = 23  y = 44  save(x, y, file = "xy.Rdata")  load("xy.Rdata")

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.31 Comparison Operators != Not Equal To < Less Than <= Less Than or Equal To == Exactly Equal To > Greater Than >= Greater Than or Equal To

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.32 Some Logical Operators ! Not | Or (For Calculating Vectors and Arrays of Logicals) & And (For Calculating Vectors and Arrays of Logicals)

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.33 Some Mathematical Functions abs Absolute Value ceiling Next Larger Integer floor Next Smallest Integer cos, sin, tan Trigonometric Functions exp(x) e^x [e = …] log Natural Logarithm log10 Logarithm Base 10 sqrt Square Root

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.34 Statistical Summary Functions length Length of Object max Maximum Value mean Arithmetic Mean median Median min Minimum Value prod Product of Values quantile Empirical Quantiles sum Sum var Variance - Covariance sd Standard Deviation cor Correlation Between Vectors or Matrices

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.35 Sorting and Other Functions rev Put Values of Vectors in Reverse Order sort Sort Values of Vector order Permutation of Elements to Produce Sorted Order rank Ranks of Values in Vector match Detect Occurrences in a Vector cumsum Cumulative Sums of Values in Vector cumprod Cumulative Products

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.36 Plotting Functions Useful for One-Dimensional Data barplotBar plot boxplotBox & Whisker plot histHistogram dotchartDot plot piePie chart

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.37 Plotting Functions Useful for Two-Dimensional Data plot Creates a scatter plot:  plot(x, y) qqnorm Quantile-quantile plot sample vs. N(0, 1):  qqnorm(x) qqplot Plot quantile-quantile plot for two samples:  qqplot(x, y) pairsCreates a pairs or scatter plot matrix:  attach(babies)  pairs(babies[, c("gestation", "wt", "age", "inc" ) ] )

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.38 Three-Dimensional Plotting Functions contourContour plot perspPerspective plot imageImage plot

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.39 Probability Distributions Using R Pseudo-random sampling  sample(0:20, 5) # select 5 WOR  sample(0:20, 5, replace = T) # select WR Coin toss simulation [0 = tail; 1 = head] 20 tosses:  sample(c(0, 1), 20, replace=T)

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.40 For Any Probability Distribution ddist density or probability pdist cumulative probability qdist quantiles [percentiles] rdist pseudo-random selection

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.41 Binomial Distribution X ~ Binomial(n, p) ; x = 0, 1, …, n dbinom(x, n, p ) Density or point probability pbinom(x, n, p ) Cumulative distribution qbinom(q, n, p ) Quantiles [ 0 < q < 1 ] rbinom(m, n, p )Pseudo-random numbers

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.42 Binomial Distribution Coin toss simulation:  x = 0:20 # num. of heads in 20 tosses  px = dbinom(x, size = 20, prob = 0.5)  plot(x, px, type = "h") # graph display  curve(dnorm(x, 10, sqrt(20*.5*.5)), col=2, add=T)

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.43

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.44 Normal Distribution X ~ Normal(µ,  ) dnorm(x, µ,  ) Density pnorm(x, µ,  ) Cumulative probability qnorm(q, µ,  ) Quantiles rnorm(m, µ,  ) Random numbers

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.45 Standard Normal  x = seq(-3.5,3.5,0.1) # x ~ N(0,1)  prx = dnorm(x) # M = 0, SD = 1  plot(x, prx, type = "l" ) Or using:  curve(dnorm(x), from = -3.5, to = 3.5)

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.46 Cumulative Normal & Quantiles  curve(pnorm(x), from=-3.5,to=3.5)  qnorm(.25) #Percentile 25, x~N(0,1)  qnorm(.75, m=50, sd=2) # M=50,SD=2  qnorm(c(.1,.3,.7,.9), m=65, sd=3)

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.47 Poisson Distribution X ~ Poisson( λ ) ; X = 0, 1, 2, 3, …  x = 0:20 # Suppose λ = 3.5  prx = dpois(x, lambda = 3.5)  plot(x, prx, type = "h", main = "Poisson Distribution")  text(10,.10, "Lambda = 3.5")

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.48

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.49 Sampling Distributions  n = 25; curve(dnorm(x, 0, 1/sqrt(n)), -3, 3, xlab = "Mean", ylab = "Densities of Sample Mean", bty = "l" )  n=5 ; curve(dnorm(x, 0, 1/sqrt(n)), add=T)  n=1 ; curve(dnorm(x, 0, 1/sqrt(n)), add=T)

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.50

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.51 t – Distribution as df Increase  curve(dnorm(x), -4, 4, main="Normal & t Distributions", ylab="Densities" )  k=3; curve(dt(x, df = k ), lty = k, add = T)  k=5; curve(dt(x, df = k ), lty = k, add = T)  k=15; curve(dt(x, df = k ), lty = k, add = T)  k=100; curve(dt(x, df = k ), lty = k, add = T)

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.52

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.53 Binomial-Normal Approximation Coin toss example: n = 100, p =.5 P(X ≤ 40)? Using Larget’s prob.R file:  source(file.choose( ) )  gbinom(100,.5, b = 40 ) Normal approximation: µ = 50,  = 5  gnorm(50, 5, b = 40.5)

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.54

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.55

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.56 One-Sample t-test Ho: µ = µ 0 Null Hypothesis Ha: µ  µ 0 Two-sided Ha: µ > µ 0 One-sided Ha: µ < µ 0 One-sided

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.57 R One-Sample t.test  x = c(x1, x2, …, xn)# data set  t.test(x, mu = Mo) # two-sided  t.test(x, mu = Mo, alt = "g") # one-sided  t.test(x, mu = Mo, alt = "l") # one-sided

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.58 R One-Sample t.test [cont.] Example: Text, Problem 8.11, page 226  library(UsingR)  attach(stud.recs)  x = sat.m # Math SAT Scores  hist(x) # Visual display  qqnorm(x) # Normal quantile plot  qqline(x, col=2)# Add equality line  t.test(x, mu = 500)  detach(stud.recs)

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.59 Normality Test Shapiro-Wilk test: Ho: X ~ Normal Ha: X !~ Normal Command:  shapiro.test(x) # Examine p-value

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.60 Normality Test [cont.] Example: On Base %  data(OBP)  summary(OBP)  boxplot(OBP)

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.61 Normality Test [cont.]  qqnorm(OBP)  qqline(OBP, col=2)  shapiro.test(OBP)  wilcox.test(OBP, mu=.330)

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.62 One-Sample Proportion Test x total successes; n sample size  prop.test(x, n, p = Po) # two-sided  prop.test(x, n, p = Po, alt= "g")  prop.test(x, n, p = Po, alt= "l")

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.63 Or Using Binomial “Exact” Test  binom.test(x, n, p = Po)  binom.test(x, n, p = Po, alt = "g")  binom.test(x, n, p = Po, alt = "l")

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.64 Proportion Test Text, Example 8.3: Survey US Poverty Rate Ho: P = # Year 2000 Rate Ha: P > # Year 2001 Rate Increased  x = 5850 # Sample people UPL  n = # Sample size  prop.test(x, n, p = 0.113, alt = "g")  binom.test(x, n, p = 0.113, alt = "g")

© J. Guzmán, 2007R: Stat. Prog. for Teach. & Res.65 Some Modeling Functions/Packages Linear Models:anova, car, lm, glm Graphics:graphics, grid, lattice Multivariate:mva, cluster Survey:survey SQC:qcc Time Series:tseries Bayesian:BRugs, MCMCpack, … Simulation:boot, bootstrap, Zelig

You Perform An Experiment In Order To Learn, Not To Prove. W Edwards Deming