A quick introduction to R prog. 淡江統計 陳景祥 (Steve Chen)

Slides:



Advertisements
Similar presentations
BA 275 Quantitative Business Methods
Advertisements

Student Quiz Grades Test Grades 1.Describe the association between Quiz Grades and Test Grades. 2.Write the.
Generalized Linear Models (GLM)
Confidence Intervals Underlying model: Unknown parameter We know how to calculate point estimates E.g. regression analysis But different data would change.
Regression with ARMA Errors. Example: Seat-belt legislation Story: In February 1983 seat-belt legislation was introduced in UK in the hope of reducing.
Multiple Regression Predicting a response with multiple explanatory variables.
Zinc Data SPH 247 Statistical Analysis of Laboratory Data.
x y z The data as seen in R [1,] population city manager compensation [2,] [3,] [4,]
SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data.
DJIA1 Beneath the Calm Waters: A Study of the Dow Index Group 5 members Project Choice: Hyo Joon You Data Retrieval: Stephen Meronk Statistical Analysis:
Regression Hal Varian 10 April What is regression? History Curve fitting v statistics Correlation and causation Statistical models Gauss-Markov.
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
7/2/ Lecture 51 STATS 330: Lecture 5. 7/2/ Lecture 52 Tutorials  These will cover computing details  Held in basement floor tutorial lab,
MATH 3359 Introduction to Mathematical Modeling Linear System, Simple Linear Regression.
Crime? FBI records violent crime, z x y z [1,] [2,] [3,] [4,] [5,]
Regression Transformations for Normality and to Simplify Relationships U.S. Coal Mine Production – 2011 Source:
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
How to plot x-y data and put statistics analysis on GLEON Fellowship Workshop January 14-18, 2013 Sunapee, NH Ari Santoso.
© Department of Statistics 2012 STATS 330 Lecture 18 Slide 1 Stats 330: Lecture 18.
BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra.
PCA Example Air pollution in 41 cities in the USA.
9/14/ Lecture 61 STATS 330: Lecture 6. 9/14/ Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.
MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression.
Analysis of Covariance Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
6 Mar 2007EMBnet Course – Introduction to Statistics for Biologists Linear Models I Correlation and Regression.
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
Use of Weighted Least Squares. In fitting models of the form y i = f(x i ) +  i i = 1………n, least squares is optimal under the condition  1 ……….  n.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 4b, February 20, 2015 Lab: regression, kNN and K- means results, interpreting and evaluating models.
Lecture 9: ANOVA tables F-tests BMTRY 701 Biostatistical Methods II.
Regression Model Building LPGA Golf Performance
Using R for Marketing Research Dan Toomey 2/23/2015
FACTORS AFFECTING HOUSING PRICES IN SYRACUSE Sample collected from Zillow in January, 2015 Urban Policy Class Exercise - Lecy.
Chapter 13 Multiple Regression
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Lecture 7: Multiple Linear Regression Interpretation with different types of predictors BMTRY 701 Biostatistical Methods II.
Environmental Modeling Basic Testing Methods - Statistics III.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Determining Factors of GPA Natalie Arndt Allison Mucha MA /6/07.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Lecture 3 Linear Models II Olivier MISSA, Advanced Research Skills.
Linear Models Alan Lee Sample presentation for STATS 760.
Exercise 1 The standard deviation of measurements at low level for a method for detecting benzene in blood is 52 ng/L. What is the Critical Level if we.
Steps Continuous Categorical Histogram Scatter Boxplot Child’s Height Linear Regression Dad’s Height Gender Continuous Y X1, X2 X3 Type Variable Mom’s.
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
Nemours Biomedical Research Statistics April 9, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
1 Analysis of Variance (ANOVA) EPP 245/298 Statistical Analysis of Laboratory Data.
Lecture 10 Linear models in R Trevor A. Branch FISH 552 Introduction to R.
Lecture 11: Simple Linear Regression
EXCEL: Multiple Regression
Peter Fox and Greg Hughes Data Analytics – ITWS-4600/ITWS-6600
Data Analytics – ITWS-4600/ITWS-6600
Chapter 12 Simple Linear Regression and Correlation
Résolution de l’ex 1 p40 t=c(2:12);N=c(55,90,135,245,403,665,1100,1810,3000,4450,7350) T=data.frame(t,N,y=log(N));T; > T t N y
CHAPTER 7 Linear Correlation & Regression Methods
Correlation, Regression & Nested Models
Mixed models and their uses in meta-analysis
Correlation and regression
Data Analytics – ITWS-4600/ITWS-6600/MATP-4450
Console Editeur : myProg.R 1
Chapter 12 Simple Linear Regression and Correlation
Simple Linear Regression
Obtaining the Regression Line in R
ITWS-4600/ITWS-6600/MATP-4450/CSCI-4960
Estimating the Variance of the Error Terms
Introduction to Regression
Presentation transcript:

A quick introduction to R prog. 淡江統計 陳景祥 (Steve Chen)

Major features (1) Contains c onstant, simple variables, vector and array variables (2) With Input and Output instructions (3) Has conditional expression ( e.g., if, else ) (4) With do loop structures (e.g., for, while, repeat until ) (5) Can define program modules (e.g., function) Programming => Freedom !

R Output Function output of R are stored in LIST variables, you can choose to see the value of any list variable > x = rnorm(20); y = rnorm(20) # generate 20 samples of the standard normal dist. > lm.result = lm(y ~ x) # A simple regression >lm.result # note that lm.result is variable name. Call: lm(formula = y ~ x) Coefficients: (Intercept) x > names(lm.result) [1] "coefficients" "residuals" "effects" "rank" [5] "fitted.values" "assign" "qr" "df.residual" [9] "xlevels" "call" "terms" "model" > lm.result$coefficients (Intercept) x

Simple prog. examples x = c(1,2,3) y = c(11,12,13) for (i in 1:3) { x[i] = x[i]+100 } x # User defined function: f1 f1 = function(a,b) { c = a+b return(c) } z = f1(x,y) # Now a = x, b = y if (is.vector(z)) { cat("z is a vector! \n") } else { cat("z is NOT a vector! \n") } fit = lm(y ~ x) # Simple regression summary(fit)

R Example 1-(1) > # the scores and sexes of 6 students,stored in scores and gender vector variable > scores = c(60,55,64,66,55,56) > gender = c(“male",“male",”female",”frmale",”male",“female") > scores[2] # the 2 nd element of scores [1] 55 > scores[2:4] # 2:4 = c(2,3,4), the 2,3,4 elements [1] > scores[c(1,3,4,5)] # the 1,3,4, 5 elements of scores [1] > scores[scores > 60] [1] > # compute the average > mean(scores) [1] > # compute the standard deviation > sd(scores) [1]

R Example 1-(2) > # compute variance > var(scores) [1] > # compute the median > median(scores) [1] 58 > # compute the 25th Percentile > quantile(scores,0.25) 25% > scores[gender == “male"] [1] > mean(scores[gender == “male"]) [1] > table(gender, scores) scores gender female male > hist(scores) # histogram

R Example 2 : Regression > ( IQ = round(rnorm(6,110,15)) ) # generate 6 samples from N(110, 15) ,and rounding [1] > ( scores = *IQ + rnorm(6,0,2) ) # course score = * IQ + random error (N(0,2)) [1] > summary( lm(scores ~ IQ) ) # Call: lm(formula = scores ~ IQ) Residuals: Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) IQ *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 4 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 4 DF, p-value:

R Input function Read a vector : X = scan(“c:/dir1/sample.txt”) Read a data frame : read.table(“c:/dir1/d.txt”,header=T) read.csv(“c:/dir1/d.csv”, header=T)

R Output function Output a vector : cat(scores, file=“c:/dir2/scores.txt") write(scores, file=“c:/dir2/scores2.txt") Output a data frame : write.table(X, “c:/dir2/data.txt”,row.names = FALSE, sep=“ “ ) write.csv(X, “c:/dir2/data.txt”,row.names = FALSE )

R special operations (1) Vector operations > x = c(1,2,3,4,5) # c stands for concatenate > y = c(10,20,30,40,50) > x + y [1] > x - y [1] > x * y [1] > y ^ x [1] > x [1]

R sepcial operations (2) Data filtering using vectors, matrices and pointer array > x = c(1,2,3,4,5) > y = c(10,20,30,40,50) > x >= 3 [1] FALSE FALSE TRUE TRUE TRUE > ( x.index = x >= 3 ) [1] FALSE FALSE TRUE TRUE TRUE > x[x >= 3] # 或 x[ x.index ] [1] > x[c(F,F,T,T,T)] [1] > x[c(3,4,5)] [1] > x[y >= 20] [1] > z = c("boy","girl","boy","boy","girl") > y[z == "boy"] [1]

R special operations (3) LIST variable : to store the function Output > x = rnorm(20); y = rnorm(20) # 2 N(0,1) samples > lm.result = lm(y ~ x) # simple regression, output is stored in lm.result > lm.result Call: lm(formula = y ~ x) Coefficients: (Intercept) x > names(lm.result) # the components of the output [1] "coefficients" "residuals" "effects" "rank" …….. > lm.result$coefficients (Intercept) x > lm.result$coefficients[2] x

R special operations(4) Factor (factor) variable : for nominal data (categorical data), used in classification > gender = c(“male",”male",”female",”female",”male",”female") > gender2 = as.factor(gender) > gender2 [1] male male female female male female Levels: female male > levels(gender2) [1] “female" “male"

R special operations(5) Data-Frame variables > iris Sepal.Length Sepal.Width Petal.Length Petal.Width Species setosa setosa virginica virginica > iris$Sepal.Length [1] [145] > iris$Species [1] setosa setosa setosa setosa setosa setosa [145] virginica virginica virginica virginica virginica virginica Levels: setosa versicolor virginica

R special operations(6) Function and function as an argument f1 = function(x,f0,…) { result = f0(x,…) return(result) } y = rnorm(100) f1(y,mean,trim=0.05,na.rm=T) # Now f0(x,...) = mean(x,trim=0.05,na.rm=T) [1]

R special operation (7) Use cat in function for explanation > f2 = function(x) { + mean(x) + var(x) } > f2(y) [1] > f2 = function(x){ + cat("mean of X = ",mean(x),"\n") + var(x) + } > f2(y) mean of X = [1]