The R Book Chapter 2: Essentials of the R Language Session 4.

Slides:



Advertisements
Similar presentations
CS 100: Roadmap to Computing Fall 2014 Lecture 0.
Advertisements

String and Lists Dr. Benito Mendoza. 2 Outline What is a string String operations Traversing strings String slices What is a list Traversing a list List.
Week 10 Recap CSE 115 Spring For-each loop When we have a collection and want to do something to all elements of that collection we use the for-each.
Lecture 4 Sept 8 Complete Chapter 3 exercises Chapter 4.
© Janice Regan, CMPT 102, Sept CMPT 102 Introduction to Scientific Computer Programming Expressions and Operators Program Style.
Elementary Data Types Scalar Data Types Numerical Data Types Other
JavaScript, Fourth Edition
Lecture 4 Sept 7 Chapter 4. Chapter 4 – arrays, collections and indexing This chapter discusses the basic calculations involving rectangular collections.
The Data Element. 2 Data type: A description of the set of values and the basic set of operations that can be applied to values of the type. Strong typing:
Spreadsheets Objective 6.02
2 Explain advanced spreadsheet concepts and functions Advanced Calculations 1 Sabbir Saleh_Lecture_17_Computer Application_BBA.
1 Week 12 Arrays, vectors, matrices and cubes. Introduction to Scientific & Engineering Computing 2 Array subscript expressions n Each subscript in an.
The Data Element. 2 Data type: A description of the set of values and the basic set of operations that can be applied to values of the type. Strong typing:
Flow of Control. 2 Control Structures Control structure: An instruction that determines the order in which other instructions in a program are executed.
1.2 – Evaluate and Simplify Algebraic Expressions A numerical expression consists of numbers, operations, and grouping symbols. An expression formed by.
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
20-753: Fundamentals of Web Programming 1 Lecture 12: Javascript I Fundamentals of Web Programming Lecture 12: Introduction to Javascript.
PHP Logic. Review: Variables Variables: a symbol or name that stands for a value – Data types ( Similar to C++ or Java): Int, Float, Boolean, String,
Programming for Beginners Martin Nelson Elizabeth FitzGerald Lecture 2: Variables & Data Types.
1 Program Development  The creation of software involves four basic activities: establishing the requirements creating a design implementing the code.
1 An Introduction to R © 2009 Dan Nettleton. 2 Preliminaries Throughout these slides, red text indicates text that is typed at the R prompt or text that.
Introduction to Programming Python Lab 3: Arithmetic 22 January PythonLab3 lecture slides.ppt Ping Brennan
Chapter 1: Real Numbers and Equations Section 1.3: Variables and Expressions.
String and Lists Dr. José M. Reyes Álamo. 2 Outline What is a string String operations Traversing strings String slices What is a list Traversing a list.
IST 210: PHP Logic IST 210: Organization of Data IST2101.
7 - Programming 7J, K, L, M, N, O – Handling Data.
String and Lists Dr. José M. Reyes Álamo.
The R Book Chapter 2: Essentials of the R Language Session 7.
Programming in R Intro, data and programming structures
BASIC ELEMENTS OF A COMPUTER PROGRAM
Ch.4 Data Manipulation.
Stats Lab #1 TA: Kyle Davis
Fundamental of Java Programming Basics of Java Programming
Unit 42 : Spreadsheet Modelling
Chapter 19 and Material Adapted from Fluency Text book
Chapter 5 Structures.
Introduction to Python
Copyright 2012, 2008, 2004, 2000 Pearson Education, Inc.
Week 8 - Programming II Today – more features: Loop control
Algorithm design and Analysis
Arithmetic operations, decisions and looping
Introduction to Python
Chapter 8 JavaScript: Control Statements, Part 2
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
L1-6 Notes: Algebra: Variables and Expressions
Vectorized Code, Logical Indexing
C Operators, Operands, Expressions & Statements
CS 100: Roadmap to Computing
String and Lists Dr. José M. Reyes Álamo.
Fourth Year – Software Engineering
Topics Sequences Introduction to Lists List Slicing
Introduction to Matlab
Spreadsheets 2 Explain advanced spreadsheet concepts and functions
Vectors and Matrices Chapter 2 Attaway MATLAB 4E.
Chapter 3 Operators and Expressions
Flow of Control.
The Data Element.
Spreadsheets Objective 6.02
R Course 1st Lecture.
Topics Sequences Introduction to Lists List Slicing
Winter 2019 CISC101 4/28/2019 CISC101 Reminders
Data analysis with R and the tidyverse
Spreadsheets Objective 6.02
The Data Element.
OPERATORS in C Programming
To Start: 15 Points Evaluate: * 6 – 2 3(6 +2) – 2 3{6 +(3 * 4)}
DATA TYPES AND OPERATIONS
CS 100: Roadmap to Computing
Class code for pythonroom.com cchsp2cs
OPERATORS in C Programming
Presentation transcript:

The R Book Chapter 2: Essentials of the R Language Session 4

Questions last week Rounding floating point values > x<-4.654583732637648594387372632 > round(x,digits=5) [1] 4.65458 > round(x,3) [1] 4.655 Practical situation of a list … I have no real example but : Data frames are lists as well, but they have a few restrictions: - you can't use the same name for two different variables - all elements of a data frame are vectors - all elements of a data frame have an equal length. lists are suitable for executing functions to global datasets, without the need of using iterations or loops. -powerful, fast but should be used with care. At our level, we will use dataframes.

Addresses within Vectors, length(), seq() > y<-c(8,3,5,7,6,6,8,9,2,3,9,4,10,4,11) > length(y) [1] 15 > length(y[y>5]) [1] 9 Extract every nth element from a vector with seq() > xv<-rnorm(1000,100,10) random numbers, length 1000 mean 100 standard deviation 10 > xv[seq(25,length(xv),25)] [1] 78.55738 102.43888 98.35262 90.35767 108.45821 105.87016 103.74105 [8] 88.86585 107.17449 107.34596 90.55796 106.00437 119.99036 101.28521 [15] 92.50904 92.89058 105.71306 84.45659 105.17407 100.62878 110.44917 [22] 85.21301 103.60246 85.07047 98.14852 99.96667 78.37900 115.25886 [29] 109.51843 98.88434 83.30707 105.62302 97.21630 90.99227 106.60842 [36] 74.92409 92.88968 90.89482 96.53100 91.82529 every 25th value from, to (try 1000) by

Finding Closest Values find the value of xv that is closest to 108.0: which(abs(xv-108)==min(abs(xv-108))) [1] 813 > xv[813] [1] 107.9925 write a function : closest value to a specified value sv > closest<-function(xv,cb){ + xv[which(abs(xv-cb)==min(abs(xv-cb)))] } > closest(xv,108) logical argument absolute minimum - |abs| - which() tests a logical operation

Finding Closest Values find the value of xv that is closest to 108.0: which(abs(xv-108)==min(abs(xv-108))) [1] 813 > xv[813] [1] 107.9925 write a function : closest value to a specified value sv > which.closest<-function(xv,sv){ + which(abs(xv-sv)==min(abs(xv-sv))) } > which.closest(xv,108) logical argument absolute minimum - |abs| - which() tests a logical operation

Trimming Vectors Using Negative Subscripts > x<- c(5,8,6,7,1,5,3) > x [1] 5 8 6 7 1 5 3 Take out first element > z<-x[-1] > z [1] 8 6 7 1 5 3 Take out smallest and largest value - sort → remove first element : x[-1] → remove last element : x[-length(x)] - concatenation of both: -c(1,length(x)) > sort(x) [1] 1 3 5 5 6 7 8 > sort(x)[-c(1,length(x))] [1] 3 5 5 6 7

Trimming Vectors Using Negative Subscripts > x<- c(5,8,6,7,1,5,3) > x [1] 5 8 6 7 1 5 3 Take out first element > z<-x[-1] > z [1] 8 6 7 1 5 3 Take out smallest and largest value > sort(x)[-c(1,length(x))] [1] 3 5 5 6 7 Calculate mean of trimmed vector > trim.mean <- function(x) mean(sort(x)[-c(1,length(x))]) > trim.mean(x) [1] 5.2

Taking out multiple values omitting all the multiples of seven (7, 14, 21, etc.) > vec<-1:50 > (multiples<-floor(length(vec)/7)) [1] 7 > (subscripts<-7*(1:multiples)) [1] 7 14 21 28 35 42 49 > vec[-subscripts] [1] 1 2 3 4 5 6 8 9 10 11 12 13 15 16 17 18 19 20 22 23 24 25 26 27 29 [26] 30 31 32 33 34 36 37 38 39 40 41 43 44 45 46 47 48 50

Taking out multiple values omitting all the multiples of seven (7, 14, 21, etc.) vec<-1:50 vec[-(1:50*(1:50%%7==0))] or use the modulo > vec[-(1:50)] Integer(0) > 1:50%%7==0 [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE [13] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE [25] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE [37] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE [49] TRUE FALSE > 1:50*(1:50%%7==0) [1] 0 0 0 0 0 0 7 0 0 0 0 0 0 14 0 0 0 0 0 0 21 0 0 0 0 [26] 0 0 28 0 0 0 0 0 0 35 0 0 0 0 0 0 42 0 0 0 0 0 0 49 0 > vec[-(1:50*(1:50%%7==0))] [1] 1 2 3 4 5 6 8 9 10 11 12 13 15 16 17 18 19 20 22 23 24 25 26 27 29 [26] 30 31 32 33 34 36 37 38 39 40 41 43 44 45 46 47 48 50

Logical Arithmetic logical expressions evaluate to either true or false. R can coerce TRUE or FALSE into numerical values: 1 or 0 x<-0:6 x<4 [1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE logical functions « all » and « any » all(x>0) [1] FALSE any(x<0) sum(x<4) [1] 4 (x<4)*runif(7) [1] 0.9433433 0.9382651 0.6248691 0.9786844 0.0000000 0.0000000 0.0000000 random number (0-1)

Logical Arithmetic Useful to generate simplified factor levels (treatment<-letters[1:5]) [1] "a" "b" "c" "d" "e" (t2<-factor(1+(treatment=="b")+2*(treatment=="c")+2*(treatment=="d"))) [1] 1 2 3 3 1 Levels: 1 2 3 x <- y x is assigned the value of y (x gets the values of y); x=y in a function or a list x is set to y unless you specify otherwise; x == y produces TRUE if x is exactly equal to y and FALSE otherwise.

Evaluation of combinations of TRUE and FALSE x <- c(NA, FALSE, TRUE) names(x) <- as.character(x) to assign the values as > outer(x, x, "&") <NA> FALSE TRUE <NA> NA FALSE NA FALSE FALSE FALSE FALSE TRUE NA FALSE TRUE Only TRUE & TRUE evaluates to TRUE. > outer(x, x, "|") <NA> FALSE TRUE <NA> NA NA TRUE FALSE NA FALSE TRUE TRUE TRUE TRUE TRUE Only FALSE & FALSE evaluates to FALSE.

Repeats > rep(9,5) [1] 9 9 9 9 9 > rep(1:4,2) [1] 1 2 3 4 1 2 3 4 > rep(1:4,each=2) [1] 1 1 2 2 3 3 4 4 > rep(1:4,1:4) [1] 1 2 2 3 3 3 4 4 4 4 > rep(1:4,c(4,1,4,2)) [1] 1 1 1 1 2 3 3 3 3 4 4

Generate Factor Levels gl(‘up to’, ‘with repeats of’, ‘to total length’) > gl(4,3) [1] 1 1 1 2 2 2 3 3 3 4 4 4 Levels: 1 2 3 4 > gl(4,3,24) [1] 1 1 1 2 2 2 3 3 3 4 4 4 1 1 1 2 2 2 3 3 3 4 4 4 > gl(4,3,20) [1] 1 1 1 2 2 2 3 3 3 4 4 4 1 1 1 2 2 2 3 3 > gl(4,1,20) [1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 > gl(3,2,24,labels=c("A","B","C")) [1] A A B B C C A A B B C C A A B B C C A A B B C C Levels: A B C

Generating Regular Sequences of Numbers > 10:18 [1] 10 11 12 13 14 15 16 17 18 > 18:10 [1] 18 17 16 15 14 13 12 11 10 > -0.5:8.5 [1] -0.5 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 > seq(0,1.5,0.2) [1] 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 not > 1.5 > seq(1.5,0,-0.2) [1] 1.5 1.3 1.1 0.9 0.7 0.5 0.3 0.1 not < 0 To generate an array of values between min(x) and max(x) x.values<-seq(min(x),max(x),(max(x)-min(x))/100)

Generating Regular Sequences of Numbers To generate an array of values between min(x) and max(x) x.values<-seq(min(x),max(x),(max(x)-min(x))/100) Generate a sequence of values with the same length as vector x > x<-rnorm(18,10,2) > seq(88,50,along=x) [1] 88.00000 85.76471 83.52941 81.29412 79.05882 76.82353 74.58824 72.35294 [9] 70.11765 67.88235 65.64706 63.41176 61.17647 58.94118 56.70588 54.47059 [17] 52.23529 50.00000 sequence(x), to produce a vector consisting of sequences > sequence(5) [1] 1 2 3 4 5 > sequence(5:1) [1] 1 2 3 4 5 1 2 3 4 1 2 3 1 2 1 > sequence(c(2,4,1)) [1] 1 2 1 2 3 4 1

Variable Names • Case-sensitive, x is not X. • Should not begin with numbers (e.g. 1x) or symbols (e.g. %x). • Should not contain blank spaces: back.pay (not back pay).

Sorting, Ranking and Ordering > houses<-read.table("/home/.../houses.txt",header=T) > attach(houses) > ranks<-rank(Price) > sorted<-sort(Price) > ordered<-order(Price) > view<-data.frame(Price,ranks,sorted,ordered) > view Price ranks sorted ordered 1 325 12.0 95 9 2 201 10.0 101 6 3 157 5.0 117 10 4 162 6.0 121 12 5 164 7.0 157 3 6 101 2.0 162 4 7 211 11.0 164 5 8 188 8.5 188 8 9 95 1.0 188 11 10 117 3.0 201 2 11 188 8.5 211 7 12 121 4.0 325 1

Sorting, Ranking and Ordering - Using order with subscripts is a much safer option than using sort ! - Values of the response variable and the explanatory variables could be uncoupled with sort > names(houses) [1] "Location" "Price" > Location[order(Price)] [1] Reading Staines Winkfield Newbury Bracknell Camberley [7] Bagshot Maidenhead Warfield Sunninghill Windsor Ascot > Location[rev(order(Price))] [1] Ascot Windsor Sunninghill Warfield Maidenhead Bagshot [7] Camberley Bracknell Newbury Winkfield Staines Reading