Programming For Big Data Darren Redmond
Programming with R R is programming language and integrated suite of software with facilities for data manipulation, calculation and graphical display R is widely used for statistical software development and data analysis R is based on “S” programming language Try R - http://tryr.codeschool.com/
Introduction It runs on a variety of platforms including Windows, Mac OS and Linux It contains advanced statistical routines It has a large, coherent, integrated collection of intermediate tools for data analysis R is highly extensible through the use of packages Powerful graphics capabilities
Hello World Hello World! Very simple to implement a Hello World! program with R: print(“Hello World”)
R Expressions & Operators An expression may also be a single literal or variable Assignment In R we assign a value to a variable using the assignment operators <- = ->
R Operators Operator Description Example Result + Addition 2 + 3 5 - Subtration (binary operator) 3 - 2 1 Negation (unary operator) -3 * Multiplication 7*5 35 / Division 5/3 1.666667 %% Modulus 34 %% 2 %/% Integer Division 6%/%4 ^ or ** Exponentiation 2^3 8
Precedence and Conversions Operator precedence in R from highest to lowest ^ exponentiation (right to left) - + unary minus and plus %any% special operators (including %% and %/%) * / multiply, divide + - (binary) add, subtract Data Type Conversions Use is.foo to test for data type foo. Returns TRUE or FALSE Use as.foo to explicitly convert it. is.numeric(), is.character(), is.vector(), etc. as.numeric(), as.character(), as.vector(), etc.
Statements Sequence Statements Instructions given in a sequence Have a begin and an end point In Class Example – transform from Celsius To Fahrenheit celciusInput <- readline("please enter a Celcius value:") celciusNumeric <- as.numeric(celciusInput) fahrResults <- ((9/5)*celciusNumeric)+32 fahrString <- sprintf("%.2f", fahrResults) message <- paste("the result is", fahrString, sep=" ") print(message)
The simple if statement R syntax Program control: if the condition is true then the expr is executed otherwise the expr is skipped. Python vs. R example The if-else statement Program control: if the condition is true then the expr1 is executed, otherwise the expr2 is executed.
Repetition Statements A repetition statement tells the program to execute the program repeatedly The while statement R syntax while (loop) expression Program control: while the loop-continuation-condition is true the expr is executed Example: The for statement Is a counter-controlled loop => you know how many times the loop will execute for (count in c(1, 2, 3)) { print(count) } Program control: the expr are repeated for each item in the sequence
Datatypes in R atomic types Vectors List Factor data.frame Numeric, Integer, complex, logical, character Vectors sequence of data elements of the same basic type mutable (i.e., their values can change once created) List generic vector containing other objects Factor vector augmented with information about the possible categories, called the levels of the factor data.frame a table-like structure experimental results often collected in this form
Functions A sequence of reusable statements that performs a desired operation Contains a header and a body R syntax without a return value functionName <-function(list of parameters){ funtionStatementblock } R syntax with a return value return(Expression)
Function Implementation factorial <- function(n) { if (n == 0) { return(1) } else { return(n * factorial(n - 1)) } average <- function(number1, number2) { return ((number1 + number2) / 2) http://www.johnmyleswhite.com/notebook/2010/08/17/unit-testing-in-r-the-bare-minimum/
Function Calling Local vs. global variables >average(5, 6) 5.5 Variables declared inside the function are local variables and cannot be used outside the function Global variables can be accessed anywhere in the program including inside functions >average(5, 6) 5.5 >factorial(5) 120
File Input / Output File open File close myfile = file(“cities.txt”, open = “w”) myfile File close close(myfile)
File Modes Open Mode Description “r” or “rt” Opens a file for reading in text mode “w” or “wt” Opens a file for writing in text mode “a” or “at” Opens a file for appending data the end of the file “rb” Opens a file for reading binary data “wb” Opens a file for writing binary data “r+” Opens a file for reading and writing in text mode
Summary Introduction Expressions Statements Data types & Structures Functions File Input Output 22/04/2019