Basics of R INSTRUCTOR: AMANDA MCGOUGH TUESDAY, MARCH 29, 2016.

Slides:



Advertisements
Similar presentations
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Advertisements

Introduction to FX Equation 4. The Basic Idea FX Equation is DIFFERENT. Most equation editors use a point and click interface that has you searching for.
Microsoft Excel The Basics. spreadsheet A type of application program which manipulates numerical and string data in rows and columns of cells. The value.
 Statistics package  Graphics package  Programming language  Can be used to share/reproduce analyses  Many new packages being created - can be downloaded.
 2005 Pearson Education, Inc. All rights reserved Introduction.
© by Pearson Education, Inc. All Rights Reserved.
By Hrishikesh Gadre Session II Department of Mechanical Engineering Louisiana State University Engineering Equation Solver Tutorials.
1 An Introduction to IBM SPSS PSY450 Experimental Psychology Dr. Dwight Hennessy.
Concepts of Database Management Sixth Edition
1 of 5 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2006 Microsoft Corporation.
SPSS Statistical Package for the Social Sciences is a statistical analysis and data management software package. SPSS can take data from almost any type.
Entering Data in Excel. Entering numbers, text, a date, or a time n 1Click the cell where you want to enter data. n 2Type the data and press ENTER or.
Introduction to Array The fundamental unit of data in any MATLAB program is the array. 1. An array is a collection of data values organized into rows and.
The University of Adelaide Table Talk: Using tables in Word Peter Murdoch March 2014 PREPARING GOOD LOOKING DOCUMENTS.
Creating Web Page Forms
Introduction to Excel 2007 Part 3: Bar Graphs and Histograms Psych 209.
LISA Short Course Series Basics of R Lin Zhang Feb. 16, 2015 LISA: Basics of RFeb. 16, 2015.
Adobe Forms THE FORM ELEMENT PANEL. Creating a form using the Adobe FormsCentral is a quick and easy way to distribute a variety of forms including surveys.
Introduction to R Statistical Software Anthony (Tony) R. Olsen USEPA ORD NHEERL Western Ecology Division Corvallis, OR (541)
A Guide to SQL, Eighth Edition Chapter Three Creating Tables.
Introduction to Access By Mary Ann Chaney and Alicia Harkleroad.
LISA Short Course Series R Basics Ana Maria Ortega Villa Fall 2013 LISA: R BasicsFall 2013.
Introduction to MATLAB Session 1 Prepared By: Dina El Kholy Ahmed Dalal Statistics Course – Biomedical Department -year 3.
Fortran 1- Basics Chapters 1-2 in your Fortran book.
STATISTICS Microsoft Excel “Frequency Distribution”
Introduction to SPSS Edward A. Greenberg, PhD
Data, graphics, and programming in R 28.1, 30.1, Daily:10:00-12:45 & 13:45-16:30 EXCEPT WED 4 th 9:00-11:45 & 12:45-15:30 Teacher: Anna Kuparinen.
Objectives Understand what MATLAB is and why it is widely used in engineering and science Start the MATLAB program and solve simple problems in the command.
Miscellaneous Excel Combining Excel and Access. – Importing, exporting and linking Parsing and manipulating data. 1.
McGraw-Hill Technology Education © 2004 by the McGraw-Hill Companies, Inc. All rights reserved. Office Access 2003 Lab 3 Analyzing Data and Creating Reports.
Start the slide show by clicking on the "Slide Show" option in the above menu and choose "View Show”. or – hit the F5 Key.
1 Working with MS SQL Server Textbook Chapter 14.
Teacher’s Assessment Assistant Worksheet Builder Starting the Program
Input, Output, and Processing
Concepts of Database Management Seventh Edition
Chapter 6.  If a cell style will be used over and over again it can be modified in the cell styles gallery  Home ⇒ Cell Styles ⇒ right-click a style.
Chapter 17 Creating a Database.
Getting Started with MATLAB 1. Fundamentals of MATLAB 2. Different Windows of MATLAB 1.
Chapter 12 Creating a Worksheet.
Shannon K. Basher, MLS Houston Academy of Medicine – Texas Medical Center Library.
SESSION 3.1 This section covers using the query window in design view to create a query and sorting & filtering data while in a datasheet view. Microsoft.
Introduction to Programming with RAPTOR
Getting Started with TI-Interactive. TI-Interactive TI-Interactive can be used to create a variety of graphs. Scatter Plots, Line Plots, Histograms, Modified.
WHAT IS A DATABASE? A DATABASE IS A COLLECTION OF DATA RELATED TO A PARTICULAR TOPIC OR PURPOSE OR TO PUT IT SIMPLY A GENERAL PURPOSE CONTAINER FOR STORING.
A Simple Guide to Using SPSS ( Statistical Package for the Social Sciences) for Windows.
WFM 6311: Climate Risk Management © Dr. Akm Saiful Islam WFM 6311: Climate Change Risk Management Akm Saiful Islam Lecture-7:Extereme Climate Indicators.
Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
1 Chapter 3 – Examples The examples from chapter 3, combining the data types, variables, expressions, assignments, functions and methods with Windows controls.
Python Lesson 1 1. Starter Create the following Excel spreadsheet and complete the calculations using formulae: 2 Add A1 and B1 A2 minus B2 A3 times B3.
Introduction to Programming Python Lab 3: Arithmetic 22 January PythonLab3 lecture slides.ppt Ping Brennan
Microsoft Office 2013 Try It! Chapter 4 Storing Data in Access.
R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables 
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
PHP Tutorial. What is PHP PHP is a server scripting language, and a powerful tool for making dynamic and interactive Web pages.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
1-2 What is the Matlab environment? How can you create vectors ? What does the colon : operator do? How does the use of the built-in linspace function.
Introduction to R Dr. Satish Nargundkar. What is R? R is a free software environment for statistical computing and graphics. It compiles and runs on a.
EMPA Statistical Analysis
Multi-Axis Tabular Loads in ANSYS Workbench
Downloading and Preparing a StudentVoice File for SPSS
Introduction to R Carolina Salge March 29, 2017.
Introduction to Programming
Intro to PHP & Variables
Lab 1 Introductions to R Sean Potter.
Introduction to Programming
Activating Your Account and Navigating Through TIDE
PHP.
Introduction to Programming
Introduction to Programming
Data analysis with R and the tidyverse
Presentation transcript:

Basics of R INSTRUCTOR: AMANDA MCGOUGH TUESDAY, MARCH 29, 2016

About LISA o LISA is the source for statistical collaboration and consulting services at VT, free of charge to all students and faculty currently o LISA provides three types of services: Collaboration, Walk-in Consulting, and Short Courses o Collaboration: In depth statistical advice from LISA collaborators which includes meetings about your specific research questions. It is best to meet with LISA before collecting your data. Request a meeting here: o Walk-in Consulting: Answers to short research questions. Schedule available here: o Short Courses: Workshops on a variety of topics. Schedule available here:

Outline o What is R? o Why use R? o Download R and RStudio o Basic Commands in R o Getting Started using R Scripts o Prices Data Set o Variables in R o Exploratory Data Analysis o Plotting your Data o Practice on your own

What is R? o R is a well-developed, simple, and effective programming language that is free. Many scientists, statisticians, analysts, students and others use R for statistical analysis and data visualization. o Data analysis is done in R by writing code and using built-in scripts in the R language. The R environment is equipped with common methods and recent cutting-edge techniques.

Why use R? o R is FREE and open to use. o R provides a wide variety of statistical and graphical techniques. o R is an easy programming language to learn. It is more than just point and click. o If you have a question about R, GOOGLE it. There are plenty of resources that can found online that can help to answer your question.

How to Download R for your computer? o Download R and RStudio o R can run on Unix, Windows, or Mac OS X computing operating systems. o The R software can be downloaded from CRAN here: o Once you have clicked on the version for your operating system the download window should appear. o For Windows users, click on Install R for the first time and then download R for Windows. o For Mac user, click on the package that fits your operating system.

How to Download RStudio for your computer? o After installing the R software, download RStudio which provides an easy to use Graphical User Interface (GUI). o Download RStudio here: o Download RStudio Desktop. o Install RStudio.

What does RStudio look like?

Using R as a Calculator o The simplest thing that R can do is calculate basic arithmetic expressions. o In the console, type any arithmetic expressions in and then hit the Enter key. o Try typing in several arithmetic expressions yourself! 3*(2 + 2) 3^2 sqrt(2) o What can go wrong here?

R Script Files o For your project, you may want to save the R commands that you are doing for your analysis. o You can get a new script by doing: File -> New File -> R Script o As you are working on a script, you should add comments to them in order to remind yourself what the script is doing. A comment is added in a script by using # sign. Comments are written to the left of the # sign and should show up in green. x <- sqrt(3) # x is the square root of 3 o Save your script file by clicking on the floppy disk or by using: File -> Save

Running your Script Once you have added text to your script you can run it in several ways: 1.One line at a time 2.Sections at a time (by selecting what you want) 3.The whole thing You can either click the run button in R, or use the commands below: PC: ctrl+R Mac: command+Enter

Creating Variables o You can store values in variables by assigning a value to a name, using either the = or -> operator. > x = 5 > x <- 2 > y = x + 1 > w = abs(-5) o Storing variables is very helpful when you have a lot of code and you are referring back to it throughout your analysis. These stored variables will show up in the environment.

Packages in R o Packages are sets of libraries. o library()This command lists all of the libraries installed in your computer. o sessionInfo() This command lists all of the libraries that are in use during your session. o You can also look in the panel on the bottom right and see which libraries are loaded by seeing which squares are checked. If you need to load a different library, click on the square next to its name.

Naming your objects o We have named several basic objects earlier. o How should you name your objects? Here are some tips to consider: 1.Only name using upper and lower cases, number, underscores(_), and periods(.). 2.Begin with either upper/lower case letter or dot. 3.R is case sensitive. 4.Do not use on of R’s reserved words. The command help(reserved) will give you a list of these words.

Vectors o A vector is an object that holds several data values of the same type, which are arranged in a particular order. To create vectors, we use the c() function. o Suppose we have data on whale beaching per year in Texas starting in 1990: o Create a new object called whales written in vector form: whales <- c(74, 122, 235, 111, 292, 111, 211, 133, 156, 79)

Vectors cont’d o Different commands can be used with vectors, such as the following: whales + 1 whales^2 mean(whales) var(whales) exp(whales) length(whales) whales[3]

Vectors cont’d o You can also combine data vectors into one vector. Suppose we have: temp1 <- c(3, 3.76, -0.35) temp2 <- c(1, 2.5, -5) temp <- c(temp1, temp2) o One restriction on combining vectors is that they have to be the same type. So far, all of the vectors that we have created are numeric. One example of a character vector is: pets <- c(‘dog’, ‘cat’, ‘parrot’, ‘snake’)

Sequences and Repeating Values o The : command creates a sequence that increments/decrements by 1: seq(1:5) seq(5:1) o You can create a sequence of values by specifying the length or by how much in between: seq(0,5,length=15) seq(1,10,by=2) o The rep() command repeats values or sets of values: rep(1,5) rep(c(1,2,3),5)

Data Frames o A data frame is used for storing data tables including lists of vectors of equal length which are displayed vertically and arranged side by side. o All of the values in the same column must be of the same type, but each column can hold different types of data. (e.g. pets, temperature, age, gender) o This helps us to store data sets with each column representing a variable and each row representing an observation. o First, we will work with some data sets available in R and later you can use your own dataset!

Importing.csv Files o In a CSV file, the data values are arranged with one observation per line. o Data values are separated by commas within each line. o You can import a CSV file using: read.csv(‘folder/filename.csv’) o We will name our data prices, so we have prices <- read.csv('/Users/amandamcgough/Desktop/prices.csv’) o BE AWARE: Make sure that your slashes are all facing the same way as shown in the examples above. When copying the location over, the computer may use backward slashes instead of forward slashes. o Another option to import your data set is under Tools in R.

Prices Data Set (prices.csv) o The prices data set is a random sample of records of resales of homes from Feb 15 to Apr 30 in 1993 from the files maintained by the Albuquerque Board of Realtors. This type of data was collected by multiple listing agencies in multiple listing agencies in many cities and is used by realtors as an information base. o Number of cases: 65 o Variable names: - PRICE = Selling price in hundreds of dollars - SQFT = Square feet of living space - AGE = Age of home in years - NE = Located in northeast sector of city (1) or not (0)

Investigating the Prices Data Set o Once the data set is stored in our environment, we can quickly view the data by clicking on prices over in the environment. o Each row corresponds to a particular house. o Each column represents a variable that was measured (price, sqft, age, ne) o Using bracket notation [row,column] we can find particular pieces of our data: prices[1,1] prices[10,] prices[,2] o Also, you can use dollar sign notation. This only works for the variable names. prices$SQFT

More Investigating o You can use the minus sign to exclude part of the data set: prices[-1] -> This excludes the first column prices[-1,] -> This excludes the first row o You can use sequences and vectors like before in the bracket notation: prices[1:5,] -> This returns the first five rows or observations (houses) prices[c(1,2,4,8),] -> This returns rows 1, 2, 4, and 8

Other Commands for Data Sets o The function names(prices) gives the names of the variables inside the data set. o The function head(prices) gives the names of the first six observations of the data set. o The function tail(prices) gives the last six observations of the data set. o The function dim(prices) gives the dimension of the row first (how many total observations) and the dimension of the column second (how many variables)

Variable Classifications o All of the variables in a dataset has a class. The class describes the type of data the variable contains. o To determine the class of the variable use: class(dataset$variable) class(prices$SQFT) o To check all of the classes at the same time use: sapply(dataset, class) sapply(prices, class)

Types of Variables o There are five different types of variables that can found in a data frame: 1)numeric: contains real numbers; can be positive or negative; with or without decimals; missing values are represented as NA 2)integer: can be positive or negative; NO decimals; if a fractional part is included then an integer variable is automatically converted to a numeric variable 3)factor: used for categorical data; values can either be character strings or numbers (representing categories) 4)date and POSIXIt: contains dates in a special format 5)character: contains character strings; suitable for any data that does not belong in one of the other types of variables above

Changing the Type of a Variable o Change to factor: dataset$variable <- as.factor(dataset$variable) prices$NE <- as.factor(prices$NE) o Change to numeric: dataset$variable <- as.numeric(dataset$variable) prices$SQFT <- as.numeric(prices$SQFT) o Change to character: dataset$variable <- as.character(dataset$variable)

Exploratory Data Analysis o You can produce a summary for all of the variables in a dataset, or calculate them one at a time. o summary(prices) o mean(prices$SQFT) #Note: Name SQFT <- prices$SQFT o mean(SQFT) o median(SQFT) o sd(SQFT)

Plotting your Data o The first thing that you do when conducting a statistical analysis is PLOTTING YOUR DATA. o Plots help you display your data and results that others can understand and allows you to spot features of the data like outliers and shape of the distribution.

The Simplest Plot o The most basic plot of a continuous variable against the observations use: plot(variable, type=“p”) plot(SQFT) o There are many different options that you can modify for your plot. To change titles on your plot, axis numeric labels and more then go to Help on the right and type in plot. Also, you can type in ?plot into the console.

Other Plots o A histogram helps to show the distribution of continuous variables. The data is divided into equal length intervals and the number of observations is counted that fall into each interval. hist(variable, breaks=10) hist(SQFT, breaks=10) o A scatter plot helps to show the relationship between two continuous variables. The order matters as the first variable gets displayed on the y-axis and the second variable gets displayed on the x-axis. plot(variable1~variable2, dataset) plot(prices$PRICE~prices$SQFT, prices, pch=8) o To connect the dots in a scatter plot, you can use type=‘l’ in the command.

Other Plots o A boxplot can be used to display the summary statistics of a dataset. boxplot(variable) boxplot(SQFT) o Remember that you can add a title and axis labels to any plot. There is a variety of different options in order to make your plot look nice for a paper or presentation. o R makes it easy to export plots by clicking on the Export tab above your plot. Here you can save your image and add it to your paper or presentation!

Practice – National Longitudinal Mortality Survey 1. Read the data file NLMS.csv into R. It may take a few minutes to load. 2. Delete column 14 of the data set and store this revised data set. Note: This will take a while to load again, so name it before you run it. 3. Determine the class of variables in the data set. 4. Obtain summary statistics for the variable povpct. 5. Create a histogram and boxplot for the variable povpct.

Questions o If you have any basic R questions, first try to GOOGLE it. o If you have no luck with that, then shoot me an at: o Make sure you signed in on the sign-in sheet and complete the survey that will be sent to you by . THANK YOU!!!