Getting your data into R

Slides:



Advertisements
Similar presentations
The INFILE Statement Reading files into SAS from an outside source: A Very Useful Tool!
Advertisements

COMP 116: Introduction to Scientific Programming Lecture 37: Final Review.
Data in R. General form of data ID numberSexWeightLengthDiseased… 112m … 256f3.61 NA1… 3……………… 4……………… n91m5.1711… NOTE: A DATASET IS NOT A MATRIX!
R for Research Data Analysis using R Day1: Basic R Baburao Kamble University of Nebraska-Lincoln.
Introduction to SPSS Short Courses Last created (Feb, 2008) Kentaka Aruga.
Introduction to SPSS (For SPSS Version 16.0)
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 9 Processing the Data.
1 An Introduction – UCF, Methods in Ecology, Fall 2008 An Introduction By Danny K. Hunt & Eric D. Stolen Getting Started with R (with speaker notes)
NA-MIC National Alliance for Medical Image Computing shapeAnalysisMANCOVA_Wizar d Lucile Bompard, Clement Vacher, Beatriz Paniagua, Martin.
Introduction to SPSS Edward A. Greenberg, PhD
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
Session 3: More features of R and the Central Limit Theorem Class web site: Statistics for Microarray Data Analysis.
Introduction to R Lecture 1: Getting Started Andrew Jaffe 8/30/10.
Introduction to SPSS Prof. Ramez Bedwani. Outcomes By the end of this lecture, the student will be able to Know definition, uses and types of statistics.
Outline Comparison of Excel and R R Coding Example – RStudio Environment – Getting Help – Enter Data – Calculate Mean – Basic Plots – Save a Coding Script.
R packages/libraries Data input/output Rachel Carroll Department of Public Health Sciences, MUSC Computing for Research I, Spring 2014.
Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
Learning R hands on. Organization of Folders: Class Data folder has datasets (end in.csv or.rda) Rcode has the scripts of R commands that can cut and.
SP5 - Neuroinformatics SynapsesSA Tutorial Computational Intelligence Group Technical University of Madrid.
Files: By the end of this class you should be able to: Prepare for EXAM 1. create an ASCII file describe the nature of an ASCII text Use and describe string.
R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables 
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
Data & Graphing vectors data frames importing data contingency tables barplots 18 September 2014 Sherubtse Training.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
Before the class starts: 1) login to a computer 2) start RStudio 3) download Intro.R from MyCourses 4) open Intro.R in Rstudio 5) Download “R in Action”
Introduction to R Dr. Satish Nargundkar. What is R? R is a free software environment for statistical computing and graphics. It compiles and runs on a.
Introduction to Data Manipulation, Analysis, and Visualization with R Patrick Grof-Tisza.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
What School / Dept? n=55. What School / Dept? n=55.
HCAI Information for ACtion 2010
EMPA Statistical Analysis
Overview of R and ggplot2 for graphics
Learning to Program D is for Digital.
Introduction to R Carolina Salge March 29, 2017.
Introduction to R.
Introduction to SPSS.
By Dr. Madhukar H. Dalvi Nagindas Khandwala college
N=54.
Introduction to R Commander
DEPARTMENT OF COMPUTER SCIENCE
Summary Statistics in R Commander
Best practices in R scripting
shapeAnalysisMANCOVA_Wizard
R Assignment #4: Making Plots with R (Due – by ) BIOL
Data manipulation in R: dplyr
Ggplot2 I EPID 799C Mon Sep
Perl for Bioinformatics
Lab 1 Introductions to R Sean Potter.
Numerical Descriptives in R
Univariate Data Exploration
Python I/O.
MATH 493 Introduction to MATLAB
Use of Mathematics using Technology (Maltlab)
Preparing your Data using Python
Preparing your Data using Python
Code is on the Website Outline Comparison of Excel and R
This is where R scripts will load
Communication and Coding Theory Lab(CS491)
Installing Packages Introduction to R, Part II
funCTIONs and Data Import/Export
Overview of R and ggplot2 for graphics
DATA MANIPULATION Wendy Harrison Mari Morgan Dafydd Williams
This is where R scripts will load
This is where R scripts will load
Have you signed up (or had) your meeting?
Stat 251 (2009, Summer) Lab 2 TA: Yu, Chi Wai.
Data analysis with R and the tidyverse
By A.Arul Xavier Department of mathematics
SAS/Graph to help data Dose/Concentration consistency review
Graphpad Prism 2.
Presentation transcript:

Getting your data into R Yesterday you did: * Start R Studio and set up a project * Prepare and run R script files * what an R package is and how to install and load them * Load a simple data file * Understand how (& why) R uses dataframes & vectors * Prepare simple tables * Produce good quality graphs, and save these * Locate and perform some statistical tests on your data * Prepare a simple function

Getting your data into R Download -> CSV -> data.frame -> data cleaning Later: EDA At the end of this session, you will be able to: Upload, download, create or import data into R Manipulate large datasets as tables in R Explore datasets using multiple approaches Test for missing, partial or inconsistent data Summarise and compare datasets with R

CSV data First line = names of variables, separated by commas Variables = proper numbers or plain text - no spaces, funny characters or punctuation Data = mix of numbers and grouping variables = time sequence is ok = dates and times are not Missing data = represented by the two letters NA and nothing else - no dashes, no 999, 77, 88 or anything else

CSV data

Data input and output R comes with several pre-packaged datasets You can access these datasets with the data function eg 1990 Davis PMID 2241138 "Body image and weight preoccupation: A comparison between exercising and non-exercising women" View(Davis) head(Davis) str(Davis) glimpse(Davis) summary(Davis)

Data input and output Davis table(Davis$weight) table(Davis$height) data.frame(Davis$height,Davis$weight) # create a dataframe from heights and weights only data.frame(Davis$height,Davis$weight)[1:5,] look at dataframe for heights and weights - samples 1-5

Dplyr -> Data cleaning + EDA => manipulating data: Davis %>% filter(height<quantile(height,0.5)) # subset rows of smallest 50% %>% arrange(desc(height)) # sort rows of smallest 50% %>% select(sex,weight,repwt) # select columns we want %>% mutate(weight_diff=(weight-repwt)) # create new variables, eg "weight_diff" %>% group_by(sex) %>% summarise(mean=mean(weight)) # summarise details of interest

Dplyr -> Data cleaning + EDA => manipulating data: # assign Davis heights vs weights using ggplot2 x <- ggplot(Davis,aes(x=weight, y=height, colour=sex)) + geom_jitter() + geom_line() + stat_smooth(span=0.5) + ggtitle('heights v weights') + xlab('height') + ylab('weight') # Higher spans = smoother # plot it (this way avoids errors) ggsave(filename='heights-weights-plot1.png', plot=x, dpi=1200)

Dplyr -> Data cleaning + EDA => manipulating data: # assign Davis heights vs weights using ggplot2 x2 <- ggplot(Davis,aes(x=weight, y=height, colour=sex)) + geom_boxplot() + coord_flip() + facet_wrap(~sex) + ggtitle('heights v weights') # plot it (NB not all plot types are sensible) ggsave(filename = 'heights-weights-plot2.png', plot=x2, dpi=1200)

Dplyr -> Data cleaning + EDA => manipulating data: scale_h <- function(height) { return(height/185200) } # create function to scale heights as nautical miles Davis$height_nm <-scale_h(Davis$height) # assign a new variable with nautical mile heights Davis$height_nm[1:5] # always check the output