R: Packages & Data. Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task.

Slides:



Advertisements
Similar presentations
Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.
Advertisements

Database Basics. What is Access? Database management system Computer-based equivalent of a manual database Makes it easy to organize and update information.
CC SQL Utilities.
R for Macroecology Aarhus University, Spring 2011.
Understanding Microsoft Excel
R tutorial g/methods2.2010/R-intro.pdf.
Data in R. General form of data ID numberSexWeightLengthDiseased… 112m … 256f3.61 NA1… 3……………… 4……………… n91m5.1711… NOTE: A DATASET IS NOT A MATRIX!
Basics of Using R Xiao He 1. AGENDA 1.What is R? 2.Basic operations 3.Different types of data objects 4.Importing data 5.Basic data manipulation 2.
Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.
Access - Project 1 l What Is a Database? –A Collection of Data –Organized in a manner to allow: »Access »Retrieval »Use of That Data.
1 An Introduction to IBM SPSS PSY450 Experimental Psychology Dr. Dwight Hennessy.
SPSS 1: An Introduction to the Statistical Package SPSS Suzie Cro MRC Clinical Trials Unit.
How to Use the R Programming Language for Statistical Analyses Part I: An Introduction to R Jennifer Urbano Blackford, Ph.D. Department of Psychiatry Kennedy.
Statistical Software An introduction to Statistics Using R Instructed by Jinzhu Jia.
Baburao Kamble (Ph.D) University of Nebraska-Lincoln Data Analysis Using R Week2: Data Structure, Types and Manipulation in R.
Using Unix Shell Scripts to Manage Large Data
Introduction to R Statistical Software Anthony (Tony) R. Olsen USEPA ORD NHEERL Western Ecology Division Corvallis, OR (541)
3. Functions and Arguments. Writing in R is like writing in English Jump three times forward Action Modifiers.
Lesson 1 – Microsoft Excel The goal of this lesson is for students to successfully explore and describe the Excel window and to create a new worksheet.
Chapter 2: Working with Data in a Project
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. ACCESS 2007 M I C R O S O F T ® THE PROFESSIONAL APPROACH S E R I E S Lesson 4 – Creating New.
732A44 Programming in R.  Self-studies of the course book  2 Lectures (1 in the beginning, 1 in the end)  Labs (computer). Compulsory submission of.
Data, graphics, and programming in R 28.1, 30.1, Daily:10:00-12:45 & 13:45-16:30 EXCEPT WED 4 th 9:00-11:45 & 12:45-15:30 Teacher: Anna Kuparinen.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
Introduction to to R Emily Kalah Gade University of Washington Credit to Kristin Siebel for development of much of this PowerPoint.
Introduction to R Part 2. Working Directory The working directory is where you are currently saving data in R. What is the current working directory?
Arko Barman with modification by C.F. Eick COSC 4335 Data Mining Spring 2015.
Data Objects in R Vector1 dimensionAll elements have the same data types Data types: numeric, character logic, factor Matrix2 dimensions Array2 or more.
Teacher’s Assessment Assistant Worksheet Builder Starting the Program
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
Chapter 1: Introduction to SAS  SAS programs: A sequence of statements in a particular order  Rules for SAS statements: –Every SAS statement ends in.
CREATING TEMPLATES CREATING CUSTOM CHARACTERS IMPORTING BATCH DATA SAVING DATA & TEMPLATES CREATING SERIES DATA PRINTING THE DATA.
R Programming Yang, Yufei. Normal distribution.
Lesson 11: Looking at Files and Folders what a file or folder is on the computer how to recognize a file or folder on the desktop how to recognize the.
Page 1 Non-Payroll Cost Transfer Enhancements Last update January 24, 2008 What are the some of the new enhancements of the Non-Payroll Cost Transfer?
R packages/libraries Data input/output Rachel Carroll Department of Public Health Sciences, MUSC Computing for Research I, Spring 2014.
WHAT IS A DATABASE? A DATABASE IS A COLLECTION OF DATA RELATED TO A PARTICULAR TOPIC OR PURPOSE OR TO PUT IT SIMPLY A GENERAL PURPOSE CONTAINER FOR STORING.
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Key Applications Module Lesson 17 — Organizing Worksheets Computer Literacy BASICS.
Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
Chapter 1: Overview of SAS System Basic Concepts of SAS System.
STAGES Language Application Overview. The Language Application is available on a separate URL (typically /stagesLanguage) and tied to only one database.
Bioinformatics for biologists
1 Data Manipulation (with SQL) HRP223 – 2009 October 12, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
1 EPIB 698C Lecture 1 Instructor: Raul Cruz-Cano
Review > x[-c(1,4,6)] > Y[1:3,2:8] > island.data fishData$weight[1] > fishData[fishData$weight < 20 & fishData$condition.
16BIT IITR Data Collection Module If you have not already done so, download and install R from download.
Review > unique(plates) > is.numeric(plates) > cut(ages, breaks=c(0,18,65,Inf), labels=c("Kid","Adult","Senior")) > letters > month.name > c(Inf, NA, NaN,
Working with data in R 2 Fish 552: Lecture 3. Recommended Reading An Introduction to R (R Development Core Team) –
Introduction to R and Data Science Tools in the Microsoft Stack Jamey Johnston.
To play, start slide show and click on circle Access 1 Access 2 Access 3 Access 4 Access Access
Data Manipulation in R Fish 552: Lecture 6. Recommended reading Data Manipulation with R (Phil Specter, 2006)
Understanding Microsoft Excel
Understanding Microsoft Excel
Programming in R Intro, data and programming structures
Microsoft Office 2007-Illustrated
Key Applications Module Lesson 17 — Organizing Worksheets
Welcome to Math’s Tutorial Session-3 Data handling
Database application MySQL Database and PhpMyAdmin
Introduction to R Studio
Uploading and handling databases
Chapter 1: Introduction to SAS
Instructor: Raul Cruz-Cano
Microsoft Excel 2007 – Level 2
Basics of R, Ch Functions Help Managing your Objects
CSCI N317 Computation for Scientific Applications Unit R
R Course 1st Lecture.
Data analysis with R and the tidyverse
Data Manipulation (with SQL)
R tutorial
Presentation transcript:

R: Packages & Data

Presented here are a number of ways to accomplish a task, some are redundant or may not represent the best way to accomplish a task. However, some “quick & dirty” commands are useful to know for when all the “better” options aren’t working

R Packages What is an R package? –A series of programs bundled together Once installed a copy of the package lives on the computer and doesn’t need to be reinstalled Updating R –Must reinstall packages –May loose packages that aren’t kept updated

Packages-> Install Package

Choose a Mirror Site

Choose Package

Loading Package/Contents To load a package –library(package name) Contents of package –library(help= package name) For additional documentation – Packages  Package Name  Downloads: Reference Manual Note: Some packages may overwrite the contents or functions in another package, when this happens it will be indicated in the log

Advanced: Loading Packages To find out what packages are already installed on a computer –installed.packages() To check if a given package is installed –is.installed <- function(mypkg) is.element(mypkg, installed.packages()[,1]) To install a package without clicking through windows –Install.package(“Package Name”) These last two commands are particularly helpful when writing functions for other users

Functions within a Package To get help –?FunctionName –??Topic of Interest To see the source code –Function Name To see an example –example(Function Name)

Getting Started: Loading Files help(topic) ?topic help.search(“topic”) ??topic str() ls() dir() history() library() library(help=) rm() rm(list=ls()) example() setwd() source() function

Data Manipulation: Data Entry Types of Data –Numerical, categorical, logical, factors –mode(variable) Formats of Data –Scalar, vector/array, matrix, data frame, list Ways to enter data –Manually –read.csv,read.table,scan –library(foreign) –library(Hmisc)

Importing from SAS Option One: –In SAS proc export DATA=file DBMS=CSV OUTFILE=“destination\name.csv"; run; –In R read.csv()

Syntax –read.csv(file, header = TRUE, sep = ",“, dec=".", fill = TRUE,...) File: the name of the file which the data are to be read from. Each row of the table appears as one line of the file. If it does not contain an absolute path, the file name is relative to the current working directory, getwd(). File can also be a complete URL. Header: a logical value indicating whether the file contains the names of the variables as its first line. If missing, the value is determined from the file format: header is set to TRUE if and only if the first row contains one fewer field than the number of columns. Sep: the field separator character. Values on each line of the file are separated by this character. If sep = "" (the default for read.table) the separator is ‘white space’, that is one or more spaces, tabs, newlines or carriage returns. Dec: the character used in the file for decimal points. fill :logical. If TRUE then in case the rows have unequal length, blank fields are implicitly added. See ‘Details’. Additional Options available, see documentation Note: If you’re desperate to read in an unusual data type see “scan”

.RData The extension.RData is a way to store objects created in R. Store using the command save(c(object1, object2),file=“Storage.RData”) Access later using load( “Storage.RData”)

Advanced: Reading Data directly from SAS or STATA SAS Option Two: –In SAS libname library xport =“destination\name.xpt"; data library.data; set data; run; –In R library(Hmisc) data<-sasexport.get(“destination\name.xpt“) STATA –library(foreign) NOTE: THE PACKAGE FOREGIN CAN HANDLE MULTIPLE FILE TYPES INCLUDING SAS –data.stata<-read.dta(“file.dta")

Data Entry c(…) seq(from,to) rep(x,times) data.frame() list() matrix() read.dta() sasxport.get() read.csv() data() data(R DataSet) help(R DataSet) load()

Data Information mode() is.character() is.numeric() is.logical() is.factor() class() is.matrix() is.data.frame() names() head() tail() length() dim() nrow() ncol() is.na() dimnames() rownames() colnames() unique() describe() levels()

Data Manipulation It is possible to access subsets of a data item using bracketed commands. (e.g. x[n] ) Options to do this includes the everything but command (x[-n]), multiple selections (x[1:n] or x(c(1,2,3)]) Logical Arguments can also be used (x[x > 3 & x < 5]) Lists use a double bracketing structure ( x[[n]] ) Data frame items can be called using two formats –x[[“name”]] –x$name Anything with row and column data uses a double structure to index (x[ i, j ])

Data Manipulation as.numeric() as.logical() as.character() as.array() as.data.frame() as.matrix() factor() ordered() t() reshape() cat() rbind() cbind() merge() sort() order() library(reshape) rownames()<-c() colnames()<-c() na.omit() cut()

Character & Time Based Data nchar() substr() tolower() toupper() chartr() grep() match() %in% pmatch() charmatch() sub() strsplit() paste() Sys.time() Sys.Date() date() as.Date as.POSIXct()

SymbolMeaning %dDay as a number (01-31) %aAbbreviated Weekday (Mon) %HHours as decimal number (00-23) %IHours as decimal number (01-12) %wWeekday as decimal number (0–6, Sunday is 0). %WWeek of the year as decimal number (00–53) using Monday as the first day of week (and typically with the first Monday of the year as day 1 of week 1). The UK convention. %xDate, locale-specific. %XTime, locale-specific. %zTime zone %jDays of year as decimal number ( ) %MMinute as decimal number (00-59) %pAM/PM indicator in the lcoale (Used in conjunction with %I and not with %H) %SSecond as decimal number (00-61), allowing for up to two leap-seconds %UWeek of the year as a decimal (00-53), using Sunday as the first day 1 of the week %AUnabbreviated Weekday (Monday) %cDate and time, locale-specific. %mMonth (2) %bAbbreviated Month (Feb) %BUnabbreviated Month (February) %yTwo-Digit Year (11) %YFour-Digit Year(2011)

Data Export ftable() format() paste() xtable() write.table(data,"clipb oard",sep="\t",col.na mes=NA) write.csv() write.foreign() write.dta sink() save() print() save.image()

format() Syntax –format(x, trim = FALSE, digits = NULL, nsmall = 0L, justify = c("left", "right", "centre", "none"), width = NULL, na.encode = TRUE, scientific = NA, big.mark = "", big.interval = 3L, small.mark = "", small.interval = 5L, decimal.mark = ".", zero.print = NULL, drop0trailing = FALSE,...) –X: any R object –Trim: logical, if FALSE numbers are right-justified to a common width, If TRUE the leading blacks for justification are suppressed. –Digits: how many significant digits should be used. –justify: character, vector should be left-justified, right-justified, or centered. –See also format.Date,(methods for dates) format.POSIXct (date-times)

Extra Resources

Advanced Packages to try –gtools –reshape Journal of Statistical Computing – Journal of Statistical Software –