Data Management Module: Concatenating, Stacking, Merging and Recoding

Slides:



Advertisements
Similar presentations
Haas MFE SAS Workshop Lecture 3:
Advertisements

Basics of Using R Xiao He 1. AGENDA 1.What is R? 2.Basic operations 3.Different types of data objects 4.Importing data 5.Basic data manipulation 2.
A Guide to SQL, Seventh Edition. Objectives Create a new table from an existing table Change data using the UPDATE command Add new data using the INSERT.
Basic And Advanced SAS Programming
Baburao Kamble (Ph.D) University of Nebraska-Lincoln Data Analysis Using R Week2: Data Structure, Types and Manipulation in R.
Adding and Subtracting Decimals. Essential Question: How do I add and subtract decimals? Always line up decimals Add and subtract like you always do Bring.
ALEXANDER C. LOPILATO R: Because the names of other stat programs don’t make sense so why should this one?
Arko Barman with modification by C.F. Eick COSC 4335 Data Mining Spring 2015.
Math Pacing Elimination Using Addition and Subtraction 1.The sum of two numbers is 31. The greater number is 5 more than the lesser number. What are the.
DBSQL 3-1 Copyright © Genetic Computer School 2009 Chapter 3 Relational Database Model.
Data Objects in R Vector1 dimensionAll elements have the same data types Data types: numeric, character logic, factor Matrix2 dimensions Array2 or more.
Relational Databases Database Driven Applications Retrieving Data Changing Data Analysing Data What is a DBMS An application that holds the data manages.
R packages/libraries Data input/output Rachel Carroll Department of Public Health Sciences, MUSC Computing for Research I, Spring 2014.
Introduction to Programming in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone
Introduction to Enterprise Guide Jennifer Schmidt Rhonda Ellis Cassandra Hall.
Programming in R SQL in R. Running SQL in R In this session I will show you how to: Run basic SQL commands within R.
Programming in R Subset, Sort, and format data. In this session, I will introduce the topics: Subsetting the observations in a data frame. Sorting a data.
Sociology 680 SPSS Introduction. Using SPSS The Statistical Package for the Social Sciences (SPSS) started at Stanford University in the late 1960’s.
R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables 
Chapter 3-4 More R functions Graphs!. Random note The package DSUR from the Field book is not a thing. ◦ That’s ok! We’ll figure it out.
R Workshop #2 Basic Data Analysis. What we did last week: Understand the basics of how R works Generated objects (vectors, matrices, etc.) Read in data.
Data & Graphing vectors data frames importing data contingency tables barplots 18 September 2014 Sherubtse Training.
Oracle9i Developer: PL/SQL Programming Chapter 11 Performance Tuning.
Quiz Which of the following is not a mandatory characteristic of a relation? Rows are not ordered (Not required) Each row is a unique There is a.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
16BIT IITR Data Collection Module If you have not already done so, download and install R from download.
Computer Organization CS345 David Monismith Based upon notes by Dr. Bill Siever and notes from the Patternson and Hennessy Text.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 7 & 10 By Tasha Chapman, Oregon Health Authority.
Introduction to R user-friendly and absolutely free
More SQL: Complex Queries,
DATA MANAGEMENT MODULE: USING SQL in R
R: Working with Databases
Arko Barman COSC 6335 Data Mining Fall 2014
Putting tables together
DATA MANAGEMENT MODULE: Getting Data Into and Out of R
Welcome to Math’s Tutorial Session-3 Data handling
DATA MANAGEMENT MODULE: Subsetting and Formatting
DATA MANAGEMENT MODULE: Concatenating, Stacking and Merging
Working with Data in Windows
DATA MANAGEMENT MODULE: Getting Data Into and Out of R
ECONOMETRICS ii – spring 2018
DATA MANAGEMENT MODULE: USING SQL in R
By Don Henderson PhilaSUG, June 18, 2018
Lesson 4.2 Adding and Subtracting Decimals
Lesson 4.2 Adding and Subtracting Decimals
Adding and Subtracting Decimals
R Data Manipulation Bootstrapping
Physical Join Operators
DATA MANAGEMENT MODULE: Managing Variables
Lab 2 Data Manipulation and Descriptive Stats in R
Preparing your Data using Python
Adding and Subtracting Decimals
Preparing your Data using Python
DATA MANAGEMENT MODULE: Subsetting and Formatting
DATA MANAGEMENT MODULE: Concatenating, Stacking and Merging
DATA MANAGEMENT MODULE: Managing Variables
Lesson 35 Adding and Subtracting Decimals
Statistics 540 Computing in Statistics
Combining Data Sets in the DATA step.
Data Management Module: Subset, Sort, and Format data
CSCI N317 Computation for Scientific Applications Unit R
Lab 2 and Merging Data (with SQL)
Data Management Module: Creating, Adding and Dropping Variables
Adding and Subtracting Decimals
Lecture 5 Binary Operation Boolean Logic. Binary Operations Addition Subtraction Multiplication Division.
Adding and Subtracting Decimals
Data analysis with R and the tidyverse
TEAM NAME MEMBER 1 MEMBER 2 MEMBER 3 MEMBER 4
Adding and Subtracting Decimals
Lesson 37 Adding and Subtracting Decimals
Presentation transcript:

Data Management Module: Concatenating, Stacking, Merging and Recoding Programming in R Data Management Module: Concatenating, Stacking, Merging and Recoding

Data Management Module Importing and Exporting Imputting data directly into R Creating, Adding and Dropping Variables Assigning objects Subsetting and Formatting Working with SAS Files Using SQL in R Concatenating, Stacking and Merging Replacing values

Data Management: SAS Files in R You have likely noticed that SAS files have a .sas7bdat extension. To get these files into R, you must: Convert them into xport files in SAS. Install the hmisc package in R (or the foreign package). Use the sasxport.get function to “write” the SAS data into an R dataframe.

Data Management: Concatenating To “concatenate” basically means to bring together columns (vectors) of data. In R, this is accomplished through the function cbind: Newdata <- cbind(data1, data2) This will create as many columns are in the sum of data1 and data2. Note that a “matchkey” is not needed.

Data Management: Stacking To “stack” basically means to bring together rows of data. In R, this is accomplished through the function rbind: Newdata <- rbind(data1, data2) This will create as many rows are in the sum of data1 and data2. Note that there MUST be the same column names in data1 and data2. Note that a “matchkey” is not needed.

Data Management: Merging To “Merge” basically means to bring together dataframes. In R, this is accomplished through the function merge: Newdata <- merge (data1, data2, by="PrimaryKey", all="TRUE") Note that all = TRUE will include all rows and columns for both data1 and data2 – essentially an outer join. all=FALSE will include only rows and columns that are present in both data1 and data2 – essentially an inner join. Note that a “matchkey” IS needed.

Data Management: Recoding values Frequently when you receive data either a) the values are incorrect or b) they are “coded” with designated values – like 999. These values need to be replaced. This process is called recoding or imputation – depending on the logic behind the replacement. These notes will cover recoding (simply replacing one value with another). Later notes will cover statistical imputation methods.

Data Management: Recoding values At this point, lets recode values using the same logic you would use in Excel: IF(Condition, value if true, value if false) In R: newvariable<-ifelse(oldvariable test, value if true, value if false)