Data Management Module: Subset, Sort, and Format data

Slides:



Advertisements
Similar presentations
Unofficial Guide. Part 1 – The Basics Connection setup and running basic queries Playing with query results sorting, filtering and exporting Part 2 –
Advertisements

1 Creating and Tweaking Data HRP223 – 2010 October 24, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
A Guide to SQL, Seventh Edition. Objectives Create a new table from an existing table Change data using the UPDATE command Add new data using the INSERT.
Codebook Centric to Life-Cycle Centric In the beginning….
Information systems and databases Database information systems Read the textbook: Chapter 2: Information systems and databases FOR MORE INFO...
Table maintenance revisited (again) Please use speaker notes for additional information!
Introduction –All information systems create, read, update and delete data. This data is stored in files and databases. Files are collections of similar.
CS1100: Access Reports A (Very) Short Tutorial on Microsoft Access Report Construction Created By Martin Schedlbauer With contributions from Matthew Ekstrand-Abueg.
1 Chapter 4: Introduction to Lookup Techniques 4.1 Introduction to Lookup Techniques 4.2 In-Memory Lookup Techniques 4.3 Disk Storage Techniques.
PHP meets MySQL.
McGraw-Hill Technology Education © 2004 by the McGraw-Hill Companies, Inc. All rights reserved. Office Access 2003 Lab 3 Analyzing Data and Creating Reports.
Banner and the SQL Select Statement: Part Three (Joins) Mark Holliday Department of Mathematics and Computer Science Western Carolina University 4 November.
Chapter 5 Conceptualization, Operationalization, and Measurement.
THE ART OF CODING OF QUESTIONNAIRES By David Onen (Ph.D) Lecturer, Department Of Higher Degrees Uganda Management Institute (UMI) A paper presented to.
CS 338Query Evaluation7-1 Query Evaluation Lecture Topics Query interpretation Basic operations Costs of basic operations Examples Textbook Chapter 12.
Pseudocode Simple Program Design Third Edition A Step-by-Step Approach 2.
Programming in R SQL in R. Running SQL in R In this session I will show you how to: Run basic SQL commands within R.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
A Guide to SQL, Eighth Edition Chapter Six Updating Data.
1 What is Data? l An attribute is a property or characteristic of an object l Examples: eye color of a person, temperature, etc. l Attribute is also known.
Programming in R Subset, Sort, and format data. In this session, I will introduce the topics: Subsetting the observations in a data frame. Sorting a data.
Manipulating Data Lesson 3. Objectives Queries The SELECT query to retrieve or extract data from one table, how to retrieve or extract data by using.
IPFIX MIB Status Managed Object for IP Flow Export A Status Report Thomas Dietz Atsushi Kobayashi
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
To play, start slide show and click on circle Access 1 Access 2 Access 3 Access 4 Access Access
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
Survey Training Pack Session 14 – Transferring CSPro, Access and Excel Files to SPSS.
Session 1 Retrieving Data From a Single Table
var variableName:datatype;
SPSS For a Beginner CHAR By Adebisi A. Abdullateef
Formulas, Functions, and other Useful Features
Understanding Data Storage
Active Learning Lecture Slides
Project Management: Messages
DATA MANAGEMENT MODULE: USING SQL in R
GO! with Microsoft Office 2016
Putting tables together
Topics Procedural and Object-Oriented Programming Classes
DATA MANAGEMENT MODULE: Getting Data Into and Out of R
PL/SQL LANGUAGE MULITPLE CHOICE QUESTION SET-1
Database application MySQL Database and PhpMyAdmin
Quiz Questions Q.1 An entity set that does not have sufficient attributes to form a primary key is a (A) strong entity set. (B) weak entity set. (C) simple.
DATA MANAGEMENT MODULE: Subsetting and Formatting
GO! with Microsoft Access 2016
DATA MANAGEMENT MODULE: Concatenating, Stacking and Merging
Data Management Module: Concatenating, Stacking, Merging and Recoding
SQL – Application Persistence Design Patterns
DATA MANAGEMENT MODULE: Getting Data Into and Out of R
DATA MANAGEMENT MODULE: USING SQL in R
Chapter 18: Modifying SAS Data Sets and Tracking Changes
Chapter 3 The Relational Database Model
DATA MANAGEMENT MODULE: Managing Variables
Chapter 5 Conceptualization, Operationalization, and Measurement
Introduction to Customizing Reports in SAP
Observations, Variables and Data Matrices
Variables ICS2O.
DATA MANAGEMENT MODULE: Subsetting and Formatting
DATA MANAGEMENT MODULE: Concatenating, Stacking and Merging
DATA MANAGEMENT MODULE: Managing Variables
CSCI N317 Computation for Scientific Applications Unit R
Data Management Module: Creating, Adding and Dropping Variables
Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
Game Over Module 4 Lesson 2.
Required queries FdSc inICT Module 107.
Indexes and more Table Creation
Business Intelligence
Manipulating Data Lesson 3.
New Perspectives on Microsoft
Login Main Functions Via SAS Information Delivery Portal
Presentation transcript:

Data Management Module: Subset, Sort, and Format data Programming in R Data Management Module: Subset, Sort, and Format data

Data Management Module Importing and Exporting Imputting data directly into R Creating, Adding and Dropping Variables Assigning objects Subsetting and Formatting Working with SAS Files Using SQL in R

Subset, Sort, and Format data In this session, I will introduce these topics: Subsetting the observations in a data frame. Sorting a data frame Formatting values in a data frame.

Data Management: Subset There are different ways to subset a data frame or select rows based on some criteria. First, check the values of the variable. pricedata$region==3 This line of R code will return TRUE when the value of region is 3.

Data Management: Subset Then select all the rows from the data frame when the criteria is true. pricedata[pricedata$region==3,] This line of R code will return all rows of pricedata when the region is 3. Using ‘<-’ we can assign the results to a new object: Newdata<-pricedata[pricedata$region==3,]

Data Management: Subset We can also use the function subset(data.frame, criteria) subset(pricedata, region==3) This line of R code will do the same thing we previously described. Again, we can assign the results to a new object with newpricedata <- subset(pricedata, region==3)

Data Management: Subset We can also check for multiple criteria using and/or operators. & is “and.” | is “or.” We can select records when region==3 and line==4. This is also known as an “inner join” > subset(pricedata, region == 3 & line==4)

Data Management: Subset We can also select records when region is 3 or line is 4. This criteria should return more records since it does not require the selection criteria to both be true at the same time. This is also known as an “outer join”. subset(pricedata, region == 3 | line==4)

Data Management: Sorting If you are new to R, then the natural function to consider is “sort()”. The best function to use is the function order(). The function is order(variable(s), decreasing=FALSE) To sort the data frame it needs to be placed in the row index data.frame[order(variables),]

Data Management: Sorting If we want to sort the price data by the cost of the devices then I would do pricedata[order(pricedata$cost),] We can also sort by multiple variables pricedata[order(pricedata$region, pricedata$cost),]

Data Management: Labels A label can be used to give categorical variables coded with numbers a meaningful description. For instance, the data may have region coded as 1. The value 1 really means “East” so we want to label the value 1 as East.

Data Management: Labels Labels are part of the attributes. To check the attributes, there is a function called attributes. Use the function str() to determine the variable’s structure and how it is stored.

Data Management: Labels In R, there are two functions to use. If the data is nominal, then use the function factor(variable, levels or values, labels) If the data is ordinal, then use the function ordered(variable, levels or values, labels)