Programming in R Subset, Sort, and format data. In this session, I will introduce the topics: Subsetting the observations in a data frame. Sorting a data.

Slides:



Advertisements
Similar presentations
TYPES OF DATA. Qualitative vs. Quantitative Data A qualitative variable is one in which the “true” or naturally occurring levels or categories taken by.
Advertisements

1 Creating and Tweaking Data HRP223 – 2010 October 24, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Computer Science 1620 Multi-Dimensional Arrays. we used arrays to store a set of data of the same type e.g. store the assignment grades for a particular.
Codebook Centric to Life-Cycle Centric In the beginning….
Mark Dixon Page 1 02 – Queries: Query by Example.
Access 2007 ® Use Databases How can Microsoft Access 2007 help you to enter and organize information?
Table maintenance revisited (again) Please use speaker notes for additional information!
Joins and Cardinality Demystified
Sorting data and Other selection Techniques Ordering data results Allows us to view our data in a more meaningful way. Rather than just a list of raw.
DAY 16: ACCESS CHAPTER 2 Tazin Afrin October 10,
Introduction –All information systems create, read, update and delete data. This data is stored in files and databases. Files are collections of similar.
1 Statistics 202: Statistical Aspects of Data Mining Professor David Mease Tuesday, Thursday 9:00-10:15 AM Terman 156 Lecture 2 = Start chapter 2 Agenda:
1 Chapter 4: Introduction to Lookup Techniques 4.1 Introduction to Lookup Techniques 4.2 In-Memory Lookup Techniques 4.3 Disk Storage Techniques.
SW388R6 Data Analysis and Computers I Slide 1 Central Tendency and Variability Sample Homework Problem Solving the Problem with SPSS Logic for Central.
PHP meets MySQL.
Introduction to Microsoft Access Overview 1. Introduction What is Access? A relational database management system What is a Relational Database? Organized.
Eng.Mosab I. Tabash Applied Statistics. Eng.Mosab I. Tabash Session 1 : Lesson 1 IntroductiontoStatisticsIntroductiontoStatistics.
Banner and the SQL Select Statement: Part Three (Joins) Mark Holliday Department of Mathematics and Computer Science Western Carolina University 4 November.
XP New Perspectives on Microsoft Office Access 2003 Tutorial 9 1 Microsoft Office Access 2003 Tutorial 9 – Using Action Queries, and Defining Table Relationships.
Chapter 5 Conceptualization, Operationalization, and Measurement.
Quantitative Analysis. Quantitative / Formal Methods objective measurement systems graphical methods statistical procedures.
THE ART OF CODING OF QUESTIONNAIRES By David Onen (Ph.D) Lecturer, Department Of Higher Degrees Uganda Management Institute (UMI) A paper presented to.
Databases. Not All Tables Are Created Equal Spreadsheets use tables to store data and formulas associated with that data The “meaning” of data is implicit.
Areej Jouhar & Hafsa El-Zain Biostatistics BIOS 101 Foundation year.
Environment Change Information Request Change Definition has subtype of Business Case based upon ConceptPopulation Gives context for Statistical Program.
Pseudocode Simple Program Design Third Edition A Step-by-Step Approach 2.
Programming in R SQL in R. Running SQL in R In this session I will show you how to: Run basic SQL commands within R.
Qualitative Data: consists of attributes, labels or non-numerical entries Examples: Quantitative Data: consists of numerical measurements or counts Examples:
Unit 42 : Spreadsheet Modelling
Microsoft Access 2013 ®® Tutorial 12 Managing and Securing a Database.
1 Objectives ❏ To understand the basic concepts and uses of arrays ❏ To be able to define C arrays ❏ To be able to pass arrays and array elements to functions.
 Enhancing User Experience  Why it is important?  Discussing user experience one-by-one.
CS 139-Programming Fundamentals Lecture 11B - Arrays Adapted from a presentation by Dr. Rahman Fall 2014.
SW388R6 Data Analysis and Computers I Slide 1 Comparing Central Tendency and Variability across Groups Impact of Missing Data on Group Comparisons Sample.
1 PAUF 610 TA 1 st Discussion. 2 3 Population & Sample Population includes all members of a specified group. (total collection of objects/people studied)
Research Design. Descriptive (“what”) –No attempt to develop a hypothesis –Example: Where do guns used in crime come from? Explanatory (“why”) –Hypothesis.
1 What is Data? l An attribute is a property or characteristic of an object l Examples: eye color of a person, temperature, etc. l Attribute is also known.
LINQ to DATABASE-2.  Creating the BooksDataContext  The code combines data from the three tables in the Books database and displays the relationships.
Chapter 1 Introduction to Database. Database Concept Field: a basic data element or attribute of an object Record: a set of fields Table: a set of records.
Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Research Methodology. Topics of Discussion Variable Measurement.
IPFIX MIB Status Managed Object for IP Flow Export A Status Report Thomas Dietz Atsushi Kobayashi
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
Basic Statistics for Testing. Why we need statistics Types of scales Frequency distributions Percentile ranks.
Measurement and Scaling Concepts
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Data Preliminaries CSC 600: Data Mining Class 1.
SPSS For a Beginner CHAR By Adebisi A. Abdullateef
Formulas, Functions, and other Useful Features
Understanding Data Storage
Active Learning Lecture Slides
Copyright © 2011 Pearson Education, Inc.
Quiz Questions Q.1 An entity set that does not have sufficient attributes to form a primary key is a (A) strong entity set. (B) weak entity set. (C) simple.
DATA MANAGEMENT MODULE: Subsetting and Formatting
Measurement and Scaling: Fundamentals and Comparative Scaling
Chapter 3 The Relational Database Model
Chapter 5 Conceptualization, Operationalization, and Measurement
Observations, Variables and Data Matrices
DATA MANAGEMENT MODULE: Subsetting and Formatting
Classification of Variables
Data Management Module: Subset, Sort, and Format data
CSCI N317 Computation for Scientific Applications Unit R
Which best describes the relationship between classes and objects?
Data Preliminaries CSC 576: Data Mining.
Database Systems: Design, Implementation, and Management
Assignment resource Working with Excel Tables, PivotTables, and Pivot Charts Fairhurst pp The commands on these slides work with the Week 2 Excel.
Arrays.
New Perspectives on Microsoft
STAT 515 Statistical Methods I Chapter 1 Data Types and Data Collection Brian Habing Department of Statistics University of South Carolina Redistribution.
Presentation transcript:

Programming in R Subset, Sort, and format data

In this session, I will introduce the topics: Subsetting the observations in a data frame. Sorting a data frame Formatting values in a data frame.

Subset There are different ways to subset a data frame or select rows based on some criteria. First, check the values of the variable. –pricedata$region==3 This line of R code will return TRUE when the value of region is 3.

Subset Then select all the rows from the data frame when the criteria is true. >pricedata[pricedata$region==3,] This line of R code will return all rows of pricedata when the region is 3. Using ‘<-’ we can assign the results to a new object.

Subset We can also use the function subset(data.frame, criteria). >subset(pricedata, region==3) The line of R code will do the same thing we previously described. Again, we can assign the results to a new object with > newpricedata <- subset(pricedata, region==3)

Subset We can also check for multiple criteria using and/or operators. & is “and.” | is “or.” We can select records when region==3 and line==4. This is also known as an “inner join” > subset(pricedata, region == 3 & line==4)

Subject We can also select records when region is 3 or line is 4. –This criteria should return more records since it does not require the selection criteria to both be true at the same time. –This is also known as an “outer join”. > subset(pricedata, region == 3 | line==4)

Sorting If you are new to R, then the natural function to consider is “sort()”. The best function to use is the function order(). The function is order(variable(s), decreasing=FALSE) To sort the data frame it needs to be placed in the row index data.frame[order(variables),]

Sorting If we want to sort the price data by the cost of the devices then I would do > pricedata[order(pricedata$cost),] We can also sort by multiple variables > pricedata[order(pricedata$region, pricedata$cost),]

Labels A label can be used to give categorical variable coded with numbers a meaningful description. For instance, the data may have region coded as 1. The value 1 really means “East” so we want to label the value 1 as East.

Labels Labels are part of the attributes. To check the attributes, there is a function called attributes. Use the function str() to determine how the variable is stored.

Labels In R, there are two functions to use. If the data is nominal, then use the function > factor(variable, levels or values, labels) If the data is ordinal, then use the function >ordered(variable, levels or values, labels)