Programming in R Describing multivariate data. In this session I will explain: How to describe two or more categorical variables with tables and stacked.

Slides:



Advertisements
Similar presentations
1 SESSION 5 Graphs for data analysis. 2 Objectives To be able to use STATA to produce exploratory and presentation graphs In particular Bar Charts Histograms.
Advertisements

Displaying Data Objectives: Students should know the typical graphical displays for the different types of variables. Students should understand how frequency.
Introduction to Stats Honors Analysis. Data Analysis Individuals: Objects described by a set of data. (Ex: People, animals, things) Variable: Any characteristic.
Contingency Tables Chapters Seven, Sixteen, and Eighteen Chapter Seven –Definition of Contingency Tables –Basic Statistics –SPSS program (Crosstabulation)
AP Statistics Section 4.2 Relationships Between Categorical Variables.
Chapter 2 Graphs, Charts, and Tables – Describing Your Data
12 FURTHER MATHEMATICS Organising and Displaying Data.
Chapter 2 Presenting Data in Tables and Charts
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 2-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Data analysis Incorporating slides from IS208 (© Yale Braunstein) to show you how 208 and 214 are telling you many of the the same things; and how to use.
SW318 Social Work Statistics Slide 1 Using SPSS for Graphic Presentation  Various Graphics in SPSS  Pie chart  Bar chart  Histogram  Area chart 
Alok Srivastava Chapter 2 Describing Data: Graphs and Tables Basic Concepts Frequency Tables and Histograms Bar and Pie Charts Scatter Plots Time Series.
Graphing Examples Categorical Variables
Graphing A Practical Art. Graphing Examples Categorical Variables.
Programming in R Describing Univariate and Multivariate data.
Bivariate Data Learn to set up bivariate data in tables and calculate relative frequencies.
Statistics 3502/6304 Prof. Eric A. Suess Chapter 3.
AP STATISTICS Section 4.2 Relationships between Categorical Variables.
1 GE5 Tutorial 4 rules of engagement no computer or no power → no lessonno computer or no power → no lesson no SPSS → no lessonno SPSS → no lesson no.
Chapters 1 and 2 Week 1, Monday. Chapter 1: Stats Starts Here What is Statistics? “Statistics is a way of reasoning, along with a collection of tools.
HW#8: Chapter 2.5 page Complete three questions on the last two slides.
1 Copyright © Cengage Learning. All rights reserved. 3 Descriptive Analysis and Presentation of Bivariate Data.
STA Lecture 51 STA 291 Lecture 5 Chap 4 Graphical and Tabular Techniques for categorical data Graphical Techniques for numerical data.
CADA Final Review Assessment –Continuous assessment (10%) –Mini-project (20%) –Mid-test (20%) –Final Examination (50%) 40% from Part 1 & 2 60% from Part.
Multivariate Data Summary. Linear Regression and Correlation.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Active Learning Lecture Slides For use with Classroom Response Systems Chapter 5 Association between Categorical.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 2-1 Chapter 2 Presenting Data in Tables and Charts Statistics For Managers 4 th.
SPSS Instructions for Introduction to Biostatistics Larry Winner Department of Statistics University of Florida.
Analysis of two-way tables - Data analysis for two-way tables IPS chapter 2.6 © 2006 W.H. Freeman and Company.
 Some variables are inherently categorical, for example:  Sex  Race  Occupation  Other categorical variables are created by grouping values of a.
Aim: How do we analyze data with a two-way table?
Correlation/Regression - part 2 Consider Example 2.12 in section 2.3. Look at the scatterplot… Example 2.13 shows that the prediction line is given by.
Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques.
Tabular and Graphical Representations of Data 8/24/11.
Lesson 2 9/4/12.
Univariate EDA. Quantitative Univariate EDASlide #2 Exploratory Data Analysis Univariate EDA – Describe the distribution –Distribution is concerned with.
AP Statistics Section 4.2 Relationships Between Categorical Variables
Learn R Toolkit D Kelly O'DayExcel & R WorldsMod 2 - Excel & R Worlds: 1 Module 2 Moving Between Excel & R Worlds Do See & HearRead Learning PowerPoint.
1 M04- Graphical Displays 2  Department of ISM, University of Alabama, 2003 Graphical Displays of Data.
Great way to show your data!. * In your journal, draw a 4 square grid.
Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Use table and/or graph to represent Categorical Data Chapter 2 – Class 11 1.
1 Take a challenge with time; never let time idles away aimlessly.
Descriptive Statistics using R. Summary Commands An essential starting point with any set of data is to get an overview of what you are dealing with You.
Unit 2: Exploring Data with Graphs and Numerical Summaries Lesson 2-2a – Graphs for Categorical Data Probability & Stats Essential Question: How do we.
Displaying and Describing Categorical Data Chapter 3.
Descriptive Statistics: Tabular and Graphical Methods
26134 Business Statistics Autumn 2017
Second factor: education
CHAPTER 1 Exploring Data
Guide to Using Minitab 14 For Basic Statistical Applications
CHAPTER 1 Exploring Data
Chapter 2 Describing Data: Graphs and Tables
Analysis of two-way tables - Data analysis for two-way tables
Second factor: education
CATEGORICAL DATA CHAPTER 3
Bivariate Testing (Chi Square)
Multivariate Data Summary
Lecture 2 Chapter 2. Displaying and Describing Categorical Data
Bivariate Testing (Chi Square)
CHAPTER 6: Two-Way Tables
AP Statistics Chapter 3 Part 2
Second factor: education
Warmup Which part- time jobs employed 10 or more of the students?
Descriptive Analysis and Presentation of Bivariate Data
Data Analysis Module: Chi Square
Contingency Tables.
Chapter 26 Comparing Counts.
Displaying Data – Charts & Graphs
Displaying and Describing Categorical Data
Presentation transcript:

Programming in R Describing multivariate data

In this session I will explain: How to describe two or more categorical variables with tables and stacked bar charts How to use scatterplots to summarize two numeric variables

Two categorical variables Count of TattooTattoo SexNoYesGrand Total Female Male Grand Total Count of TattooTattoo SexNoYesGrand Total Female86.86%13.14%100.00% Male80.88%19.12%100.00% Grand Total84.88%15.12%100.00% Contingency table with counts Contingency table with row percents

Two categorical variables Contingency tables in R do not have the nice look from the previous slide. The function table() will create the counts The function prop.table() turns the counts into percentages. The function margin.table() calculates the row or column totals.

Two categorical variables There is a package called prettyR. It contains a function called xtab which does produce better looking output. The basic function is –Xtab(y~x, dataframe) –Where x and y are the variables you want to relate in model notation. –Usually x is the independent variable –Y is the dependent variable

Two categorical variables This is a basic bar chart produced in R using the function barplot().

Two categorical variables Nicer graphics using ggplot2 package.

Two categorical variables Another graphic from ggplot2 using slightly different options.

Three categorical variables Sex(All) Count of TattooAnypierces TattooNoYesGrand Total No31.03%68.97%100.00% Yes12.90%87.10%100.00% Grand Total28.29%71.71%100.00% SexFemale Count of TattooAnypierces TattooNoYesGrand Total No5.04%94.96%100.00% Yes0.00%100.00% Grand Total4.38%95.62%100.00% SexMale Count of TattooAnypierces TattooNoYesGrand Total No87.27%12.73%100.00% Yes30.77%69.23%100.00% Grand Total76.47%23.53%100.00%

Three categorical variables R can create counts and percentages for three or more variables using the functions: –table() –prop.table() –margin.table().

Three categorical variables Uses the package ggplot2. This is a stacked bar chart that is also grouped by gender.

Two numeric variables A simple scatter plot using the plot() function available in R. plot(Penn$Height, Penn$HtChoice, main="Actual Height versus Preferred Height", xlab="Actual ", ylab="Preferred ")

One numeric and one categorical variable There are many different ways to “group” by a variable and summarize a second variable. aggregate() tapply() >tapply(Penn$Height, Penn$Sex, mean) –The first argument is the variable to summarize –The second is the “group by” –The third is the function to apply.

Histograms Histograms created using the hist function and subsetting the data.