Summary Statistics in R Commander

Slides:



Advertisements
Similar presentations
Summary Statistics/Simple Graphs in SAS/EXCEL/JMP.
Advertisements

Lab # 03- SS Basic Graphic Commands. Lab Objectives: To understand M-files principle. To plot multiple plots on a single graph. To use different parameters.
Plotting with ggplot2: Part 1
Introduction to FX Stat 3. Getting Started When you open FX Stat you will see three separate areas.
Overview Revising z-scores Revising z-scores Interpreting z-scores Interpreting z-scores Dangers of z-scores Dangers of z-scores Sample and populations.
Stat 112: Lecture 15 Notes Finish Chapter 6: –Review on Checking Assumptions (Section ) –Outliers and Influential Points (Section 6.7) Homework.
Jan Shapes of distributions… “Statistics” for one quantitative variable… Mean and median Percentiles Standard deviations Transforming data… Rescale:
Example 16.3 Estimating Total Cost for Several Products.
8/10/2015Slide 1 The relationship between two quantitative variables is pictured with a scatterplot. The dependent variable is plotted on the vertical.
Use of Quantile Functions in Data Analysis. In general, Quantile Functions (sometimes referred to as Inverse Density Functions or Percent Point Functions)
Baburao Kamble (Ph.D) University of Nebraska-Lincoln
Tutor: Prof. A. Taleb-Bendiab Contact: Telephone: +44 (0) CMPDLLM002 Research Methods Lecture 9: Quantitative.
FW364 Ecological Problem Solving Lab 4: Blue Whale Population Variation [Ramas Lab]
R-Graphics Day 2 Stephen Opiyo. Basic Graphs One of the main reasons data analysts turn to R is for its strong graphic capabilities. R generates publication-ready.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 1 PROBABILITIES FOR CONTINUOUS RANDOM VARIABLES THE NORMAL DISTRIBUTION CHAPTER 8_B.
STAT E100 Section Week 3 - Regression. Review  Descriptive Statistics versus Hypothesis Testing  Outliers  Sample vs. Population  Residual Plots.
Key Data Management Tasks in Stata
Matlab Workshop 1/10/07 Lesson 1: Matlab as a graphing calculator.
I❤RI❤R Kin Wong (Sam) Game Plan Intro R Import SPSS file Descriptive Statistics Inferential Statistics GraphsQ&A.
Learning R hands on. Organization of Folders: Class Data folder has datasets (end in.csv or.rda) Rcode has the scripts of R commands that can cut and.
The following slides will lead you through the process of creating assignments in Engrade. Engrade’s Help section of Assignments Engrade’s Help section.
June 11, 2008Stat Lecture 10 - Review1 Midterm review Chapters 1-5 Statistics Lecture 10.
R-Graphics Stephen Opiyo. Basic Graphs One of the main reasons data analysts turn to R is for its strong graphic capabilities. R generates publication-ready.
Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
Assumptions 5.4 Data Screening. Assumptions Parametric tests based on the normal distribution assume: – Independence – Additivity and linearity – Normality.
Ggplot2 A cool way for creating plots in R Maria Novosolov.
Data Science and Big Data Analytics Chap 3: Data Analytics Using R
Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables 
Data & Graphing vectors data frames importing data contingency tables barplots 18 September 2014 Sherubtse Training.
Copyright © 2009 Pearson Education, Inc. Slide 4- 1 Practice – Ch4 #26: A meteorologist preparing a talk about global warming compiled a list of weekly.
Frequency Distributions Chapter 2. Descriptive Statistics Distributions are part of descriptive statistics…we are learning how to describe some data by.
AP Statistics Review Day 1 Chapters 1-4. AP Exam Exploring Data accounts for 20%-30% of the material covered on the AP Exam. “Exploratory analysis of.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Statistical Programming Using the R Language Lecture 2 Basic Concepts II Darren J. Fitzpatrick, Ph.D April 2016.
HW 19 Key. 22:37 Diamond Rings. The data table contains the listed prices and weights of the diamonds in 48 rings offered for sale in The Singapore Times.
Project 1 to 3. Project 1 (10 pts) (use the word document to enter results and answers – Save this file as Lname_BUA350_Cohort#_projects#.doc Go to Total.
CHAPTER 12 More About Regression
Exploring Data: Summary Statistics and Visualizations
Ggplot2 Wu Shaohuan.
Modeling in R Sanna Härkönen.
CHAPTER 12 More About Regression
QM222 Class 13 Section D1 Omitted variable bias (Chapter 13.)
ggplot2 Merrill Rudd TAs: Brooke Davis and Megsie Siple
Jonathan W. Duggins; James Blum NC State University; UNC Wilmington
Introduction to R Commander
DEPARTMENT OF COMPUTER SCIENCE
CHAPTER 12 More About Regression
R Assignment #4: Making Plots with R (Due – by ) BIOL
Ggplot2 I EPID 799C Mon Sep
Edexcel: Large Data Set Activities
Univariate Data Exploration
Lab 2 Data Manipulation and Descriptive Stats in R
CHAPTER 12 More About Regression
Simple and Multiple Regression
Independent Analysis Project
Stata Basic Course Lab 2.
Lecture 7 – Delivering Results with R
Amos Introduction In this tutorial, you will be briefly introduced to the student version of the SEM software known as Amos. You should download the current.
Project Assignment – Instructions
R course 6th lecture.
Stat 251 (2009, Summer) Lab 2 TA: Yu, Chi Wai.
CHAPTER 12 More About Regression
Getting started – Example 1
PowerPoint is for making high quality presentations
Displaying data Seminar 2.
Starter Activities GCSE Python.
DATA VISUALISATION (QUANTITATIVE).
Presentation transcript:

Summary Statistics in R Commander R Assignment #3: Summary Statistics in R Commander (Due Oct 3 – by email) BIOL 4090 - 6090 David Hyrenbach

Tasks For Week 5 Practice data summaries with RCmdr Install ggplot2 function Learn how to access data from a package Learn how to make simple plots with qplot

Assignment for Week 5 Clear Workspace and History. Install and run the Rcmdr package. Import datset “age.xlxs” file. Install and run the ggplot2 package. 5. Import datasets from ggplot2 package. 6. Perform summary stats / create figures. 7. Email me your assignment by Oct 3. (this exercise is worth 2.5 points)

Instructions Answer the questions in the spaces provided Save and import the figures you create into ppt; DO NOT take screen shots Once you are done, change the title of this file, by adding your name (e.g., “R_Assignment3_Hyrenbach.ppt”) 4. Email this file to the instructor at khyrenba@gmail.com, using the following title: “BIOL4090 – 6090: “R Assignment #3”

Questions / Answers 1a. Report the command that calculates kurtosis in R (make sure you include and explain all of the arguments used with this command) (+0.125):

Questions / Answers 1b. Use the R help to find out how the three kurtosis types (1, 2 and 3) are calculated in R. Which one is the default type used, unless specified? Copy and paste the meta-data for each type below (+0.125):

Questions / Answers 1c. Report the command that calculates skewness in R (make sure you include and explain all of the arguments used with this command) (+0.125):

Questions / Answers 1d. Use the R help to find out how the three kurtosis types (1, 2 and 3) are calculated in R. Which one is the default type used, unless specified? Copy and paste the meta-data for each type below (+0.125):

Questions / Answers 1e. Import the file age.xlsx and calculate the following summary statistics of “age” using R and RCmdr Summary Statistic (calculated using R) Value Skewness (type 1) Skewness (type 2) Skewness (type 3) Kurtosis (type 1) Kurtosis (type 2) Kurtosis (type 3) Summary Statistic (calculated with Rcmdr) Value Skew Kurtosis Report which Skewness type does RCmdr use: _____ Report which Kurtosis type does RCmdr use: _____ (+0.125)

Questions / Answers 2a. Using RCmdr, calculate the following summary statistics for the “age” dataset: Mean, SD, SE, IQR, CV, and 5 Quantiles (0, 25, 50, 75, 100) (+0.125).

Questions / Answers 2b. Using RCmdr make a Histogram and a Density plot of “age” (use gaussian kernel) and default settings (+0.125). Paste both figures below.

Questions / Answers 2c. Do the Histogram and Density plot of the “age” data set look like normal distributions? Explain Why / Why not? (+0.125).

Questions / Answers 3a. Install package ggplot2. List all the data sets attached to the ggplot2 package. Paste a screen capture of your console, showing the list of available data sets (+0.125).

Questions / Answers 3b. View the data sets attached to package ggplot2. Read the data from data set “diamonds”. Report how large is the “diamond” data set (how many rows and columns) (+0.125).

Questions / Answers 3c. Create a simple scatterplot of the diamond data set using the quickplot command (qplot). First of all, use qplot to create a scatterplot showing the relationship between the price and carats (weight) of a diamond, with this command: > qplot(carat, price, data = diamonds) Paste the scatterplot below. (+0.125).

Questions / Answers 3d. The plot shows a strong correlation with notable outliers. The relationship looks exponential, though, so the first thing we would like to do is to transform the variables. Because qplot() accepts functions of variables as arguments, plot log(price) vs. log(carat) with the command: > qplot(log(carat), log(price), data = diamonds) Paste the scatterplot below. (+0.125).

Questions / Answers 3e. Note that arguments can also be combinations of existing variables. If we are curious about the relationship between the volume of the diamond (approximated by x × y × z) and its weight, we could use the following command: > qplot(carat, x * y * z, data = diamonds) Paste the scatterplot below. (+0.125).

Questions / Answers 3f. Lets add some aesthetic attributes to the plot, like specific colors and symbols. Note that qplot can do this and will automatically provide a legend that maps the displayed attributes to the data values. This makes it easy to include additional data on the plot. Next, lets augment the plot of carat and price with new information about diamond color and cut, from the data set “dsmall”, using these commands: > qplot(carat, price, data = diamonds, colour = color) > qplot(carat, price, data = diamonds, shape = cut)

Questions / Answers 3f. Paste the two scatterplots on this slide. (+0.125).

Questions / Answers 3g. Report the command that you would use to plot the diamond shape data by symbol colour, rather than symbol shape, and paste the revised scatterplot, below (+0.125).

Questions / Answers 3h. You can also manually set the aesthetics attributes using I(). For example, to make a semi-transparent colour you can use the alpha aesthetic, which takes a value between 0 (completely transparent) and 1 (complete opaque). It’s often useful to specify the transparency as a fraction, e.g., 1/10 or 1/20, as the denominator specifies the number of points that must overplot to get a completely opaque colour. To start, make a default plot, using this command: > qplot(carat, price, data = diamonds, alpha = I(1))

Questions / Answers 3h. Try these two commands: > qplot(carat, price, data = diamonds, alpha = I(1/10)) > qplot(carat, price, data = diamonds, alpha = I(1/100)) Paste both scatterplots, below and explain what happens as you change I from 1/10 to 1/100 (+0.125).

Questions / Answers 3i. You can also manually set other aesthetics attributes using I(), e.g., colour = I("red") or size = I(2). To start, make a default plot, using this command: > qplot(carat, price, data = diamonds) Then, create the same scatterplot with large red symbols. Report the command you used here (+0.125):

Questions / Answers 3i. Paste the resulting scatterplot below (+0.125).

Questions / Answers 3j. qplot is not limited to scatterplots, but can produce almost any kind of plot by varying the geometric object (or geom), which describes the type of object being used to display the data. For example, the geom “smooth” enables you to investigate two-dimensional relationships: geom "smooth" fits a smoother to the data and displays the smooth and its standard error Notice that you can combined multiple geoms by creating a vector of geom names created with command c(). The geoms will be overlaid in the order in which they are listed.

Questions / Answers 3j. Try this command: > qplot(carat, price, data = diamonds, geom = c("point", "smooth")) Report the smoothing method used, as reported by R: Paste the resulting plot below (+0.125):

Questions / Answers 3k. Paste the help documentation for the geom_smooth, command. Report how many smoothing methods are available, and list their names (+0.125).