Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intro to R What is covered in the Intro-R lab: Basic Math Data Types

Similar presentations


Presentation on theme: "Intro to R What is covered in the Intro-R lab: Basic Math Data Types"— Presentation transcript:

1 Intro to R What is covered in the Intro-R lab: Basic Math Data Types
Vectors (and briefly, other data structures) Importing data (mostly read.csv, and briefly, other methods) The “lecture” is not primarily these slides, but rather the Intro-R Project that you will build in lab.

2 Intro-R.zip You will find a link to this zip file on both the syllabus and on Sakai.

3 Arithmetic in R Operator Description Example x + y y added to x
2 + 3 = 5 x – y y subtracted from x 8 – 2 = 6 x * y x multiplied by y 3 * 2 = 6 x / y x divided by y 10 / 5 = 2 x ^ y (or x ** y) x raised to the power y 2 ^ 5 = 32 x %% y remainder of x divided by y (x mod y) 7 %% 3 = 1 x %/% y x divided by y but rounded down (integer divide) 7 %/% 3 = 2

4 Logical comparisons and operations
Operator Description < less than <= less than or equal to > greater than >= greater than or equal to == exactly equal to != not equal to !x Not x x | y x OR y x & y x AND y isTRUE(x) test if X is TRUE

5 Constants Integer: use “L” after the number to ensure integer arithmetic, e.g. 3L Logical: TRUE and FALSE (can be shorted to T or F, but bad idea, as T/F are really just variables!) Numeric Complex String: single or double quotes, and use escape characters for the usual suspects Special NA NULL Inf NAN NOne

6 Variables my_var <- 4
Examples from: You can assign a value 4 to a variable my_var with the command my_var <- 4 Within a data.frame, you use the format: myDF$varName OR, if you attach(myDF), then you can reference names without qualifying # Three examples for doing the same computations (from mydata$sum <- mydata$x1 + mydata$x2 mydata$mean <- (mydata$x1 + mydata$x2)/2 attach(mydata) mydata$sum <- x1 + x2 mydata$mean <- (x1 + x2)/2 detach(mydata) mydata <- transform( mydata, sum = x1 + x2, mean = (x1 + x2)/2 )

7 Getting Data Into R Use the Import Dataset button on the environment pane “scan” function for a quick snippet copy and paste Copy from somewhere (e.g., Excel file Type varName = scan() In the console, CTRL-V For characters, do characters = scan(what = “character”) (default is double) fread() in the data.table package (install the package, and library(data.table) Good speed Good for large datasets Improved csv importing—good default setting Output is data.frame or data.table (default) You can set data.table = F And good old read.csv. We’ll be using that most of the time Remember to Setwd or it won’t find your data (or specify entire path)

8 Intro to Graphics in R Graphics classified by package
R Base Graphics: built into R Tools for drawing primitives and entire plots Usually fast, but have limited scope Grid GRID “grobs” (graphical objects can be represented independently of the plot and modified later Lattice Based on GRID, adding functionality such as legends, plot details, conditional plotting. Lacks a formal model, so difficult to extend GGplot2: including qplot and ggplot (we will focus on this widely-used package) Also uses GRID, but includes a formal model, so very extensible Relies upon a “grammar of graphics” ggvis: successor to ggplot2: web and interactive graphics. Htmlwidgets: exends ggvis And many, many other packages….

9 (This is one of the ways we’ll organize the graphics in our examples.)
Intro to Graphics in R Graphics classified by number variable number and type (This is one of the ways we’ll organize the graphics in our examples.) Explore individual variable distributions Area Density DotPlot Histograms Q-Q Plots* Box plots Strip charts Violin plot Uivariate Distributions (some “faceted” by another variable) Some from Kassambara, Guide to Create Beautiful Graphics in R, sthda.com * Q-Q plots are a special case. They plot two different distributions against each other. Often, however, they are used to plot one variable’s distribution against a theoretical one, e.g., normal.

10 Intro to Graphics in R Graphics classified by variable number and type
Bivariate (and multi-variate) data

11 Intro to Graphics in R Graphics classified by application
Correlation Regression Support Vector Machines Neural Networks Naïve Bayes K-Means Apriori CART (gini) And many more… We will not be focusing on these algorithms, although they may be part of examples for the purpose of visualization

12 Intro to Graphics in R Graphics classified by features (We will demonstrate these things as part of other graphs.) Titles and legends Color Fonts Rotate text Margins Faceting Themes annotation

13 Graphic Examples in Rbase
Examples in the Intro-R Project. Yes, you can do practically anything in Rbase It can just be a little more complicated Our examples begin with one-variable plots First, just very plain ones Then, we add color and “faceted” by other variables Then, we look at an examples that motivates using the more “formal” language of ggplot2.

14 Some basic plots in RBase
Hist: histograms Plot: line and scatter plots Barplot Boxplot Nice cheat sheet on Rbase Graphics

15 Ggplot2 components Aesthetics (aes)
The aesthetics map your data to the graph telling it what role each variable will play. Some variables will map to an axis, some will determine the color, shape, or size of a point in a scatterplot. Different groups might have differently shaped or colored points. The size or color of a point might reflect the magnitude of a third variable. Other variables might determine how to fill the bars of a bar chart with colors or patterns so, for example, you can see the number of males and females within each bar. Geoms Short for geometric objects; geoms determine the objects that will represent the data values. Possible geoms include: bar, boxplot, error-bar, histogram, jitter, line, path, point, smooth, and text. Statistics Provide functions for features like adding regression lines to a scatterplot, or dividing a variable up into bins to form a histogram. Scales Match your data to the aesthetic features, for example in a legend that tells us that triangles represent males and circles represent females. Coordinate system For most plots this is the usual rectangular Cartesian coordinate system. However, for pie charts, it is the circular polar coordinate system. Facets These describe how to repeat your plot for each subgroup, perhaps creating a separate scatterplot for males and females. A helpful feature with facets is that they standardize the axes on each plot, making comparisons across groups much easier.

16 Ggplot syntax: most basic components
Data: (example from the wdata dataset) We randomly generated a dataset of M and F weights, in kg Two variables: sex and weight Start with the function call: ggplot() Specify the dataset: wdata ggplot(wdata) or ggplot(data=wdata) You must have an aesthetic (aes), so let’s plot the weight: ggplot(wdata,aes(x=weight)) For 2-dim plots, you must also specify the y “aesthetic mappings between variables in the dataset and visual properties” At least one layer which describes how to render each observation. These layers are created with a geom function If you want to assign this to a variable, you may. Or not. Using a variable allows you to easily layer different geoms on top of this data definition a = ggplot(wdata, aes(x=weight))

17 ggplot syntax: adding geoms (shapes that are plotted)
Recall that: a = ggplot(wdata, aes(x=weight)) Some geom functions require only an x dimension. Some, x and y. Some geoms work fine with no further arguments. Under certain conditions… Most have default arguments (of stat, of color, scale, etc.) a + geom_density() is sufficient Most have a default “stat” associated with them For instance geom_bar assumes the y-axis should be “count”. Change to “identity” (values, not counts) ggplot (wdata, aes(x=sex)) + geom_bar() This works fine—displays the frequencies of M and F ggplot (mtcars, aes(sex, weight)) + geom_bar(stat="identity") plots sex on the x axis, and the values (not counts) of weight (displays the sums, actually) Usually, you can specify overrides to the defaults Some geoms work with continuous data, some with discrete: ggplot (wdata, aes(x=sex)) + geom_bar() This works fine. ggplot (wdata, aes(x=weight)) + geom_bar() This doesn’t work. Bar charts do frequencies!! To further confuse: you can often specify the same chart in two different ways. But here is a great cheat sheet!

18 Some packages that we will be using
plyr is an R package that makes it simple to split data apart, do stuff to it, and mash it back together. This is a common data-manipulation step. Importantly, plyr makes it easy to control the input and output data format from a syntactically consistent set of functions. gridExtra: we’ll use this to get more than one ggplot graph on a page HistData: a package with good sample datasets RColorBrewer: for color palletes Data.table: to read in a text file SASxport: to read a SAS file and convert it to an R data file

19 Our “Textbook” website
Book: Kassambara, Alboukadel, Guide to Create Beautiful Graphics in R”, stdh.com, 2013. Online website, with practically the entire book: And similar support: practical-guide-to-be-highly-effective-r-software-and-data- visualization

20 Links and Things to Google
Linetypes Themes Shapes histogram-and-density-plots documentation Stackoverflow for all kinds of examples and debugging

21 In-Class Assignment—Intro R
Link on Sakai and/or syllabus. To be completed as homework or as in-class assignment, as announced in class and posted on Sakai.


Download ppt "Intro to R What is covered in the Intro-R lab: Basic Math Data Types"

Similar presentations


Ads by Google