Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intro to R & MS Data Science Tools

Similar presentations


Presentation on theme: "Intro to R & MS Data Science Tools"— Presentation transcript:

1 Intro to R & MS Data Science Tools
Code Like a Pirate … Intro to R & MS Data Science Tools Jamey Johnston, Data Scientist

2 Jamey Johnston Data Scientist
Data Scientist for an O&G Company 20+ years DBA Experience TAMU MS in Analytics LSU BS in Spatial Analysis Semi-Pro Photographer ( Blog at /jameyj @STATCowboy

3 Agenda By the end of this presentation attendees should possess the necessary skills to begin their journey in R! Intro to R Why R? R, RStudio & R Tools for Visual Studio Basics Objects in R Packages Control Flows MS and R – Azure ML, R Server, SQL 2016 & PBI Resources Put it all together – Regression Demo Source:

4 Why R? R is Free!!!! Graphics and Data Viz
I typically use Power BI, Spotfire or SAS JMP Flexible Statistical Analysis Toolkit Very Powerful Open Source Community Microsoft is investing in R (Revolution Analytics)

5 R, RStudio & R Tools for Visual Studio
R Project for Statistical Computing RStudio R Tools for Visual Studio MS R Client

6 RStudio Run Options Built-In Docs Version Control Projects CTL+Enter
Ctl+Alt+R Built-In Docs Version Control Projects

7 RStudio Debugging Breakpoints (Shift+F9) R Functions Environment Pane
browser() debugonce() Environment Pane Traceback(Callstack)

8 RStudio Debugging Console Step into function (Shift+F4)
Finish Function (Shift+F6) Continue Running (Shift+F5) Stop Debugging (Shift+F8)

9 Basics Comment Variable Creation Help > # Basics > m <- 3 * 5
[1] 15 Help > help(“lm”) # lm is function for Fitting Linear Models > ?lm

10 Objects in R Variables, Values, Commands, Functions …
Everything in R is an Object Typical Data in R is stored in: Vectors (one row, same data type) Matrices (multiple rows, same data type) Data Frames (multiple rows, multiple data types) It’s like a Table! List (collection of objects)

11 Vector Building Blocks for data objects in R (index starts at 1!)
c (combine) function to create a Vector v <- c(2, 3, 1.5, 3.1, 49) seq Function generates numeric sequences s <- seq(from = 0, to = 100, by = .1)

12 Vector rep Function replicates values
r <- rep(c(1,4), times = 4) : creates a number seq incremented by 1 or -1 colon <- 1:10 length(var) returns length of vector length(colon)

13 Matrix matrix function used to build matrix
rbind (row bind) and cbind (column bind) Combine matrices by row or column

14 Data Frame It is like a table! rownames – extract row labels
colnames – extract column labels names – set names of column headers read.table, read.csv, readxl, RODBC Different ways to create data frames

15 List Combine multiple objects types into one object
vectors, matrices, data frames, list, functions Great for splitting data frames to reduce for loops (demo later!)

16 Missing Data NA is used to represent Missing Data The is.na and which functions are used to manage NA

17 Missing Data > x <- c(1.3,2.3,3.4,NA) > print(x) [1] NA > # Returns integer location of values (not the values) > n <- which (is.na(x)) > v <- which (!is.na(x)) > print(n) [1] 4 > print(v) [1] 1 2 3

18 Missing Data > # y will be set to the values not = NA > y <- x[!is.na(x)] > print(y) [1]

19 Packages Add-ons for R List packages already installed
library() List packages already installed install.package(“dplyr”, “ggplot2”) Install new packages library(dplyr) Load package to be used in R

20 Conditional Operators
Comparisons return logical vector > x <- 2 > x > 1 [1] TRUE > 1:10 == 2 [1] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > 1:10 != 2 [1] TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

21 Conditional Operators
> 1:10 > 2 [1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE > 1:10 >= 2 [1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE > 1:10 < 2 [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > 1:10 <= 2 [1] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

22 Logical Operations > x <- 1:4 > x [1] > (x > 2) | (x <= 3) [1] TRUE TRUE TRUE TRUE > (x > 2) & (x <= 3) [1] FALSE FALSE TRUE FALSE

23 Logical Operations > xor((x > 2), (x < 4)) [1] TRUE TRUE FALSE TRUE > 0:5 %in% x [1] FALSE TRUE TRUE TRUE TRUE FALSE

24 Control Flows IF … ELSE x <- 4
if (x < 3) print("true") else print("false") ifelse ((x < 3), print("true"), print("false"))

25 Control Flows FOR Loops for(i in 1:10) print(1:i)
for (i in 1:nrow(df)) print(df[i,]) break and next …

26 Control Flows WHILE Loops i <- 1 while (i <= 10) { print(i)
i <- i + 1 }

27 R in MS Tools Demos

28 Azure ML Demo Azure Machine Learning R Integration

29 MS R Server Demo Enterprise Class R
Built on Revolution Analytics Acquisition SQL Server 2016 R Support via R Server Source: Microsoft Website (URL above)

30 SQL 2016 and R Leverages the MS R Server Setup and Installation
Set up SQL Server R Services (In-Database) Upgrade and Installation FAQ (SQL Server R Services) Differences in R Features between Editions of SQL Server

31 SQL 2016 and R SQL Server R Services Tutorials
DEMO - iris-sepal-example.sql sp_execute_external_script (Transact-SQL)

32 SQL 2016 and R = N'language' = = ] 'input_data_1' [ = ] N'input_data_1_name' ] [ = 'output_data_1_name' ] [ WITH <execute_option> [ ,...n ] ] [;]

33 Power BI Demo Running R Scripts in Power BI Desktop Demo – mtcars.pbix
Use your on R IDE now! Demo – mtcars.pbix Options Needed

34 Resources UCLA idre R-Bloggers (sign up for daily email) Quick-R
R-Bloggers (sign up for daily ) Quick-R R in Action (book to go with website)

35 Resources Hadley Wickham Data Camp Coursera MS Academy
Data Camp Coursera MS Academy

36 R in Action Regression Demo

37 Questions? Thank you for attending!

38 Thank You Learn more from Jamey Johnston
@STATCowboy


Download ppt "Intro to R & MS Data Science Tools"

Similar presentations


Ads by Google