Introduction to R Las Vegas 2015 James McCaffrey Microsoft Research, Advanced Development Tuesday, October 27, 2015 2:15 - 3:30 PM devintersection.com.

Slides:



Advertisements
Similar presentations
Chapter 1 Introduction to Visual Basic Programming and Applications 1 Exploring Microsoft Visual Basic 6.0 Copyright © 1999 Prentice-Hall, Inc. By Carlotta.
Advertisements

Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
Multiple Regression Predicting a response with multiple explanatory variables.
Zinc Data SPH 247 Statistical Analysis of Laboratory Data.
Ann Arbor ASA ‘Up and Running’ Series: SPSS Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with.
x y z The data as seen in R [1,] population city manager compensation [2,] [3,] [4,]
SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data.
DJIA1 Beneath the Calm Waters: A Study of the Dow Index Group 5 members Project Choice: Hyo Joon You Data Retrieval: Stephen Meronk Statistical Analysis:
1 Regression Homework Solutions EPP 245/298 Statistical Analysis of Laboratory Data.
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
FISH 397C Winter 2009 Evan Girvetz Basic Statistical Analyses and Contributed Packages in R © R Foundation, from
7/2/ Lecture 51 STATS 330: Lecture 5. 7/2/ Lecture 52 Tutorials  These will cover computing details  Held in basement floor tutorial lab,
MATH 3359 Introduction to Mathematical Modeling Linear System, Simple Linear Regression.
Crime? FBI records violent crime, z x y z [1,] [2,] [3,] [4,] [5,]
RESEARCH HUB AT THE UNIVERSITY LIBRARIES PENN STATE UNIVERSITY TOUR OF STATISTICAL PACKAGES.
Regression Transformations for Normality and to Simplify Relationships U.S. Coal Mine Production – 2011 Source:
How to plot x-y data and put statistics analysis on GLEON Fellowship Workshop January 14-18, 2013 Sunapee, NH Ari Santoso.
Chapter 1 Introduction to Visual Basic Programming and Applications 1 Exploring Microsoft Visual Basic 6.0 Copyright © 1999 Prentice-Hall, Inc. By Carlotta.
1 Programming Concepts Module Code : CMV6107 Class Contact Hours: 45 hours (Lecture 15 hours) (Laboratory/Tutorial 30 hours) Module Value: 1 Textbook:
Chapter 1 Introduction to Visual Basic Programming and Applications 1 Joshi R.G. Dept. of Computer Sci. YMA.
PCA Example Air pollution in 41 cities in the USA.
9/14/ Lecture 61 STATS 330: Lecture 6. 9/14/ Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.
Lecture 15: Logistic Regression: Inference and link functions BMTRY 701 Biostatistical Methods II.
Analysis of Covariance Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 4b, February 14, 2014 Lab exercises: regression, kNN and K-means.
Use of Weighted Least Squares. In fitting models of the form y i = f(x i ) +  i i = 1………n, least squares is optimal under the condition  1 ……….  n.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 4b, February 20, 2015 Lab: regression, kNN and K- means results, interpreting and evaluating models.
Exercise 8.25 Stat 121 KJ Wang. Votes for Bush and Buchanan in all Florida Counties Palm Beach County (outlier)
Collaboration and Data Sharing What have I been doing that’s so bad, and how could it be better? August 1 st, 2010.
Lecture 9: ANOVA tables F-tests BMTRY 701 Biostatistical Methods II.
Using R for Marketing Research Dan Toomey 2/23/2015
FACTORS AFFECTING HOUSING PRICES IN SYRACUSE Sample collected from Zillow in January, 2015 Urban Policy Class Exercise - Lecy.
Exercise 1 The standard deviation of measurements at low level for a method for detecting benzene in blood is 52 ng/L. What is the Critical Level if we.
Lecture 11 Multicollinearity BMTRY 701 Biostatistical Methods II.
Microsoft Visual Basic 2005 BASICS Lesson 1 A First Look at Microsoft Visual Basic.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
1 Programming Environment and Tools VS.Net 2012 First project MSDN Library.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Determining Factors of GPA Natalie Arndt Allison Mucha MA /6/07.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Linear Models Alan Lee Sample presentation for STATS 760.
Exercise 1 The standard deviation of measurements at low level for a method for detecting benzene in blood is 52 ng/L. What is the Critical Level if we.
Lecture Set 2 Part A: Creating an Application with Visual Studio – Solutions, Projects, Files 8/10/ :35 PM.
EPP 245 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data.
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
The Effect of Race on Wage by Region. To what extent were black males paid less than nonblack males in the same region with the same levels of education.
Statistical Analysis Data Mining  R is an extremely popular tool for Statistical Analysis and Data Mining. freeopen source  It is free and open source,
1 Analysis of Variance (ANOVA) EPP 245/298 Statistical Analysis of Laboratory Data.
DEPARTMENT OF COMPUTER SCIENCE Introduction to Visual Basic BCA 3 RD YR PRESENTED BY HASHIR UN NABI Dated:01/07/
Collaboration and data sharing in the Distributed Graduate Seminars Role of NCEAS in the DGS Coordination Collaborative webspace Grad Research Associate.
Before the class starts: Login to a computer Read the Data analysis assignment 1 on MyCourses If you use Stata: Start Stata Start a new do file Open the.
WSUG M AY 2012 EViews, S-Plus and R Damian Staszek Bristol Water.
Chapter 1 Introduction to Visual Basic
Programming vs. Packaged
Peter Fox and Greg Hughes Data Analytics – ITWS-4600/ITWS-6600
Data Analytics – ITWS-4600/ITWS-6600
Scripting Languages Info derived largely from Programming Language Pragmatics, by Michael Scott.
Résolution de l’ex 1 p40 t=c(2:12);N=c(55,90,135,245,403,665,1100,1810,3000,4450,7350) T=data.frame(t,N,y=log(N));T; > T t N y
Introduction to R Programming with AzureML
Correlation and regression
REGRESI DENGAN VARABEL FAKTOR/ KUALLTATIF
Programming vs. Packaged
Introduction to R.
Data Analytics – ITWS-4600/ITWS-6600/MATP-4450
Console Editeur : myProg.R 1
Working with different JavaScript frameworks and libraries
ITWS-4600/ITWS-6600/MATP-4450/CSCI-4960
Introduction to Matlab
Presentation transcript:

Introduction to R Las Vegas 2015 James McCaffrey Microsoft Research, Advanced Development Tuesday, October 27, :15 - 3:30 PM devintersection.com

Agenda What is R? Why consider learning R? Three R Development Environments Examples of R vs. C# Summary, Resources, Q&A

What is R ? R is a scripting language, plus an interactive shell environment, plus a large library of math functions. R is open source and has strong support from all key industry, research, government, and academia players. devintersection.com

What is R - The Hello World of R > setwd("C:\\IntroToR") > > t <- read.table("Income.txt", header=TRUE, sep=",") > > head(t, n=3) Occupation Age Tech Income 1 Developer Developer Developer > > m <- lm(t$Income ~ (t$Occupation + t$Age + t$Tech)) > > summary(m) Call: lm(formula = t$Income ~ (t$Occupation + t$Age + t$Tech)) Residuals: Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) t$OccupationManager t$OccupationQuality * t$Age t$Tech * --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 3 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 20.6 on 4 and 3 DF, p-value: > devintersection.com

Why R? The most common language for Data Science Microsoft acquisition of RevolutionR Microsoft Azure ML and ML Studio Consulting aspect Big Data and little data (and IoT?) Relatively easy to learn* R Consortium devintersection.com

Installing the Base R Environment devintersection.com

Launching R – Start Menu (Rgui.exe) devintersection.com

Launching R – File Explorer (Rterm.exe) devintersection.com

The RStudio Environment devintersection.com

The Revolution R (Microsoft) Environment devintersection.com

The Revolution R (Microsoft) Environment devintersection.com

R vs. C# (the t-Test) devintersection.com

R vs. C# (the t-Test) devintersection.com

R vs. C# (the t-Test) devintersection.com

R vs. C# (LDA Analysis)

R vs. C# (Graphing) devintersection.com

Programming using R and OOP # file CarClass.R require("R6") Car <- R6Class("Car", public = list( make = NULL, price = NULL, initialize = function(ma, pr) { self$make <- ma self$price <- pr }, setMake = function(ma) { self$make <- ma }, # setPrice = function(pr) { self$price <- pr }, display = function() { cat("Make = ", self$make, " Price = ", self$price, "\n") } ) > source(“CarClass.R”) > > myCar <- Car$new(“Audi”, 40000) > > myCar$display() Make = Audi Price = > > myCar$setMake(“BMW”) > myCar$price = > > print(myCar) Public: display: function initialize : function make: BMW price: setMake: function >

R vs. C# (Packages, Libraries, Scripts) An R package is a collection of file(s) that contain R functions An R library is 1.) R terminology for the location of a package, or 2.) a DLL (on Windows) The install() command installs an R package The library() command loads an R package for use An R script is a set of R commands R has basic control structures (if – else, for, while, repeat) and four different OOP paradigms devintersection.com

Alternatives to R MatLab – very pricey Mathematica - pricey SciLab, Octave – open source versions of MatLab SAS – very pricey SPSS (IBM) – very pricey Python – general purpose (with SciPy library) devintersection.com

Your Four Possible Roles with R Use R in interactive mode for ad hoc data analysis Act as a data expert to help an R consultant Write R scripts to automate recurring data analysis Write R code to create custom data analysis devintersection.com

Summary R is the deeply entrenched default language for “Data Science” RStudio is the most common optional environment Understanding statistics* is the key to R C# is general purpose, R is domain specific Best examples for R are chaotic Web pages devintersection.com

Resources McCaffrey, J., “Introduction to R for C# Programmers”, Microsoft MSDN Magazine, July 2015 (vol. 30, no. 7) McCaffrey, J., “Introduction to R for.NET Developers”, Visual Studio Magazine, December 2015 (vol. 25, no. 12) devintersection.com

Introduction to R Las Vegas 2015 James McCaffrey Microsoft Research, Advanced Development Tuesday, October 27, :15 - 3:30 PM devintersection.com Thank You !