Common Linear & Classification for Machine Learning using Microsoft R

Slides:



Advertisements
Similar presentations
Forecasting Using the Simple Linear Regression Model and Correlation
Advertisements

Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Neural Nets How the brain achieves intelligence 10^11 1mhz cpu’s.
Multiple Linear Regression Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Machine Learning ICS 178 Instructor: Max Welling visualization & k nearest neighbors.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
15: Linear Regression Expected change in Y per unit X.
Correlation & Regression
Big data analytics with R and Hadoop Chapter 5 Learning Data Analytics with R and Hadoop 데이터마이닝연구실 김지연.
Tree-Based Methods (V&R 9.1) Demeke Kasaw, Andreas Nguyen, Mariana Alvaro STAT 6601 Project.
Simple Linear Regression Models
Linear Regression. Simple Linear Regression Using one variable to … 1) explain the variability of another variable 2) predict the value of another variable.
A Brief Introduction to R Programming Darren J. Fitzpatrick, PhD The Bioinformatics Support Team 27/08/2015.
Figure 1.1 Rules for the contact lens data.. Figure 1.2 Decision tree for the contact lens data.
Introduction to SQL Server Data Mining Nick Ward SQL Server & BI Product Specialist Microsoft Australia Nick Ward SQL Server & BI Product Specialist Microsoft.
Statistics 1: tests and linear models. How to get started? Exploring data graphically: Scatterplot HistogramBoxplot.
Then click the box for Normal probability plot. In the box labeled Standardized Residual Plots, first click the checkbox for Histogram, Multiple Linear.
1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston.
Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Simple Linear Regression (SLR)
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
Data Analysis.
Linear Discriminant Analysis (LDA). Goal To classify observations into 2 or more groups based on k discriminant functions (Dependent variable Y is categorical.
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
Weka Just do it Free and Open Source ML Suite Ian Witten & Eibe Frank University of Waikato New Zealand.
Recapitulation! Statistics 515. What Have We Covered? Elements Variables and Populations Parameters Samples Sample Statistics Population Distributions.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Machine Learning (ML) with Weka Weka can classify data or approximate functions: choice of many algorithms.
Neural networks – Hands on
1 Statistics & R, TiP, 2011/12 Multivariate Methods  Multivariate data  Data display  Principal component analysis Unsupervised learning technique 
Decision Tree Lab. Load in iris data: Display iris data as a sanity.
Diagonal is sum of variances In general, these will be larger when “within” class variance is larger (a bad thing) Sw(iris[,1:4],iris[,5]) Sepal.Length.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Marginal Distribution Conditional Distribution. Side by Side Bar Graph Segmented Bar Graph Dotplot Stemplot Histogram.
Show Me Potential Customers Data Mining Approach Leila Etaati.
Data Summit 2016 H104: Building Hadoop Applications Abhik Roy Database Technologies - Experian LinkedIn Profile:
Intelligent apps Powered by Advanced Analytics.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Building 1 million predictions per second using SQL-R
R + R Tool for Visual Studio= Data Science
Correlation, Bivariate Regression, and Multiple Regression
Analyzing Big Data with Microsoft R
Erich Smith Coleman Platt
Introduction to R Programming with AzureML
Discriminant Analysis
STAT 6304 Final Project Fall, 2016.
Quantitative Methods Simple Regression.
Prepared by Kimberly Sayre and Jinbo Bi
Azure Machine Learning 101
Machine Learning & Data Science
BA 275 Quantitative Business Methods
CHAPTER 29: Multiple Regression*
Chapter 12 Inference on the Least-squares Regression Line; ANOVA
K-Means Lab.
Linear Regression.
Linear Regression.
Weka Free and Open Source ML Suite Ian Witten & Eibe Frank
DataMining, Morgan Kaufmann, p Mining Lab. 김완섭 2004년 10월 27일
24/02/11 Tutorial 3 Inferential Statistics, Statistical Modelling & Survey Methods (BS2506) Pairach Piboonrungroj (Champ)
Regression is the Most Used and Most Abused Technique in Statistics
Analytics: Its More than Just Modeling
Predictive Models with SQL Server Machine Learning Services
Created by Erin Hodgess, Houston, Texas
DATA VISUALISATION (QUANTITATIVE).
Presentation transcript:

Common Linear & Classification for Machine Learning using Microsoft R Venus Lin (Xiuqing Lin) Lin.xiuqing@outlook.com @azssugsqlpass Assistant: Fred Benardella

Content iris Example Opening talks Machine learning RevoScaleR Package linear and classification iris Example Linear model Prediction Visualization: what can we do in SSRS Decision Tree and Clustering quick demo Q & A

Yearn for the sea

Machine Learning What do you do with unlimited data? What is the right question to ask about this data? Accuracy of prediction Human Understandable Formats Structured and unstructured data (text, image, & media) Netflix recommendation system contest award $1M to the algorithm 10% better than the Cinematch in 2009

RevoScaleR Package for Microsoft R Scalable, Distributed and Parallel Computation, Available along with Microsoft R Server and in-Database R Services. With Prefix – rx Name Description rxLinMod Linear regression rxDtree Decision tree rxKmeans Classification

More about RevoScaleR

Linear Regression Name In RevoScaleR Package In Basci R Description rxLinMod lm Linear regression rxDtree rpart Decision tree rxKmeans kmeans Cluster

Work Flow Develop in R IDE – RStudio Create a store procedure in SQL Server 2016 Visualize in SSRS

Iris – Data Frame Study

A Normal Analytics Project Process Problem definition Understand Iris species petal and sepal Data exploration Scatterplot & summary statistics Data preparation (N/A) Modelling Linear regression Validation of data Prediction Implementation and tracking

Iris – Data Frame Study

Plot(iris)

iris[,3] >>> Petal.Length iris[,4] >>> Petal.Width plot(iris[,4],iris[,3],type="p",pch=16,cex=1,xlab="Petal.Width",ylab="Petal.Length",main="Iris",col=iris$Species)

Linear Regression -Some basic statistic elements Dependent variable iris[,3] >>> Petal.Length Independent variable (predictor) iris[,4] >>> Petal.Width Formula Petal.Length = a + b*Petal.Width

- How well does the model fit the data? PPModel1<-rxLinMod(Petal.Length~Petal.Width,data=iris,covCoef = T) PPModel1 summary(PPModel1) names(PPModel1) - How well does the model fit the data? R Squared : percentage of the variances explained by the model Significance of the models (P Value) AIC / VIF / ANOVA / Residual Plot

Linear Regression - rxLinMod

Petal.Length = a + b*Petal.Width Petal.Length = 1.08356 + 2.22994*Petal.Width plot(iris[,4],iris[,3],type="p",pch=16,cex=1,xlab="Petal.Width",y lab="Petal.Length",main="Iris",col=iris$Species) abline(PPModel1,lwd=5,col="red")

Prediction > new1<-data.frame(iris[,3:4]) > ppModel1_Predict<-rxPredict(PPModel1,data=new1,outData = new1,computeResiduals = T,interval = "prediction",writeModelVars = T,computeStdErrors = T) > names(ppModel1_Predict) [1] "Petal.Length" "Petal.Width" "Petal.Length_Pred" [4] "Petal.Length_StdErr" "Petal.Length_Lower" "Petal.Length_Upper" [7] "Petal.Length_Resid" > plot(iris[,4],iris[,3],type="p",pch=16,cex=1,xlab="Petal.Width",ylab="Petal.Length",main="Iris",col=iris$Species) > abline(PPModel1,lwd=5,col="red") > points(ppModel1_Predict[,2],ppModel1_Predict[,5],col="green",type="l",lwd="4") > points(ppModel1_Predict[,2],ppModel1_Predict[,6],col="green",type="l",lwd="4")

Create a stored procedure in SQL Sever Management Studio

SSRS Report

Name In RevoScaleR Package In Basci R Description rxLinMod lm Linear regression rxDtree rpart Decision tree rxKmeans kmeans Cluster

Common Classification Model rxDTree { from RevoScaleR Package} Fit classification and regression trees Demo library(RevoScaleR) library(RevoTreeView) IrisTreeexample<-rxDTree(Species~Sepal.Length + Sepal.Width +Petal.Length +Petal.Width,data = iris) plot(createTreeView(IrisTreeexample))

Name In RevoScaleR Package In Basci R Description rxLinMod lm Linear regression rxDtree rpart Decision tree rxKmeans kmeans Cluster

K means clustering iris_rxKmeans_3<-rxKmeans(formula = ~Petal.Length+Petal.Width,data=iris,numClusters = 3,outFile = "iriscluster.xdf",outColName = "Cluster") iris_rxKmeans_3$cluster<-as.factor(iris_rxKmeans_3$cluster) library(ggplot2) ggplot(iris, aes(Petal.Width, Petal.Length, color = iris_rxKmeans_3$cluster)) + geom_point(size=5)+ggtitle("after K Means 3 cluster")

Plot(iris) – Sepal.Length /Width

iris_rxKmeans<-rxKmeans(formula = ~Sepal. Length+Sepal iris_rxKmeans<-rxKmeans(formula = ~Sepal.Length+Sepal.Width,data=iris,numClusters = 3,outFile = "iriscluster1.xdf",outColName = "Cluster") iris_rxKmeans$cluster<-as.factor(iris_rxKmeans$cluster) library(ggplot2) ggplot(iris, aes(Sepal.Length, Sepal.Width, color = iris_rxKmeans$cluster)) + geom_point(size=5)+ggtitle("after K Means 3 cluster")

numClusters = 5 Github: tomaztk/Compare_kmeans_rxKmeans Is it possible to use RevoScaleR package in Power BI? -From https://tomaztsql.wordpress.com/tag/revoscaler/

Thank you Sponsors!

Thank you!