Common Linear & Classification for Machine Learning using Microsoft R Venus Lin (Xiuqing Lin) Lin.xiuqing@outlook.com @azssugsqlpass Assistant: Fred Benardella
Content iris Example Opening talks Machine learning RevoScaleR Package linear and classification iris Example Linear model Prediction Visualization: what can we do in SSRS Decision Tree and Clustering quick demo Q & A
Yearn for the sea
Machine Learning What do you do with unlimited data? What is the right question to ask about this data? Accuracy of prediction Human Understandable Formats Structured and unstructured data (text, image, & media) Netflix recommendation system contest award $1M to the algorithm 10% better than the Cinematch in 2009
RevoScaleR Package for Microsoft R Scalable, Distributed and Parallel Computation, Available along with Microsoft R Server and in-Database R Services. With Prefix – rx Name Description rxLinMod Linear regression rxDtree Decision tree rxKmeans Classification
More about RevoScaleR
Linear Regression Name In RevoScaleR Package In Basci R Description rxLinMod lm Linear regression rxDtree rpart Decision tree rxKmeans kmeans Cluster
Work Flow Develop in R IDE – RStudio Create a store procedure in SQL Server 2016 Visualize in SSRS
Iris – Data Frame Study
A Normal Analytics Project Process Problem definition Understand Iris species petal and sepal Data exploration Scatterplot & summary statistics Data preparation (N/A) Modelling Linear regression Validation of data Prediction Implementation and tracking
Iris – Data Frame Study
Plot(iris)
iris[,3] >>> Petal.Length iris[,4] >>> Petal.Width plot(iris[,4],iris[,3],type="p",pch=16,cex=1,xlab="Petal.Width",ylab="Petal.Length",main="Iris",col=iris$Species)
Linear Regression -Some basic statistic elements Dependent variable iris[,3] >>> Petal.Length Independent variable (predictor) iris[,4] >>> Petal.Width Formula Petal.Length = a + b*Petal.Width
- How well does the model fit the data? PPModel1<-rxLinMod(Petal.Length~Petal.Width,data=iris,covCoef = T) PPModel1 summary(PPModel1) names(PPModel1) - How well does the model fit the data? R Squared : percentage of the variances explained by the model Significance of the models (P Value) AIC / VIF / ANOVA / Residual Plot
Linear Regression - rxLinMod
Petal.Length = a + b*Petal.Width Petal.Length = 1.08356 + 2.22994*Petal.Width plot(iris[,4],iris[,3],type="p",pch=16,cex=1,xlab="Petal.Width",y lab="Petal.Length",main="Iris",col=iris$Species) abline(PPModel1,lwd=5,col="red")
Prediction > new1<-data.frame(iris[,3:4]) > ppModel1_Predict<-rxPredict(PPModel1,data=new1,outData = new1,computeResiduals = T,interval = "prediction",writeModelVars = T,computeStdErrors = T) > names(ppModel1_Predict) [1] "Petal.Length" "Petal.Width" "Petal.Length_Pred" [4] "Petal.Length_StdErr" "Petal.Length_Lower" "Petal.Length_Upper" [7] "Petal.Length_Resid" > plot(iris[,4],iris[,3],type="p",pch=16,cex=1,xlab="Petal.Width",ylab="Petal.Length",main="Iris",col=iris$Species) > abline(PPModel1,lwd=5,col="red") > points(ppModel1_Predict[,2],ppModel1_Predict[,5],col="green",type="l",lwd="4") > points(ppModel1_Predict[,2],ppModel1_Predict[,6],col="green",type="l",lwd="4")
Create a stored procedure in SQL Sever Management Studio
SSRS Report
Name In RevoScaleR Package In Basci R Description rxLinMod lm Linear regression rxDtree rpart Decision tree rxKmeans kmeans Cluster
Common Classification Model rxDTree { from RevoScaleR Package} Fit classification and regression trees Demo library(RevoScaleR) library(RevoTreeView) IrisTreeexample<-rxDTree(Species~Sepal.Length + Sepal.Width +Petal.Length +Petal.Width,data = iris) plot(createTreeView(IrisTreeexample))
Name In RevoScaleR Package In Basci R Description rxLinMod lm Linear regression rxDtree rpart Decision tree rxKmeans kmeans Cluster
K means clustering iris_rxKmeans_3<-rxKmeans(formula = ~Petal.Length+Petal.Width,data=iris,numClusters = 3,outFile = "iriscluster.xdf",outColName = "Cluster") iris_rxKmeans_3$cluster<-as.factor(iris_rxKmeans_3$cluster) library(ggplot2) ggplot(iris, aes(Petal.Width, Petal.Length, color = iris_rxKmeans_3$cluster)) + geom_point(size=5)+ggtitle("after K Means 3 cluster")
Plot(iris) – Sepal.Length /Width
iris_rxKmeans<-rxKmeans(formula = ~Sepal. Length+Sepal iris_rxKmeans<-rxKmeans(formula = ~Sepal.Length+Sepal.Width,data=iris,numClusters = 3,outFile = "iriscluster1.xdf",outColName = "Cluster") iris_rxKmeans$cluster<-as.factor(iris_rxKmeans$cluster) library(ggplot2) ggplot(iris, aes(Sepal.Length, Sepal.Width, color = iris_rxKmeans$cluster)) + geom_point(size=5)+ggtitle("after K Means 3 cluster")
numClusters = 5 Github: tomaztk/Compare_kmeans_rxKmeans Is it possible to use RevoScaleR package in Power BI? -From https://tomaztsql.wordpress.com/tag/revoscaler/
Thank you Sponsors!
Thank you!