Presentation is loading. Please wait.

Presentation is loading. Please wait.

All about Revolution R Enterprise

Similar presentations


Presentation on theme: "All about Revolution R Enterprise"— Presentation transcript:

1

2 All about Revolution R Enterprise
Hong Ooi DAT335B

3 Why Revolution R? Revolution R architecture DeployR Short demos Roadmap

4 Why Revolution R?

5 R usage continues to grow
Implications for talent acquisition/large analytics organisations R Usage Growth Rexer Data Miner Survey, 70% of data miners report using R 24% use R as primary tool Source:

6 R usage continues to grow
Implications for talent acquisition/large analytics organisations

7 Limitations of open source R
R requires skilled resource to scale out computations across a cluster R needs data in memory R is single threaded R is supported by the open source community – Revolution provides commercial support Revolution R Enterprise solves these problems!

8 Typical Revolution Analytics users

9 Why customers choose Revo
R TALENT WAVE R skills coming from Universities / commercial R adopters in volume MULTI-PLATFORM Code Once. Deploy Anywhere – workstation to server to cluster to Hadoop and Teradata Microsoft and Red Hat Linux INNOVATION AT SPEED Faster model development Much faster production “run” Access all R innovation VALUE v COST Technical Support Less software cost Less hardware More flexible commercials

10 Revolution R vs open source R
No RAM Limits Open source R exceeds RAM and fails RRE scales linearly well beyond RAM limits Faster Algorithms As data grows RRE optimization becomes apparent File Name Compressed File Size (MB) No. Rows Open Source R (secs) Revolution R Tiny 0.3 1,235 0.00 0.05 V. Small 0.4 12,353 0.21 Small 1.3 123,534 0.03 Medium 10.7 1,235,349 1.94 0.08 Large 104.5 12,353,496 60.69 0.42 Big (full) 12,960.0 123,534,969 Memory! 4.89 V.Big 25,919.7 247,069,938 9.49 Huge 51,840.2 494,139,876 18.92 Public US Flight Data Linear Regression on Arrival Delay Run on 4 core laptop, 16GB RAM and 500GB SSD

11 Revolution R vs SAS Independent consulting firm set up cluster environment – 5 x 4 core machines running CentOS SAS 9.4: Base SAS, SAS/STAT, Grid Mgr Revolution R Enterprise ScaleR, with IBM Platform LSF, Platform MPI Release 9 Data set: 591 columns and 5,000,000 rows Tests run sequentially with no other operations running on the cluster Data preparation steps were of similar performance Scoring test on 50m rows Linear Regression model was 6 x faster

12 Revolution R architecture

13 The Revo product suite Revolution R Open Revolution R Enterprise
Free and open source R distribution Enhanced and distributed by Revolution Analytics Revolution R Open Secure, Scalable and Supported Distribution of R With proprietary components created by Revolution Analytics Revolution R Enterprise

14 Revolution R Enterprise (RRE)
High-performance open source R plus: Data source connectivity to big-data objects Big-data advanced analytics Multi-platform environment support In-Hadoop and in-Teradata predictive modeling Development and production environment support IDE for data scientist developers Secure, Scalable R Deployment Technical support, training and services R+CRAN Revolution R Open DistributedR DeployR DevelopR ScaleR ConnectR

15 The RRE platform ConnectR R+CRAN DistributedR RevoR ScaleR
High-speed & direct connectors Available for: High-performance XDF SAS, SPSS, delimited & fixed format text data files Hadoop HDFS (text & XDF) Teradata Database & Aster EDWs and ADWs ODBC R+CRAN Open source R interpreter R 3.2.2 Freely-available huge range of R algorithms Algorithms callable by RevoR Embeddable in R scripts 100% Compatible with existing R scripts, functions and packages ConnectR R+CRAN RevoR DevelopR DeployR ScaleR DistributedR DistributedR Distributed computing framework Delivers cross-platform portability Available on: Windows Servers Red Hat and SuSE Linux Servers Teradata Database Cloudera Hadoop Hortonworks Hadoop MapR Hadoop RevoR Performance enhanced R interpreter Based on open source R Adds high-performance math library to speed up linear algebra functions ScaleR Range of predictive functions Ready-to-Use high- performance big data big analytics User tools for distributing customized R algorithms across nodes Fully-parallelized analytics Wide data sets supported – thousands of variables Data prep & data distillation Descriptive statistics & statistical tests

16 Your choice of frontends

17 RRE: scaling to big data
Data “chunking” alleviates memory limits Volume limited only by storage capacity

18 RRE: scaling to big data
In an Xdf file (local)

19 RRE: scaling to big data
In Teradata VAMPs Teradata Database ODBC Revolution R Enterprise Data Segments Database Nodes Hybrid Storage Parse Engine External Stored Procedure Table Operator Desktops & Servers

20 RRE: scaling to big data
In Hadoop Master node Job tracker Slave node Task tracker Slave node Task tracker Slave node Task tracker

21 RRE: distributed compute
No data movement Setting the compute context determines where processing takes place VAMPs Teradata Database ODBC Revolution R Enterprise Data Segments Database Nodes Hybrid Storage Parse Engine External Stored Procedure Table Operator Desktops & Servers

22 Local compute context ### LOCAL COMPUTE CONTEXT ###
rxSetComputeContext("local") ### CREATE DIRECTORY AND FILE OBJECTS ### AirlineDatabase <-file.path("datasets","AirlineDemoSmall") AirlineDataSet <- RxXdfData(file.path(AirlineDatabase,"AirlineDemoSmall.xdf")) ### ANALYTICAL PROCESSING ### ### Statistical Summary of the data rxSummary(~ArrDelay+DayOfWeek, data= AirlineDataSet, reportProgress=1) ### CrossTab the data rxCrossTabs(ArrDelay ~ DayOfWeek, data= AirlineDataSet, means=T) ### Linear Model and plot arrLateLinMod <- rxLinMod(ArrDelay ~ DayOfWeek + 0 , data = AirlineDataSet) plot(arrLateLinMod$coefficients)

23 Remote compute: Teradata
### SETUP TERADATA ENVIRONMENT VARIABLES ### dbConnStr <- "Driver=Teradata; Server=dbHostName; Database=RevoDb; Uid=xxxx; pwd=xxxx" myTeradataCC <- RxInTeradata(connectionString = dbConnStr, shareDir = "/tmp",     remoteShareDir = "/tmp/revoJobs", revoPath = "/usr/lib64/Revo-7.0/R-3.0.2/lib64/R") ### TERADATA COMPUTE CONTEXT ### rxSetComputeContext(myTeradataCC) ### CREATE TERADATA DATA SOURCE ### AirlineDemoQuery <- "SELECT * FROM AirlineDemoSmall;" AirlineDataSet <- RxTeradata(connectionString = dbConnStr, sqlQuery = AirlineDemoQuery) ### ANALYTICAL PROCESSING ### ### Statistical Summary of the data rxSummary(~ArrDelay+DayOfWeek, data= AirlineDataSet, reportProgress=1) ### CrossTab the data rxCrossTabs(ArrDelay ~ DayOfWeek, data= AirlineDataSet, means=T) ### Linear Model and plot arrLateLinMod <- rxLinMod(ArrDelay ~ DayOfWeek + 0 , data = AirlineDataSet) plot(arrLateLinMod$coefficients)

24 Remote compute: Hadoop
### SETUP HADOOP ENVIRONMENT VARIABLES ### myNameNode <- "master" myUser <- "root" myPort <- 8020 myHadoopCluster <- RxHadoopMR(sshUsername = myUser, sshHostname = myNameNode, port = myPort) ### HADOOP COMPUTE CONTEXT USING HDFS ### rxSetComputeContext(myHadoopCluster) ### CREATE HDFS, DIRECTORY AND FILE OBJECTS ### hdfsFS <- RxHdfsFileSystem(hostName=myNameNode, port=myPort) AirlineDatabase <-file.path("datasets","AirlineDemoSmall") AirlineDataSet <- RxXdfData(file.path(AirlineDatabase,"AirlineDemoSmall.xdf"), fileSystem = hdfsFS) ### ANALYTICAL PROCESSING ### ### Statistical Summary of the data rxSummary(~ArrDelay+DayOfWeek, data= AirlineDataSet, reportProgress=1) ### CrossTab the data rxCrossTabs(ArrDelay ~ DayOfWeek, data= AirlineDataSet, means=T) ### Linear Model and plot arrLateLinMod <- rxLinMod(ArrDelay ~ DayOfWeek + 0 , data = AirlineDataSet) plot(arrLateLinMod$coefficients)

25 Remote context: SQL Server*
### SETUP SQL SERVER ENVIRONMENT VARIABLES ### dbConnStr <- "Driver=SQL Server; Server=dbHostName; Database=RevoDb; Uid=xxxx; pwd=xxxx" mySqlServerCC <- RxInSqlServer(connectionString = dbConnStr, consoleOutput = TRUE) ### SQL SERVER COMPUTE CONTEXT ### rxSetComputeContext(mySqlServerCC) ### CREATE SQL SERVER DATA SOURCE ### AirlineDemoQuery <- "SELECT * FROM AirlineDemoSmall;" AirlineDataSet <- RxSqlServer(connectionString = dbConnStr, sqlQuery = AirlineDemoQuery) ### ANALYTICAL PROCESSING ### ### Statistical Summary of the data rxSummary(~ArrDelay+DayOfWeek, data= AirlineDataSet, reportProgress=1) ### CrossTab the data rxCrossTabs(ArrDelay ~ DayOfWeek, data= AirlineDataSet, means=T) ### Linear Model and plot arrLateLinMod <- rxLinMod(ArrDelay ~ DayOfWeek + 0 , data = AirlineDataSet) plot(arrLateLinMod$coefficients) * In 2016

26 ScaleR functions and algorithms
Data step Statistical tests Variable selection Data import – delimited, fixed, SAS, SPSS, ODBC Chi square test Stepwise regression Variable creation & transformation Kendall rank correlation Simulation Recode variables Fisher’s exact test Simulation (eg Monte Carlo) Factor variables Student’s t-test Parallel random number generation Missing value handling Sampling Cluster analysis Sort, merge, split Subsample (observations & variables) Aggregate by category (means, sums) K-means clustering Random sampling Descriptive statistics Classification Predictive models Min / Max, Mean, Median (approx.) Decision forests (random forests) Quantiles (approx.) Decision trees Standard deviation Multiple linear regression Gradient boosted decision trees Variance Generalized linear models (GLM) exponential family distributions: binomial, Gaussian, inverse Gaussian, Poisson, Tweedie. Standard link functions: cauchit, identity, log, logit, probit. User defined distributions & link functions. Naïve Bayes Correlation Combination Covariance Sum of squares (cross product matrix for set variables) PEMA API rxDataStep Pairwise cross tabs Covariance & correlation matrices rxExec Risk ratio & odds ratio Logistic regression Crosstabulation of data (standard tables & long form) Classification & regression trees Predictions/scoring for models Marginal summaries of crosstabulations Residuals for all models

27 Demos

28 DeployR

29 DeployR Framework for R as a service for BI/web apps

30 DeployR architecture

31 Roadmap/licensing

32 Revolution roadmap with Microsoft
Continued development: Windows Linux Hadoop Teradata New platforms and integrations: Azure Marketplace Azure ML Azure HDInsight Sql Server 2016 Azure SQL Frontend tooling/BI integration

33 Complete your session evaluation on My Ignite for your chance to win one of many daily prizes.

34 Continue your Ignite learning path
Microsoft Ignite 2015 4/25/2017 6:09 PM Continue your Ignite learning path Visit Microsoft Virtual Academy for free online training visit Visit Channel 9 to access a wide range of Microsoft training and event recordings Head to the TechNet Eval Centre to download trials of the latest Microsoft products The purpose of this slide is to ensure delegates consider their next steps after your presentation. The learning should not end on 20th November 2015  Option to use this slide in the current generic format or for you to recommend 1 (or more) Microsoft Virtual Academy Course or Channel 9 video that is relevant next steps from your presentation. Thanks © 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

35


Download ppt "All about Revolution R Enterprise"

Similar presentations


Ads by Google