Download presentation
Presentation is loading. Please wait.
1
70-773-Analyzing Big Data with Microsoft R
6/11/ :39 AM BRK3172 Analyzing Big Data with Microsoft R Derek McCrae Norton Lead Data Scientist © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
2
Session objectives and takeaways
Tech Ready 15 6/11/2018 Session objectives and takeaways At the end of this session, you should be better able to… Use Microsoft R Server Efficiently work with “Big Data” Build machine learning models Take (and pass) exam © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
3
Microsoft Certification - General Information
6/11/ :39 AM Microsoft Certification - General Information © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
4
Microsoft Certified Solutions Associate
6/11/ :39 AM Microsoft Certified Solutions Associate Background A specialist who knows “how” to work with an advanced product For those with access to hands-on practical experience. Someone with 1-2 years’ experience Simplified Tracks Requires passing 2 exams Only one more exam to reach MCSE (Microsoft Certified Solutions Expert) MCSA certification is required in order to become an MCSE. © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
5
Exam Basics 40-60 questions 1-3 hours to complete the exam
Can review questions (for most question types) Can include up to 20 different types of questions 700 is passing 700 <> 70% © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
6
Read and Explore Big Data
6/11/ :39 AM Read and Explore Big Data © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
7
6/11/ :39 AM Read data with R Server Read data files such as text files, SAS, and SPSS Convert data to XDF format Identify trade-offs between XDF and flat text files Read data through Open Database Connectivity (ODBC) data sources Use an internal data frame as a data source © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
8
Summarize data Compute crosstabs and univariate statistics
6/11/ :39 AM Summarize data Compute crosstabs and univariate statistics Choose when to use rxCrossTabs versus rxCube Integrate with open source using packages such as dplyrXdf Use group by functionality Create formulae to perform multiple tasks in one pass Extract quantiles with rxQuantile © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
9
Visualize data Visualize in-memory data with base and ggplot2 plots
6/11/ :39 AM Visualize data Visualize in-memory data with base and ggplot2 plots Create custom visualizations with rxSummary and rxCube Visualize data with rxHistogram and rxLinePlot Including faceted plots © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
10
Explore Big Data - Code 6/11/2018 10:39 AM
© Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
11
6/11/ :39 AM Process Big Data © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
12
Process data with rxDataStep
6/11/ :39 AM Process data with rxDataStep Subset rows of data Modify and create columns by using the transforms argument Choose when to use on-the-fly transformations versus in-data transforms Handle missing values through filtering or replacement Generate a data frame or an XDF file Process dates (POSIXct, POSIXlt) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
13
Perform complex transforms
6/11/ :39 AM Perform complex transforms Define a transform function (transformFunc) Reshape data using a transform function Use open source packages, such as lubridate Pass in values by using transformVars and transformEnvir Use internal .rx variables and functions Including cross-chunk communication © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
14
Manage data sets Sort data in various orders
6/11/ :39 AM Manage data sets Sort data in various orders Use rxSort deduplication to remove duplicate values Merge data sources using rxMerge Merge options and types Identify when alternatives to rxSort and rxMerge should be used © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
15
Process text using MicrosfoftML package
6/11/ :39 AM Process text using MicrosfoftML package Create features using MML functions, such as featurizeText Create indicator variables and arrays using MML functions, such as categorical and categoricalHash Perform feature selection using MML functions © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
16
Process Big Data - Code 6/11/2018 10:39 AM
© Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
17
Build Models with RevoScaleR
6/11/ :39 AM Build Models with RevoScaleR © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
18
Estimate linear models
6/11/ :39 AM Estimate linear models Use rxLinMod, rxGlm, and rxLogit for linear models Set the family for a glm with functions such as rxTweedie Process data on the fly by using arguments and functions e.g. the F function and transforms argument Weight observations via frequency or probability weights Perform automatic variable selections, such as greedy searches, repeated scoring, and by product of training Identify the impact of missing values during automatic variable selection © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
19
Build and use partitioning models
6/11/ :39 AM Build and use partitioning models Use rxDTree, rxDForest, and rxBTrees to build partitioning models Adjust the weighting of false positives and misses by using loss Select parameters that affect bias and variance e.g. pruning, learning rate, and tree depth Use as.rpart to interact with open source functionality © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
20
Generate predictions and residuals
6/11/ :39 AM Generate predictions and residuals Use rxPredict to generate predictions Perform parallel scoring using rxExec Generate different types of predictions e.g. link and response scores for GLM, response, probability, and vote for rxDForest Generate different types of residuals e.g. Usual, Pearson, and DBM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
21
Evaluate models and tuning parameters
Summarize estimated models Evaluate tree models with RevoTreeView and rxVarImpPlot Calculate model evaluation metrics by using built-in functions Calculate model evaluation metrics and visualizations by using custom code e.g. mean absolute percentage error © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
22
Create models using MicrosoftML package
6/11/ :39 AM Create models using MicrosoftML package Build and use a One-Class Support Vector Machine Build and use linear and logistic regressions with L1 and L2 regularization Build and use a decision tree with FastTree Use FastTree as a recommender with ranking loss (NDCG) Build and use a simple feed-forward neural network © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
23
Building Models - Code 6/11/2018 10:39 AM
© Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
24
Use R Server In Different Environments
6/11/ :39 AM Use R Server In Different Environments © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
25
Use different compute contexts to run R Server
6/11/ :39 AM Use different compute contexts to run R Server Change the compute context rxHadoopMR, rxSpark, rxLocalseq, and rxLocalParallel Identify which compute context to use for different tasks Use different data source objects on different compute contexts e.g. RxOdbcData and RxTextData On HDFS and SQL Server Identify use cases for RevoPemaR © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
26
Optimize tasks by using local compute contexts
6/11/ :39 AM Optimize tasks by using local compute contexts Identify and execute tasks that can be run only in the local compute context Identify tasks that are more efficient to run in the local compute context Choose between rxLocalseq and rxLocalParallel Profile across different compute contexts © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
27
Perform in-database analytics using SQL Server
6/11/ :39 AM Perform in-database analytics using SQL Server Choose when to perform in-database versus out-of-database computations identify limitations of in-database computations use in-database versus out-of-database compute contexts appropriately use stored procedures for data processing steps, serialize objects and write back to binary fields in a table write tables, configure R to optimize SQL Server ( chunksize, numtasks, and computecontext) effectively communicate performance properties to SQL administrators and architects (SQL Server Profiler) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
28
Implement analysis workflows in Hadoop and Spark
6/11/ :39 AM Implement analysis workflows in Hadoop and Spark Use appropriate R Server functions in Spark Integrate with Hive, Pig, and Hadoop MapReduce Integrate with the Spark ecosystem of tools, such as SparklyR and SparkR Profile and tune across different compute contexts Use doRSR for parallelizing code that was written using open source foreach © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
29
Deploy predictive models
6/11/ :39 AM Deploy predictive models Deploy predictive models to SQL Server as a stored procedure Deploy an arbitrary function to Azure Machine Learning by using the AzureML R package Identify when to use DeployR © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
30
Different Environments - Code
6/11/ :39 AM Different Environments - Code © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
31
6/11/ :39 AM Wrap up © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
32
In review: session objectives and takeaways
Tech Ready 15 6/11/2018 In review: session objectives and takeaways Feel comfortable using Microsoft R Server Feel comfortable working with “Big Data” Feel comfortable building machine learning models Take (and pass) exam ! © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
33
Session resources http://learnanalytics.microsoft.com/
6/11/ :39 AM Session resources © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
34
6/11/ :39 AM Session resources 2 © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
35
Please evaluate this session
Tech Ready 15 6/11/2018 Please evaluate this session From your Please expand notes window at bottom of slide and read. Then Delete this text box. PC or tablet: visit MyIgnite Phone: download and use the Microsoft Ignite mobile app Your input is important! © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.