Azure Machine Learning Studio: Four Tips from the Pros

Slides:



Advertisements
Similar presentations
Models with Discrete Dependent Variables
Advertisements

Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
GRA 6020 Multivariate Statistics; The Linear Probability model and The Logit Model (Probit) Ulf H. Olsson Professor of Statistics.
Curve-Fitting Regression
Classification and Prediction: Regression Analysis
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
Esri International User Conference | San Diego, CA Technical Workshops | Spatial Statistics: Best Practices Lauren Rosenshein, MS Lauren M. Scott, PhD.
Model Building III – Remedial Measures KNNL – Chapter 11.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Curve-Fitting Regression
Predictive Design Space Exploration Using Genetically Programmed Response Surfaces Henry Cook Department of Electrical Engineering and Computer Science.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Multilevel Modeling Software Wayne Osgood Crime, Law & Justice Program Department of Sociology.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Consul- ting Services Outsour- cing Services Techno- logy Services Local Profes- sional Services Competence Centers Business Intelligence WebTech SAP.
M Machine Learning F# and Accord.net.
Overview G. Jogesh Babu. Overview of Astrostatistics A brief description of modern astronomy & astrophysics. Many statistical concepts have their roots.
Forecasting is the art and science of predicting future events.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Introduction to R and Data Science Tools in the Microsoft Stack Jamey Johnston.
Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,
Cloud Analytics Platforms Christian Frey. About AIDA Our mission is to advance knowledge in data analytics through research, education and outreach Our.
Estimating standard error using bootstrap
Data Modeling Patrice Koehl Department of Biological Sciences
Chapter 7. Classification and Prediction
Deep Feedforward Networks
Predicting Azure Consumption using Ensemble Learning
Microsoft Professional Program
Make Predictions Using Azure Machine Learning Studio
William Greene Stern School of Business New York University
Microeconometric Modeling
It’s All About Me From Big Data Models to Personalized Experience
Generalized regression techniques
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
CSE 4705 Artificial Intelligence
Azure Machine Learning Algorithm Accuracy Enhancement, Tips and Tricks
CH 5: Multivariate Methods
Ch12.1 Simple Linear Regression
Introduction to R Programming with AzureML
Discrete Choice Modeling
Validation of Regression Models
Welcome! Power BI User Group (PUG)
Chapter 5 STATISTICS (PART 4).
Kathi Kellenberger Redgate
Kathi Kellenberger Redgate Software
What is a Data Scientist and How Do I Become One?
Encrypting Data within SQL Server
Amit Banerjee Sr. Program Manager Microsoft Data Platform Group
Intro to Machine Learning
L. Isella, A. Karvounaraki (JRC) D. Karlis (AUEB)
Moving advanced analytics to your SQL Server databases
Microeconometric Modeling
10701 / Machine Learning Today: - Cross validation,
What is Regression Analysis?
Microeconometric Modeling
1/18/2019 ST3131, Lecture 1.
Contact: Machine Learning – (Linear) Regression Wilson Mckerrow (Fenyo lab postdoc) Contact:
Intro to Machine Learning
The loss function, the normal equation,
Ensemble learning Reminder - Bagging of Trees Random Forest
Mathematical Foundations of BME Reza Shadmehr
Topological Signatures For Fast Mobility Analysis
Machine Learning with Databricks
Jia-Bin Huang Virginia Tech
Microeconometric Modeling
CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM CISE301_Topic1.
Year 8 Computer Science Digital Portfolio
Jamie Cool Program Manager Microsoft
Regression Models - Introduction
Presentation transcript:

Azure Machine Learning Studio: Four Tips from the Pros October 20th, 2018 Brad Llewellyn Senior Analytics Associate – Data Science Syntelli Solutions

Senior Analytics Associate – Data Science MCSE: Data Management and Analytics MCSE: Cloud Platform and Infrastructure MCSA: SQL Server 2012/2014 MCSA: Cloud Platform MCSA: Machine Learning MCSA: SQL 2016 BI Development M.S. Statistics: University of South Carolina Analytics Consultant – 6 Years Syntelli Solutions – <1 Year @BreakingBI https://www.linkedin.com/in/bradllewellyn http://breaking-bi.blogspot.com Llewellyn.wb@gmail.com Charlotte BI Group Organizer - http://charbigroup.com PASS Member and Speaker - https://www.pass.org About Me Brad Llewellyn Senior Analytics Associate – Data Science Syntelli Solutions

Sponsors A quick comment about sponsors. SQL Saturdays cannot take place without the funding provided by sponsors. The speakers are not paid. The organizers and other folks running around making sure this event runs smoothly are all volunteers. However, his facility, the food, and other expenses that go into putting on an event of this magnitude requires money. Sponsors provide that money. So, show your appreciation by saying hi and thank you when you stop by the sponsor tables to stuff your raffle ticket into the box. You might even take a couple of minutes to ask about their product and services. You may learn something valuable that you can bring back to your work, or that might become a career opportunity. It's all part of the very important networking you should be doing while you are here.

What is Data Science? Data Science Advanced Analytics

Scenario Contoso Technologies, Inc. (CTI) is an online technology retailer. When users sign into their site, they fill out a form with basic demographic information. CTI wants to use this information to predict the user’s income. This information will be used to determine which products the user should be offered on the site.

Agenda Let the Data Decide Tune Model Hyperparameters Postpone the Feature Engineering Extending with SQL, R and Python

Let the Data Decide

Regression Algorithms Linear Regression Simple Regression Ordinary Least Squares Regression Polynomial Regression Charizard Regression General Linear Model Generalized Linear Model Discrete Choice Regression Logistic Regression Multinomial Logit Regression Mixed Logit Regression Probit Regression Multinomial Probit Regression Ordered Logit Regression Ordered Probit Regression Poisson Regression Multilevel Regression Model Fixed Effects Regression Random Effects Regression Mixed Model Regression Nonlinear Regression Nonparametric Regression Semiparametric Regression Robust Regression Arceus Estimation Quantile Regression Isotonic Regression Principal Components Regression Least Angle Regression Local Regression Segmented Regression Errors-in-Variables Regression Least Squares Estimation Delcatty Residual Ordinary Least Squares Estimation Linear Estimation Partial Estimation Total Estimation Generalized Estimation Weighted Estimation Non-Linear Estimation Non-Negative Estimation Iteratively Reweighted Estimation Ridge Regression Least Absolute Deviations Estimation Rowlet Validation Bayesian Estimation Bayesian Multivariate Estimation Regression Model Validation Mean and Predicted Response Errors and Residuals Goodness of Fit Studentized Residual Gauss-Markov Theorem

Charizard Regression Arceus Estimation Rowlet Validation Regression Algorithms Linear Regression Simple Regression Ordinary Least Squares Regression Polynomial Regression Charizard Regression General Linear Model Generalized Linear Model Discrete Choice Regression Logistic Regression Multinomial Logit Regression Mixed Logit Regression Probit Regression Multinomial Probit Regression Ordered Logit Regression Ordered Probit Regression Poisson Regression Multilevel Regression Model Fixed Effects Regression Random Effects Regression Mixed Model Regression Nonlinear Regression Nonparametric Regression Semiparametric Regression Robust Regression Arceus Estimation Quantile Regression Isotonic Regression Principal Components Regression Least Angle Regression Local Regression Segmented Regression Errors-in-Variables Regression Least Squares Estimation Delcatty Residual Ordinary Least Squares Estimation Linear Estimation Partial Estimation Total Estimation Generalized Estimation Weighted Estimation Non-Linear Estimation Non-Negative Estimation Iteratively Reweighted Estimation Ridge Regression Least Absolute Deviations Estimation Rowlet Validation Bayesian Estimation Bayesian Multivariate Estimation Regression Model Validation Mean and Predicted Response Errors and Residuals Goodness of Fit Studentized Residual Gauss-Markov Theorem Charizard Regression Arceus Estimation Rowlet Validation

Traditional Data Scientist

Data Scientist with Azure Machine Learning

Data Scientist of the Future

Tune Model Hyper Parameters

WORK Why Does Modeling Take So Much Time? Hundreds of Model Types X Thousands of Hyperparameters Dozens of Cleansing Methods WORK

What If… The computer did all this work for us?

Demo!!!

Postpone the Feature Engineering

What is Feature Engineering? Goal: Improve Model Accuracy Using Existing Data Creating New Fields Aggregating Existing Fields Combining Existing Fields

Traditional Data Science Cycle Data Collection Data Cleansing Feature Engineering Model Creation Model Evaluation

Data Collection Model Creation Model Evaluation As Needed Data Cleansing Feature Engineering Agile Data Science Cycle Data Collection Model Creation Model Evaluation

Demo!!!

Extending with SQL, R and Python

Azure Machine Learning Data Science Space Data Science SQL, R and Python Azure Machine Learning

What Can We Do With SQL, R and Python? SQL Data Manipulation Feature Engineering Especially Cross-Dataset and Aggregate Features *It’s not T-SQL, it’s SQLite!* R and Python Especially Time-Series and Rolling Features Model Building Import Additional Libraries through Studio

Demo!!!

Senior Analytics Associate – Data Science Other Presentations Four Paths to Data Science Success using Microsoft Azure Azure Machine Learning Studio: Making Data Science Easy(er) What is a Data Scientist and How Do I Become One? Thank You! Brad Llewellyn Senior Analytics Associate – Data Science Syntelli Solutions @BreakingBI https://www.linkedin.com/in/bradllewellyn http://breaking-bi.blogspot.com Llewellyn.wb@gmail.com