Mickey Gousset Principal Consultant / Team Lead

You want to see smart computers? You need to check out machine learning!
Mickey Gousset Principal Consultant / Team Lead Infront Consulting Group Mickeygousset.com @mickey_gousset

What is Tupelo, MS “World-Famous” For?

Agenda What is Machine Learning DEMO Our First Machine Learning Model
Deploying The Model Testing The Model Using Python For Data Analysis

What is Machine Learning?
Computer "learns" from data in order to perform predictive analytics Credit-card fraud detection Online shopping recommendations Self-driving cars and more Supervised learning Unsupervised learning Machine Learning finds patterns in large volumes of data and uses those patterns to perform predictive analysis. Microsoft offers Azure Machine Learning, while Amazon offers Amazon Machine Learning and Google offers the Google Prediction API. Software products such as MATLAB support traditional, non-cloud-based ML modeling. Machine learning models fall into two broad categories: supervised and unsupervised. In supervised learning, the model is "trained" with a large volume of data and algorithms are then used to predict an outcome from future inputs. Most supervised learning models use regression algorithms to compute an outcome from a continuous set of possible outcomes (for example, your score on a test), or classification algorithms to compute the probability of an outcome from a finite set of possible outcomes (for example, the probability that an is spam or a credit-card transaction is fraudulent). In unsupervised learning, the computer isn't trained, but is presented with a set of data and challenged to find relationships in it. K-Means Clustering is a common unsupervised learning algorithm. For a great explanation of how it works, see

Azure Machine Learning
Fully managed cloud service for building and operationalizing ML models Fully managed Integrated Best in Class Algorithms + R Deploy in minutes No software to install, no hardware to manage, and one portal to view and update. Azure Machine Learning is a cloud-based predictive-analytics service that offers a streamlined experience for data scientists of all skill levels. It's accompanied by the Azure Machine Learning Studio (ML Studio), which is a browser-based tool that provides an easy to use, drag-and-drop interface for building machine-learning models. It comes with a library of time-saving experiments and features best-in-class algorithms developed and tested in the real world by Microsoft businesses such as Bing. And its built-in support for R and Python means you can build custom scripts to customize your model. Once you've built and trained your model in the ML Studio, you can easily expose it as a Web service that is consumable from a variety of programming languages, or share it with the community by placing it in the Cortana Intelligence Gallery. Simple drag, drop, and connect interface for Data Science. No need for programming for common tasks. Built-in collection of best of breed algorithms. Support for R and popular CRAN packages. Operationalize models with a single click. Monetize in Machine Learning Marketplace.

Machine Learning Process

WOULD YOU HAVE SURVIVED THE TITANIC?

WOULD YOU BRIAN HAVE SURVIVED THE TITANIC?
I prebuilt some of this in ML_Testing Would you have survived on the Titanic? This is a classic example that people use to teach Machine Learning, so I thought it would work well here. Let’s predict if my good friend Brian Randell would have survived on the Titanic. Step 1: Get Access to Azure Machine Learning Studio ML Studio is part of Microsoft Azure. You can sign up as a guest for access, or get access using your Azure Subscription. I’m going to use my azure subscription New | Data + Analytics | Machine Learning Workspace Studio Explain the different options Launch the Machine Learning Workspace Let’s Create Our Titanic Experiment Click New | Blank Experiment Change the title The Data Open the train data and explain the columns Survivied – 1 = yes, 0 = no Pclass – socioeconomic status: 1=upper, 2 = middle, 3 = lower Sibsp – sibnling spouse Parch – parent, child New | DataSet | From Local File Show how to do this, but don’t actually do it. We will use the preloaded file Drag the data onto the experiment Columns are called ‘Features” Notice how you can drage around, zoom in and out Right-click and visualize One of the keys is to know your data When you click a class, you can see stats to the right We have some rows that are missing some data, and some data may not be needed Manipulating the Data Some columns are not meaningful in predicting what we care about. Women/children were evacuated first, so age/gender is important. PassengerID is just a primary key. It doesn’t matter. Neither does name. Let’s pare down our dataset some Use the search functionality to Drag a Select Columns In dataset task to the workspace Connect the dataset to the new task – notice the red and green color coding This task lets us specify the columns we think are significant, that we care about for the prediction. We also need to select the column we are going to predict “Survived” Notice how the task as a red exclamation point meaning it needs to be addressed Select the task and click Launch Column Selector Can select columns by name or by rules I’ll use by column Survived Pclass Sex Age Parch Sibsp At this point I can run my model and look at the output of the select. Run and View Best Practice do this every so often, as well as save Visualize the data set. Notice that there is still data missing for Age. We want to pull out those rows. Select Survived. Notice the graph. Because a passer either dies or doesn’t, there are no middle values. However, because this is a numeriv value, ML doesn’t know that. We need to make this (as well as Pclass, SibSP, Parch) categorical values. A categorical value is a value that can take on one of a limited (usually fixed) number of possible values. Add Edit Metadata to Page Connect Select Columns in Dataset to Edit Metadata Select: Survivied, Pclass, Sibsp, Parch We can use this task to change column names, and do other data modifications. Select Make Categorical. Save and test Add a Clean Missing Data Task Connect Edit Metadata to it Select Age column and Remove Entire Row. Notice the other options Save and Run Split Data for Training vs Testing Whenever we execute ML Experiments, we use some of the data to train the model and some to test it. Split Data Task Allows us to divie up the data, some to trya nd find patterns and some to test if the model we created was successful. Do an 80/20 split and test Creating the Model So far we have done a bunch of data manipulation. Now we can create the actual model Add a Train Model task Connect the Split Data to the Train model Select the Survived column, to tell our model what we are training for We need a model to train. At this point if you wanted you could import your own R code or Python code But we are going to try some of the built in standard algorithms Different types of ML use different algorithms. Since we are trying to predict if an output has one of two values, we want to use a two-class algorithm to train our model. Two class algorithms are used to predict outcomes that can only have two possible values. Search two-class in the task bar. There are a number of different algorithms ML Algorithm Cheat Sheet: Open Cheat Sheet Select Two Class Decision Forest and connect it to the training model. Take default values After the model is trained, we need to see how well it predicts survival. So we need to score it against the 20% remainder of our data Add a Score Model Task and link it up Finall, we add an Evaluate Model task to evaluate the results Run and Interpret The Results Run the experiment Visualize results of the run First the Graph. The closer the graph is to a straight line, the more your model is guessing randomly. You want to get your line as close to the upper left corner as possible. You want as large an Area Under The Curve as possible. ROC – Reciever Operating Charactieric) Scroll down for detailed results The closer AUC is to 1, the better the model is at making predictions Confusion Matrix True Positive – how often the model correctly predicted someone would survive False Positive – How often the model predticted someone would survive, when they did not (i.e. the model was incorrect) True Negative – How often the model predicted a passenger would not survive False Negative – how often the model predicted someone died when they actually survived. (i.e. the model was incorrect) You want high values for True positive/negative, and low values for False Positive/Negative Accuracy – Sum of all the correct predictions divided by the total number of predictions. Good value for evenly distributed data points but badly for skewed datapoints. Precision – True Positive / TP + FP. This metric is concerned with the numner of correct positive predictions. “Of those predicted to the positive, how many were actually predicted correctly” Recall – also called the True Positive Rate. TP / (TP+FN). Concerned with the number of correctly predicted positive events. “Of those positive events, how many were predticted correctly” F1 Score – 2* (precision * recall) / precision * recall. Harmonic mean of precision and recall and is a good way to summaraize the evaluation of the algorithm in a single number Precision vs recall If precision or recall is high, indicates patients with breast cancer are diagnosed correctly If precision is low, more patients without breast cancer are diagnosed as having cancer If recall is low, more pateitns with brest cancer are diagnosed as not having cancer At this point we have a working experiment and I’m happy with the accuracy. If I wasn’t, I could try other models or I could try more data. Now I want to use this to predict if Brian would survive the titanic. I need to take my model and deploy it as a web service. Then I can have a website or app call it and see what happens CREATING A WEB SERVICE Convert the training experiement into a predictive experiment. This getst the model ready to be deployed as as web service. Run my model one more time. Click Setup Web service | Predictive Web Service This creates a new Preditive experiment for your web service. The predictive model doesn’t have as many components as your original experiement You don’t need a data set because when someone calls the web service they will pass in data You still need to identify eh columns Your algorithm and trainemodel tasks have become a single trained model No need to evaluate. Just score at the end. Web Service Input and Output Run the experiement – this can be slow, so be prepared to talk Click Deploy Web Service (new) Preview Click Test Web Service Batch test using the test.csv file TESTING THE WEB SERVICE Go to Consume. Open Excel, copy some data in and test Select Python 3 code Back in workspace, create a new notebook – Python 3 Copy in code and run Need to look up more on jupyter notebooks Let’s create a website where we can test this Go to portla and create an Azure ML WeB App WouldBrianSurvive.azurewebsites.net Configure the site then demo it Let’s go back and add another model to see if we can get a better result As well as change it where it only asks for the values we need. Add Two-Class Logitics Regression, Train Model, Score Model We can see the second model has a lower accuracy. Let’s see if Brian survives using that model. Change the web service input to Edit Metadata Talk about the Cortana Intelligence Gallery: Let’s Look At Another Example That Also Uses Some Python Flight Delays and Jupyter Notebooks Execute the runbook and look at the python output

Wrap-up Machine Learning is awesome, go try it!
Thank you for attending my talk!

Mickey Gousset Principal Consultant / Team Lead
Infront Consulting Group – Microsoft MVP – Visual Studio Tools and Development Technologies – 13 Years Author, Speaker, Father (2 beautiful daughters), Geek Lives in Tupelo, Mississippi, USA

Mickey Gousset Principal Consultant / Team Lead

Similar presentations

Presentation on theme: "Mickey Gousset Principal Consultant / Team Lead"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Mickey Gousset Principal Consultant / Team Lead

Similar presentations

Presentation on theme: "Mickey Gousset Principal Consultant / Team Lead"— Presentation transcript:

Similar presentations

About project

Feedback