Download presentation
Presentation is loading. Please wait.
Published byRose Andrews Modified over 8 years ago
1
Azure Machine Learning My first Data Science experiment Using Azure Machine Learning
2
Our Main Sponsors:
3
Speaker Florian Eiden @fleid_bi / fleid.frfleid.fr Cellenza 156, bd Haussmann 75008 Paris, France http://www.cellenza.com http://blog.cellenza.com
4
For who? BeginnerExperimented Beginner Experimented Machine Learning Azure ML
5
Agenda A quick word on Azure ML then: Two experimentations I’m the owner of a flat/condo in Paris and I want to sell it! I’m in marketing and I want my promotion emails to reach their targets through anti- spam software
6
Azure ML in one schema Business Need Business Value Modeling Deployment HDInsight SQL Server VM SQL DB Blobs & Tables Local Files Excel Files … Cloud Local Storage space IDE for Machine Learning Publication as a web service API Monetization ML Studio API Microsoft Azure Marketplace Web
7
First : Enable your ML Studio In the Azure portal, with an Azure account http://manage.windowsazure.com
8
Azure ML Studio http://studio.azureml.com
9
Before that : my 1st experimentation I want to sell my flat Paris, France 2 bedrooms 55 m2 … But at what price?
10
How to answer that?
11
Surface (m 2 ) Price (€) My flat A fair price!
12
But how to generalize? Thousands of price points (facts) Often hundreds of features (dimension attributes) Surface Nb of rooms Storage area Parking Exact Location in town Floor (correlated to the presence of a lift) Age of the building Empty or equipped Distance to metro / public transportation Distance to shops … Machine Learning!
13
Machine Learning Building a system that will learn from the existing data, detecting pattern and trends, so that it can predict a continuous value! Supervised Learning >> Regression
14
Linear Regression (1 feature) Surface (m 2 ) Price (€) My flat Market price y = ax + b y : price x : surface
15
My ML System My surface A good price estimate Machine Learning xy y = ax + b Surface (m 2 ) Price(€)
16
My ML System Input : x My surface Output : y An estimate of price h The hypothesis xy y = ax + b Surface (m 2 ) Price(€) y = h(x)
17
Input : x My surface Output : y An estimate of price h The hypothesis y = h(x) x θ0θ0 y = θ 1 x + θ 0 y = h(x) h(x) = h θ (x) = θ 0 + θ 1 x
18
Parameters ranking : Cost Function J(θ i ) : Cost Function Function of thetas, that calculate the total distance between my model and the training set x x y θ0θ0 Model A θ 0 = 1 θ 1 = 0 y = θ 1 x + θ 0 Model B θ 0 = 1 θ 1 = 0,25
19
Parameters ranking : Cost Function J(θ i ) : Cost Function Function of thetas, that calculate the total distance between my model and the training set x x y θ0θ0 Model A θ 0 = 1 θ 1 = 0 y = θ 1 x + θ 0 Model B θ 0 = 1 θ 1 = 0,25 J(θ 0,θ 1 ) = 25 J(θ 0,θ 1 ) = 5
20
The last piece of the puzzle Training Set Model type Cost Error Function … ? y = h(x) h(x) = h θ (x) = θ 0 + θ 1 x
21
The last piece of the puzzle Training Set Model type Cost Error Function Optimization Method y = h(x) h(x) = h θ (x) = θ 0 + θ 1 x
22
My ML System Input : x My surface Output : y An estimate of price h The hypothesis xy y = ax + b Surface (m 2 ) Price(€) y = h(x) - Cost Function - Optimization Method
23
Demo 1
24
Variance and Bias http://scott.fortmann-roe.com/docs/BiasVariance.html Underfit Overfit
25
My 2 nd experimentation As a spammer a marketing professional, how to be sure that my ads high value content optimize the ROI gets maximum viewing on the prospect listings I got from that shady company In short: I want to know if my messages are going to be flagged as spam or not before I send them
26
Exposing the API to users in Excel SPAM!
27
To get there…
28
Machine Learning Building a system that will learn from the existing data, detecting pattern and trends, so that it can predicts a category! Supervised Learning >> Classification
29
What features for my classification? 1st experimentation : surface, location, floor… Now? 1 line = 1 message LabelAttribut 0Attribut 1Attribut 2… Spam21 Ham4 31 Spam1 Ham12
30
Intuition Labeloffernewservicerevolutionize… Spam1111
31
The data set SpamAssassin : 6000+ mails, unstructured text
32
Standard approach Normalization url > #url Email > #email $,£,€ > #devise Removal of numbers, punctuation, stopwords, HTML tags Lower case Length from 3 to 10 max Stemming
33
Generation of the training corpus 6000 mails > N reference words We keep the top 10’000 by frequence of usage A set of 6000 lines, 10000 columns: LabelWord 0Word 1Word 2Word 3…Word 10000 Spam211 Ham411 31 Spam12 Ham121 … Spam1
34
Implementation in Azure ML The Hate No module for normalizing Has to be done before, in an ETL like data pipeline The Love We don’t need to do a full normalization! Feature Hashing using Vowpal-Wabbit
35
Demo 2 Sources SpamAssassin : http://spamassassin.apache.org/publiccorpus/http://spamassassin.apache.org/publiccorpus/ Coursera : Machine Learning par Andrew Ng (ex 6 – spam detection with SVM)Machine Learning Classifying Emails as Spam or Ham using RTextTools, Dennis Lee (blog)blog AzureML Web Service Scoring with Excel and Power Query, Rui Quintino (blog)blog
36
To go further For everyone 1 month free trial http://azure.com For MSDN subscribers Activate your Azure benefits http://aka.ms/azurepourmsdn Download now Included in almost all Office licences http://www.microsoft.com/en- us/powerBI/support/default.aspx NB : Power BI in Excel is not hosted at PowerBI.com, be aware when you try to download it
37
To go further : the communities sqlpass.org sqlport.com guss.pro
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.