Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intro to Machine Learning

Similar presentations


Presentation on theme: "Intro to Machine Learning"— Presentation transcript:

1 Intro to Machine Learning
Jared Zagelbaum Intro to Machine Learning

2 October 30th Through November 3rd
Join the brightest data professionals focused on the Microsoft Data Platform! October 30th Through November 3rd Pre-Conference Sessions – Monday/Tuesday Conference – Wedneday through Friday

3 SQLSatuday #682 – After Party
4th Floor of Mall of America at 6:30 PM Sponsored By:

4 Thank you Sponsors! Platinum Sponsor: Gold Sponsors:

5 PASSMN – News/Info Sponsors: Board Member Elections:
Thanks to all our sponsors of 2017! We need Sponsors for 2018! Special thanks to our annual sponsor: Board Member Elections: 3 spots available for term. Your chance to help out the MN SQL community!

6 First, Data Science

7 Microsoft Team Data Science Process

8 5 Types of Data Science Questions
How much or how many? (regression) Which category? (classification) Which group? (clustering) Is this weird? (anomaly detection) Which option should be taken? (recommendation)

9 Define SMART success metrics
Specific Measurable Achievable Relevant Time-bound For example: Achieve customer churn prediction accuracy of X% by the end of this 3-month project, so that we can offer promotions to reduce churn.

10 How to do Data Science Brandon Rohrer, Senior Data Scientist at Microsoft

11 Modeling Feature Engineering, Model Fitting, and Model Evaluation

12 Feature Engineering Adding calculated fields and / or additional labels to your data set Removing fields is called “Feature Selection”

13 Common tasks in pre-processing / feature engineering
Data cleaning: Fill in or missing values, detect and remove noisy data and outliers. Data transformation: Normalize data to reduce dimensions and noise. Data reduction: Sample data records or attributes for easier data handling. Data discretization: Convert continuous attributes to categorical attributes for ease of use with certain machine learning methods. Text cleaning: remove embedded characters which may cause data misalignment, for e.g., embedded tabs in a tab-separated data file, embedded new lines which may break records, etc.

14 Model Fitting

15 Model Training Split the input data randomly for modeling into a training data set and a test data set. Build the models using the training data set. Evaluate (training and test dataset) a series of competing machine learning algorithms along with the various associated tuning parameters (known as parameter sweep) that are geared toward answering the question of interest with the current data. Determine the “best” solution to answer the question by comparing the success metric between alternative methods.

16 Model Evaluation Some common descriptive statistics…
Regression Coefficient of determination (R Squared) from 0 to 1 Relative Abs, Relative Squared, Root Mean Squared, and Mean Abs Error Classification ROC Curve, Confusion Matrix, Accuracy, Precision, Recall, F1 Recommendation NDCG Clustering Avg distance to cluster center , other center Maximal distance to cluster center

17 Cross Validation Leverages smaller data sets where 70 / 30 might not be feasible Helps avoid overfitting More accurate estimate of model performance

18 Deployment Where Data Science Becomes ML

19 Microsoft ML Platforms
Azure Machine Learning Microsoft Machine Learning Services in SQL Server Microsoft Machine Learning Server Data Science Virtual Machine Spark MLLib in HDInsight Batch AI Training Service Microsoft Cognitive Toolkit Microsoft Cognitive Services

20 Demo


Download ppt "Intro to Machine Learning"

Similar presentations


Ads by Google