Intro to Machine Learning

Slides:



Advertisements
Similar presentations
Random Forest Predrag Radenković 3237/10
Advertisements

Molecular Biomedical Informatics 分子生醫資訊實驗室 Machine Learning and Bioinformatics 機器學習與生物資訊學 Machine Learning & Bioinformatics 1.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Introduction to Data Mining with XLMiner
Evaluating data quality issues from an industrial data set Gernot Liebchen Bheki Twala Mark Stephens Martin Shepperd Michelle.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Major Tasks in Data Preprocessing(Ref Chap 3) By Prof. Muhammad Amir Alam.
Evaluating Classifiers
Overview DM for Business Intelligence.
Methodology Qiang Yang, MTM521 Material. A High-level Process View for Data Mining 1. Develop an understanding of application, set goals, lay down all.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Data Mining and Decision Support
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Azure Machine Learning My first Data Science experiment Using Azure Machine Learning.
Show Me Potential Customers Data Mining Approach Leila Etaati.
9/24/2017 7:27 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Experience Report: System Log Analysis for Anomaly Detection
Bhakthi Liyanage SQL Saturday Atlanta 15 July 2017
2/13/2018 4:38 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
4/18/2018 3:49 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Machine Learning with Spark MLlib
Big Data is a Big Deal!.
Danielle Dean (Microsoft), Data Science
Evaluating Classifiers
Automated Enterprise-wide SQL Server Auditing
Predicting Azure Consumption using Ensemble Learning
Make Predictions Using Azure Machine Learning Studio
It’s All About Me From Big Data Models to Personalized Experience
CSE 4705 Artificial Intelligence
Chapter 6 Classification and Prediction
Azure Machine Learning Algorithm Accuracy Enhancement, Tips and Tricks
Reading execution plans successfully
Introduction to R Programming with AzureML
Introduction to Data Science Lecture 7 Machine Learning Overview
Dipartimento di Ingegneria «Enzo Ferrari»,
Accelerate your advanced analytics practice using solution templates
Vincent Granville, Ph.D. Co-Founder, DSC
Advanced Analytics. Advanced Analytics What is Machine Learning?
CSE 4705 Artificial Intelligence
Dive into Predictive Maintenance using Cortana Intelligence Suite
11/21/ :32 PM BRK3316 Operationalizing Microsoft Cognitive Toolkit and TensorFlow models with HDInsight Spark Mary Wahl Data Scientist, AI Enablement.
TED Talks – A Predictive Analysis Using Classification Algorithms
Microsoft Ignite NZ October 2016 SKYCITY, Auckland.
Alain Goossens & Jean-Pierre Van Loo Data scientists – SII Belgium
Azure Machine Learning Studio: Four Tips from the Pros
Machine Learning: Lecture 3
Classification & Prediction
Designing SSIS Packages for Performance
Classification and Prediction
Text Analytics and Machine Learning Workshop Machine Learning Session
Lecture 6: Introduction to Machine Learning
CSCI N317 Computation for Scientific Applications Unit Weka
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Intro to Machine Learning
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
SSIS Project Deployment: The T-SQL Way
Chapter 7: Transformations
Machine learning overview
Avoid Overfitting in Classification
CS639: Data Management for Data Science
Jia-Bin Huang Virginia Tech
Introduction to Machine learning
Machine Learning in Business John C. Hull
SSDT, Docker, and (Azure) DevOps
Getting Started with Microsoft Azure Machine Learning
Creating a Marketing Dashboard with Power BI & Dax
DAX: Functions and Context That’s What It’s All About!
An Introduction to Partitioning
Presentation transcript:

Intro to Machine Learning Jared Zagelbaum Intro to Machine Learning

October 30th Through November 3rd Join the brightest data professionals focused on the Microsoft Data Platform! October 30th Through November 3rd Pre-Conference Sessions – Monday/Tuesday Conference – Wedneday through Friday

SQLSatuday #682 – After Party 4th Floor of Mall of America at 6:30 PM Sponsored By:

Thank you Sponsors! Platinum Sponsor: Gold Sponsors:

PASSMN – News/Info Sponsors: Board Member Elections: Thanks to all our sponsors of 2017! We need Sponsors for 2018! Special thanks to our annual sponsor: Board Member Elections: 3 spots available for 2018-2019 term. Your chance to help out the MN SQL community!

First, Data Science

Microsoft Team Data Science Process

5 Types of Data Science Questions How much or how many? (regression) Which category? (classification) Which group? (clustering) Is this weird? (anomaly detection) Which option should be taken? (recommendation)

Define SMART success metrics Specific Measurable Achievable Relevant Time-bound For example: Achieve customer churn prediction accuracy of X% by the end of this 3-month project, so that we can offer promotions to reduce churn.

How to do Data Science Brandon Rohrer, Senior Data Scientist at Microsoft

Modeling Feature Engineering, Model Fitting, and Model Evaluation

Feature Engineering Adding calculated fields and / or additional labels to your data set Removing fields is called “Feature Selection”

Common tasks in pre-processing / feature engineering Data cleaning: Fill in or missing values, detect and remove noisy data and outliers. Data transformation: Normalize data to reduce dimensions and noise. Data reduction: Sample data records or attributes for easier data handling. Data discretization: Convert continuous attributes to categorical attributes for ease of use with certain machine learning methods. Text cleaning: remove embedded characters which may cause data misalignment, for e.g., embedded tabs in a tab-separated data file, embedded new lines which may break records, etc.

Model Fitting

Model Training Split the input data randomly for modeling into a training data set and a test data set. Build the models using the training data set. Evaluate (training and test dataset) a series of competing machine learning algorithms along with the various associated tuning parameters (known as parameter sweep) that are geared toward answering the question of interest with the current data. Determine the “best” solution to answer the question by comparing the success metric between alternative methods.

Model Evaluation Some common descriptive statistics… Regression Coefficient of determination (R Squared) from 0 to 1 Relative Abs, Relative Squared, Root Mean Squared, and Mean Abs Error Classification ROC Curve, Confusion Matrix, Accuracy, Precision, Recall, F1 Recommendation NDCG Clustering Avg distance to cluster center , other center Maximal distance to cluster center

Cross Validation Leverages smaller data sets where 70 / 30 might not be feasible Helps avoid overfitting More accurate estimate of model performance

Deployment Where Data Science Becomes ML

Microsoft ML Platforms Azure Machine Learning Microsoft Machine Learning Services in SQL Server Microsoft Machine Learning Server Data Science Virtual Machine Spark MLLib in HDInsight Batch AI Training Service Microsoft Cognitive Toolkit Microsoft Cognitive Services

Demo