Intro to Machine Learning

Slides:

Advertisements

Similar presentations

UNIT-2 Data Preprocessing LectureTopic ********************************************** Lecture-13Why preprocess the data? Lecture-14Data cleaning Lecture-15Data.

Advertisements

Random Forest Predrag Radenković 3237/10

Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.

Molecular Biomedical Informatics 分子生醫資訊實驗室 Machine Learning and Bioinformatics 機器學習與生物資訊學 Machine Learning & Bioinformatics 1.

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.

Computational Biology Lecture Slides Week 10 Classification (some parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar)

Data Mining: A Closer Look Chapter Data Mining Strategies.

Evaluating data quality issues from an industrial data set Gernot Liebchen Bheki Twala Mark Stephens Martin Shepperd Michelle.

Evaluation of MineSet 3.0 By Rajesh Rathinasabapathi S Peer Mohamed Raja Guided By Dr. Li Yang.

Major Tasks in Data Preprocessing(Ref Chap 3) By Prof. Muhammad Amir Alam.

Classification and Prediction: Regression Analysis

Evaluating Classifiers

Anomaly detection Problem motivation Machine Learning.

SVMLight SVMLight is an implementation of Support Vector Machine (SVM) in C. Download source from :

Overview DM for Business Intelligence.

Methodology Qiang Yang, MTM521 Material. A High-level Process View for Data Mining 1. Develop an understanding of application, set goals, lay down all.

Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.

Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.

Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.

Data Mining and Decision Support

A new clustering tool of Data Mining RAPID MINER.

1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.

Azure Machine Learning Introduction to Azure ML. Setting Expectations This presentation is for you if…  you hear the buzzword “Machine Learning” and.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Manifold Learning JAMES MCQUEEN – UW DEPARTMENT OF STATISTICS.

Show Me Potential Customers Data Mining Approach Leila Etaati.

9/24/2017 7:27 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.

Experience Report: System Log Analysis for Anomaly Detection

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Machine Learning with Spark MLlib

Danielle Dean (Microsoft), Data Science

Evaluating Classifiers

Predicting Azure Consumption using Ensemble Learning

Make Predictions Using Azure Machine Learning Studio

It’s All About Me From Big Data Models to Personalized Experience

CSE 4705 Artificial Intelligence

Chapter 6 Classification and Prediction

Azure Machine Learning Algorithm Accuracy Enhancement, Tips and Tricks

CH 5: Multivariate Methods

Introduction to R Programming with AzureML

Data Mining 101 with Scikit-Learn

Introduction to Data Science Lecture 7 Machine Learning Overview

Dipartimento di Ingegneria «Enzo Ferrari»,

A Time Series Representation Framework Based on Learned Patterns

Vincent Granville, Ph.D. Co-Founder, DSC

Advanced Analytics. Advanced Analytics What is Machine Learning?

Intro to Machine Learning

CSE 4705 Artificial Intelligence

Mitchell Kossoris, Catelyn Scholl, Zhi Zheng

TED Talks – A Predictive Analysis Using Classification Algorithms

Microsoft Ignite NZ October 2016 SKYCITY, Auckland.

Azure Machine Learning Studio: Four Tips from the Pros

Machine Learning: Lecture 3

Classification & Prediction

Classification and Prediction

Text Analytics and Machine Learning Workshop Machine Learning Session

Lecture 6: Introduction to Machine Learning

CSCI N317 Computation for Scientific Applications Unit Weka

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Chapter 7: Transformations

Machine learning overview

Avoid Overfitting in Classification

MIS2502: Data Analytics Classification Using Decision Trees

Assignment 1: Classification by K Nearest Neighbors (KNN) technique

CS639: Data Management for Data Science

Jia-Bin Huang Virginia Tech

Introduction to Machine learning

Machine Learning in Business John C. Hull

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

Presentation transcript:

Intro to Machine Learning

5 Types of Data Science Questions How much or how many? (regression) Which category? (classification) Which group? (clustering) Is this weird? (anomaly detection) Which option should be taken? (recommendation)

Define SMART success metrics Specific Measurable Achievable Relevant Time-bound For example: Achieve customer churn prediction accuracy of X% by the end of this 3- month project, so that we can offer promotions to reduce churn.

Brandon Rohrer, Senior Data Scientist at Microsoft How to do Data Science Brandon Rohrer, Senior Data Scientist at Microsoft

Microsoft Team Data Science Process

Feature Engineering Adding calculated fields and / or additional labels to your data set Removing fields is called “Feature Selection”

Common tasks in pre-processing / feature engineering Data cleaning: Fill in or missing values, detect and remove noisy data and outliers. Data transformation: Normalize data to reduce dimensions and noise. Data reduction: Sample data records or attributes for easier data handling. Data discretization: Convert continuous attributes to categorical attributes for ease of use with certain machine learning methods. Text cleaning: remove embedded characters which may cause data misalignment, for e.g., embedded tabs in a tab-separated data file, embedded new lines which may break records, etc.

Model Fitting

Model Training Split the input data randomly for modeling into a training data set and a test data set. Build the models using the training data set. Evaluate (training and test dataset) a series of competing machine learning algorithms along with the various associated tuning parameters (known as parameter sweep) that are geared toward answering the question of interest with the current data. Determine the “best” solution to answer the question by comparing the success metric between alternative methods.

Model Evaluation Regression Classification Recommendation Clustering Coefficient of determination (R Squared) from 0 to 1 Relative Abs, Relative Squared, Root Mean Squared, and Mean Abs Error Classification ROC Curve, Confusion Matrix, Accuracy, Precision, Recall, F1 Recommendation NDCG Clustering Avg distance to cluster center , other center Maximal distance to cluster center

Cross Validation Leverages smaller data sets where 70 / 30 might not be feasible Helps avoid overfitting More accurate estimate of model performance

Deployment R / Python (SQL Server 2017) Machine Learning Services (In-database) Azure ML