Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

Slides:



Advertisements
Similar presentations
Chapter 4 Pattern Recognition Concepts: Introduction & ROC Analysis.
Advertisements

Learning Algorithm Evaluation
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Data Mining and SSIS A marriage made in heaven (or Redmond at least) Allan Mitchell SQL Server MVP.
Solving Problems in ETL using SSIS Allan Mitchell SQL Server MVP
Data Management Conference ETL In SQL Server 2008 Allan Mitchell London September 29th.
Lecture 22: Evaluation April 24, 2010.
Evaluation.
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
Model Evaluation Metrics for Performance Evaluation
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
© sebastian thrun, CMU, The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University
Computer Programming and Basic Software Engineering 4. Basic Software Engineering 1 Writing a Good Program 4. Basic Software Engineering 3 October 2007.
Evaluation.
Experimental Evaluation
Decision Tree Models in Data Mining
Evaluation of Learning Models
Today Evaluation Measures Accuracy Significance Testing
Evaluating Classifiers
CC0002NI – Computer Programming Computer Programming Er. Saroj Sharan Regmi Week 7.
An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden.
Evaluation – next steps
1 Evaluating Model Performance Lantz Ch 10 Wk 5, Part 2 Right – Graphing is often used to evaluate results from different variations of an algorithm. Depending.
REVIEW 2 Exam History of Computers 1. CPU stands for _______________________. a. Counter productive units b. Central processing unit c. Copper.
More value from data using Data Mining Allan Mitchell SQL Server MVP.
Lesson No: 11 Working with Formula, Function, Chart & Excel Tools CHBT-01 Basic Micro process & Computer Operation.
Chapter 11 Creating Formulas that Count and Sum Microsoft Excel 2003.
Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp )
The DM Process – MS’s view (DMX). The Basics  You select an algorithm, show the algorithm some examples called training example and, from these examples,
© FPT SOFTWARE – TRAINING MATERIAL – Internal use 04e-BM/NS/HDCV/FSOFT v2/3 Working with MSSQL Server Code:G0-C# Version: 1.0 Author: Pham Trung Hai CTD.
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Classification Performance Evaluation. How do you know that you have a good classifier? Is a feature contributing to overall performance? Is classifier.
CpSc 810: Machine Learning Evaluation of Classifier.
1. First of all we opened up a spreadsheet and started adding the data. 2. To work out the total cost for platinum, you times cell b5*c5 3. To calculate.
Integration Services in SQL Server 2008 Allan Mitchell SQL Server MVP.
Evaluating Results of Learning Blaž Zupan
1 Integration Services in SQL Server 2008 Allan Mitchell – SQLBits – Oct 2007.
Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.
1 Advanced Topics Using Microsoft SQL Server 2005 Integration Services Allan Mitchell – SQLBits – Oct 2007.
1 Evaluation of Learning Models Literature: Literature: T. Mitchel, Machine Learning, chapter 5 T. Mitchel, Machine Learning, chapter 5 I.H. Witten and.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Advanced Tips And Tricks For Power Query
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Classification Evaluation. Estimating Future Accuracy Given available data, how can we reliably predict accuracy on future, unseen data? Three basic approaches.
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 5-1 Data Mining Methods: Classification Most frequently used DM method Employ supervised.
Questions/problems with Data Export Wizard 27 Feb
Evaluating Classification Performance
Validation methods.
Math 3680 Lecture #15 Confidence Intervals. Review: Suppose that E(X) =  and SD(X) = . Recall the following two facts about the average of n observations.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Evaluating Classifiers Reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)An introduction to ROC analysis.
Chapter 5 – Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel & Bruce.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Evaluation of Learning Models Evgueni Smirnov. Overview Motivation Metrics for Classifier’s Evaluation Methods for Classifier’s Evaluation Comparing Data.
EXCEL DECISION MAKING TOOLS AND CHARTS BASIC FORMULAE - REGRESSION - GOAL SEEK - SOLVER.
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
Model Evaluation Saed Sayad
1 Introduction to the Excel ‘IF’ Function. 2 What is the ‘IF’ Function? The calculation is based on a condition that is either TRUE or FALSE. An Excel.
Spreadsheet Engineering
6/16/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
Chapter 19 PHP Part III Credits: Parts of the slides are based on slides created by textbook authors, P.J. Deitel and H. M. Deitel by Prentice Hall ©
Parameter Sniffing in SQL Server Stored Procedures
Performance Measures II
Introduction to Data Mining and Classification
ROC Curves and Operating Points
Model Evaluation and Selection
CS639: Data Management for Data Science
Module 14: Performing Predictive Analysis with Data Mining
ROC Curves and Operating Points
Presentation transcript:

Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com

Who am I SQL Server MVP SQL Server Consultant Joint author on Wrox Professional SSIS book Worked with SQL Server since version and

Today’s Schedule Mostly Demos Data Mining Add-In for Excel 2007 – Added XL Functions – Visualisation Methods

Today’s Schedule Added XL Functions - Not a lot of people know these exist – DMPREDICT – DMPREDICTTABLEROW – DMCONTENTQUERY – Only exist after add-in installed

Today’s Schedule Visualisation Methods – Accuracy Charts – Classification Matrix – Profit Charts – Folding (X-Validation) – Calculator (if we get time)

Excel Functions DMPREDICT Can take a variable number of arguments, the minimum being 3. The first parameter is the Analysis Services connection to be used. An empty string refers to the current (active) connection. The second parameter is the name of the mining model that will execute the prediction The third parameter, is the requested predicted entity (predictable column, in general, but could also be any prediction function) The function may also take up to 32 pairs of arguments. Each such pair contains the value and the name of an input (in this order, i.e. value followed by name).

Excel Functions DMPREDICTTABLEROW The first parameter is the Analysis Services connection to be used. An empty string refers the current (active) connection. The second parameter is the name of the mining model that will execute the prediction The third parameter, is the requested predicted entity (predictable column, in general, but could also be any prediction function) The fourth parameter is a range of cells to be passed as inputs The fifth parameter (optional) is a comma-separated list of column names to be used as names for the inputs

Excel Functions DMPREDICTTABLEROW If range of cells is form XL List Object Column Headers taken from List 5 th Parameter not necessary – Unless Column Name != Model Column Name

Excel Functions DMCONTENTQUERY The first parameter is the Analysis Services connection to be used. An empty string refers to the current (active) connection. The second parameter is the name of the mining model that will execute the prediction The third parameter, is the requested content column The fourth parameter is a WHERE clause to be appended to the content query

DEMO Data Mining Excel functions

Excel Add-In Great way of visualising Data Mining Takes away some of the mystery Easy to use Some wizards Freedom vs. flexibility

Accuracy Charts Compare 1-n models against – Another model – Best model – Thumb in the air model/no model/chance

Accuracy Charts Interpreting – How does a model compare with other models – What is the cumulative gain – Lift The real thing we want to see is..... – By how much do we beat the “chance” model

DEMO Accuracy Charts

Classification Matrix What are we interested in – How well did my model predict outcomes – False Positive – False Negative – True Positive – True Negative

Classification Matrix PredictedTRUEFALSE Actual TRUETrue PositiveFalse Negative (type 2 error) FALSEFalse Positive (type 1 error)True Negative

Classification Matrix A misclassification is not always a bad thing Consider – Predicted possibility of disease – Extra care/treatment given – Real result is “No disease” – Example of false positive – Is it such a bad thing?

DEMO Classification Matrix

Profit Charts Closely follows lift/cumulative gain chart Apply costs to efforts

Profit Charts Apply costs to – Initial/Fixed outlay – Cost per case – Return per case Target predictable column Target Outcome Count of cases to use

DEMO Profit Chart

X-Validation/Folding/Rotation Estimation Validates your model Tests whether model generally applicable Large variations in results between partitions – Model not generally applicable – May need tuning

X-Validation/Folding/Rotation Estimation Stratified K-Fold Cross Validation Creates K folds – Representative partitions Holds one partition out Trains model with others Tests with holdout partition Repeat (different holdout/test partition)* K

DEMO X-Validation/Folding/Rotation Estimation

Prediction Calculator Set costs and profits associated with – Getting the prediction right – Getting the prediction wrong See profit curves See profit threshold scores Pad for entering new data

Prediction Calculator Cloud Version available Print version available for later data entry Easy to use Easy to understand

DEMO Prediction Calculator

Thank you…