Presentation is loading. Please wait.

Presentation is loading. Please wait.

Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

Similar presentations


Presentation on theme: "Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com."— Presentation transcript:

1 Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com

2 Who am I SQL Server MVP SQL Server Consultant Joint author on Wrox Professional SSIS book Worked with SQL Server since version 6.5 www.SQLDTS.com and www.SQLIS.com www.SQLDTS.comwww.SQLIS.com

3 Today’s Schedule Mostly Demos Data Mining Add-In for Excel 2007 – Added XL Functions – Visualisation Methods

4 Today’s Schedule Added XL Functions - Not a lot of people know these exist – DMPREDICT – DMPREDICTTABLEROW – DMCONTENTQUERY – Only exist after add-in installed

5 Today’s Schedule Visualisation Methods – Accuracy Charts – Classification Matrix – Profit Charts – Folding (X-Validation) – Calculator (if we get time)

6 Excel Functions DMPREDICT Can take a variable number of arguments, the minimum being 3. The first parameter is the Analysis Services connection to be used. An empty string refers to the current (active) connection. The second parameter is the name of the mining model that will execute the prediction The third parameter, is the requested predicted entity (predictable column, in general, but could also be any prediction function) The function may also take up to 32 pairs of arguments. Each such pair contains the value and the name of an input (in this order, i.e. value followed by name).

7 Excel Functions DMPREDICTTABLEROW The first parameter is the Analysis Services connection to be used. An empty string refers the current (active) connection. The second parameter is the name of the mining model that will execute the prediction The third parameter, is the requested predicted entity (predictable column, in general, but could also be any prediction function) The fourth parameter is a range of cells to be passed as inputs The fifth parameter (optional) is a comma-separated list of column names to be used as names for the inputs

8 Excel Functions DMPREDICTTABLEROW If range of cells is form XL List Object Column Headers taken from List 5 th Parameter not necessary – Unless Column Name != Model Column Name

9 Excel Functions DMCONTENTQUERY The first parameter is the Analysis Services connection to be used. An empty string refers to the current (active) connection. The second parameter is the name of the mining model that will execute the prediction The third parameter, is the requested content column The fourth parameter is a WHERE clause to be appended to the content query

10 DEMO Data Mining Excel functions

11 Excel Add-In Great way of visualising Data Mining Takes away some of the mystery Easy to use Some wizards Freedom vs. flexibility

12 Accuracy Charts Compare 1-n models against – Another model – Best model – Thumb in the air model/no model/chance

13 Accuracy Charts Interpreting – How does a model compare with other models – What is the cumulative gain – Lift The real thing we want to see is..... – By how much do we beat the “chance” model

14 DEMO Accuracy Charts

15 Classification Matrix What are we interested in – How well did my model predict outcomes – False Positive – False Negative – True Positive – True Negative

16 Classification Matrix PredictedTRUEFALSE Actual TRUETrue PositiveFalse Negative (type 2 error) FALSEFalse Positive (type 1 error)True Negative

17 Classification Matrix A misclassification is not always a bad thing Consider – Predicted possibility of disease – Extra care/treatment given – Real result is “No disease” – Example of false positive – Is it such a bad thing?

18 DEMO Classification Matrix

19 Profit Charts Closely follows lift/cumulative gain chart Apply costs to efforts

20 Profit Charts Apply costs to – Initial/Fixed outlay – Cost per case – Return per case Target predictable column Target Outcome Count of cases to use

21 DEMO Profit Chart

22 X-Validation/Folding/Rotation Estimation Validates your model Tests whether model generally applicable Large variations in results between partitions – Model not generally applicable – May need tuning

23 X-Validation/Folding/Rotation Estimation Stratified K-Fold Cross Validation Creates K folds – Representative partitions Holds one partition out Trains model with others Tests with holdout partition Repeat (different holdout/test partition)* K

24 DEMO X-Validation/Folding/Rotation Estimation

25 Prediction Calculator Set costs and profits associated with – Getting the prediction right – Getting the prediction wrong See profit curves See profit threshold scores Pad for entering new data

26 Prediction Calculator Cloud Version available Print version available for later data entry Easy to use Easy to understand

27 DEMO Prediction Calculator

28 Thank you… allan.mitchell@konesans.com


Download ppt "Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com."

Similar presentations


Ads by Google