Module 14: Performing Predictive Analysis with Data Mining Course 10778A Module 14: Performing Predictive Analysis with Data Mining Module 14 Performing Predictive Analysis with Data Mining
Module 14: Performing Predictive Analysis with Data Mining Course 10778A Module Overview Module 14: Performing Predictive Analysis with Data Mining Overview of Data Mining Creating a Data Mining Solution Validating a Data Mining Model Consuming Data Mining Data
Lesson 1: Overview of Data Mining Course 10778A Lesson 1: Overview of Data Mining Module 14: Performing Predictive Analysis with Data Mining What Is the Purpose of Data Mining? Components of an Analysis Services Data Mining Solution Analysis Services Data Mining Algorithms Data Mining Add-ins for Excel
What is the Purpose of Data Mining? Course 10778A What is the Purpose of Data Mining? Module 14: Performing Predictive Analysis with Data Mining Analysis of large data sets to reveal hidden patterns and trends Data mining algorithms perform different types of statistical analyses for different scenarios Data mining has a wide range of applications, for example: Sales forecasting Targeted advertising Providing online recommendations Risk assessment .
Components of an Analysis Services Data Mining Solution Course 10778A Components of an Analysis Services Data Mining Solution Module 14: Performing Predictive Analysis with Data Mining Data Mining Structure Contains data source view Contains case table and mining structure columns Contains data mining models Specifies training set and testing set Case Table Stores source data for data mining models Columns have defined data types and content type Data Mining Model Uses a single data mining algorithm Includes columns from data mining structure
Analysis Services Data Mining Algorithms Course 10778A Analysis Services Data Mining Algorithms Module 14: Performing Predictive Analysis with Data Mining Classification algorithms Microsoft Decision Trees Microsoft Neural Network Microsoft Naive Bayes Regression algorithms Microsoft Time Series Microsoft Linear Regression Microsoft Logistic Regression Segmentation or clustering algorithms Microsoft Clustering Association algorithms Microsoft Association Sequence analysis algorithms Microsoft Sequence Clustering
Data Mining Add-ins for Excel Course 10778A Data Mining Add-ins for Excel Module 14: Performing Predictive Analysis with Data Mining Data Mining Client for Excel Build, validate, and manage data models Browse and query data mining models You may also want to draw students’ attention to the new data mining cloud functionality. For more information, see the BILabs blog at http://go.microsoft.com/fwlink/?LinkID=248850. Table Analysis Tools for Excel Perform a range of table analyses No knowledge of data mining required
Demonstration: Performing Table Analysis in Excel Course 10778A Demonstration: Performing Table Analysis in Excel Module 14: Performing Predictive Analysis with Data Mining In this demonstration, you will see how to: Create a connection to Analysis Services Use the Excel table analysis tools Task 1: Create a connection to Analysis Services Ensure that the MIA-DC1 and MIA-SQLBI virtual machines are both running, and then log on to MIA- SQLBI as ADVENTUREWORKS\Student with the password Pa$$w0rd. Then in the D:\10778A\Demofiles\Mod14 folder, run Setup.cmd as Administrator. In the D:\10778A\Demofiles\Mod14 folder, double-click Table Analysis.xlsx to open it in Excel. On the ribbon, in the Table Tools group, click the Analyze tab, and then click <No Connection>. In the Analysis Services Connections dialog box, click New, in the Connect to Analysis Services dialog box, in the Server name field, type (local), in the Catalog name drop-down list, click DMAddinsDB, and then click OK. In the Analysis Services Connections dialog box, click Close. Task 2: Use the Excel table analysis tools On the ribbon, click Analyze Key Influencers. In the SQL Server Data Mining - Analyze Key Influencers dialog box, in the Column Selection drop- down list, click Purchased Bike, and then click Run. In the SQL Server Data Mining – Discrimination based on key influencers dialog box, in the Compare Value 1 drop-down list, click Yes, and then in the to Value 2 drop-down list, click No. Click Add Report, and then click Close. Review the Key Influencers Report for ‘Purchased Bike’ report. Note the values that most strongly correlate with a customer purchasing a bike. Close Excel, and do not save changes to Table Analysis.xlsx.
Lesson 2: Creating a Data Mining Solution Course 10778A Lesson 2: Creating a Data Mining Solution Module 14: Performing Predictive Analysis with Data Mining Creating Data Mining Solutions Editing Data Mining Structures and Models
Creating Data Mining Solutions Course 10778A Creating Data Mining Solutions Module 14: Performing Predictive Analysis with Data Mining SQL Server Data Tools Data Mining Wizard Data Mining Designer Data Mining Client for Excel
Demonstration: Using the Data Mining Wizard Course 10778A Demonstration: Using the Data Mining Wizard Module 14: Performing Predictive Analysis with Data Mining In this demonstration, you will see how to: Create a data mining project in SQL Server Data Tools Create a data mining structure and a data mining model Task 1: Create a data mining project in SQL Server Data Tools Ensure that you have completed the previous demonstration. Click Start, click All Programs, click Microsoft SQL Server 2012, and then click SQL Server Data Tools. In SQL Server Data Tools, on the File menu, click New, and click Project. In the New Project dialog box, click Analysis Services Multidimensional and Data Mining Project, in the Name field, type, Mine AW, in the Location field, browse to D:\10778A\Demofiles\Mod14, and then click OK. In Solution Explorer, right-click Data Sources, and then click New Data Source. In the Data Source Wizard, on the Welcome to the Data Source Wizard page, click Next, and then on the Select how to define the connection page, click New. In the Connection Manager dialog box, in the Server name field, type localhost, in the Select or enter a database name drop-down list, click AdventureWorksDW, and then click OK. In the Data Source Wizard, on the Select how to define the connection page, click Next, on the Impersonation Information page, click Use a specific Windows user name and password, in the User name field, type ADVENTUREWORKS\Student, in the Password field, type Pa$$w0rd, click Next, and then on the Completing the wizard page, click Finish. In Solution Explorer, right-click Data Source Views, and then click New Data Source View. In the Data Source View Wizard, on the Welcome to the Data Source View Wizard page, click Next, on the Select a Data Source page, ensure that Adventure Works DW is selected, and then click Next. On the Select Tables and Views page, in the Available objects list, click ProspectiveBuyer (dbo), hold the Ctrl key and click vTargetMail (dbo), and click the > button to move the selected objects to the Included objects list. Then click Next. On the Completing the Wizard page, in the Name field, type AW DW View, and then click Finish. Task 2: Create a data mining structure and a data mining model In Solution Explorer, right-click Mining Structures, click New Mining Structure, and then in the Data Mining Wizard, on the Welcome to the Data Mining Wizard page, click Next. On the Select the Definition Method page, ensure that From existing relational database or data warehouse is selected, and then click Next. On the Create the Data Mining Structure page, ensure that the Create mining structure with mining model radio button is selected, in the Which data mining technique do you want to use? drop-down list, ensure that Microsoft Decision Trees is selected, and then click Next. On the Select a Data Source View page, select AW DW View and click Next. On the Specify Table Types page, in the vTargetMail row, select the check box in the Case column, and then click Next.
Editing Data Mining Structures and Models Course 10778A Editing Data Mining Structures and Models Module 14: Performing Predictive Analysis with Data Mining SQL Server Data Tools Data Mining Designer Data Mining Client for Excel
Demonstration: Modifying a Data Mining Structure Course 10778A Demonstration: Modifying a Data Mining Structure Module 14: Performing Predictive Analysis with Data Mining In this demonstration, you will see how to: Connect to a data mining model Add a model to a data mining structure Task 1: Connect to a data mining model Ensure that you have completed the previous demonstrations in this module. Click Start, click All Programs, click Microsoft Office, and then click Microsoft Excel 2010. On the ribbon, click the Data Mining tab, in the Connection area, click DMAddinsDB ((local)), and then in the Analysis Services Connections dialog box, click New. In the Connect to Analysis Services dialog box, in the Server name field, type (local), in the Catalog name drop-down list, select Mine AW, and then click OK. In the Analysis Services Connections dialog box, click Close. Task 2: Add a model to a data mining structure On the ribbon, in the Data Modeling area, click Advanced, and then click Add Model to Structure. In the Add Model to Structure Wizard, on the Getting Started with the Add Model to Structure Wizard page, click Next, on the Select Structure or Model page, ensure that the Purchase Prediction structure is selected, and then click Next. On the Select Mining Algorithm page, in the Algorithm drop-down list, click Microsoft Naive Bayes, and then click Next. On the Select Columns page, in the Bike Buyer row, in the Usage column, click Predict Only, in the Name Style row, in the Usage column, click Do not use, and then click Next. On the Finish page, clear the Browse Model checkbox and click Finish. On the ribbon, in the Management section, click Manage Models to verify that the model has been added to the data mining structure, and then click Close. Close Excel without saving the workbook.
Lesson 3: Validating a Data Mining Model Course 10778A Lesson 3: Validating a Data Mining Model Module 14: Performing Predictive Analysis with Data Mining Overview of Data Mining Validation Criteria for Validating Data Mining Models Tools for Validating Data Mining Models
Overview of Data Mining Validation Course 10778A Overview of Data Mining Validation Module 14: Performing Predictive Analysis with Data Mining Validate data mining models to assess accuracy, reliability, and usefulness Use prediction queries against a testing data set Choose the most suitable data mining model for any given scenario
Criteria for Validating Data Mining Models Course 10778A Criteria for Validating Data Mining Models Module 14: Performing Predictive Analysis with Data Mining Accuracy Strength of correlations between inputs and predictable outcomes Accurate data mining models are not always reliable or useful Reliability Consistency of data mining model when used with multiple data sets Reliable data is not always useful Usefulness Does the data mining model produce information that is useful to the business?
Tools for Validating Data Mining Models Course 10778A Tools for Validating Data Mining Models Module 14: Performing Predictive Analysis with Data Mining Validation tools are available in: SQL Server Data Tools Data Mining Add-Ins for Excel SQL Server Management Studio Lift chart – Compare model accuracy Profit chart –Estimated profit gained by using a model Classification matrix – Count true and false positives Cross-validation report – Analyze data set by partitions
Demonstration: Validating Data Mining Models Course 10778A Demonstration: Validating Data Mining Models Module 14: Performing Predictive Analysis with Data Mining In this demonstration, you will see how to: View a lift chart View a profit chart View a classification matrix View a cross-validation report Task 1: Create a lift chart Ensure that you have completed the previous demonstrations in this module. Start SQL Server Management Studio, and when prompted, connect to the localhost instance of Analysis Services by using Windows authentication. In Object Explorer, expand Databases, expand Mine AW, and expand Mining Structures. Then right- click the Purchase Prediction mining structure and click View Lift Chart. On the Input Selection tab, note that both mining models are selected with the Bike Buyer column as predictable, and that the test cases defined in the models themselves will be used for the validation. Click the Lift Chart tab, and view the lift chart Review the scores in the Mining Legend tab to see which of your models is the most accurate for this test data. Task 2: Create a profit chart In the Chart type drop-down list, select Profit Chart. In the Profit Chart Settings dialog box, enter the following values, and then click OK: Population: 20000, Fixed cost: 1000, Individual cost: 3, Revenue per individual: 10. Review the chart and the Mining Legend tab to evaluate which mining model is likely to generate the most profitable marketing campaign based on the test data. Task 3: Create a classification matrix Click the Classification Matrix tab. Review the matrix. Task 4: Create a cross validation report Click the Cross Validation tab. Enter the following values, and click Get Results: Fold Count: 5, Max Cases: 5, Target Attribute: Bike Buyer, Target State: 1, Target Threshold: 0.1 View the resulting report, and note that for each mining model, the results include classifications for true positives, false positives, true negatives, and false negatives; and the likely lift gained by using the model. Minimize SQL Server Management Studio – you will use it in the next demonstration.
Lesson 4: Consuming Data Mining Data Course 10778A Lesson 4: Consuming Data Mining Data Module 14: Performing Predictive Analysis with Data Mining Viewing Data Mining Results Introduction to DMX Using Data Mining Data in Reporting Services Reports
Viewing Data Mining Results Course 10778A Viewing Data Mining Results Module 14: Performing Predictive Analysis with Data Mining Mining Model Viewer in SQL Server Data Tools Data Mining Client for Excel
Module 14: Performing Predictive Analysis with Data Mining Course 10778A Introduction to DMX Module 14: Performing Predictive Analysis with Data Mining Data Definition Language statements CREATE ALTER DROP IMPORT EXPORT SELECT INTO Data Manipulation Language statements SELECT PREDICTION JOIN INSERT INTO
Demonstration: Querying Data Mining Models Course 10778A Demonstration: Querying Data Mining Models Module 14: Performing Predictive Analysis with Data Mining In this demonstration, you will see how to: Browse a data mining model Query a data mining model Task 1: Browse a data mining model Ensure that you have completed the previous demonstrations in this module. Maximize SQL Server Management Studio, and in Object Explorer, right-click the Purchase Prediction data mining structure and click Browse. In the Mining Model drop-down list, ensure that Purchase - Bayes is selected, and on the Dependency Network tab, move the slider gradually from All Links to Strongest Links. On the Attribute Profiles tab, view the color-coded indicators of the values for each column when compared to customers with a Bike Buyer value of 0 or 1. On the Attribute Characteristics tab, in the Attribute drop-down list, ensure that Bike Buyer is selected, and in the Value drop-down list, select 1. Then view the probability for each other column value when the Bike Buyer value is 1. On the Attribute Discrimination tab, in the Attribute drop-down list, ensure that Bike Buyer is selected, in the Value drop-down list, select 1, and in the Value 2 drop-down list, select 0. Then note how values for each other column favor a Bike Buyer value of 1 or 0. In the Mining Model drop-down list, select Purchase Decision tree, and on the Decision Tree tab, in the Background drop-down list, select 1. Then view the decision tree to see how the other column values influence a value of 1 for Bike Buyer. Task 2: Query a data mining model In Object Explorer, right-click the Purchase Prediction mining structure and click Build Prediction Query. In Query Designer, in the Mining Model pane, click Select Model, in the Select Mining Model dialog box, expand Purchase Prediction, click Purchase - Bayes, and then click OK. In the Select Input Table(s) pane, click Select Case Table, in the Select Table dialog box, click ProspectiveBuyer (dbo), and then click OK. Under the Mining Model pane, in the Source column, click ProspectiveBuyer table, and then in the Field column, click EmailAddress. Under the row you just added, in the Source column, click Purchase – Bayes mining model, in the Field column, click Bike Buyer, and in the Criteria/Argument column, type =1. Under the row you just added, in the Source column, click Prediction Function, in the Field column, click PredictProbability, in the Alias column, type Purchase Probability, and then drag the Bike Buyer column from the Purchase – Bayes model in the Mining Model pane to the Criteria/Argument column so that it contain the value [Purchase - Bayes].[Bike Buyer]. On the Mining Model menu, click Query to view the DMX code that has been generated. On the Mining Model menu, click Result to view the query results. Close SQL Server Management Studio without saving any changes.
Using Data Mining Data in Reporting Services Reports Course 10778A Using Data Mining Data in Reporting Services Reports Module 14: Performing Predictive Analysis with Data Mining Easy to understand even for non-technical users Wide range of options for formatting and displaying data Include the results from multiple models in a single report
Module 14: Performing Predictive Analysis with Data Mining Course 10778A Lab Scenario Module 14: Performing Predictive Analysis with Data Mining Students will perform the lab in the role of a data analyst in the Adventure Works Cycles company, and: Use Excel table analysis tools to find key influencers for purchasing a bike. Use SQL Server Data Tools to create a data mining model for predicting potential bike buyers. Use Excel to create a data mining structure. Validate the data mining structures using a variety of validation techniques. Create a report that uses the data mining model to return a list of potential bike buyers. Point out that the instructions in the lab are deliberately designed to be high-level so that students need to think carefully about what they are trying to accomplish and work out how best to proceed for themselves. Encourage students to read the scenario information carefully and collaborate with each other to meet the scenario requirements. Remind students that if they find a particular task or exercise too challenging, they can find step-by-step instructions in the lab answer key. The marketing department at Adventure Works Cycles is planning a direct mail campaign. In order to maximize the effectiveness of the campaign, you have been asked to create a report that uses data mining techniques to identify the subset of potential customers who are most likely to purchase a bike.
Lab: Using Data Mining to Support a Marketing Campaign Course 10778A Lab: Using Data Mining to Support a Marketing Campaign Module 14: Performing Predictive Analysis with Data Mining Exercise 1: Using Table Analysis Tools Exercise 2: Creating a Data Mining Model Exercise 3: Using the Data Mining Add-in for Excel to Modify the Data Mining Structure Exercise 4: Validating Data Mining Models Exercise 5: Using a Data Mining Model in a Report In this lab, students will create data mining models, use the Data Mining Add-in for Excel, validate data mining models, and create a Reporting Services report that contains data mining data. Exercise 1 In this exercise, students use the table analysis tools in the Data Mining Add-in for Excel. Exercise 2 In this exercise, students use SQL Server Data Tools to create a data mining model, and then deploy the model to the MIA-SQLBI Analysis Services instance. Exercise 3 In this exercise, students use the Data Mining Add-in for Excel to create a new model and to review it. Exercise 4 In this exercise, students use the Data Mining Add-in for Excel to validate the models. Exercise 5 In this exercise, students create a Reporting Services report that uses data from the AW Data Mining database and then format the report. Logon information Virtual machine MIA-SQLBI User name ADVENTURWORKS\Student Password Pa$$w0rd Estimated time: 75 minutes
Module Review and Takeaways Course 10778A Module Review and Takeaways Module 14: Performing Predictive Analysis with Data Mining What is the purpose of data mining? What are the major components of an Analysis Services data mining solution? What are the main criteria for validating data mining models? Review Questions Point the students to the appropriate section in the course so that they are able to answer the questions presented in this section. Some guidance for discussing the answers to the questions is included below. What is the purpose of data mining? The purpose of data mining is to reveal patterns and trends that cannot be discovered by using standard analysis techniques. What are the major components of an Analysis Services data mining solution? Data Mining Structure, Data Mining Model, case table. What are the main criteria for validating data mining models? Accuracy, reliability, and usefulness.
Module 14: Performing Predictive Analysis with Data Mining Course 10778A Course Evaluation Module 14: Performing Predictive Analysis with Data Mining Remind students to complete the course evaluation.