Download presentation
Presentation is loading. Please wait.
1
Data Mining in SQL Server 2005
2
Agenda What is Data Mining and how can you use it? Data Mining Process
SQL Server 2005 Data Mining Data Mining integration Data Mining programmability Data Mining with Office 2007
3
Types of analysis Ad-hoc query/Reporting/Analysis Data Mining
“What happened?” Simple reports Key Performance Indicators OLAP cubes – Slice & Dice Realtime - “What happens now?” Events/Triggers Data Mining “What will happen?” “How/why did it happen?”
4
What is Data Mining? “Data mining is the semi-automatic extraction of patterns, changes, associations, anomalies, and other statistically significant structures from large data sets.” - R. Grossman “The nontrivial extraction of implicit, previously unknown, and potentially useful information from data" - W. Frawley, et al 1992 “The science of extracting useful information from large data sets or databases” - D. Hand, et al 2001
5
What is Data Mining? Also known as: Machine Learning
“Data mining is the semi-automatic extraction of patterns, changes, associations, anomalies, and other statistically significant structures from large data sets.” - R. Grossman “The nontrivial extraction of implicit, previously unknown, and potentially useful information from data" - W. Frawley, et al 1992 “The science of extracting useful information from large data sets or databases” - D. Hand, et al 2001 Also known as: Machine Learning Predictive Analytics
6
What does Data Mining Do?
Explores Your Data Finds Patterns Performs Predictions
7
Data Mining Tasks Classification Regression Segmentation Association
Forecasting Text Analysis Advanced Data Exploration
8
Mining Process Data to be predicted Training data Mining Model
DM Engine DM Engine Mining Model Mining Model Mining Model With predictions
9
Data Mining Process CRISP-DM
“Doing Data Mining” Business Understanding Data Understanding Data Preparation Data “Putting Data Mining to Work” Deployment Cross Industry Standard Process for Data Mining Business Understanding This initial phase focuses on understanding the project objectives and requirements from a business perspective, and then converting this knowledge into a data mining problem definition, and a preliminary plan designed to achieve the objectives. Data Understanding The data understanding phase starts with an initial data collection and proceeds with activities in order to get familiar with the data, to identify data quality problems, to discover first insights into the data, or to detect interesting subsets to form hypotheses for hidden information. Data Preparation The data preparation phase covers all activities to construct the final dataset (data that will be fed into the modeling tool(s)) from the initial raw data. Data preparation tasks are likely to be performed multiple times, and not in any prescribed order. Tasks include table, record, and attribute selection as well as transformation and cleaning of data for modeling tools. Modeling In this phase, various modeling techniques are selected and applied, and their parameters are calibrated to optimal values. Typically, there are several techniques for the same data mining problem type. Some techniques have specific requirements on the form of data. Therefore, stepping back to the data preparation phase is often needed. Evaluation At this stage in the project you have built a model (or models) that appears to have high quality, from a data analysis perspective. Before proceeding to final deployment of the model, it is important to more thoroughly evaluate the model, and review the steps executed to construct the model, to be certain it properly achieves the business objectives. A key objective is to determine if there is some important business issue that has not been sufficiently considered. At the end of this phase, a decision on the use of the data mining results should be reached. Deployment Creation of the model is generally not the end of the project. Even if the purpose of the model is to increase knowledge of the data, the knowledge gained will need to be organized and presented in a way that the customer can use it. Depending on the requirements, the deployment phase can be as simple as generating a report or as complex as implementing a repeatable data mining process. In many cases it will be the customer, not the data analyst, who will carry out the deployment steps. However, even if the analyst will not carry out the deployment effort it is important for the customer to understand up front what actions will need to be carried out in order to actually make use of the created models. Modeling Evaluation
10
Data Mining Process in SQL CRISP-DM
SSAS (OLAP) DSV Business Understanding Data Understanding SSIS SSAS (OLAP) Data Data Preparation SSIS SSAS(OLAP) SSRS Flexible APIs Deployment Cross Industry Standard Process for Data Mining Business Understanding This initial phase focuses on understanding the project objectives and requirements from a business perspective, and then converting this knowledge into a data mining problem definition, and a preliminary plan designed to achieve the objectives. Data Understanding The data understanding phase starts with an initial data collection and proceeds with activities in order to get familiar with the data, to identify data quality problems, to discover first insights into the data, or to detect interesting subsets to form hypotheses for hidden information. Data Preparation The data preparation phase covers all activities to construct the final dataset (data that will be fed into the modeling tool(s)) from the initial raw data. Data preparation tasks are likely to be performed multiple times, and not in any prescribed order. Tasks include table, record, and attribute selection as well as transformation and cleaning of data for modeling tools. Modeling In this phase, various modeling techniques are selected and applied, and their parameters are calibrated to optimal values. Typically, there are several techniques for the same data mining problem type. Some techniques have specific requirements on the form of data. Therefore, stepping back to the data preparation phase is often needed. Evaluation At this stage in the project you have built a model (or models) that appears to have high quality, from a data analysis perspective. Before proceeding to final deployment of the model, it is important to more thoroughly evaluate the model, and review the steps executed to construct the model, to be certain it properly achieves the business objectives. A key objective is to determine if there is some important business issue that has not been sufficiently considered. At the end of this phase, a decision on the use of the data mining results should be reached. Deployment Creation of the model is generally not the end of the project. Even if the purpose of the model is to increase knowledge of the data, the knowledge gained will need to be organized and presented in a way that the customer can use it. Depending on the requirements, the deployment phase can be as simple as generating a report or as complex as implementing a repeatable data mining process. In many cases it will be the customer, not the data analyst, who will carry out the deployment steps. However, even if the analyst will not carry out the deployment effort it is important for the customer to understand up front what actions will need to be carried out in order to actually make use of the created models. SSAS (Data Mining) Modeling Evaluation
11
Agenda What is Data Mining and how can you use it? Data Mining Process
SQL Server 2005 Data Mining Data Mining integration Data Mining programmability
12
Data Mining in SQL Server 2005
Nine algorithms developed in conjunction with Microsoft Research Data mining is made accessible and easy to use through integrated user interface, cross-product integration and familiar, standard APIs Complete framework for building and deploying intelligent applications
13
Value of Data Mining Business Knowledge Relative business value
Simple Complicated Usability Relative business value SQL Server 2005 Data Mining OLAP Reports (Adhoc) Reports (static)
14
Complete set of algorithms
Clustering Time Series Decision Trees Introduced in SQL Server 2000 Sequence Clustering Association Rules Naïve Bayes All data mining tools, including Microsoft SQL Server 2005 Analysis Services, use multiple algorithms. Analysis Services, of course, is extensible; third party ISVs can develop algorithms that snap in seamlessly to the Analysis Services data mining framework. Depending on the data and the goals, different algorithms are preferred, and each algorithm can be used for multiple problems. Analytical problem Examples Microsoft algorithms Classification: Assign cases to predefined Decision Trees Naďve Bayes Nueral Nets classes such as "Good" vs "Bad“ Credit risk analysis Churn analysis Customer retention + Linear Regression Logistic Regression Neuronal Networks
15
Agenda What is Data Mining and how can you use it? Data Mining Process
SQL Server 2005 Data Mining Data Mining integration Data Mining programmability
16
Data Mining User Interface
SQL Server BI Development Studio Environment for creation and data exploration Data Mining projects in Visual Studio solutions, tightly integrated Source Control Integration SQL Server Management Studio One tool for all administrative tasks Manage, view and query mining models
17
BI Integration Integration Services OLAP Reporting Services
Data Mining processing and results integrate directly in IS pipeline OLAP Processing of mining models directly from cubes Use of mining results as dimensions Reporting Services Embed Data Mining results directly in Reporting Services Reports
18
Embedded Data Mining Make Decisions without Coding
Learn business rules directly from data Client Customization Learn logic customized for each client Automatic Update Data mining application logic updated by model re-processing Applications do not need to be rewritten, recompiled, re-deployed
19
Server Mining Architecture
Deploy BI Dev Studio (Visual Studio) Your Application OLE DB/ ADOMD/ XMLA App Data Analysis Services Server Mining Model Data Mining Algorithm Data Source
20
Agenda What is Data Mining and how can you use it? Data Mining Process
SQL Server 2005 Data Mining Data Mining integration Data Mining programmability
21
Data Mining EXpressions
OLE DB for Data Mining specification Now part of XML/A specification See for XML/A details Connect to Analysis Server OLEDB, ADO, ADO.Net, ADOMD.Net, XMLA Dim cmd as ADOMD.Command Dim reader as ADOMD.DataReader Cmd.Connection = conn Set reader = Cmd.ExecuteReader(“Select Predict(Gender)…”)
22
DMX Commands Definition (DDL) Manipulation (DML)
CREATE – Make new model SELECT INTO – Create model by copying existing EXPORT – Save model as .abf file IMPORT – Retrieve model from .abf file Manipulation (DML) INSERT INTO – Train model UPDATE – Change content of model DELETE – Clear content SELECT – Browse model
23
DMX SELECT Elements SELECT [FLATTENED] [TOP] <columns>
FROM <model> PREDICTION JOIN <table> ON <mapping> WHERE <filter> ORDER BY <sort expression> Use query builder to create SELECT statement
24
Agenda What is Data Mining and how can you use it? Data Mining Process
SQL Server 2005 Data Mining Data Mining integration Data Mining programmability Data Mining with Office 2007
25
“Data Mining is Hard” “White-coats” only need apply How do you:
Define problem? Select data? Choose inputs? Choose outputs? Interpret results? Validate results? © 2006 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.
26
Table Analysis Tools for Office Excel 2007
11/7/ :14 AM Table Analysis Tools for Office Excel 2007 Highlight exceptions Find categories Key factors analysis Forecasting What if Goal seeking Fill from example © 2006 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.
27
Data Mining Client for Office Excel 2007
Prepare data Create models from Office Excel data Test models Explore models Manage models Predict from Office Excel data Import prediction results into Office Excel © 2006 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.
28
Data Mining Templates for Office Visio 2007
Render data mining graphical views as Visio diagrams Interaction Annotation Publishing © 2006 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.
29
Data Mining Session Summary
Nine algorithms + viewers BI Dev Studio for developers and analysts Integration with SSIS, SSAS, and SSRS New world of “smart applications” Complete platform for all levels of data mining experience
30
Additional Resources http://www.sqlserverdatamining.com
(German)
33
©. 2006 Microsoft Corporation. All rights reserved
© 2006 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.