Presentation is loading. Please wait.

Presentation is loading. Please wait.

©Jiawei Han and Micheline Kamber Slides contributed by Jian Pei

Similar presentations


Presentation on theme: "©Jiawei Han and Micheline Kamber Slides contributed by Jian Pei"— Presentation transcript:

1 Data Mining: Concepts and Techniques — Slides for Textbook — — Appendix A —
©Jiawei Han and Micheline Kamber Slides contributed by Jian Pei Intelligent Database Systems Research Lab School of Computing Science Simon Fraser University, Canada November 21, 2018 Data Mining: Concepts and Techniques

2 Appendix A: An Introduction to Microsoft’s OLE OLDB for Data Mining
Overview and design philosophy Basic components Data set components Data mining models Operations on data model Concluding remarks November 21, 2018 Data Mining: Concepts and Techniques

3 Why OLE DB for Data Mining?
Industry standard is critical for data mining development, usage, interoperability, and exchange OLEDB for DM is a natural evolution from OLEDB and OLDB for OLAP Building mining applications over relational databases is nontrivial Need different customized data mining algorithms and methods Significant work on the part of application builders Goal: ease the burden of developing mining applications in large relational databases November 21, 2018 Data Mining: Concepts and Techniques

4 Motivation of OLE DB for DM
Facilitate deployment of data mining models Generating data mining models Store, maintain and refresh models as data is updated Programmatically use the model on other data set Browse models Enable enterprise application developers to participate in building data mining solutions November 21, 2018 Data Mining: Concepts and Techniques

5 Features of OLE DB for DM
Independent of provider or software Not specialized to any specific mining model Structured to cater to all well-known mining models Part of upcoming release of Microsoft SQL Server 2000 November 21, 2018 Data Mining: Concepts and Techniques

6 Data Mining: Concepts and Techniques
Overview Data mining applications Core relational engine exposes OLE DB in a language-based API Analysis server exposes OLE DB OLAP and OLE DB DM Maintain SQL metaphor Reuse existing notions OLE DB OLAP/DM Analysis Server OLE DB RDB engine November 21, 2018 Data Mining: Concepts and Techniques

7 Key Operations to Support Data Mining Models
Define a mining model Attributes to be predicted Attributes to be used for prediction Algorithm used to build the model Populate a mining model from training data Predict attributes for new data Browse a mining model fro reporting and visualization November 21, 2018 Data Mining: Concepts and Techniques

8 DMM As Analogous to A Table in SQL
Create a data mining module object CREATE MINING MODEL [model_name] Insert training data into the model and train it INSERT INTO [model_name] Use the data mining model SELECT relation_name.[id], [model_name].[predict_attr] consult DMM content in order to make predictions and browse statistics obtained by the model Using DELETE to empty/reset Predictions on datasets: prediction join between a model and a data set (tables) Deploy DMM by just writing SQL queries! November 21, 2018 Data Mining: Concepts and Techniques

9 Data Mining: Concepts and Techniques
Two Basic Components Cases/caseset: input data A table or nested tables (for hierarchical data) Data mining model (DMM): a special type of table A caseset is associated with a DMM and meta-info while creating a DMM Save mining algorithm and resulting abstraction instead of data itself Fundamental operations: CREATE, INSERT INTO, PREDICTION JOIN, SELECT, DELETE FROM, and DROP November 21, 2018 Data Mining: Concepts and Techniques

10 Flatterned Representation of Caseset
Customers Customer ID Gender Hair Color Age Age Prob Product Purchases Customer ID Product Name Quantity Product Type Problem: Lots of replication! CID Gend Hair Age Age prob Prod Quan Type Car Car prob 1 Male Black 35 100% TV Elec VCR Ham 6 Food Van 50% Car Owernership Customer ID Car Car Prob November 21, 2018 Data Mining: Concepts and Techniques

11 Logical Nested Table Representation of Caseset
Use Data Shaping Service to generate a hierarchical rowset Part of Microsoft Data Access Components (MDAC) products CID Gend Hair Age Age prob Product Purchases Car Ownership Prod Quan Type Car Car prob 1 Male Black 35 100% TV Elec VCR Van 50% Ham 6 Food November 21, 2018 Data Mining: Concepts and Techniques

12 More About Nested Table
Not necessary for the storage subsystem to support nested records Cases are only instantiated as nested rowsets prior to training/predicting data mining models Same physical data may be used to generate different casesets November 21, 2018 Data Mining: Concepts and Techniques

13 Defining A Data Mining Model
The name of the model The algorithm and parameters The columns of caseset and the relationships among columns “Source columns” and “prediction columns” November 21, 2018 Data Mining: Concepts and Techniques

14 Data Mining: Concepts and Techniques
Example CREATE MINING MODEL [Age Prediction] %Name of Model ( [Customer ID] LONG KEY, %source column [Gender] TEXT DISCRETE, %source column [Age] Double DISCRETIZED() PREDICT, %prediction column [Product Purchases] TABLE %source column [Product Name] TEXT KEY, %source column [Quantity] DOUBLE NORMAL CONTINUOUS, %source column [Product Type] TEXT DISCRETE RELATED TO [Product Name] %source column )) USING [Decision_Trees_101] %Mining algorithm used November 21, 2018 Data Mining: Concepts and Techniques

15 Data Mining: Concepts and Techniques
Column Specifiers KEY ATTRIBUTE RELATION (RELATED TO clause) QUALIFIER (OF clause) PROBABILITY: [0, 1] VARIANCE SUPPORT PROBABILITY-VARIANCE ORDER TABLE November 21, 2018 Data Mining: Concepts and Techniques

16 Data Mining: Concepts and Techniques
Attribute Types DISCRETE ORDERED CYCLICAL CONTINOUS DISCRETIZED SEQUENCE_TIME November 21, 2018 Data Mining: Concepts and Techniques

17 Data Mining: Concepts and Techniques
Populating A DMM Use INSERT INTO statement Consuming a case using the data mining model Use SHAPE statement to create the nested table from the input data November 21, 2018 Data Mining: Concepts and Techniques

18 Example: Populating a DMM
INSERT INTO [Age Prediction] ( [Customer ID], [Gender], [Age], [Product Purchases](SKIP, [Product Name], [Quantity], [Product Type]) ) SHAPE {SELECT [Customer ID], [Gender], [Age] FROM Customers ORDER BY [Customer ID]} APPEND {SELECT [CustID], {product Name], [Quantity], [Product Type] FROM Sales ORDER BY [CustID]} RELATE [Customer ID] TO [CustID] AS [Product Purchases] November 21, 2018 Data Mining: Concepts and Techniques

19 Using Data Model to Predict
Prediction join Prediction on dataset D using DMM M Different to equi-join DMM: a “truth table” SELECT statement associated with PREDICTION JOIN specifies values extracted from DMM November 21, 2018 Data Mining: Concepts and Techniques

20 Example: Using a DMM in Prediction
SELECT t.[Customer ID], [Age Prediction].[Age] FROM [Age Prediction] PRECTION JOIN (SHAPE {SELECT [Customer ID], [Gender] FROM Customers ORDER BY [Customer ID]} APPEND ( {SELECT [CustID], [Product Name], [Quantity] FROM Sales ORDER BY [CustID]} RELATE [Customer ID] TO [CustID] ) AS [Product Purchases] AS t ON [Age Prediction].[Gender]=t.[Gender] AND [Age Prediction].[Product Purchases].[Product Name]=t.[Product Purchases].[Product Name] AND [Age Prediction].[Product Purchases].[Quantity]=t.[Product Purchases].[Quantity] November 21, 2018 Data Mining: Concepts and Techniques

21 Data Mining: Concepts and Techniques
Browsing DMM What is in a DMM? Rules, formulas, trees, …, etc Browsing DMM Visualization November 21, 2018 Data Mining: Concepts and Techniques

22 Data Mining: Concepts and Techniques
Concluding Remarks OLE DB for DM integrates data mining and database systems A good standard for mining application builders How can we be involved? Provide association/sequential pattern mining modules for OLE DB for DM? Design more concrete language primitives? References November 21, 2018 Data Mining: Concepts and Techniques

23 Data Mining: Concepts and Techniques
Thank you !!! November 21, 2018 Data Mining: Concepts and Techniques


Download ppt "©Jiawei Han and Micheline Kamber Slides contributed by Jian Pei"

Similar presentations


Ads by Google