DAT204 Introduction to Data Mining with SQL Server 2000 ZhaoHui Tang Program Manager SQL Server Analysis Services Microsoft Corporation.

Slides:



Advertisements
Similar presentations
© 2010 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. TIBCO Spotfire Application Data Services TIBCO Spotfire European User Conference.
Advertisements

Business Intelligence Simon Pease. Experience with BI Developing end-to-end BI prototype for Plan International Developing end-to-end BI prototype for.
Supporting End-User Access
Introduction to ETL Using Microsoft Tools By Dr. Gabriel.
OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
C6 Databases.
By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
Angoss Janggam Adiyawarma Matt Rhinehart Brandon Richardson Craig Soper Don Yap.
Managing Data Resources. File Organization Terms and Concepts Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits that represents a single.
Data Mining: A Closer Look
Chapter 5 Data mining : A Closer Look.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Building a Data Warehouse with SQL Server Presented by John Sterrett.
Gavin Russell-Rockliff BI Technical Specialist Microsoft BIN305.
SQL Server 2000 and XML Erik Veerman Consultant Intellinet Business Intelligence.
Understanding Analysis Services Architecture. Microsoft Data Warehousing Overview OLTP Source DTS DW Storage Analysis Services Clients OLE DB for OLAP,
CIS 2200 Kannan Mohan Department of CIS Zicklin School of Business, Baruch College.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
IST722 Data Warehousing Business Intelligence Design and Development Michael A. Fudge, Jr.
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Data Mining Dr. Chang Liu. What is Data Mining Data mining has been known by many different terms Data mining has been known by many different terms Knowledge.
5.1 © 2007 by Prentice Hall 5 Chapter Foundations of Business Intelligence: Databases and Information Management.
Data Warehousing at STC MSIS 2007 Geneva, May 8-10, 2007 Karen Doherty Director General Informatics Branch Statistics Canada.
CIS 9002 Kannan Mohan Department of CIS Zicklin School of Business, Baruch College.
DAT336 SQL Server “Yukon” – The Future of Business Intelligence Jason Carlson Product Unit Manager SQL Server Microsoft Corporation Brian Welcker Microsoft.
Activity Running Time DurationIntro0 2 min Setup scenario 2 2 min SQL BI components & concepts 4 5 min Data input (Let’s go shopping) 9 7 min Whiteboard.
Introduction to the Orion Star Data
RDB/1 An introduction to RDBMS Objectives –To learn about the history and future direction of the SQL standard –To get an overall appreciation of a modern.
PO320: Reporting with the EPM Solution Keshav Puttaswamy Program Manager Lead Project Business Unit Microsoft Corporation.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Introducing Reporting Services for SQL Server 2005.
More value from data using Data Mining Allan Mitchell SQL Server MVP.
欢迎光临 微软 SQL 数据挖掘 / 数据仓库 技术研讨会. 今日安排 微软 SQL 数据挖掘技术概述 − 左洪 微软公司 数据仓库在电信的应用 − 贝志城 明天高科 数据挖掘在 CRM 中的应用 − 王立军 中圣公司 灵通 IT Service 维护管理服务系统 – 邹雄文 广州灵通.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
The DM Process – MS’s view (DMX). The Basics  You select an algorithm, show the algorithm some examples called training example and, from these examples,
ZhaoHui Tang Program Manager SQL Server Analysis Services Microsoft Corporation DAT205 Advanced Data Mining Using SQL Server 2000.
DATA MINING Prof. Sin-Min Lee Surya Bhagvat CS 157B – Spring 2006.
Introduction to SQL Server Data Mining Nick Ward SQL Server & BI Product Specialist Microsoft Australia Nick Ward SQL Server & BI Product Specialist Microsoft.
Oracle Data Mining Update and Xerox Application Charlie Berger Sr. Director of Product Management, Life Sciences and Data Mining
Data MINING Data mining is the process of extracting previously unknown, valid and actionable information from large data and then using the information.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
Decision Trees. MS Algorithms Decision Trees The basic idea –creating a series of splits, also called nodes, in the tree. The algorithm adds a node to.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Consul- ting Services Outsour- cing Services Techno- logy Services Local Profes- sional Services Competence Centers Business Intelligence WebTech SAP.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Part I Data Mining Fundamentals Chapter 1 Data Mining: A First View Jason C. H. Chen, Ph.D. Professor.
Managing Data Resources. File Organization Terms and Concepts Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits that represents a single.
MIS2502: Data Analytics Advanced Analytics - Introduction.
OpenI (“open-eye”) : Open Source Business Intelligence Gets Real Sandeep Giri Project Lead, openi.org CTO, Loyalty Matrix, Inc. MySQL User Conference 2006.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Show Me Potential Customers Data Mining Approach Leila Etaati.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Ahmed K. Ezzat, SQL Server 2008 and Data Mining Overview 1 Data Mining and Big Data.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Leveraging the Business Intelligence Features in SharePoint 2010
Introduction to R Programming with AzureML
Fundamentals of Information Systems
Business Intelligence Design and Development Michael A. Fudge, Jr.
Phil Bernstein Microsoft Corp.
©Jiawei Han and Micheline Kamber Slides contributed by Jian Pei
©Jiawei Han and Micheline Kamber
Populating a Data Warehouse
MANAGING DATA RESOURCES
Data Warehouse and OLAP
©Jiawei Han and Micheline Kamber Slides contributed by Jian Pei
Populating a Data Warehouse
©Jiawei Han and Micheline Kamber Slides contributed by Jian Pei
Supporting End-User Access
Data mining algorithms
Data Warehouse and OLAP
Presentation transcript:

DAT204 Introduction to Data Mining with SQL Server 2000 ZhaoHui Tang Program Manager SQL Server Analysis Services Microsoft Corporation

Agenda What is Data Mining What is Data Mining The Data Mining Market The Data Mining Market OLE DB for Data Mining OLE DB for Data Mining Overview of the Data Mining Features in SQL Server 2000 Overview of the Data Mining Features in SQL Server 2000 Demo Demo Q&A Q&A

What Is Data Mining?

What is DM? A process of data exploration and analysis using automatic or semi-automatic means A process of data exploration and analysis using automatic or semi-automatic means – Techniques origin from Machine Learning, statistics and database – “Exploring data” – scanning samples of known facts about “cases”. – “knowledge”: Clusters, Rules, Decision trees, Equations, Association rules… Once the “knowledge” is extracted it: Once the “knowledge” is extracted it: – Can be browsed Provides a very useful insight on the cases behavior Provides a very useful insight on the cases behavior – Can be used to predict values of other cases Can serve as a key element in closed loop analysis Can serve as a key element in closed loop analysis

What drives high school students to attend college?

The deciding factors for high school students to attend college are… Attend College: 55% Yes 45% No All Students Attend College: 79% Yes 21% No IQ=High Attend College: 45% Yes 55% No IQ=Low IQ ? Wealth Attend College: 94% Yes 6% No Wealth = True Attend College: 69% Yes 21% No Wealth = False Parents Encourage? Attend College: 70% Yes 30% No Attend College: 31% Yes 69% No Parents Encourage = No Parents Encourage = Yes

Business Oriented DM Problems Targeted ads Targeted ads – “What banner should I display to this visitor?” Cross sells Cross sells – “What other products is this customer likely to buy? Fraud detection Fraud detection – “Is this insurance claim a fraud?” Churn analysis Churn analysis – “Who are those customers likely to churn?” Risk Management Risk Management – “Should I approve the loan to this customer?” …

Mining Model Mining Process - Illustrated DM Engine Data To Predict DM Engine Predicted Data Training Data Mining Model

The Data Mining Market

The $$$: Market Size DM Tools Market: DM Tools Market: – 1999: $341.3M – 2000: $455.1M – 2001: $449.5M * IDC

The Players Leading vendors Leading vendors – SAS – SPSS – IBM – Angoss – Hundreds of smaller vendors offering DM algorithms… Oracle –Thinking Machines acquisition Oracle –Thinking Machines acquisition

The Products End-to-end horizontal DM tools End-to-end horizontal DM tools – Extraction, Cleansing, Loading, Modeling, Algorithms (dozens), Analysts workbench, Reporting, Charting…. The customer is the power-analyst The customer is the power-analyst – PhD in statistics is usually required… Closed tools – no standard API Closed tools – no standard API – Total vendor lock-in – Limited integration with applications DM an “outsider” in the Data Warehouse DM an “outsider” in the Data Warehouse Extensive consulting required Extensive consulting required Sky rocketing prices Sky rocketing prices – $60K+ for a single user license

What the analysts say… “Stand-alone Data Mining Is Dead” - Forrester “Stand-alone Data Mining Is Dead” - Forrester “The demise of [stand alone] data mining” – Gartner “The demise of [stand alone] data mining” – Gartner

The Microsoft Approach

DataPro Users Survey “Data mining will be the fastest- growing BI technology…”

Market Size of BI * IDC

SQL Server The Analysis Platform SQL 2000 provides a complete Analysis Platform SQL 2000 provides a complete Analysis Platform – Not an isolated, stand alone DM product Platform means: Platform means: – Standard based DM API’s (OLE DB for DM) for applications development – Integrated vision for all technologies, tools – Extensible – Scaleable

Data Flow DWOLTP OLAP DM Apps Reports & Analysis DM

Analysis Services 2000 – Components Manager UI DSO Analysis Server Client OLE DBOLAP Engine (local) OLAP Engine DM Engine DM Engine (local) DM DMM DM Wizards DM DTS Task Tree View Control Cluster View Control Lift Chart Control Sample Query Tool

OLE DB for Data Mining…

Why OLE DB for DM? Make DM a mass market technology by: Leverage existing technologies and knowledge Leverage existing technologies and knowledge – SQL and OLE DB Common industry wide concepts and data presentation Common industry wide concepts and data presentation Changing DM market perception from “proprietary” to “open” Changing DM market perception from “proprietary” to “open” Increasing the number of players: Increasing the number of players: – Reduce the cost and risk of becoming a consumer – one tool works with multiple providers – Reduce the cost and risk of becoming a provider – focus on expertise and find many partners to complement offering

Integration With RDBMS Customers would like to Customers would like to – Build DM models from within their RDBMS – Train the models directly off their relational tables – Perform predictions as relational queries (tables in, tables out) – Feel that DM is a native part of their database. Therefore… Therefore… – Data mining models are relational objects – All operations on the models are relational – The language used is SQL (w/Extensions) The effect: every DBA and VB developer can become a DM developer The effect: every DBA and VB developer can become a DM developer

Creating a Data Mining Model (DMM)

Identifying the “Cases” DM algorithms analyze “cases” DM algorithms analyze “cases” The “case” is the entity being categorized and classified The “case” is the entity being categorized and classified Examples Examples – Customer credit risk analysis: Case = Customer – Product profitability analysis: Case = Product – Promotion success analysis: Case = Promotion Each case encapsulate all we know about the entity Each case encapsulate all we know about the entity

A Simple Set of Cases StudentI D Gende r ParentIncomeIQEncouragementCollegePlans 1Male Not Encouraged No 2Female EncouragedYes 3Male Yes

More Complicated Cases Cust ID Age Marit al Statu s IQ Favorite Movies TitleScore 135M2 Star Wars 8 Toy Story 9 Terminator7 220S3 Star Wars 7 Braveheart7 The Matrix M2 Sixth Sense 9 Casablanca10

A DMM is a Table! A DMM structure is defined as a table A DMM structure is defined as a table – Training a DMM means inserting data (pattern) into the table – Predicting from a DMM means querying the table All information describing the case are contained in columns All information describing the case are contained in columns

Creating a Mining Model CREATE MINING MODEL [Plans Prediction] ( StudentID LONG KEY, Gender TEXT DISCRETE, ParentIncome LONG CONTINUOUS, IQ DOUBLE CONTINUOUS, Encouragement TEXT DISCRETE, CollegePlans TEXT DISCRETE PREDICT ) USING Microsoft_Decision_Trees

Creating a mining model with nested table Create Mining Model MoviePrediction ( CutomerId long key, Age long continuous, Gender discrete, Education discrete, MovieList table predict ( MovieName text key )) using microsoft_decision_trees

Training a DMM

Training a DMM means passing it data for which the attributes to be predicted are known Training a DMM means passing it data for which the attributes to be predicted are known – Multiple passes are handled internally by the provider! Use an INSERT INTO statement Use an INSERT INTO statement The DMM will not persist the inserted data The DMM will not persist the inserted data Instead it will analyze the given cases and build the DMM content (decision tree, segmentation model, association rules) Instead it will analyze the given cases and build the DMM content (decision tree, segmentation model, association rules) INSERT [INTO] INSERT [INTO] [(columns list)]

INSERT INTO Plans Prediction INSERT INTO [ Plans Prediction ] ( StudentID, Gender, ParentIncome, IQ, Encouragement, CollegePlans ) SELECT [StudentID], [Gender], [ParentIncome], [IQ], [Encouragement], [CollegePlans] FROM [Students]

When Insert Into Is Done… The DMM is trained The DMM is trained – The model can be retrained – Content (rules, trees, formulas) can be explored – OLE DB Schema rowset – SELECT * FROM.CONTENT – XML string (PMML) Prediction queries can be executed Prediction queries can be executed

Predictions

What are Predictions? Predictions apply the rules of a trained model to a new set of data in order to estimate missing attributes or values Predictions apply the rules of a trained model to a new set of data in order to estimate missing attributes or values Predictions = queries Predictions = queries – The syntax is SQL - like – The output is a rowset In order to predict you need: In order to predict you need: – Input data set – A trained DMM – Binding (mapping) information between the input data and the DMM

The Truth Table Concept Gende r ParentIncomeIQEncouragement Colleg e Plans Probabilit y Male Not Encouraged No85% Male Yes15% Male EncouragedNo60% Male EncouragedYes40% Male No80% Male Yes20% Male EncouragedNo58% …

Prediction GenderParentIncomeIQEncouragement College Plans Probability Male Not Encouraged No85% Male Yes15% Male EncouragedNo60% Male EncouragedYes40% Male No80% Male Yes20% Male EncouragedNo58% Male EncouragedYes42% Male No78% Male Yes22% Male EncouragedNo45% It’s a JOIN! StudentI D GenderParentIncomeIQEncouragement1Male Not Encouraged 2Male Female Encouraged 4Male Encouraged 5Female Female

The Prediction Query Syntax SELECT SELECT FROM PREDICTION JOIN PREDICTION JOIN ON = …

Example SELECT [New Students].[StudentID], [Plans Prediction].[CollegePlans], PredictProbability([CollegePlans])FROM [Plans Prediction] PREDICTION JOIN [Plans Prediction] PREDICTION JOIN [New Students] [New Students] ON [Plans Prediction].[Gender] = [New Students].[Gender] AND [New Students].[Gender] AND [Plans Prediction].[IQ] = [Plans Prediction].[IQ] = [New Students].[IQ] AND... [New Students].[IQ] AND...

Demo

OLE DB for Data Mining Defines API OLE DB for DM (API) RDBMS Consumer Provider Cube Misc. Data Source Provider Consumer … … OLE DB

OLEDB for DM Configuration Options Demo Consumers OLEDB for DM Providers MS Analysis Manager MS DM Provider ANGOSS DM Provider ANGOSS Controls

Demo on OLE DB for DM API using Angoss Controls and Provider

For more info… DM URL DM URL – – News Group: News Group: – Microsoft.public.SQLserver.datamining – Communities.msn.com/AnalysisServicesDataMining White papers: White papers: – Performance paper:

Questions ?