Data Mining (and Machine Learning) With Microsoft Tools Michael Lisin, Plaster Group May 8, 2014.

Slides:



Advertisements
Similar presentations
Supporting End-User Access
Advertisements

By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining By Archana Ketkar.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material.
Chapter 5 Data mining : A Closer Look.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Data Mining: An Introduction Wing Kee Ho Xiaohua Luan.
Gavin Russell-Rockliff BI Technical Specialist Microsoft BIN305.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining Techniques
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Data Mining Dr. Chang Liu. What is Data Mining Data mining has been known by many different terms Data mining has been known by many different terms Knowledge.
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
COMP3503 Intro to Inductive Modeling
Decision Support Systems Management Information Systems BUS 391 Barry Floyd.
Data-mining & Data As we used Excel that has capability to analyze data to find important information, the data-mining helps us to extract information.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Lecture 9: Knowledge Discovery Systems Md. Mahbubul Alam, PhD Associate Professor Dept. of AEIS Sher-e-Bangla Agricultural University.
More value from data using Data Mining Allan Mitchell SQL Server MVP.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Knowledge Discovery and Data Mining Evgueni Smirnov.
The DM Process – MS’s view (DMX). The Basics  You select an algorithm, show the algorithm some examples called training example and, from these examples,
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
Introduction to SQL Server Data Mining Nick Ward SQL Server & BI Product Specialist Microsoft Australia Nick Ward SQL Server & BI Product Specialist Microsoft.
Amer Kanj Data Mining For Business Professionals.
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Data Warehousing.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Consul- ting Services Outsour- cing Services Techno- logy Services Local Profes- sional Services Competence Centers Business Intelligence WebTech SAP.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
Finding Hidden Intelligence with Predictive Analysis of Data Mining Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Advanced (and attractive) analytics Rafal Lukawiecki Strategic Consultant, Project Botticelli
MIS2502: Data Analytics Advanced Analytics - Introduction.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
DATA MINING PREPARED BY RAJNIKANT MODI REFERENCE:DOUG ALEXANDER.
Academic Year 2014 Spring Academic Year 2014 Spring.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Knowledge Discovery and Data Mining 19 th Meeting Course Name: Business Intelligence Year: 2009.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Data Mining With SQL Server Data Tools Mining Data Using Tools You Already Have.
Event Title Event Date. Module 09— Introducing SSAS Data Mining Models Name Title Microsoft Corporation.
Show Me Potential Customers Data Mining Approach Leila Etaati.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Ahmed K. Ezzat, SQL Server 2008 and Data Mining Overview 1 Data Mining and Big Data.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Business Intelligence for a Tough Economy: Data Mining
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Week 11 Knowledge Discovery Systems & Data Mining :
TechEd /28/ :48 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered.
Supporting End-User Access
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Module 14: Performing Predictive Analysis with Data Mining
Presentation transcript:

Data Mining (and Machine Learning) With Microsoft Tools Michael Lisin, Plaster Group May 8, 2014

Why Reinvent a Toilet? Page 2

Definitions Page 3 ConceptDefinition / Solution For Data Mining Algorithms to discover unknown data patterns Machine Learning Algorithms to predict based on data patterns StatisticsBranch of mathematics, methods of data collection and interpretation Data Science All of the above + Data Visualization

What Do You Think? Page 4 Is Linear Regression?  Data Mining  Machine Learning  Statistics  All of the above Linear Regression is a straight line describing how variable Y responds to changes in variable X

MS DM Environment SQL Server Excel Data Mining Add-Ins (optional, recommended) Interact with: Excel (add-ins), SQL Management Studio, SQL Server Data Tools (SSDT), Custom Code Page 5 SQL Edition Component: CapabilityEnterpriseBIStandard SSIS: Text Mining SSAS: DM basic SSAS: DM advanced (CV, prediction queries, …) SSDT Custom Code

Start With a Question Page 6

7 Many Potential Questions MS DM Capabilities  How do we combine our products to increase profits?  How do we predict the demand for a product / service?  Why are customers buying from us?  Where can we best cut costs?  What are the opportunities to reduce risks?  Who are our best customers?  … Generic question: What are the data patterns? Best if more specific and directed at a problem, for example:

Approach Define problem / questions Prepare data Build model Validate model Implement predictions Automate model refresh Extend / custom applications Page 8 More Technical

SQL DM Algorithms Summary Discrete Continuous Sequence Common Group Similar Group TXT Semantic Decision Trees [Classify, Estimate] Linear Regression [Advanced] Time Series [Forecast (T), Forecast] Clustering [Detect Categories(T), Except, Cluster] Sequence Clustering [Advanced] Neural Network [Advanced] Logistic Regression [Fill From Sample (T), Scenario Analysis(T), Prediction Calculator (T)] Association Rules [Shopping Basket (T), Associate] Naïve Bayers [Analyze Key Influencers(T)] Text Mining (matching, grouping, extracting) Page 9

Predict Using Models SELECT Model.[Bike Buyer], PredictProbability( Model.[Bike Buyer]), NewData. FROM [Model] NATURAL PREDICTION JOIN (SELECT Age, [Commute Distance], FROM … ) As NewData Page 10 DMX = Data Mining Extensions to query models for predictions … Output: DMX Query:

Demo Page 11

Questions Page 12

Appendix Page 13

SQL Server Data Mining Algorithms Page 14 Decision Tree Linear Regression Clustering Sequence Clustering Association Naive Bayes Neural Network Time Series Text Mining Fuzzy Grouping Term Extraction Term Lookup

Key SQL Server Algorithms - 1 Page 15 Decision Tree - makes predictions based on the relationships between input columns in a dataset. The decision tree makes predictions based on this tendency toward a particular outcome. Example: predict which customers are likely to be satisfied with a company, based on some input variables (# purchases, avg. transaction size). Linear Regression - is a variation of the Decision Trees calculates a linear relationship between a dependent and independent variable, and then use that relationship for prediction. The algorithm is most applicable to predict continuous attribute. Example: product demand, price, site visitors. Clustering is a segmentation algorithm that uses iterative techniques to group cases in a dataset into clusters that contain similar characteristics. These groupings are useful for exploring data, identifying anomalies in the data, customer segmentation.

Key SQL Server Algorithms - 2 Page 16 Sequence Clustering – is similar to Clustering algorithm; however, instead of finding clusters of cases that contain similar attributes, this algorithm finds clusters of cases that contain similar paths in a sequence. It is used to explore data that contains events that can be linked by following paths, or sequences. For example: the click paths that are created when users navigate a Web site; the order in which a user follows a process. Association is useful to recommends products to customers (recommendation engine) based on items they have already bought, or in which they have indicated an interest. Example: market basket analysis. Naive Bayes is a classification algorithm, it uses Bayes theorem but does not take into account dependencies that may exist, thus its assumptions are said to be naive. Can be used to do initial explorations of data where later you can apply the results to create additional mining models with other more computationally intense and more accurate algorithms. Example: send mailers only to those customers who are likely to respond.

Key SQL Server Algorithms - 3 Page 17 Neural Network algorithm combines each possible state of the input attribute with each possible state of the predictable attribute, and uses the data to calculate probabilities. useful for analyzing complex input data, such as from a manufacturing or commercial process, or business problems for which a significant quantity of data is available but for which rules cannot be easily derived by using other algorithms. Time Series algorithm provides regression algorithms that are optimized for the forecasting of continuous values, such as product sales, over time. Whereas other Microsoft algorithms, such as decision trees, require additional columns of new information as input to predict a trend, a time series model does not. Text Mining algorithm analyzes unstructured text data. This allows companies to analyze unstructured data such as a "comments" section on a customer satisfaction survey. This algorithm is available in SQL Server Integration Services. TEXT

SQL Text Mining Page 18 Term Extraction Transformation Creates (extracts) a list of terms discovered in the source Writes the terms (+score) to a transformation output column Limitations: English only Nouns or noun phrases only Term Lookup Transformation Matches terms extracted from text in an input with terms in a reference table. Counts the number of times a term in the lookup table occurs in the input data set, writes the count together with the term from the reference table to columns in the transformation output. Fuzzy Grouping Transformation Select canonical row, identify fuzzy (to exact) text fragment match. Output: UID, Group ID, Similarity Score 0..1 Supplemental Sampling (training and test sets, uniform representation): Row (Quantity) Sampling Transformation Percentage Sampling Transformation Sort Transformation

Interesting Links Sources of free data for research – – – – Algorithms – – – – – – – – – – – – – – 69/Default.aspx 69/Default.aspx Page 19

Useful Terms Page 20