Data Mining With SQL Server Data Tools Mining Data Using Tools You Already Have.

Slides:



Advertisements
Similar presentations
Supporting End-User Access
Advertisements

Data Mining (and Machine Learning) With Microsoft Tools Michael Lisin, Plaster Group May 8, 2014.
The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material.
Chapter 9 Business Intelligence Systems
Recommender systems Ram Akella November 26 th 2008.
Sales forecasting with SAS Advanced Analytics for the Pharmaceutical sector. A business case.
Business Intelligence components Introduction. Microsoft® SQL Server™ 2005 is a complete business intelligence (BI) platform that provides the features,
UNCLASSIFIED Business Intelligence and SharePoint 2010 Steve McDonnell.
The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material.
Finding Hidden Intelligence with Predictive Analysis of Data Mining Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Data Mining: A Closer Look
Monté Carlo Simulation MGS 3100 – Chapter 9. Simulation Defined A computer-based model used to run experiments on a real system.  Typically done on a.
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Decision Trees Prepared by David Douglas, University of ArkansasHosted.
Building a Data Warehouse with SQL Server Presented by John Sterrett.
Gavin Russell-Rockliff BI Technical Specialist Microsoft BIN305.
Introduction to Directed Data Mining: Decision Trees
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Enterprise systems infrastructure and architecture DT211 4
Business Intelligence Megan Amberson Mallory Conger Tamara Day.
Peter Myers Bitwise Solutions Pty Ltd. Predictive Analytics PresentationExplorationDiscovery Passive Interactive Proactive Business Insight Canned.
Deliver Rich Analytics with Analysis Services SQL Server Donald Farmer Group Program Manager Microsoft Corporation.
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Data Mining Dr. Chang Liu. What is Data Mining Data mining has been known by many different terms Data mining has been known by many different terms Knowledge.
‘TARGIT BI for Microsoft GP’ – Demo & Presentation Presentor: Kyle McNerney Partner Account Manager (813)
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of Designing the User Interface: Strategies for Effective Human-Computer.
TEST With Johan Beeckmans
More value from data using Data Mining Allan Mitchell SQL Server MVP.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Introduction to SQL Server Data Mining Nick Ward SQL Server & BI Product Specialist Microsoft Australia Nick Ward SQL Server & BI Product Specialist Microsoft.
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Consul- ting Services Outsour- cing Services Techno- logy Services Local Profes- sional Services Competence Centers Business Intelligence WebTech SAP.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Finding Hidden Intelligence with Predictive Analysis of Data Mining Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd
Advanced (and attractive) analytics Rafal Lukawiecki Strategic Consultant, Project Botticelli
MIS2502: Data Analytics Advanced Analytics - Introduction.
Why BI….? Most companies collect a large amount of data from their business operations. To keep track of that information, a business and would need to.
Knowledge Discovery and Data Mining 19 th Meeting Course Name: Business Intelligence Year: 2009.
Intro to Power BI Azhagappan Arunachalam.  Senior Database Architect   PowerBICentral.com  (blog on getting started.
Show Me Potential Customers Data Mining Approach Leila Etaati.
BI Performance Management. Business Issues Too much information: Create confusions Multiple version of Truth: Lack of Trusted information: Incomplete,
Atlantic Coast Operations Business Intelligence Mobility Project.
Ahmed K. Ezzat, SQL Server 2008 and Data Mining Overview 1 Data Mining and Big Data.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Jeremy Kingry, eBECS | PREDICTIVE INTELLIGENCE AND WHY YOU WANT TO KNOW ABOUT IT.
Saskatoon SAS user group
Introduction to Gartner Inc.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
ANOMALY DETECTION FRAMEWORK FOR BIG DATA
MIS2502: Data Analytics Advanced Analytics - Introduction
Business Intelligence
Delivering Business Insight with SQL Server 2005
Business Intelligence Design and Development Michael A. Fudge, Jr.
Machine Learning & Data Science
What's New in eCognition 9
Supporting End-User Access
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
CHAPTER 7: Information Visualization
Kenneth C. Laudon & Jane P. Laudon
Welcome! Knowledge Discovery and Data Mining
KEY INITIATIVE Financial Data and Analytics
Azure Machine Learning
What's New in eCognition 9
Presentation transcript:

Data Mining With SQL Server Data Tools Mining Data Using Tools You Already Have

Introductions  Annelies Beaty, Manager Enterprise Data Strategy at US Xpress  Played many roles at US Xpress over the years  My current role is to architect how Enterprise level data is managed and presented to the organization as a whole.  Development Practices and Guidelines  Tool evaluation  Step in and get my hands ‘dirty’ whenever possible, needed.

Data Mining (or Data Science)  What do we mean by Data Mining  Process By Which Large Sets of Data can be Analyzed for Actionable Information  Look for Answers to Questions over data sets so large you can get lost in it.  Find relationships that are too complex to be seen.  Types of Data Mining Scenarios  Forecasting – Predict future outcomes based on past experience  Risk and Probability – Based on Past results, what factors lead to the results we want  Recommendations – Based on ‘experience’, what else do we think goes with this set?  Finding Sequences – What are the frequent paths or steps taken through a system of possible steps.  Grouping – Separating the dataset into clusters of ‘like’ objects; determining affinity

High Level Data Management New Business Question is asked Answer is Delivered THIS TAKES TIME AND RESOURCES

Challenge with ‘Best Practice’ EDW  EDW development is methodical. Designed to answer a specific related set of questions around a business process.  Time to Deliver Results  Sometimes an ‘Overkill’ solution.  Sometimes an Incomplete solution.  Interesting Fact from a TDWI conference I attended about 2 years ago on Operational Intelligence: “50% of traditional data warehouses are not used in daily decision making”  What if the lifespan of the current ‘question of the day’ is very short? OR - what if you don’t even know the question?

So A Real Challenge Data Mining over (potentially) incomplete data fast enough to get the results needed by the business yesterday to make a strategic business decision -And Can we do it without significant investment in new tools

Gartner Magic Quadrant - Microsoft  Business Intelligence  Leader Quadrant for Completeness of Vision and Ability To Execute  Sql Server/SSIS/SSRS is a complete solution.  Product Quality, availability of skills, low implementation costs, alignment with existing infrastructure  Lacking a true Metadata Management solution and visualizations are not as good as some other vendors (but improving)  Advanced Analytics  Perhaps not so much – still a Niche Player  Product Quality, availability of skills, low implementation costs, alignment with existing infrastructure  Availability of analytics gives MS great reach into organizations that can serve as a springboard for future development  SSAS still lacks in depth and breadth, and usability, when compared to the leaders.  However – MS is expected to put a lot of energy into this space and has the means to do so. Source: Gartner Research *** Early/mid 2014

Demo 1 – Setting up a project  Availability of Skills  Don’t need to wait for SSAS  Easily works with existing infrastructure  Cost – If you have a SQL Server installation, you can do this now.  Set up and cursorily explore a Decision Tree model. Show the new objects on the backend SSAS server, single predictive query.

Predictive Algorithms  Decision Tree:  Presents the data as a series of ‘decisions’ used to reach the conclusion. A new branch is added when a significant correlation is found between the input and predicted variables.  Clustering:  Presents the input data as groups of entities with a high correlation of common attributes. ** Can be used to simply profile the data. Prediction is optional **  Naïve Bayes:  Quick method to analyze relationships between input and predictable columns; Less intense, but also less accurate. However, can be used to help define inputs for more accurate, but costly, solutions.  Neural Network:  Complex algorithm that evaluates every possible combination of inputs and outcome(s).

Results – Singleton Query RESULTS Decision TreeCluster Naïve BayesNeural Network Buyer% ChanceSupport Buyer% ChanceSupport Buyer% ChanceSupport Buyer% ChanceSupport Query Inputs

Demo  Exploration of 4 predictive data mining models.

Strengths of the predictive models  Naïve Bayes – Simplest computationally. May use up front to start the analysis since it processes faster. Use the results to refine the criteria for additional analysis with more complex tools. ** Cannot use continuous data as an input  Decision Tree – Used to predict outcomes based on past data, both discrete and continuous.  Clustering – Used to segment the dataset. Use of a predictable outcome is not required. Makes it useful to detect anomalies in the data.  Neural Network – most complex – can detect rules and relationships other methods can’t. Good use cases include those with large number of inputs and relatively few output: Text mining, (Stock) market analysis, manufacturing processes.

Other Data Mining Algorithms  Time Series: Allows us to use historical data to extrapolate a likely value at some point in the future.  Example: Predict Expected Sales By Region  Association Algorithm: Used to detect associations between items or events – the more frequently items or events occur together, the higher the correlation and the probability that if one occurs, the other will too.  Example – customers who bought this also bought this (Amazon)  Sequence Algorithm: Clusters sequences of events; Similar to cluster algorithm.  Example: Common paths through a website, or application.  Linear Regression Algorithm: allows us to explore the linear relationship between variables. Variation on Decision Trees.  Compute a trend line over sales and marketing data.  Logistic Regression: Variation of Neural Network used to model binary outcomes  Use demographics to determine likelihood of a predicted outcome, such as disease.

Resources  MSDN has substantial documentation and tutorials to bring you up to speed on each algorithm  Sql Server Central (a red gate community site) has a step by step Data Mining series of Articles that take you all the way through the MSDN tutorials on basic Data Mining and then how to leverage them…  SSIS packages to build them, exploration via Excel data mining tools, Power BI suite.