Drew Minkin ◦ Past  Analytics Architect at Zilliant  Senior Consultant, Fujitsu  6+ years Microsoft Services  Escalation.

Slides:



Advertisements
Similar presentations
Unsupervised Learning
Advertisements

DAMA-NCR Tuesday, November 13, 2001 Laura Squier Technical Consultant What is Data Mining?
Data Mining (and Machine Learning) With Microsoft Tools Michael Lisin, Plaster Group May 8, 2014.
The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material.
Introduction to Data Mining with XLMiner
Chapter 9 Business Intelligence Systems
/faculteit technologie management Introduction to Data Mining a.j.m.m. (ton) weijters (slides are partially based on an introduction of Gregory Piatetsky-Shapiro)
SLIDE 1IS 257 – Fall 2008 Data Mining and the Weka Toolkit University of California, Berkeley School of Information IS 257: Database Management.
The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material.
Finding Hidden Intelligence with Predictive Analysis of Data Mining Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Gavin Russell-Rockliff BI Technical Specialist Microsoft BIN305.
April 11, 2008 Data Mining Competition 2008 The 4 th Annual Business Intelligence Symposium Hualin Wang Manager of Advanced.
Data Mining Techniques
More on Data Mining KDnuggets Datanami ACM SIGKDD
Data Mining Dr. Chang Liu. What is Data Mining Data mining has been known by many different terms Data mining has been known by many different terms Knowledge.
5.1 © 2007 by Prentice Hall 5 Chapter Foundations of Business Intelligence: Databases and Information Management.
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Dr. Russell Anderson Dr. Musa Jafar West Texas A&M University.
The CRISP-DM Process Model
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Highline Class, BI 348 Basic Business Analytics using Excel, Chapter 01 Intro to Business Analytics BI 348, Chapter 01.
Lecture 9: Knowledge Discovery Systems Md. Mahbubul Alam, PhD Associate Professor Dept. of AEIS Sher-e-Bangla Agricultural University.
More value from data using Data Mining Allan Mitchell SQL Server MVP.
The DM Process – MS’s view (DMX). The Basics  You select an algorithm, show the algorithm some examples called training example and, from these examples,
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Introduction to SQL Server Data Mining Nick Ward SQL Server & BI Product Specialist Microsoft Australia Nick Ward SQL Server & BI Product Specialist Microsoft.
Data Mining with Oracle using Classification and Clustering Algorithms Proposed and Presented by Nhamo Mdzingwa Supervisor: John Ebden.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Consul- ting Services Outsour- cing Services Techno- logy Services Local Profes- sional Services Competence Centers Business Intelligence WebTech SAP.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
1 STAT 5814 Statistical Data Mining. 2 Use of SAS Data Mining.
What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or.
1 Technology in Action Chapter 11 Behind the Scenes: Databases and Information Systems Copyright © 2010 Pearson Education, Inc. Publishing as Prentice.
Bohdan Szymanik Enterprise Architecture Manager, Kiwibank.
Finding Hidden Intelligence with Predictive Analysis of Data Mining Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd
Foundations of Business Intelligence: Databases and Information Management.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Advanced (and attractive) analytics Rafal Lukawiecki Strategic Consultant, Project Botticelli
Data Mining With SQL Server Data Tools Mining Data Using Tools You Already Have.
Show Me Potential Customers Data Mining Approach Leila Etaati.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Jeremy Kingry, eBECS | PREDICTIVE INTELLIGENCE AND WHY YOU WANT TO KNOW ABOUT IT.
Oracle Advanced Analytics
Machine Learning with Spark MLlib
INTRODUCTION AND DEFINITIONS
Customer Analytics: Strategies for Success
Machine Learning for Computer Security
Data Mining in Action: A Case Study
DATA MINING © Prentice Hall.
Fundamentals & Ethics of Information Systems IS 201
Delivering Business Insight with SQL Server 2005
©Jiawei Han and Micheline Kamber Slides contributed by Jian Pei
©Jiawei Han and Micheline Kamber Slides contributed by Jian Pei
Physical Database Design
Week 11 Knowledge Discovery Systems & Data Mining :
TechEd /28/ :48 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered.
Dr. Morgan C. Wang Department of Statistics
©Jiawei Han and Micheline Kamber Slides contributed by Jian Pei
Machine Learning with Weka
Course Lab Introduction to IBM Watson Analytics
Chapter 7: Transformations
6/17/ :03 AM © 2004 Microsoft Corporation. All rights reserved.
Presentation transcript:

Drew Minkin

◦ Past  Analytics Architect at Zilliant  Senior Consultant, Fujitsu  6+ years Microsoft Services  Escalation Engineer  Dedicated Field Engineer (“Alliance”)  Local speaker for SQL and BI  OLAP Lecturer, SMU’s BI Graduate Certificate Program ◦ Present  Business Intelligence Architect at FiServ ISV  Part time data miner for hire

 Data Mining Intro  DM Methodology  Data Concepts  Validating and Testing Models  Applying Output with Scorecards

 

 Methodology  Architecture  Information Flow  Technologies

 Problem Definition  Data Modeling  Data Discovery  Analytics Modeling  Applied Analytics  Model Validation

 Problem Definition  Data Modeling  Data Discovery  Analytics Modeling  Applied Analytics  Model Validation

 Business case and non-technical details of predictive analytics inquiry ◦ Business objectives and success criteria ◦ Requirements, assumptions and constraints ◦ Project plan, Risks and contingencies ◦ Data mining goals and success criteria ◦ Terminology, tools and techniques

 Analysis of source data for structural and content gaps ◦ Data collection report ◦ Data description report ◦ Data exploration report ◦ Data quality report

 Selection and manipulation of source data into a conformed entity input ready for formal exploration ◦ Dataset and dictionary and rationale ◦ Data cleansing report ◦ Derived attributes ◦ Generated merged and reformatted data

 Research and analysis of patterns and creation of data mining models ◦ Model ◦ Modeling technique ◦ Modeling assumptions ◦ Test design ◦ Parameter settings ◦ Model description

 Testing data mining models using different algorithms and validation of statistical significance ◦ Revised Parameter settings ◦ Model Validation plan ◦ Model assessment

 Integration of models with new data ◦ Deployment plan ◦ Monitoring and maintenance plan ◦ Final report ◦ Final presentation ◦ Experience documentation

 Case – set of columns you want to analyze ◦ Age, Gender, Region, Annual Spending  Case Key – unique ID of a case  A column has: ◦ Data Type ◦ Content Type ◦ And optionally:  Distribution  Discretization  Related Columns  Flags (e.g. NOT NULL)

 We don’t care about detailed low-level types  DM only uses: ◦ Text ◦ Long ◦ Boolean ◦ Double ◦ Date ◦ and by some 3rd party algorithms:  Time, and Sequence

 Common: ◦ DISCRETE  Red, Blue ◦ CONTINOUS  $6, ◦ DISCRETIZED  1-5, 6-20, 21+  Denotes a key: ◦ KEY  For special purposes: ◦ KEY SEQUENCE ◦ KEY TIME ◦ ORDERED ◦ CYCLICAL

 Some algorithms interpret this in different ways, but in general, columns are for:  Input ◦ For predicting another column  PREDICT ◦ These columns are both predicted and act as inputs for predicting others  PREDICT_ONLY ◦ Not used as input  Columns can be input or predictable or both

 When you don’t need to analyze full continuous range  DM automatically convert data into buckets ◦ By default, into 5  Techniques: ◦ AUTOMATIC ◦ CLUSTERS ◦ EQUAL_AREAS ◦ THRESHOLDS

 If you know the distribution of your data (you should), indicate it: ◦ NORMAL  Typical Gaussian bell-curve ◦ LOG NORMAL  Most values at the “beginning” of the scale ◦ UNIFORM  Flat line – equally likely or perfectly random  Other distributions can exist, but you cannot indicate them – algorithm will work fine

 Nested Case – case containing a table column ◦ Purchases of a Customer  Used for analyzing patterns in a relationship  It has a Nested Key ◦ Not a “relational” foreign key! ◦ Normally, the Nested Key is a column you want to analyze  E.g.: Product Name or Model

Classification Estimation Segmentation Association Forecasting Text Analysis Advanced Data Exploration Time Series Sequence Clustering Neural Nets Naïve Bayes Logistic Regression Linear Regression Decision Trees Clustering Association Rules Algorithms and Use Cases

AlgorithmDrillthroughPMMLDM Dimension AssociationYesNoYes ClusteringYes Decision TreesYes Linear RegressionYesNo Logistic RegressionNo Naive BayesYes No Neural NetworkNo Sequence ClusteringYesNoYes Time SeriesYesNo

 AVGGIFT Average dollar amount of gifts to date  INCOME HOUSEHOLD INCOME  LASTGIFT last donation amount  MAXRAMNT Dollar amount of largest gift to date  MINRAMNT Dollar amount of smallest gift to date  RAMNTALL Dollar amount of lifetime gifts to date  WEALTH1 Wealth Rating  WEALTH2 Wealth Rating  STATE State abbreviation (a nominal/symbolic field)

 Donor Rank  DOMAIN/Cluster code. A nominal or symbolic field.  could be broken down by bytes as explained below. ◦ 1st byte = Urbanicity level of the donor's neighborhood  U=Urban  C=City  S=Suburban  T=Town  R=Rural ◦ 2nd byte = Socio-Economic status of the neighborhood  1 = Highest SES  2 = Average SES  3 = Lowest SES except for Urban communities, 1 = Highest SES, 2= Above average SES 3 = Below average SES 4 = Lowest SES. 

=

   Masao Okada  Rafal Lukawiecki  Eugene A. Asahara

Data Mining in Action : A Case Study Drew Minkin (madmanminkin) Evaluation Links