Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong.

Slides:



Advertisements
Similar presentations
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Advertisements

Data Mining (and Machine Learning) With Microsoft Tools Michael Lisin, Plaster Group May 8, 2014.
The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material.
Chapter 9 Business Intelligence Systems
DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
Chapter Extension 14 Database Marketing © 2008 Pearson Prentice Hall, Experiencing MIS, David Kroenke.
Data Mining Knowledge Discovery in Databases Data 31.
Data Mining Adrian Tuhtan CS157A Section1.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material.
Finding Hidden Intelligence with Predictive Analysis of Data Mining Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd
Data Mining: A Closer Look
Data Mining: A Closer Look Chapter Data Mining Strategies 2.
Chapter 5 Data mining : A Closer Look.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Gavin Russell-Rockliff BI Technical Specialist Microsoft BIN305.
Beyond Opportunity; Enterprise Miner Ronalda Koster, Data Analyst.
Enterprise systems infrastructure and architecture DT211 4
Comparison of Classification Methods for Customer Attrition Analysis Xiaohua Hu, Ph.D. Drexel University Philadelphia, PA, 19104
Basic Data Mining Techniques
Dr. Awad Khalil Computer Science Department AUC
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
MAKING THE BUSINESS BETTER Presented By Mohammed Dwikat DATA MINING Presented to Faculty of IT MIS Department An Najah National University.
10 Data Mining. What is Data Mining? “Data Mining is the process of selecting, exploring and modeling large amounts of data to uncover previously unknown.
Data Mining Dr. Chang Liu. What is Data Mining Data mining has been known by many different terms Data mining has been known by many different terms Knowledge.
1 Data Mining DT211 4 Refer to Connolly and Begg 4ed.
Data Mining Techniques As Tools for Analysis of Customer Behavior
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Chapter 7 DATA, TEXT, AND WEB MINING Pages , 311, Sections 7.3, 7.5, 7.6.
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
Decision Support Systems Management Information Systems BUS 391 Barry Floyd.
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
Overview of Data Mining Methods Data mining techniques What techniques do, examples, advantages & disadvantages.
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
Lecture 9: Knowledge Discovery Systems Md. Mahbubul Alam, PhD Associate Professor Dept. of AEIS Sher-e-Bangla Agricultural University.
Chapter 7 Neural Networks in Data Mining Automatic Model Building (Machine Learning) Artificial Intelligence.
Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.
Decision Trees Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
Data Mining By : Tung, Sze Ming ( Leo ) CS 157B. Definition A class of database application that analyze data in a database using tools which look for.
Amer Kanj Data Mining For Business Professionals.
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Consul- ting Services Outsour- cing Services Techno- logy Services Local Profes- sional Services Competence Centers Business Intelligence WebTech SAP.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Finding Hidden Intelligence with Predictive Analysis of Data Mining Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd
Overview of Methods Data mining techniques What techniques do, examples, advantages & disadvantages.
Customer Relationship Management (CRM) Chapter 4 Customer Portfolio Analysis Learning Objectives Why customer portfolio analysis is necessary for CRM implementation.
Chapter 14 Data Mining Transparencies. 2 Chapter Objectives u The concepts associated with data mining. u The main features of data mining operations,
MIS2502: Data Analytics Advanced Analytics - Introduction.
DATA MINING PREPARED BY RAJNIKANT MODI REFERENCE:DOUG ALEXANDER.
Academic Year 2014 Spring Academic Year 2014 Spring.
Data Mining Copyright KEYSOFT Solutions.
Knowledge Discovery and Data Mining 19 th Meeting Course Name: Business Intelligence Year: 2009.
Monday, February 22,  The term analytics is often used interchangeably with:  Data science  Data mining  Knowledge discovery  Extracting useful.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Show Me Potential Customers Data Mining Approach Leila Etaati.
Ahmed K. Ezzat, SQL Server 2008 and Data Mining Overview 1 Data Mining and Big Data.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Data Mining Functionalities
Data Mining – Intro.
DATA MINING © Prentice Hall.
Adrian Tuhtan CS157A Section1
Week 11 Knowledge Discovery Systems & Data Mining :
Supporting End-User Access
Kenneth C. Laudon & Jane P. Laudon
Presentation transcript:

Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong

Agenda AnnouncementOverview Microsoft Mining Model Algorithms Lucky Draw!!!

Announcement Learn Microsoft Technologies and Win Some Prize! To make it easier for you to learn Microsoft technologies, we have changed the way to deliver seminar contents by offering you Offline Webcast CDs. 3 CDs in 6 months – 3 topics and assessment3 CDs in 6 months – 3 topics and assessment If you can pass the assessment criteria, you will receive a $150 Park’n Shop cash coupon!If you can pass the assessment criteria, you will receive a $150 Park’n Shop cash coupon! Since this is a trial offer, the maximum number of participants will be limited to 50 (on first-come-first-serve basis). Register now by sending to Microsoft Macau Team at

Data Mining Overview Microsoft Data Mining Algorithms

Microsoft Mining Model Algorithms Decision Trees Naive Bayes Cluster Analysis Sequence Clustering Association Rules Time Series Neural Networks

Decision Trees Classify each case to one of a few discrete broad categories of selected attributes The process of building is recursive partitioning – splitting data into partitions and then splitting it up more Initially all cases are in one big box

Decision Trees The algorithm tries all possible breaks in classes using all possible values of each input attribute; it then selects the split that partitions data to the purest classes of the searched variable Several measures of purity Then it repeats splitting for each new class Again testing all possible breaks Unuseful branches of the tree can be pre-pruned or post-pruned

Decision Trees Decision trees are used for classification and prediction Typical questions: Predict which customers will leave Help in mailing and promotion campaigns Explain reasons for a decision What are the movies young female customers like to buy?

Microsoft Mining Models

Naïve Bayes Classification and Prediction Model Calculates probabilities for each possible state of the input attribute given each state of the predictable attribute

Naïve Bayes Used for classification Assign new cases to predefined classes Some typical questions: Categorize bank loan applications Determining which home telephone lines are used for Internet access Assigning customers to predefined segments Quickly gathering basic comprehension

Cluster Analysis Grouping data into clusters Objects within a cluster have high similarity based on the attribute values The class label of each object is not known Several techniques Partitioning methods Hierarchical methods Density based methods Model based methods, more…

Cluster Analysis Segments a heterogeneous population into a number of more homogenous subgroups or clusters Some typical questions: Discover distinct groups of customers Identification of groups of houses in a city In biology, derive animal and plant taxonomies

Sequence Clustering Analyzes sequence-oriented data that contains discrete-valued series The sequence attribute in the series holds a set of events with a specific order that can be cosnsidered as a model Typically used for Web customer analysis Can be used for any other sequential data

Sequence Clustering UserSequence 1 frontpage news travel travel 2 news news news news news 3 frontpage news frontpage news frontpage 4 news news 5 frontpage news news travel travel travel 6 news weather weather weather weather 7 news health health business business business 8 frontpage sports sports sports weather 9weather Click-Stream Analysis

Microsoft Mining Models

Association Rules For market basket analyses Identify cross-selling opportunities Arrange attractive packages Considers each attribute/value pair as an item An item set is a combination of items in a single transaction The algorithm scans through the dataset trying to find item sets that tend to appear in many transactions

Association Rules – Support Support is the percentage of rows containing the item combination compared to the total number of rows: Transaction 1: Frozen pizza, cola, milk Transaction 2: Milk, potato chips Transaction 3: Cola, frozen pizza Transaction 4: Milk, pretzels Transaction 5: Cola, pretzels The support for the rule “If a customer purchases Cola, then they will purchase Frozen Pizza” is 40%

Association Rules – Confidence What if 100% of customers buy milk and only 20% of those buy potato chips? The confidence of an association rule is the support for the combination divided by the support for the condition This gives a confidence for a rule “If a customer purchases Milk, they will purchase Potato Chips” of (20% / 60%) = 33%

Time Series Predict continuous columns, such as product sales or stock performance in a forecasting scenario Builds a model in two stages First stage creates a list of optimal candidate input columns Second stage investigates each candidate input column and determines if it improves the model

Microsoft Mining Models

Neural Network Data modeling tool that is able to capture and represent complex input/output relationships Neural networks resemble the human brain in the following two ways: A neural network acquires knowledge through learning A neural network's knowledge is stored within inter- neuron connection strengths known as synaptic weights It explores all possible data relationships It is slow

Back-Propagation Training a neural network is setting the best weights on the inputs of each of the units The back-propagation process: Get a training example and calculate outputs Calculate the error – the difference between the calculated and the expected (known) result Adjust the weights to minimize the error

Conclusion: When To Use What Analytical problem ExamplesAlgorithms Classification: Assign cases to predefined classes Credit risk analysis Churn analysis Customer retention Decision Trees Naive Bayes Neural Nets Segmentation: Taxonomy for grouping similar cases Customer profile analysis Mailing campaign Clustering Sequence Clustering Association: Advanced counting for correlations Market basket analysis Advanced data exploration Decision Trees Association Time Series Forecasting: Predict the future Forecast sales Predict stock prices Time Series Prediction: Predict a value for a new case based on values for similar cases Quote insurance rates Predict customer income All Deviation analysis: Discover how a case or segment differs from others Credit card fraud detection Network infusion analysis All

© 2004 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.