Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto.

Slides:



Advertisements
Similar presentations
Chapter 7 – Classification and Regression Trees
Advertisements

Chapter 7 – Classification and Regression Trees
Chapter 9 Business Intelligence Systems
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Basic Data Mining Techniques
Data Mining By Archana Ketkar.
Data Mining Adrian Tuhtan CS157A Section1.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Chapter Extension 12 Database Marketing.
Data Mining Concepts 1.1 COT5230 Data Mining Week 1 Data Mining Concepts M O N A S H A U S T R A L I A ’ S I N T E R N A T I O N A L U N I V E R S I T.
Data Mining – Intro.
Data Mining: A Closer Look
Data Mining: A Closer Look Chapter Data Mining Strategies 2.
Chapter 5 Data mining : A Closer Look.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Chapter 35 Data Mining Transparencies. 2 Chapter Objectives u The concepts associated with data mining. u The main features of data mining operations,
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Decision Trees Prepared by David Douglas, University of ArkansasHosted.
Beyond Opportunity; Enterprise Miner Ronalda Koster, Data Analyst.
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Chapter 13 – Association Rules
Dr. Awad Khalil Computer Science Department AUC
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
Data Mining Techniques
Overview DM for Business Intelligence.
Data Mining Dr. Chang Liu. What is Data Mining Data mining has been known by many different terms Data mining has been known by many different terms Knowledge.
1 Data Mining DT211 4 Refer to Connolly and Begg 4ed.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Chapter 9 – Classification and Regression Trees
Chapter 12 – Discriminant Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
XLMiner – a Data Mining Toolkit QuantLink Solutions Pvt. Ltd.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Chapter 14 – Cluster Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Chapter 14 Data Mining Transparencies. 2 Chapter Objectives u The concepts associated with data mining. u The main features of data mining operations,
MIS2502: Data Analytics Advanced Analytics - Introduction.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Data Mining and Decision Support
Overview of the Data Mining Process
Data Mining Copyright KEYSOFT Solutions.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Monday, February 22,  The term analytics is often used interchangeably with:  Data science  Data mining  Knowledge discovery  Extracting useful.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Chapter 13 – Association Rules DM for Business Intelligence.
Chapter 12 – Discriminant Analysis
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining – Intro.
By Arijit Chatterjee Dr
Chapter 15 – Cluster Analysis
XLMiner – a Data Mining Toolkit
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Data Mining 101 with Scikit-Learn
Adrian Tuhtan CS157A Section1
Supporting End-User Access
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Chapter 14 – Association Rules
Presentation transcript:

Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto

Extraction Of Knowledge From Data

DSS Architecture: Learning and Predicting Courtesy: Tim Graettinger

Data Mining: Definitions Data mining = the process of discovering and modeling hidden pattern in a large volume of data Related terms = knowledge discovery in database (KDD), intelligent data analysis (IDA), decision support system (DSS). The pattern should be novel and useful. Example of trivial (not useful) pattern: “unemployed people don’t earn income from work” The data mining process is data-driven and must be automatic and semi-automatic.

Example: Nonlinear Model

Basic Fields of Data Mining Machine Learning Databases Statistics

Human-Centered Process

Watson Jeopardy 8

Core Algorithms in Data Mining Supervised Learning: ◦ Classification ◦ Prediction Unsupervised Learning ◦ Association Rules ◦ Clustering ◦ Data Reduction (Principal Component Analysis) ◦ Data Exploration and Visualization

Supervised Learning Supervised: there are clear examples from the past cases that can be used to train (supervise) the machine. Goal: predict a single “target” or “outcome” variable Training data where target value is known Score to data where value is not known Methods: Classification and Prediction

Unsupervised Learning Unsupervised: there is no clear examples to supervise the machine Goal: segment data into meaningful segments; detect patterns There is no target (outcome) variable to predict or classify Methods: Association rules, data reduction & exploration, visualization

Example of Supervised Learning: Classification Goal: predict categorical target (outcome) variable Examples: Purchase/no purchase, fraud/no fraud, creditworthy/not creditworthy… Each row is a case (customer, tax return, applicant) Each column is a variable Target variable is often binary (yes/no)

Example of Supervised Learning: Prediction Goal: predict numerical target (outcome) variable Examples: sales, revenue, performance As in classification: ◦ Each row is a case (customer, tax return, applicant) ◦ Each column is a variable Taken together, classification and prediction constitute “predictive analytics”

Example of Unsupervised Learning: Association Rules Goal: produce rules that define “what goes with what” Example: “If X was purchased, Y was also purchased” Rows are transactions Used in recommender systems – “Our records show you bought X, you may also like Y” Also called “affinity analysis”

The Process of Data Mining

Steps in Data Mining 1. Define/understand purpose 2. Obtain data (may involve random sampling) 3. Explore, clean, pre-process data 4. Reduce the data; if supervised DM, partition it 5. Specify task (classification, clustering, etc.) 6. Choose the techniques (regression, CART, neural networks, etc.) 7. Iterative implementation and “tuning” 8. Assess results – compare models 9. Deploy best model

Preprocessing Data: Eliminating Outliers 17

Handling Missing Data Most algorithms will not process records with missing values. Default is to drop those records. Solution 1: Omission ◦ If a small number of records have missing values, can omit them ◦ If many records are missing values on a small set of variables, can drop those variables (or use proxies) ◦ If many records have missing values, omission is not practical Solution 2: Imputation ◦ Replace missing values with reasonable substitutes ◦ Lets you keep the record and use the rest of its (non- missing) information

Common Problem: Overfitting Statistical models can produce highly complex explanations of relationships between variables The “fit” may be excellent When used with new data, models of great complexity do not do so well.

100% fit – not useful for new data Consequence: Deployed model will not work as well as expected with completely new data.

Learning and Testing Problem: How well will our model perform with new data? Solution: Separate data into two parts ◦ Training partition to develop the model ◦ Validation partition to implement the model and evaluate its performance on “new” data Addresses the issue of overfitting

Algorithms: for Classification/Prediction tasks ◦ k-Nearest Neighbor ◦ Naïve Bayes ◦ CART ◦ Discriminant Analysis ◦ Neural Networks Unsupervised learning ◦ Association Rules ◦ Cluster Analysis 22

K-Nearest Neighbor: The idea How to classify: Find the k closest records to the one to be classified, and let them “vote”. 23

Example 24

Naïve Bayes: Basic Idea Basic idea similar to k-nearest neighbor: To classify an observation, find all similar observations (in terms of predictors) in the training set Uses only categorical predictors (numerical predictors can be binned) Basic idea equivalent to looking at pivot tables 25

The “Primitive” Idea: Example Y = personal loan acceptance (0/1) Two predictors: CreditCard (0/1), Online (0,1) What is the probability of acceptance for customers with CreditCard=1, Online=1? 26 50/(461+50) =.0978

Conditional Probability - Refresher 27 A = the event “customer accepts loan” (Loan=1) B = the event “customer has credit card” (CC=1) = probability of A given B (the conditional probability that A occurs given that B occurred) If P(B)>0

A classic: Microsoft’s Paperclip 28

Classification and Regression Trees (CART) Trees and Rules Goal: Classify or predict an outcome based on a set of predictors The output is a set of rules Example: Goal: classify a record as “will accept credit card offer” or “will not accept” Rule might be “IF (Income > 92.5) AND (Education < 1.5) AND (Family <= 2.5) THEN Class = 0 (nonacceptor) Also called CART, Decision Trees, or just Trees Rules are represented by tree diagrams 29

30

Key Ideas Recursive partitioning: Repeatedly split the records into two parts so as to achieve maximum homogeneity within the new parts Pruning the tree: Simplify the tree by pruning peripheral branches to avoid overfitting 31

The first split: Lot Size = 19,000 Second Split: Income = $84,000 32

After All Splits 33

Neural Networks: Basic Idea Combine input information in a complex & flexible neural net “model” Model “coefficients” are continually tweaked in an iterative process The network’s interim performance in classification and prediction informs successive tweaks 34

Architecture 35

36

Discriminant Analysis A classical statistical technique Used for classification long before data mining ◦ Classifying organisms into species ◦ Classifying skulls ◦ Fingerprint analysis And also used for business data mining (loans, customer types, etc.) Can also be used to highlight aspects that distinguish classes (profiling ) 37

Can we manually draw a line that separates owners from non-owners? 38 LDA: To classify a new record, measure its distance from the center of each class Then, classify the record to the closest class

Loan Acceptance 39 In real world, there will be more records, more predictors, and less clear separation

Association Rules (market basket analysis) Study of “what goes with what” ◦ “Customers who bought X also bought Y” ◦ What symptoms go with what diagnosis Transaction-based or event-based Also called “market basket analysis” and “affinity analysis” Originated with study of customer transactions databases to determine associations among items purchased 40

Lore A famous story about association rule mining is the "beer and diaper" story. {diaper} > {beer} An example of how unexpected association rules might be found from everyday data. In 1992, Thomas Blischok of Teradata analyzed 1.2 million market baskets of 25 Osco Drug stores. The analysis "did discover that between 5:00 and 7:00 p.m. that consumers bought beer and diapers". Osco managers did NOT exploit the beer and diapers relationship by moving the products closer together on the shelves. 41

Used in many recommender systems 42

Terms “IF” part = antecedent (item 1) “THEN” part = consequent (item 2) “Item set” = the items (e.g., products) comprising the antecedent or consequent Antecedent and consequent are disjoint (i.e., have no items in common) Confidence: Item 2 comes together with Item 1 in 10% of all transactions Support: Item 1 comes together with Item 2 in X% of all transactions 43

Plate color purchase 44

Lift ratio shows how important is the rule ◦ Lift = Support (a U c) / (Support (a) x Support (c) ) Confidence shows the rate at which consequents will be found (useful in learning costs of promotion) Support measures overall impact 45

Application is not always easy Wal-Mart knows that customers who buy Barbie dolls have a 60% likelihood of buying one of three types of candy bars. What does Wal-Mart do with information like that? 'I don't have a clue,' says Wal- Mart's chief of merchandising, Lee Scott 46

Cluster Analysis Goal: Form groups (clusters) of similar records Used for segmenting markets into groups of similar customers Example: Claritas segmented US neighborhoods based on demographics & income: “Furs & station wagons,” “Money & Brains”, … 47

Example: Public Utilities 48 Goal: find clusters of similar utilities Example of 3 rough clusters using 2 variables Low fuel cost, low sales High fuel cost, low sales Low fuel cost, high sales

Hierarchical Cluster 49

Clustering Cluster analysis is an exploratory tool. Useful only when it produces meaningful clusters Hierarchical clustering gives visual representation of different levels of clustering ◦ On other hand, due to non-iterative nature, it can be unstable, can vary highly depending on settings, and is computationally expensive Non-hierarchical is computationally cheap and more stable; requires user to set k Can use both methods 50