IBM SPSS Modeler 14.2 Data Mining Concepts Introduction to Undirected Data Mining: Association Analysis Prepared by David Douglas, University of ArkansasHosted.

Slides:



Advertisements
Similar presentations
Data Mining Techniques Association Rule
Advertisements

DATA MINING Association Rule Discovery. AR Definition aka Affinity Grouping Common example: Discovery of which items are frequently sold together at a.
Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
Pertemuan XIV FUNGSI MAYOR Assosiation. What Is Association Mining? Association rule mining: –Finding frequent patterns, associations, correlations, or.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall COS 236 Day 25.
MIS2502: Data Analytics Association Rule Mining. Uses What products are bought together? Amazon’s recommendation engine Telephone calling patterns Association.
Introduction to Directed Data Mining: K-Nearest Neighbor
Data Mining, Frequent-Itemset Mining
Chapter 9 Business Intelligence Systems
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall COS 346 Day 26.
Association Rule Mining Part 1 Introduction to Data Mining with Case Studies Author: G. K. Gupta Prentice Hall India, 2006.
Data Mining Adrian Tuhtan CS157A Section1.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Introduction to Directed Data Mining: Neural Networks
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Decision Trees Prepared by David Douglas, University of ArkansasHosted.
Introduction to Directed Data Mining: Decision Trees
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Microsoft Enterprise Consortium Data Mining Concepts Introduction: The essential background Prepared by David Douglas, University of ArkansasHosted by.
Enterprise systems infrastructure and architecture DT211 4
Introduction to undirected Data Mining: Clustering
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Neural Networks Prepared by David Douglas, University of ArkansasHosted.
Chapter 13 – Association Rules
Basic Data Mining Techniques
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
Data Mining Techniques
Data Mining with Clementine
Introduction: The essential background
Market Basket Analysis 포항공대 산업공학과 PASTA Lab. 석사과정 신원영.
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover.
ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.
Association Rule.. Association rule mining  It is an important data mining model studied extensively by the database and data mining community.  Assume.
Frequent-Itemset Mining. Market-Basket Model A large set of items, e.g., things sold in a supermarket. A large set of baskets, each of which is a small.
Association Rule Mining
DATA MINING By Cecilia Parng CS 157B.
ASSOCIATION RULES (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
1 Chapter 8: Introduction to Pattern Discovery 8.1 Introduction 8.2 Cluster Analysis 8.3 Market Basket Analysis (Self-Study)
Associations and Frequent Item Analysis. 2 Outline  Transactions  Frequent itemsets  Subset Property  Association rules  Applications.
MIS2502: Data Analytics Advanced Analytics - Introduction.
Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. Definition Data mining is the exploration and analysis of large quantities of data.
Elsayed Hemayed Data Mining Course
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Association Rules Carissa Wang February 23, 2010.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
DATA MINING It is a process of extracting interesting(non trivial, implicit, previously, unknown and useful ) information from any data repository. The.
Jerry Post Copyright © Database Management Systems: Data Mining Market Baskets Association Rules.
MIS2502: Data Analytics Association Rule Mining David Schuff
Business Intelligence Technologies – Data Mining ` Lecture 2 Market Basket Analysis, Association Rules 1.
Chapter 14 – Association Rules and Collaborative Filtering © Galit Shmueli and Peter Bruce 2016 Data Mining for Business Analytics (3rd ed.) Shmueli, Bruce.
Chapter 13 – Association Rules DM for Business Intelligence.
MIS2502: Data Analytics Association Rule Mining Jeremy Shafer
Data Mining ICCM
DATA MINING © Prentice Hall.
Data Mining-Association Rule
Knowledge discovery & data mining Association rules and market basket analysis--introduction UCLA CS240A Course Notes*
Data Analysis.
12/2/2018.
A Comparison of Capabilities of Data Mining Tools
MIS2502: Data Analytics Association Rule Mining
MIS2502: Data Analytics Association Rule Mining
MIS2502: Data Analytics Association Rule Learning
Chapter 14 – Association Rules
Presentation transcript:

IBM SPSS Modeler 14.2 Data Mining Concepts Introduction to Undirected Data Mining: Association Analysis Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas 1 IBM SPSS

IBM SPSS Modeler 14.2 Association Analysis Also referred to as Affinity Analysis Market Basket Analysis For MBA, basically means what is being purchased together Association rules represent patterns without a specific target; thus undirected or unsupervised data mining Fits in the Exploratory category of data mining Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas 2

IBM SPSS Modeler 14.2 Association Rules Other potential uses ◦Items purchases on credit card give insight to next produce or service purchased ◦Help determine bundles for telcoms ◦Help bankers determine identify customers for other services ◦Unusual combinations of things like insurance claims may need further investigation ◦Medical histories may give indications of complications or helpful combinations for patients Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas 3

IBM SPSS Modeler 14.2 Defining MBA MBA data ◦Customers ◦Purchases (baskets or item sets) ◦Items Figure 9-3 set of tables ◦Purchase (Order) is the fundamental data structure  Individual items are line items  Product –descriptive info  Customer info can be helpful Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas 4

IBM SPSS Modeler 14.2 Levels of Data Adapted from Barry & Linoff Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas 5

IBM SPSS Modeler 14.2 MBA The three levels of data are important for MBA. They can be used to answer a number of questions ◦Average number of baskets/customer/time unit ◦Average unique items per customer ◦Average number of items per basket ◦For a given product, what is the proportion of customers who have ever purchased the product? ◦For a given product, what is the average number of baskets per customer that include the item ◦For a given product, what is the average quantity purchased in an order when the product is purchased? Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas 6

IBM SPSS Modeler 14.2 Item Popularity Most common item in one-item baskets Most common item in multi-item baskets Most common items among repeat customers Change in buying patterns of item over time Buying pattern for an item by region Time and geography are two of the most important attributes of MBA data Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas 7

IBM SPSS Modeler 14.2 Tracking Market Interventions Adapted from Barry & Linoff Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas 8

IBM SPSS Modeler 14.2 Association Rules Actionable Rules ◦Wal-Mart customers who purchase Barbie dolls have a 60 percent likelihood of also purchasing one of three types of candy bars Trivial Rules ◦Customers who purchase maintenance agreements are very likely to purchase a large appliance Inexplicable Rules ◦When a new hardware store opens, one of the most commonly sold items is toilet cleaners Adapted from Barry & Linoff Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas 9

IBM SPSS Modeler 14.2 What exactly is an Association Rule? Of the form: IF antecedent THEN consequent If (orange juice, milk) Then (bread, bacon) Rules include measure of support and confidence Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas 10

IBM SPSS Modeler 14.2 How good is an Association Rule? Transactions can be converted to Co-occurrence matrices Co-occurrence tables highlight simple patterns Confidence and support can be directly determined from a co-occurrence table Or by counting via SQL, etc. DM software makes the presentation easy Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas 11

IBM SPSS Modeler 14.2 Co-Occoncurrence Table OJWCMilkSodaDet OJ WC- Milk-- Soda--- Det---- Customer Items 1Orange juice, soda 2Milk, orange juice, window cleaner 3Orange juice, detergent 4Orange juice, detergent, soda 5Window cleaner, milk Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas 12

IBM SPSS Modeler 14.2 Co-Occoncurrence Table OJWCMilkSodaDet OJ41122 WC-2200 Milk--200 Soda---21 Det----2 Customer Items 1Orange juice, soda 2Milk, orange juice, window cleaner 3Orange juice, detergent 4Orange juice, detergent, soda 5Window cleaner, milk Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas 13

IBM SPSS Modeler 14.2 Confidence, Support and Lift Support for the rule # records with both antecedent and consequent Total # records Confidence for the rule # records with both antecedent and consequent # records of the antecedent Expected Confidence # records of the consequent Total # records Lift Confidence / Expected Confidence Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas 14

IBM SPSS Modeler 14.2 Confidence and Support Rule: If soda then orange juice From the co-occurrence table, soda and orange juice occur together 2 times (out of 5 total transactions) Thus, support for the rule is 2/5 or 40% Confidence for the rule: Soda occurs 2 times; so confidence of orange juice given soda would be 2/2 or 100% Lift for the rule: Confidence / Expected Confidence confidence = 100%; expected confidence=80% lift = 1.0/.8 = 1.25 Rule: If orange juice then soda support for the rule is the same—40% orange juice occurs 4 times; so confidence of soda given orange juice is 2/4 or 50% lift =.5/.8 Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas 15

IBM SPSS Modeler 14.2 Building Association Rules Adapted from Barry & Linoff Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas 16

IBM SPSS Modeler 14.2 Product Hierarchies Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas 17

IBM SPSS Modeler 14.2 Lessons Learned MBA is complex and no one technique is powerful enough to provide all the answers. Three levels—Order (basket), line items and customer MBA can answer a number of questions Association rules most common technique for MBA Generate rules--support, confidence and lift Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas 18