Data Mining with Clementine

Slides:



Advertisements
Similar presentations
Data Mining: What? WHY? HOW?
Advertisements

Target Markets: Segmentation and Evaluation
1. Abstract 2 Introduction Related Work Conclusion References.
Chapter 9 Business Intelligence Systems
DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
Week 9 Data Mining System (Knowledge Data Discovery)
Data Mining By Archana Ketkar.
Data Mining Adrian Tuhtan CS157A Section1.
IBM SPSS Modeler 14.2 Data Mining Concepts Introduction to Undirected Data Mining: Association Analysis Prepared by David Douglas, University of ArkansasHosted.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Data mining By Aung Oo.
CS 5941 CS583 – Data Mining and Text Mining Course Web Page 05/cs583.html.
Data Mining: A Closer Look
Data Mining: A Closer Look Chapter Data Mining Strategies 2.
Chapter 5 Data mining : A Closer Look.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Beyond Opportunity; Enterprise Miner Ronalda Koster, Data Analyst.
Enterprise systems infrastructure and architecture DT211 4
3-1 Chapter Three. 3-2 Secondary Data vs. Primary Data Secondary Data: Data that have been gathered previously. Primary Data: New data gathered to help.
Teaching Data Mining: The New “Required Competency” for Marketing Professionals Today’s Presenters: Tom Nugent Kenneth Elliott, Ph.D.
Data Warehousing by Industry Chapter 4 e-Data. Retail Data warehousing’s early adopters Capturing data from their POS systems  POS = point-of-sale Industry.
Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
Business Intelligence. Topics Chart Online Analytical Process, OLAP – Excel’s Pivot table – Data visualization with dashboard Data warehousing Data Mining.
Data Mining Techniques
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
1 Data Mining DT211 4 Refer to Connolly and Begg 4ed.
Data Mining An Introduction.
Data Mining Chun-Hung Chou
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
Chapter 9 Business Intelligence and Information Systems for Decision Making.
Inductive learning Simplest form: learn a function from examples
Target Markets: Segmentation and Evaluation
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining By : Tung, Sze Ming ( Leo ) CS 157B. Definition A class of database application that analyze data in a database using tools which look for.
Data Mining with Oracle using Classification and Clustering Algorithms Proposed and Presented by Nhamo Mdzingwa Supervisor: John Ebden.
Chapter 3 The Impact of Databases. What is a database? Flat file – Access is slow – Most older legacy systems Relational – Files are linked by a duplicate.
Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Copyright © Houghton Mifflin Company. All rights reserved. 7–17–1 What Is a Market? Requirements of a Market –Must need or desire a particular product.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
DATA MINING By Cecilia Parng CS 157B.
Business Intelligence - 2 BUS 782. Topics Data warehousing Data Mining.
Business Intelligence. Topics Chart Online Analytical Process, OLAP – Excel’s Pivot table – Data visualization with dashboard Scenario Management Data.
MIS2502: Data Analytics Advanced Analytics - Introduction.
Information Management and Market Research. Marketing Research Links…. Consumer, Customer, and Public Marketer through information Marketing Research:
Academic Year 2014 Spring Academic Year 2014 Spring.
Data Mining Copyright KEYSOFT Solutions.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
1 © 2014 by McGraw-Hill Education. This is proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Supplemental Chapter: Business Intelligence Information Systems Development.
Decision Support Systems
By Arijit Chatterjee Dr
Data Based Decision Making
Data Mining Generally, (Sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it.
MIS2502: Data Analytics Advanced Analytics - Introduction
Data Mining 101 with Scikit-Learn
Adrian Tuhtan CS157A Section1
Week 11 Knowledge Discovery Systems & Data Mining :
Using Data Mining To Improve Company Strategies
Kenneth C. Laudon & Jane P. Laudon
Presentation transcript:

Data Mining with Clementine Girish Punj Professor of Marketing School of Business University of Connecticut

Agenda How to introduce data mining to students Why Clementine? Clementine features and capabilities A typical data mining class Useful teaching resources Questions?

Introduce Data Mining to Students Data mining chosen as one of top 10 emerging technologies..” (MIT Technology Review) Data mining expertise is most sought after...” (Information Week Survey) Data mining skills are an important part of the “toolkit” needed by managers in a complex business world Data Mining for job advancement and as career insurance during good and bad economic times

Introduce Data Mining to Students “When I looked at what companies were doing with analytics I found it had moved from the back room to the board room…a number of companies weren’t just using analytics, they were now competing on analytics -- they had made analytics the central strategy of their business.” (Tom Davenport, author of ‘Competing on Analytics’) “We are drowning in information but starved for knowledge.” (John Naisbitt author of ‘Megatrends’)

Applications: Retail Use data mining to understand customers’ wants, needs, and preferences Based on this information, deliver timely, personalized promotional offers

Applications: Insurance Leverage data and text mining to speed claims processing and help reduce fraud

Applications: Manufacturing Model historical production and quality data to reduce development time and improve quality of production processes

Applications: Telecom Use data mining to identify appropriate customer segments for new marketing initiatives Predict likelihood of customer churn and target those likely to leave with retention campaigns

Metaphor: Data Mining and Gold Mining

Data Mining and Knowledge Discovery Data mining is the process of discovery of interesting, meaningful and actionable patterns hidden in large amounts of data (Han and Kamber 2006) Knowledge Discovery (KD) as a more inclusive term Knowledge Discovery using a combination of artificial and human intelligence Data → Information → Knowledge

Data Mining and Statistics No hypotheses are needed Can find patterns in very large amounts of data Uses all the data available Terminology used: field, record, supervised learning, unsupervised learning Statistics Uses Hypothesis testing Techniques are not suitable for large datasets Relies on sampling Terminology used: variable, observation, analysis of dependence, analysis of interdependence

Deal with Numerophobia SPSS Inc. Deal with Numerophobia Emphasize Differences between Statistics and Data Mining to advantage (no probability distributions) Use a math primer for numerically challenged students http://www.youtube.com/watch?v=nRKzseCLja8 Copyright 2003-4, SPSS Inc. 12

Introduce Software to Students Clementine 12.0: Student Version (Clementine GradPack) is of enterprise strength Student License extends for about eight months beyond course completion date Directly address cost concerns by discussing value of “investment”

Who was Clementine? Daughter of a miner during the 1849 California Gold Rush who developed a reputation… “In a cavern, in a canyon, Excavating for a mine Dwelt a miner, forty niner, And his daughter Clementine…” http://www.empire.k12.ca.us/capistrano/mike/capmusic/the_wild_west/gold_rush/clemtine.mid

Introduce Software to Students Visual approach makes model building an art form Concept of “data flow” enables building of multiple models Point-and-click model building (no manual coding) Comprehensive portfolio of models for the Business Analyst as well as the Technical Expert

Clementine Basics: Building a Model

Clementine Basics: Select a Data Source Adding a node in Clementine is relatively simple: just select the node you want from the palette menus and drag and drop it on to the canvas.

Clementine Basics: Select a Data File

Clementine Basics: Select a Data File

Clementine Basics: Read a Data File

Clementine Basics: Select Fields

Clementine Basics: Define Field Types

Clementine Basics: Visualize Data Create tables and charts for means, ranges, and correlations of all variables

Clementine Basics: Visualize Data Examine associations among variables using visual displays

Clementine Basics: Select Target and Predictors

Clementine Basics: Execute Model

Clementine Basics: Review Model Results

Building Models in Clementine Up sell/ Cross sell Creating business rules for Up sell & Cross Sell Identify and target likely churn candidates, and create retention offerings to decrease their likelihood to churn Models Customer Churn Propensity to respond/purchase Develop models on desired purchase behavior, and target candidates that are most likely to respond

A Typical Clementine Model

Modeling Approaches But can also use expert capabilities (advanced user) Can use auto “c.h.d” settings (beginning user)

Data Mining Procedures Estimation Prediction Classification Clustering Affinity/Association

Specific Methodologies Available Estimation & Prediction: - Neural networks Classification: - Decision trees (2 types)

Specific Methodologies Available Clustering: - K-means - Kohonen networks Affinity/Association: - Association rules (2 types)

Positioning the Course Business Applications Theory and Concepts Clementine Models Focus of the Course

A Typical Class Discuss business applications of methodology based on brief articles from the business press (30 minutes) Present theory and concepts (30 minutes) Build a Clementine model for students (30 minutes) Ask students build a Clementine model (30 minutes) Discuss homework assignment (15 minutes) Students complete a homework assignment after class (requires three hours)

Discuss Business Applications “Wal-Mart's next competitive weapon is advanced data mining, which it will use to forecast, replenish and merchandise on a micro scale By analyzing years' worth of sales data--and then cranking in variables such as the weather and school schedules--the system could predict the optimal number of cases of Gatorade, in what flavors and sizes, a store in Laredo, Texas, should have on hand the Friday before Labor Day Then, if the weather forecast suddenly called for temperatures 5 hotter than last year, the delivery truck would automatically show up with more” From: “Can Wal-Mart Get Any Bigger,” Time, 13 January, 2003

Present Theory and Concepts ? Are window cleaning products also purchased when detergents and orange juice are bought together? ? Where should detergents be placed in the Store to maximize their sales? Is soda typically purchased with bananas? Does the brand of soda make a difference? ? How are the demographics of the neighborhood affecting what Customers are buying? ? From: Data Mining Techniques by Michael J. A. Berry and Gordon S. Linoff

Present Theory and Concepts Start with a record of past purchase transactions that link items purchased together From: Data Mining Techniques by Michael J. A. Berry and Gordon S. Linoff

Present Theory and Concepts Create a co-occurrence matrix that pairs items purchased together in the form of a table The co-occurrence matrix shows the number of times the “row” item was purchased with the “column” item (note that the matrix is symmetrical) From: Data Mining Techniques by Michael J. A. Berry and Gordon S. Linoff

Present Theory and Concepts Customer Items Purchased 1 OJ, soda 2 Milk, OJ, window cleaner 3 OJ, detergent 4 OJ, detergent, soda 5 Window cleaner, soda Rule Support = Percentage of transactions with both the items of interest What is the Support for the rule “If Soda, then OJ” ? OJ and Soda are purchased together in 2 out of 5 transactions Hence Support is 40% What is the support for the rule “If OJ, then Soda” ? Still 40% From: Data Mining Techniques by Michael J. A. Berry and Gordon S. Linoff

Present Theory and Concepts Customer Items Purchased 1 OJ, soda 2 Milk, OJ, window cleaner 3 OJ, detergent 4 OJ, detergent, soda 5 Window cleaner, soda Confidence = Ratio of the number of transactions with both the items of interest to the number of transactions with the “If” items What is the Confidence for “If Soda, then OJ” ? 2 out of 3 soda purchase transactions also include OJ Hence Confidence is 66.66% What is the Confidence for “If OJ, then Soda” ? 2 out of 4 OJ purchase transactions also include soda Hence Confidence is 50% From: Data Mining Techniques by Michael J. A. Berry and Gordon S. Linoff

Present Theory and Concepts Support (Prevalence): Percentage of records in the dataset that match the antecedent Support = p (antecedent) From: Data Mining Techniques by Michael J. A. Berry and Gordon S. Linoff

p (antecedent and consequent) Present Theory and Concepts Confidence (Predictability): Percentage of records in the dataset that match the antecedent and also match the consequent Confidence = p (antecedent and consequent) p (antecedent) From: Data Mining Techniques by Michael J. A. Berry and Gordon S. Linoff

Present Theory and Concepts Lift (Improvement): How much better a rule is at predicting the consequent than chance alone? Lift = A rule is only useful if Lift is > 1 confidence p (consequent) From: Data Mining Techniques by Michael J. A. Berry and Gordon S. Linoff

Build a Clementine Model

Homework Assignment Conduct a Market Basket Analysis on the dataset using both the Apriori and GRI modeling nodes in Clementine. Reconcile the association rules discovered as a result of the Apriori and GRI modeling nodes. Provide a narrative description that attempts to explain the convergence (or lack thereof) between the results obtained from the two modeling nodes.  Select those association rules discovered during your Market Basket Analysis that would make the most intuitive sense to the category managers involved and create demographic profiles of shoppers who appear to fit those rules.

Instructor’s Laptop Screen

Student’s Laptop Screen

Resources “Data Mining Techniques” by Michael J. A. Berry and Gordon S. Linoff (second edition), Wiley, 2004 “Discovering Knowledge in Data” by Daniel T. Larose, Wiley, 2005 “Making Sense of Statistics” by Fred Pyrczak (fourth edition), Pyrczak Publishing, 2006 Recent articles from the business press identified using the “Factiva” database and “data mining” “predictive analytics” as search keywords www.kdnuggets.com

Thank you for your time and participation SPSS Inc. Thank you for your time and participation Questions? Additional Information: Please see my syllabus at http://www.spss.com/academic/educator/curriculum/index.htm?tab=1 Comments and suggestions are welcome. Please send them to: Girish.Punj@business.uconn.edu Copyright 2003-4, SPSS Inc.