EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.

Slides:



Advertisements
Similar presentations
Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
Advertisements

FUNGSI MAYOR Assosiation. What Is Association Mining? Association rule mining: –Finding frequent patterns, associations, correlations, or causal structures.
Pertemuan XIV FUNGSI MAYOR Assosiation. What Is Association Mining? Association rule mining: –Finding frequent patterns, associations, correlations, or.
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
MIS2502: Data Analytics Clustering and Segmentation.
MIS2502: Data Analytics Association Rule Mining. Uses What products are bought together? Amazon’s recommendation engine Telephone calling patterns Association.
Final Exam Review. Data Mining and Data Analytics Techniques Explain the three data analytics techniques we covered in the course Decision Trees, Clustering,
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Asssociation Rules Prof. Sin-Min Lee Department of Computer Science.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining.
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
Data Mining An Introduction.
SAS Homework 3 Review Association rules mining
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.
Knowledge Discovery and Data Mining Evgueni Smirnov.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
The Three Analytics Techniques. Decision Trees – Determining Probability.
1 What is Association Analysis: l Association analysis uses a set of transactions to discover rules that indicate the likely occurrence of an item based.
Frequent-Itemset Mining. Market-Basket Model A large set of items, e.g., things sold in a supermarket. A large set of baskets, each of which is a small.
Association Rule Mining
ASSOCIATION RULES (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
CLUSTERING AND SEGMENTATION MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
MIS2502: Data Analytics Advanced Analytics - Introduction.
Elsayed Hemayed Data Mining Course
CLUSTERING AND SEGMENTATION MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
Elective-I Examination Scheme- In semester Assessment: 30 End semester Assessment :70 Text Books: Data Mining Concepts and Techniques- Micheline Kamber.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
MIS2502: Data Analytics Association Rule Mining David Schuff
Introduction to Data Mining Mining Association Rules Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
MIS2502: Data Analytics Association Rule Mining Jeremy Shafer
MIS2502: Data Analytics Clustering and Segmentation Jeremy Shafer
Stats 202: Statistical Aspects of Data Mining Professor Rajan Patel
Unsupervised Learning
Data Mining – Association Rules
By Arijit Chatterjee Dr
MIS2502: Data Analytics Advanced Analytics - Introduction
The Shopping Basket Analysis Tool
Frequent Pattern Mining
Waikato Environment for Knowledge Analysis
MIS2502: Data Analytics Clustering and Segmentation
Exam #3 Review Zuyin (Alvin) Zheng.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
MIS2502: Data Analytics Clustering and Segmentation
Data Mining Association Analysis: Basic Concepts and Algorithms
MIS2502: Data Analytics Association Rule Mining
MIS2502: Data Analytics Association Rule Mining
MIS2502: Data Analytics Clustering and Segmentation
MIS2502: Review for Exam 3 Aaron Zhi Cheng
MIS2502: Data Analytics Clustering and Segmentation
MIS2502: Data Analytics Association Rule Learning
Association Analysis: Basic Concepts
Presentation transcript:

EXAM REVIEW MIS2502 Data Analytics

Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering

1. Using the Right Data Mining Technique You run an e store for xtreme sporting goods and have collected a year of sales data from your website. You’re looking to identify customer segments to create some better target marketing. On your e store you want to implement a feature that recommends other items to purchase based on items that other customers have purchased when buying You’ve had a coupon program in place for the past 6 months. You are now redesigning the reprogram to target only those people who are likely to use the coupon.

Understanding Descriptive Statistics What is the mean for DemAge ? What would it mean if min of DemAge was 0?

Navigating the Tree How many leaves ? Which group more likely to give gift? Those who’ve made < 2.5 gifts in the last month, or those who have made 2.5 or more? Describe donors least likely to give gift Why no “Time Since Last Gift” split under “Gift Amount Last < 7.5” ? What’s the probability that someone who has made less than 1 gifts in the last 36 months, and has median home value is 75K will make a gift? Read Validation Amount. ‘1’ is a positive outcome

What is Association Mining? Discovering interesting relationships between variables in large databases ( Find out which items predict the occurrence of other items Also known as “affinity analysis” or “market basket” analysis

Core idea: The itemset Itemset: A group of items of interest {Milk, Beer, Diapers} This itemset is a “3 itemset” because it contains…3 items! An association rule expresses related itemsets X  Y, where X and Y are two itemsets {Milk, Diapers}  {Beer} means “when you have milk and diapers, you also have beer) BasketItems 1 Bread, Milk 2 Bread, Diapers, Beer, Eggs 3 Milk, Diapers, Beer, Coke 4 Bread, Milk, Diapers, Beer 5 Bread, Milk, Diapers, Coke

Support Support count (  ) Frequency of occurrence of an itemset  {Milk, Beer, Diapers} = 2 (i.e., it’s in baskets 4 and 5) Support (s) Fraction of transactions that contain all itemsets in the relationship X  Y s({Milk, Diapers, Beer}) = 2/5 = 0.4 You can calculate support for both X and Y separately Support for X = 3/5 = 0.6; Support for Y = 3/5 = 0.6 BasketItems 1 Bread, Milk 2 Bread, Diapers, Beer, Eggs 3 Milk, Diapers, Beer, Coke 4 Bread, Milk, Diapers, Beer 5 Bread, Milk, Diapers, Coke X Y 2 baskets have milk, beer, and diapers 5 baskets total

Confidence Confidence is the strength of the association Measures how often items in Y appear in transactions that contain X BasketItems 1 Bread, Milk 2 Bread, Diapers, Beer, Eggs 3 Milk, Diapers, Beer, Coke 4 Bread, Milk, Diapers, Beer 5 Bread, Milk, Diapers, Coke This says 67% of the times when you have milk and diapers in the itemset you also have beer! c must be between 0 and 1 1 is a complete association 0 is no association

Lift Takes into account how co-occurrence differs from what is expected by chance i.e., if items were selected independently from one another Based on the support metric Support for total itemset X and Y Support for X times support for Y Independent Events Two events are independent if the occurrence of one does not change the probability of the other occurring. If events are independent, then the probability of them both occurring is the product of the probabilities of each occurring. Specific Multiplication Rule Only valid for independent events P(A and B) = P(A) * P(B)

Process Set rule thresholds Define Item Sets Read through Item Sets, create list of all possible association rules (X => Y) for the Item Sets Compute Support, Confidence and Lift for each Rule Drop those that don’t meet thresholds

Evaluating Associative Rules Output What product is most likely to be bought if someone buys a Pencil? What stat did you use to answer this ? Why might the highest confidence item not have the highest lift? What might you recommend to a store manager given that the lift for PhotoProcessing>Magazine is 1.17

What is Cluster Analysis? Grouping data so that elements in a group will be Similar (or related) to one another Different (or unrelated) from elements in other groups Takashi_Saito gif Distance within clusters is minimized Distance between clusters is maximized

Applications Understanding Group related documents for browsing Create groups of similar customers Discover which stocks have similar price fluctuations Summarization Reduce the size of large data sets Those similar groups can be treated as a single data point

Process Choose K clusters Select K points as initial centroids Assign all points to clusters based on distance Recompute the centroid of each cluster Did the center change? DONE! Yes No The K-means algorithm is one method for doing partitional clustering

Clustering Output

Some Clustering Questions Using the Mean Statistics, which cluster (identified by segment ID) has the highest cohesion? (write the segment number: 1, 2, 3, or 4) Using the Segment Profile plot, is the Original Jeans Sales of segment 2 (the first one) lower or higher than the average over the entire population? (write “lower” or “higher”) Using the Segment Profile plot, is the Leisure Jeans Sales of segment 4 (the second one) lower or higher than the average over the entire population? (write “lower” or “higher”) As a general rule, if we increase the number of clusters, is the cohesion within clusters likely to increase or decrease? (write “increase” or “decrease”) As a general rule, if we increase the number of clusters, is the separation between clusters likely to increase or decrease? (write “increase” or “decrease”)