Information Systems Data Analysis – Association Mining Prof. Les Sztandera.

Slides:



Advertisements
Similar presentations
Association rule mining
Advertisements

Association Rules Mining
CSE 634 Data Mining Techniques
Data Mining Techniques Association Rule
Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
FUNGSI MAYOR Assosiation. What Is Association Mining? Association rule mining: –Finding frequent patterns, associations, correlations, or causal structures.
Pertemuan XIV FUNGSI MAYOR Assosiation. What Is Association Mining? Association rule mining: –Finding frequent patterns, associations, correlations, or.
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.
Advanced Topics in Data Mining: Association Rules
Advanced Topics in Data Mining
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rules Mining Part III. Multiple-Level Association Rules Items often form hierarchy. Items at the lower level are expected to have lower support.
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.
Data Warehousing/Mining 1 Data Warehousing/Mining Comp 150 DW Chapter 6: Mining Association Rules in Large Databases Instructor: Dan Hebert.
Data Mining Association Analysis: Basic Concepts and Algorithms
Mining Association Rules in Large Databases
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
Asssociation Rules Prof. Sin-Min Lee Department of Computer Science.
Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University Ch 2 Discovering Association Rules COMP 578 Data Warehousing & Data Mining.
1 Association Rules & Correlations zBasic concepts zEfficient and scalable frequent itemset mining methods: yApriori, and improvements yFP-growth zRule.
Mining Association Rules
Mining Frequent Patterns I: Association Rule Discovery Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Mining Association Rules
Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
Chapter 5 Mining Association Rules with FP Tree Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Association Rules presented by Zbigniew W. Ras *,#) *) University of North Carolina – Charlotte #) Warsaw University of Technology.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
Ch5 Mining Frequent Patterns, Associations, and Correlations
1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining III COMP Seminar GNET 713 BCB Module Spring 2007.
Fast Algorithms For Mining Association Rules By Rakesh Agrawal and R. Srikant Presented By: Chirayu Modi.
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Data Mining Find information from data data ? information.
Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling.
Association Rule Mining
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
UNIT-5 Mining Association Rules in Large Databases LectureTopic ********************************************** Lecture-27Association rule mining Lecture-28Mining.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Data Mining  Association Rule  Classification  Clustering.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
2016年6月14日星期二 2016年6月14日星期二 2016年6月14日星期二 Data Mining: Concepts and Techniques1 Mining Frequent Patterns, Associations, and Correlations (Chapter 5)
Mining Association Rules in Large Database This work is created by Dr. Anamika Bhargava, Ms. Pooja Kaul, Ms. Priti Bali and Ms. Rajnipriya Dhawan and licensed.
Data Mining Find information from data data ? information.
UNIT-5 Mining Association Rules in Large Databases
Mining Association Rules
Association Rules.
I. Association Market Basket Analysis.
©Jiawei Han and Micheline Kamber
Mining Association Rules in Large Databases
Association Rule Mining
Analysis of Customer Behavior and Service Modeling
©Jiawei Han and Micheline Kamber
I. Association Market Basket Analysis.
Presentation transcript:

Information Systems Data Analysis – Association Mining Prof. Les Sztandera

Information Systems Data Analysis - Association Mining Association rule mining: – Finding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transactional databases, relational databases, and other information repositories. Applications: – Basket data analysis, cross-marketing, catalog design, loss-leader analysis, clustering, classification. Examples. – Rule form: “Body  ead [support, confidence]”. – buys(x, “diapers”)  buys(x, “beer”) [0.5%, 60%] – major(x, “Business”) ^ takes(x, “MIS”)  grade(x, “A”) [1%, 75%]

Information Systems Association Rule: Basic Concepts Given: (1) database of transactions, (2) each transaction is a list of items (purchased by a customer in a visit) Find: All rules that correlate the presence of one set of items with that of another set of items –E.g., 98% of people who purchase tires and auto accessories also get automotive services done (confidence = 98%) Applications –  Maintenance Agreement (What the store should do to boost Maintenance Agreement sales) –Home Electronics  (What other products should the store stock up?) –Attached mailing in direct marketing –Detecting “ping-ponging” of patients, faulty “collisions”

Information Systems Loss-Leader Strategy (1) A business strategy in which a business offers a product or service at a price that is not profitable for the sake of offering another product/service at a greater profit or to attract new customers. This is a common practice when a business first enters a market. A loss leader introduces new customers to a service or product in the hope of building a customer base and securing future recurring revenue. The loss leader strategy is more than just a nifty business trick - it is a successful strategy if executed properly.

Information Systems Loss-Leader Strategy (2) A classic example is that of razor blades. Companies like Gillette essentially give their razor units away for free, knowing that customers will have to buy their replacement blades, which is where the company makes all of its profit. Another example is Microsoft's Xbox video game system, which was sold at a loss of more than $100 per unit to create more potential to profit from the sale of higher- margin video games.

Information Systems Cross-marketing (1) A cross-marketing or marketing cooperation is a partnership of at least two companies on the value chain level of marketing with the objective to tap the full potential of a market by bundling specific competences or resources.partnershipvalue chain marketingmarket competences

Information Systems Cross-marketing (2) An example of cross-marketing: Apple Inc. and Nike Inc. have formed a long term partnership to jointly develop and sell “Nike+iPod” products. The "Nike + iPod Sport Kit" links Nike+ products with Apples MP3- Player iPod nano, so that performance data such as distance, pace or burned calories can be displayed on the MP3-Player’s interface.Apple Inc.Nike Inc.Nike+iPod

Information Systems Classification and Clustering Classification – a process of finding models that describe classes or concepts, for the purpose of predicting a class of objects. Clustering - a process of finding models that describe clusters or concepts when the label of the class is unknown.

Information Systems Basket Data Analysis (1)

Information Systems Basket Analysis (2) Every extracted rule has Support and Confidence coefficients associated with it. Support (A => B) = (# of cases containing both A and B) / (total # of cases) Confidence (A => B) = (# of cases containing both A and B) / (# of cases containing A)

Information Systems Rule Measures: Support and Confidence Let the minimum support be 50%, and the minimum confidence 50%, then we have –A  C (50%, 66.6%) –C  A (50%, 100%) Customer buys diapers Customer buys both Customer buys beer

Information Systems Association Rule Mining Boolean vs. quantitative associations (Based on the types of values handled) – buys(x, “SQLServer”) ^ buys(x, “DMBook”)  buys(x, “DBMiner”) [0.2%, 60%] – age(x, “30..39”) ^ income(x, “42..48K”)  buys(x, “PC”) [1%, 75%] Single dimension vs. multiple dimensional associations (see ex. above) Single level vs. multiple-level analysis – What brands of beers are associated with what brands of diapers? Various extensions – Correlation, causality analysis Association does not necessarily imply correlation or causality – Maxpatterns and closed itemsets – Constraints enforced E.g., small sales (sum 1,000)?

Information Systems Mining Association Rules For rule A  C: support = support({A  C}) = 50% confidence = support({A  C})/support({A}) = 66.6% The A priori principle: Any subset of a frequent itemset must be frequent Min. support 50% Min. confidence 50%

Information Systems The a priori Algorithm — Example Database D Scan D C1C1 L1L1 L2L2 C2C2 C2C2 C3C3 L3L3

Information Systems Is a priori Fast Enough? — Performance Bottlenecks The core of the A priori algorithm: –Use frequent (k – 1)-item sets to generate candidate frequent k- item sets –Use database scan and pattern matching to collect counts for the candidate item sets The bottleneck of Apriori: candidate generation –Huge candidate sets: 10 4 frequent 1-itemset will generate 10 7 candidate 2-item sets To discover a frequent pattern of size 100, e.g., {a 1, a 2, …, a 100 }, one needs to generate  candidates. –Multiple scans of database: Needs (n +1 ) scans, n is the length of the longest pattern

Information Systems Presentation of Association Rules (Table Form )

Information Systems Visualization of Association Rule Using Plane Graph

Information Systems Visualization of Association Rule Using Rule Graph

Information Systems Multiple-Level Association Rules Items often form hierarchy Items at the lower level are expected to have lower support. Rules regarding itemsets at appropriate levels could be quite useful. Transaction database can be encoded based on dimensions and levels We can explore shared multi-level mining Food bread milk skim GiantAcme 2%white wheat

Information Systems Mining Multi-Level Associations A top_down, progressive deepening approach: – First find high-level strong rules: milk  bread [20%, 60%]. – Then find their lower-level “weaker” rules: 2% milk  wheat bread [6%, 50%]. Variations at mining multiple-level association rules. –Level-crossed association rules: 2% milk  Wonder wheat bread –Association rules with multiple, alternative hierarchies: 2% milk  Wonder bread

Information Systems Multi-level Association: Uniform Support vs. Reduced Support Uniform Support: the same minimum support for all levels –+ One minimum support threshold. No need to examine itemsets containing any item whose ancestors do not have minimum support. –– Lower level items do not occur as frequently. If support threshold too high  miss low level associations too low  generate too many high level associations Reduced Support: reduced minimum support at lower levels –There are 4 search strategies: Level-by-level independent Level-cross filtering by k-itemset Level-cross filtering by single item Controlled level-cross filtering by single item

Information Systems Uniform Support Multi-level mining with uniform support Milk [support = 10%] 2% Milk [support = 6%] Skim Milk [support = 4%] Level 1 min_sup = 5% Level 2 min_sup = 5% Back

Information Systems Reduced Support Multi-level mining with reduced support 2% Milk [support = 6%] Skim Milk [support = 4%] Level 1 min_sup = 5% Level 2 min_sup = 3% Back Milk [support = 10%]

Information Systems Multi-level Association: Redundancy Filtering Some rules may be redundant due to “ancestor” relationships between items. Example –milk  wheat bread [support = 8%, confidence = 70%] –2% milk  wheat bread [support = 2%, confidence = 72%] We say the first rule is an ancestor of the second rule. A rule is redundant if its support is close to the “expected” value, based on the rule’s ancestor.

Information Systems Criticism to Support and Confidence Example: Among 5000 students 3000 play basketball 3750 eat cereal 2000 both play basket ball and eat cereal –play basketball  eat cereal [40%, 66.7%] is misleading because the overall percentage of students eating cereal is 75% which is higher than 66.7%. –play basketball  not eat cereal [20%, 33.3%] is far more accurate, although with lower support and confidence