Download presentation
Presentation is loading. Please wait.
Published byNorman Osborne Modified over 9 years ago
1
Information Systems Data Analysis – Association Mining Prof. Les Sztandera
2
Information Systems Data Analysis - Association Mining Association rule mining: – Finding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transactional databases, relational databases, and other information repositories. Applications: – Basket data analysis, cross-marketing, catalog design, loss-leader analysis, clustering, classification. Examples. – Rule form: “Body ead [support, confidence]”. – buys(x, “diapers”) buys(x, “beer”) [0.5%, 60%] – major(x, “Business”) ^ takes(x, “MIS”) grade(x, “A”) [1%, 75%]
3
Information Systems Association Rule: Basic Concepts Given: (1) database of transactions, (2) each transaction is a list of items (purchased by a customer in a visit) Find: All rules that correlate the presence of one set of items with that of another set of items –E.g., 98% of people who purchase tires and auto accessories also get automotive services done (confidence = 98%) Applications – Maintenance Agreement (What the store should do to boost Maintenance Agreement sales) –Home Electronics (What other products should the store stock up?) –Attached mailing in direct marketing –Detecting “ping-ponging” of patients, faulty “collisions”
4
Information Systems Loss-Leader Strategy (1) A business strategy in which a business offers a product or service at a price that is not profitable for the sake of offering another product/service at a greater profit or to attract new customers. This is a common practice when a business first enters a market. A loss leader introduces new customers to a service or product in the hope of building a customer base and securing future recurring revenue. The loss leader strategy is more than just a nifty business trick - it is a successful strategy if executed properly.
5
Information Systems Loss-Leader Strategy (2) A classic example is that of razor blades. Companies like Gillette essentially give their razor units away for free, knowing that customers will have to buy their replacement blades, which is where the company makes all of its profit. Another example is Microsoft's Xbox video game system, which was sold at a loss of more than $100 per unit to create more potential to profit from the sale of higher- margin video games.
6
Information Systems Cross-marketing (1) A cross-marketing or marketing cooperation is a partnership of at least two companies on the value chain level of marketing with the objective to tap the full potential of a market by bundling specific competences or resources.partnershipvalue chain marketingmarket competences
7
Information Systems Cross-marketing (2) An example of cross-marketing: Apple Inc. and Nike Inc. have formed a long term partnership to jointly develop and sell “Nike+iPod” products. The "Nike + iPod Sport Kit" links Nike+ products with Apples MP3- Player iPod nano, so that performance data such as distance, pace or burned calories can be displayed on the MP3-Player’s interface.Apple Inc.Nike Inc.Nike+iPod
8
Information Systems Classification and Clustering Classification – a process of finding models that describe classes or concepts, for the purpose of predicting a class of objects. Clustering - a process of finding models that describe clusters or concepts when the label of the class is unknown.
9
Information Systems Basket Data Analysis (1)
10
Information Systems Basket Analysis (2) Every extracted rule has Support and Confidence coefficients associated with it. Support (A => B) = (# of cases containing both A and B) / (total # of cases) Confidence (A => B) = (# of cases containing both A and B) / (# of cases containing A)
11
Information Systems Rule Measures: Support and Confidence Let the minimum support be 50%, and the minimum confidence 50%, then we have –A C (50%, 66.6%) –C A (50%, 100%) Customer buys diapers Customer buys both Customer buys beer
12
Information Systems Association Rule Mining Boolean vs. quantitative associations (Based on the types of values handled) – buys(x, “SQLServer”) ^ buys(x, “DMBook”) buys(x, “DBMiner”) [0.2%, 60%] – age(x, “30..39”) ^ income(x, “42..48K”) buys(x, “PC”) [1%, 75%] Single dimension vs. multiple dimensional associations (see ex. above) Single level vs. multiple-level analysis – What brands of beers are associated with what brands of diapers? Various extensions – Correlation, causality analysis Association does not necessarily imply correlation or causality – Maxpatterns and closed itemsets – Constraints enforced E.g., small sales (sum 1,000)?
13
Information Systems Mining Association Rules For rule A C: support = support({A C}) = 50% confidence = support({A C})/support({A}) = 66.6% The A priori principle: Any subset of a frequent itemset must be frequent Min. support 50% Min. confidence 50%
14
Information Systems The a priori Algorithm — Example Database D Scan D C1C1 L1L1 L2L2 C2C2 C2C2 C3C3 L3L3
15
Information Systems Is a priori Fast Enough? — Performance Bottlenecks The core of the A priori algorithm: –Use frequent (k – 1)-item sets to generate candidate frequent k- item sets –Use database scan and pattern matching to collect counts for the candidate item sets The bottleneck of Apriori: candidate generation –Huge candidate sets: 10 4 frequent 1-itemset will generate 10 7 candidate 2-item sets To discover a frequent pattern of size 100, e.g., {a 1, a 2, …, a 100 }, one needs to generate 2 100 10 30 candidates. –Multiple scans of database: Needs (n +1 ) scans, n is the length of the longest pattern
16
Information Systems Presentation of Association Rules (Table Form )
17
Information Systems Visualization of Association Rule Using Plane Graph
18
Information Systems Visualization of Association Rule Using Rule Graph
19
Information Systems Multiple-Level Association Rules Items often form hierarchy Items at the lower level are expected to have lower support. Rules regarding itemsets at appropriate levels could be quite useful. Transaction database can be encoded based on dimensions and levels We can explore shared multi-level mining Food bread milk skim GiantAcme 2%white wheat
20
Information Systems Mining Multi-Level Associations A top_down, progressive deepening approach: – First find high-level strong rules: milk bread [20%, 60%]. – Then find their lower-level “weaker” rules: 2% milk wheat bread [6%, 50%]. Variations at mining multiple-level association rules. –Level-crossed association rules: 2% milk Wonder wheat bread –Association rules with multiple, alternative hierarchies: 2% milk Wonder bread
21
Information Systems Multi-level Association: Uniform Support vs. Reduced Support Uniform Support: the same minimum support for all levels –+ One minimum support threshold. No need to examine itemsets containing any item whose ancestors do not have minimum support. –– Lower level items do not occur as frequently. If support threshold too high miss low level associations too low generate too many high level associations Reduced Support: reduced minimum support at lower levels –There are 4 search strategies: Level-by-level independent Level-cross filtering by k-itemset Level-cross filtering by single item Controlled level-cross filtering by single item
22
Information Systems Uniform Support Multi-level mining with uniform support Milk [support = 10%] 2% Milk [support = 6%] Skim Milk [support = 4%] Level 1 min_sup = 5% Level 2 min_sup = 5% Back
23
Information Systems Reduced Support Multi-level mining with reduced support 2% Milk [support = 6%] Skim Milk [support = 4%] Level 1 min_sup = 5% Level 2 min_sup = 3% Back Milk [support = 10%]
24
Information Systems Multi-level Association: Redundancy Filtering Some rules may be redundant due to “ancestor” relationships between items. Example –milk wheat bread [support = 8%, confidence = 70%] –2% milk wheat bread [support = 2%, confidence = 72%] We say the first rule is an ancestor of the second rule. A rule is redundant if its support is close to the “expected” value, based on the rule’s ancestor.
25
Information Systems Criticism to Support and Confidence Example: Among 5000 students 3000 play basketball 3750 eat cereal 2000 both play basket ball and eat cereal –play basketball eat cereal [40%, 66.7%] is misleading because the overall percentage of students eating cereal is 75% which is higher than 66.7%. –play basketball not eat cereal [20%, 33.3%] is far more accurate, although with lower support and confidence
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.