Download presentation
Presentation is loading. Please wait.
1
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University
2
Class Topics Introduction Decision Functions Midterm One Midterm Two Data Mining Project Presentations Introduction Decision Functions Cluster Analysis Statistical Decision Theory Feature Selection Machine Learning Neural Nets
3
Review Data Mining Example Preprocessing Data Preprocessing Tasks
4
Review – What is Data Mining? It is a method to get beyond the “tip of the iceberg” Data Mining/ Knowledge Discovery in Databases/ Data Archeology/ Data Dredging Information Available from a database
5
Review – Data Preprocessing Data preparation is a big issue for both warehousing and mining Data preparation includes –Data cleaning and data integration –Data reduction and feature selection –Discretization A lot a methods have been developed but still an active area of research
6
OUTLINE Frequent Pattern Mining Association Rule Mining Algorithms
7
Frequent Pattern Mining
8
What is Frequent Pattern Mining? What is a frequent pattern? –Pattern (set of items, sequence, etc.) that occurs together frequently in a database Frequent pattern: an important form of regularity –What products were often purchased together? — beers and diapers! –What are the consequences of a hurricane? –What is the next target after buying a PC?
9
Applications Market Basket Analysis –* Maintenance Agreement What the store should do to boost Maintenance Agreement sales –Home Electronics * What other products should the store stocks up on if the store has a sale on Home Electronics Attached mailing in direct marketing Detecting “ping-pong”ing of patients transaction: patient item: doctor/clinic visited by a patient support of a rule: number of common patients
10
Frequent Pattern Mining Methods Association analysis – Basket data analysis, cross-marketing, catalog design, loss-leader analysis, text database analysis –Correlation or causality analysis Clustering Classification – Association-based classification analysis Sequential pattern analysis – Web log sequence, DNA analysis, etc.
11
Association Rule Mining
12
Given –A database of customer transactions –Each transaction is a list of items (purchased by a customer in a visit) Find all rules that correlate the presence of one set of items with that of another set of items –Example: 98% of people who purchase tires and auto accessories also get automotive services done –Any number of items in the consequent/antecedent of rule –Possible to specify constraints on rules (e.g., find only rules involving Home Laundry Appliances).
13
Basic Concepts Rule form: “A [support s, confidence c]”. Support: usefulness of discovered rules Confidence: certainty of the detected association Rules that satisfy both min_sup and min_conf are called strong. Examples: – buys(x, “diapers”) buys(x, “beers”) [0.5%, 60%] – age(x, “30-34”) ^ income(x,“42K-48K”) buys(x, “high resolution TV”) [2%,60%] – major(x, “CS”) ^ takes(x, “DB”) grade(x, “A”) [1%, 75%]
14
Rule Measures Find all the rules X & Y Z with minimum confidence and support –support, s, probability that a transaction contains {X, Y, Z} –confidence, c, conditional probability that a transaction having {X, Y} also contains Z. Customer buys diaper Customer buys beer Customer buys both
15
Example: Support Given the following data base: For the rule A => C, support is the probability that a transaction contains both A and C 2000 A,B,C 1000 A,C 2 out of 4 transactions contain both A and C so the support is 50%
16
Example: Confidence Given the same database: For the rule A => C, confidence is the conditional probability that a transaction which contains A also contains C 2000 A,B,C 1000 A,C 2 out of the 3 transactions which contain A also have C so the confidence is 66%
17
Algorithms
18
Apriori Algorithm The Apriori method: –Proposed by Agrawal & Srikant 1994 –A similar level-wise algorithm by Mannila et al. 1994 Major idea: –A subset of a frequent itemset must be frequent E.g., if {beer, diaper, nuts} is frequent, {beer, diaper} must be. If anyone is infrequent, its superset cannot be! –A powerful, scalable candidate set pruning technique: It reduces candidate k-itemsets dramatically (for k > 2)
19
Example Min. support 50% Min. confidence 50% Given:
20
Aprior Process ÀFind the frequent itemsets: the sets of items that have minimum support (Apriori) uA subset of a frequent itemset must also be a frequent itemset, i.e., if {A B} is a frequent itemset, both {A} and {B} should be a frequent itemset uIteratively find frequent itemsets with cardinality from 1 to k (k-itemset) ÁUse the frequent itemsets to generate association rules.
21
Aprior Algorithm Join Step C k is generated by joining L k-1 with itself Prune Step Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemset, hence should be removed. (C k : Candidate itemset of size k) (L k : frequent itemset of size k)
22
Example Database D Scan D C1C1 L2L2 C2C2 C2C2 C3C3 L3L3 L1L1 Min. support 50% Min. confidence 50% Given:
23
Generating the Candidate Set In the example, how do you go from L to C? L2L2 C3C3 For example, if L 3 ={abc, abd, acd, ace, bcd} Self-joining: L 3 *L 3 abcd from abc and abd acde from acd and ace Pruning: acde is removed because ade is not in L 3 C 4 ={abcd}
24
Generating Strong Association Rules Confidence(A B) = Prob(B|A) = support(A B)/support(A) Example: Database D L3L3 Possible Rules: 2 and 3 => 5confidence 2/2 = 100% 2 and 5 => 3confidence 2/3 = 66% 3 and 5 => 2confidence 2/2 = 100% 2 => 3 and 5confidence 2/3 = 66% 3 => 2 and 5confidence 2/3 = 66% 5 => 3 and 2confidence 2/3 = 66%
25
Possible Quiz What is a frequent pattern? Define support and confidence. What is the basic principle of the Aprior algorithm?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.