Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.

Slides:



Advertisements
Similar presentations
Data Mining Techniques Association Rule
Advertisements

3/3/20081 Data Warehousing and Data Mining. 3/3/20082 Why Data Mining? — Potential Applications Database analysis and decision support –Market analysis.
LOGO Association Rule Lecturer: Dr. Bo Yuan
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Chapter 5: Mining Frequent Patterns, Association and Correlations
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining: Concepts and Techniques (2nd ed.) — Chapter 5 —
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Association Rule Mining Instructor Qiang Yang Slides from Jiawei Han and Jian Pei And from Introduction to Data Mining By Tan, Steinbach, Kumar.
Business Systems Intelligence: 4. Mining Association Rules Dr. Brian Mac Namee (
1 Association Rule Mining Instructor Qiang Yang Thanks: Jiawei Han and Jian Pei.
Chapter 4: Mining Frequent Patterns, Associations and Correlations
Mining Association Rules in Large Databases
Mining Association Rules in Large Databases
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
Mining Association Rules
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
Mining Frequent Patterns I: Association Rule Discovery Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Mining Association Rules
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 6 —
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
1 Data Mining and Warehousing: Session 6 Association Analysis Jia-wei Han
Data Mining-Knowledge Presentation 2 Prof. Sin-Min Lee.
Data Warehousing 資料倉儲 Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management, Tamkang University Dept. of Information ManagementTamkang.
Information Systems Data Analysis – Association Mining Prof. Les Sztandera.
Fast Algorithms For Mining Association Rules By Rakesh Agrawal and R. Srikant Presented By: Chirayu Modi.
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Data Mining Find information from data data ? information.
Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Mining Frequent Patterns. What Is Frequent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs.
Chapter 6: Mining Frequent Patterns, Association and Correlations
Dept. of Information Management, Tamkang University
Data Mining  Association Rule  Classification  Clustering.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
COMP53311 Association Rule Mining Prepared by Raymond Wong Presented by Raymond Wong
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Mining Association Rules in Large Database This work is created by Dr. Anamika Bhargava, Ms. Pooja Kaul, Ms. Priti Bali and Ms. Rajnipriya Dhawan and licensed.
Data Mining Find information from data data ? information.
Data Mining: Concepts and Techniques
Association rule mining
Mining Association Rules
I. Association Market Basket Analysis.
©Jiawei Han and Micheline Kamber
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Mining Association Rules in Large Databases
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Analysis of Customer Behavior and Service Modeling
©Jiawei Han and Micheline Kamber
I. Association Market Basket Analysis.
Department of Computer Science National Tsing Hua University
Association Rule Mining
Association Analysis: Basic Concepts
Presentation transcript:

Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University

Class Topics Introduction Decision Functions Midterm One Midterm Two Data Mining Project Presentations Introduction Decision Functions Cluster Analysis Statistical Decision Theory Feature Selection Machine Learning Neural Nets

Review Data Mining Example Preprocessing Data Preprocessing Tasks

Review – What is Data Mining? It is a method to get beyond the “tip of the iceberg” Data Mining/ Knowledge Discovery in Databases/ Data Archeology/ Data Dredging Information Available from a database

Review – Data Preprocessing Data preparation is a big issue for both warehousing and mining Data preparation includes –Data cleaning and data integration –Data reduction and feature selection –Discretization A lot a methods have been developed but still an active area of research

OUTLINE Frequent Pattern Mining Association Rule Mining Algorithms

Frequent Pattern Mining

What is Frequent Pattern Mining? What is a frequent pattern? –Pattern (set of items, sequence, etc.) that occurs together frequently in a database Frequent pattern: an important form of regularity –What products were often purchased together? — beers and diapers! –What are the consequences of a hurricane? –What is the next target after buying a PC?

Applications Market Basket Analysis –*  Maintenance Agreement What the store should do to boost Maintenance Agreement sales –Home Electronics  * What other products should the store stocks up on if the store has a sale on Home Electronics Attached mailing in direct marketing Detecting “ping-pong”ing of patients transaction: patient item: doctor/clinic visited by a patient support of a rule: number of common patients

Frequent Pattern Mining Methods Association analysis – Basket data analysis, cross-marketing, catalog design, loss-leader analysis, text database analysis –Correlation or causality analysis Clustering Classification – Association-based classification analysis Sequential pattern analysis – Web log sequence, DNA analysis, etc.

Association Rule Mining

Given –A database of customer transactions –Each transaction is a list of items (purchased by a customer in a visit) Find all rules that correlate the presence of one set of items with that of another set of items –Example: 98% of people who purchase tires and auto accessories also get automotive services done –Any number of items in the consequent/antecedent of rule –Possible to specify constraints on rules (e.g., find only rules involving Home Laundry Appliances).

Basic Concepts Rule form: “A  [support s, confidence c]”. Support: usefulness of discovered rules Confidence: certainty of the detected association Rules that satisfy both min_sup and min_conf are called strong. Examples: – buys(x, “diapers”)  buys(x, “beers”) [0.5%, 60%] – age(x, “30-34”) ^ income(x,“42K-48K”)  buys(x, “high resolution TV”) [2%,60%] – major(x, “CS”) ^ takes(x, “DB”)  grade(x, “A”) [1%, 75%]

Rule Measures Find all the rules X & Y  Z with minimum confidence and support –support, s, probability that a transaction contains {X, Y, Z} –confidence, c, conditional probability that a transaction having {X, Y} also contains Z. Customer buys diaper Customer buys beer Customer buys both

Example: Support Given the following data base: For the rule A => C, support is the probability that a transaction contains both A and C 2000 A,B,C 1000 A,C 2 out of 4 transactions contain both A and C so the support is 50%

Example: Confidence Given the same database: For the rule A => C, confidence is the conditional probability that a transaction which contains A also contains C 2000 A,B,C 1000 A,C 2 out of the 3 transactions which contain A also have C so the confidence is 66%

Algorithms

Apriori Algorithm The Apriori method: –Proposed by Agrawal & Srikant 1994 –A similar level-wise algorithm by Mannila et al Major idea: –A subset of a frequent itemset must be frequent E.g., if {beer, diaper, nuts} is frequent, {beer, diaper} must be. If anyone is infrequent, its superset cannot be! –A powerful, scalable candidate set pruning technique: It reduces candidate k-itemsets dramatically (for k > 2)

Example Min. support 50% Min. confidence 50% Given:

Aprior Process ÀFind the frequent itemsets: the sets of items that have minimum support (Apriori) uA subset of a frequent itemset must also be a frequent itemset, i.e., if {A  B} is a frequent itemset, both {A} and {B} should be a frequent itemset uIteratively find frequent itemsets with cardinality from 1 to k (k-itemset) ÁUse the frequent itemsets to generate association rules.

Aprior Algorithm Join Step C k is generated by joining L k-1 with itself Prune Step Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemset, hence should be removed. (C k : Candidate itemset of size k) (L k : frequent itemset of size k)

Example Database D Scan D C1C1 L2L2 C2C2 C2C2 C3C3 L3L3 L1L1 Min. support 50% Min. confidence 50% Given:

Generating the Candidate Set In the example, how do you go from L to C? L2L2 C3C3 For example, if L 3 ={abc, abd, acd, ace, bcd} Self-joining: L 3 *L 3 abcd from abc and abd acde from acd and ace Pruning: acde is removed because ade is not in L 3 C 4 ={abcd}

Generating Strong Association Rules Confidence(A  B) = Prob(B|A) = support(A  B)/support(A) Example: Database D L3L3 Possible Rules: 2 and 3 => 5confidence 2/2 = 100% 2 and 5 => 3confidence 2/3 = 66% 3 and 5 => 2confidence 2/2 = 100% 2 => 3 and 5confidence 2/3 = 66% 3 => 2 and 5confidence 2/3 = 66% 5 => 3 and 2confidence 2/3 = 66%

Possible Quiz What is a frequent pattern? Define support and confidence. What is the basic principle of the Aprior algorithm?