Association Analysis (Data Engineering). Type of attributes in assoc. analysis Association rule mining assumes the input data consists of binary attributes.

Slides:



Advertisements
Similar presentations
Association Rule and Sequential Pattern Mining for Episode Extraction Jonathan Yip.
Advertisements

Data Mining Techniques Association Rule
Mining Multiple-level Association Rules in Large Databases
Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
Association Mining Data Mining Spring Transactional Database Transaction – A row in the database i.e.: {Eggs, Cheese, Milk} Transactional Database.
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Analysis (4) (Evaluation). Evaluation of Association Patterns Association analysis algorithms have the potential to generate a large number.
Data Mining Association Analysis: Basic Concepts and Algorithms
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Fast Algorithms for Association Rule Mining
Data Mining Association Rules: Advanced Concepts and Algorithms
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
1 Synthesizing High-Frequency Rules from Different Data Sources Xindong Wu and Shichao Zhang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.
Association Analysis (5) (Mining Word Associations)
Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.
Apriori algorithm Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK Presentation Lauri Lahti.
Data Mining Association Rules: Advanced Concepts and Algorithms
Advanced Association Rule Mining and Beyond. Continuous and Categorical Attributes Example of Association Rule: {Number of Pages  [5,10)  (Browser=Mozilla)}
Ch5 Mining Frequent Patterns, Associations, and Correlations
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 By Gun Ho Lee Intelligent Information Systems Lab.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
1 FINDING FUZZY SETS FOR QUANTITATIVE ATTRIBUTES FOR MINING OF FUZZY ASSOCIATE RULES By H.N.A. Pham, T.W. Liao, and E. Triantaphyllou Department of Industrial.
Association rule mining Goal: Find all rules that satisfy the user-specified minimum support (minsup) and minimum confidence (minconf). Assume all data.
Association rule mining Goal: Find all rules that satisfy the user-specified minimum support (minsup) and minimum confidence (minconf). Assume all data.
1 1 MSCIT 5210: Knowledge Discovery and Data Mining Acknowledgement: Slides modified by Dr. Lei Chen based on the slides provided by Tan, Steinbach, Kumar.
1 What is Association Analysis: l Association analysis uses a set of transactions to discover rules that indicate the likely occurrence of an item based.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Associations and Frequent Item Analysis. 2 Outline  Transactions  Frequent itemsets  Subset Property  Association rules  Applications.
Charles Tappert Seidenberg School of CSIS, Pace University
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.
Data Analytics CMIS Short Course part II Day 1 Part 1: Clustering Sam Buttrey December 2015.
Association Rules Carissa Wang February 23, 2010.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
Elective-I Examination Scheme- In semester Assessment: 30 End semester Assessment :70 Text Books: Data Mining Concepts and Techniques- Micheline Kamber.
DATA MINING: ASSOCIATION ANALYSIS (2) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Mining Association Rules in Large Database This work is created by Dr. Anamika Bhargava, Ms. Pooja Kaul, Ms. Priti Bali and Ms. Rajnipriya Dhawan and licensed.
Mining Association Rules: Advanced Concepts and Algorithms
Data Mining Association Rules: Advanced Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Lecture Notes for Chapter 2 Introduction to Data Mining
Frequent Pattern Mining
Association Analysis: Advance Concepts
Market Basket Analysis and Association Rules
Association Rule Mining
Data Mining II: Association Rule mining & Classification
Data Mining Association Rules: Advanced Concepts and Algorithms
Market Basket Analysis and Association Rules
Data Transformations targeted at minimizing experimental variance
15-826: Multimedia Databases and Data Mining
Data Pre-processing Lecture Notes for Chapter 2
Association Analysis: Basic Concepts
Presentation transcript:

Association Analysis (Data Engineering)

Type of attributes in assoc. analysis Association rule mining assumes the input data consists of binary attributes called items. –The presence of an item in a transaction is also assumed to be more important than its absence. –As a result, an item is treated as an asymmetric binary attribute. Now we extend the formulation to data sets with symmetric binary, categorical, and continuous attributes.

Type of attributes Symmetric binary attributes –Gender –Computer at Home –Chat Online –Shop Online –Privacy Concerns Nominal attributes –Level of Education –State Example of rules: {Shop Online= Yes}  {Privacy Concerns = Yes}. This rule suggests that most Internet users who shop online are concerned about their personal privacy.

Transforming attributes into Asymmetric Binary Attributes Create a new item for each distinct attribute-value pair. E.g., the nominal attribute Level of Education can be replaced by three binary items: –Education = College –Education = Graduate –Education = High School Binary attributes such as Gender are converted into a pair of binary items –Male –Female

Data after binarizing attributes into “items”

Handling Continuous Attributes Solution: Discretize Example of rules: –Age  [21,35)  Salary  [70k,120k)  Buy –Salary  [70k,120k)  Buy  Age:  =28,  =4 Of course discretization isn’t always easy. –If intervals too large may not have enough confidence Age  [12,36)  Chat Online = Yes (s = 30%, c = 57.7%) (minconf=60%) –If intervals too small may not have enough support Age  [16,20)  Chat Online = Yes (s = 4.4%, c = 84.6%) (minsup=15%)

Statistics-based quantitative association rules Salary  [70k,120k)  Buy  Age:  =28,  =4 Generated as follows: Specify the target attribute (e.g. Age). Withhold target attribute, and “itemize” the remaining attributes. Apply algorithms such as Apriori or FP-growth to extract frequent itemsets from the itemized data. –Each frequent itemset identifies an interesting segment of the population. Derive a rule for each frequent itemset. –E.g., the preceding rule is obtained by averaging the age of Internet users who support the frequent itemset {Annual Income> $100K, Shop Online = Yes} Remark: Notion of confidence is not applicable to such rules.

Concept Hierarchies

Multi-level Association Rules Why should we incorporate a concept hierarchy? –Rules at lower levels may not have enough support to appear in any frequent itemsets –Rules at lower levels of the hierarchy are overly specific e.g., skim milk  white bread, 2% milk  wheat bread, skim milk  wheat bread, etc. are all indicative of association between milk and bread

Multi-level Association Rules How do support and confidence vary as we traverse the concept hierarchy? –If X is the parent item for both X1 and X2, and they are the only children, then  (X) ≤  (X1) +  (X2) (Why?) –Because X1, and X2 might appear in the same transactions. –If  (X1  Y1) ≥ minsup, and X is parent of X1, Y is parent of Y1 then  (X  Y1) ≥ minsup  (X1  Y) ≥ minsup  (X  Y) ≥ minsup –If conf(X1  Y1) ≥ minconf, thenconf(X1  Y) ≥ minconf

Multi-level Association Rules Approach 1 Extend current association rule formulation by augmenting each transaction with higher level items Original Transaction: {skim milk, wheat bread} Augmented Transaction: {skim milk, wheat bread, milk, bread, food} Issue: –Items that reside at higher levels have much higher support counts if support threshold is low, we get too many frequent patterns involving items from the higher levels

Multi-level Association Rules Approach 2 Generate frequent patterns at highest level first. Then, generate frequent patterns at the next highest level, and so on. Issues: –May miss some potentially interesting cross-level association patterns. E.g. skim milk  white bread, 2% milk  white bread, skim milk  white bread might not survive because of low support, but milk  white bread could. However, we don’t generate a cross-level itemset such as {milk, white bread}

Mining word associations (in Web) “Itemset” here is a collection of words “Transactions” are the documents. Example: W1 and W2 tend to appear together in the same documents. Potential solution for mining frequent itemsets: Convert into 0/1 matrix and then apply existing algorithms –Ok, but looses word frequency information Document-term matrix: Frequency of words in a document

Normalize First How to determine the support of a word? First, normalize the word vectors – Each word has a support, which equals to 1.0 Reason for normalization –Ensure that the data is on the same scale so that sets of words that vary in the same way have similar support values. Normalize TIDW1W2W3W4W5 D D D D D

Association between words E.g. How to compute a “meaningful” normalized support for {W1, W2}? One might think to sum-up the average normalized supports for W1 and W2. s({W1,W2}) = ( )/2 + ( )/2 + ( )/2 = 1 This result is by no means an accident. Why? Averaging is useless here.

Min-APRIORI Use instead the min value of normalized support (frequencies). s({W1,W2,W3}) = = 0.17 Example: s({W1,W2}) = min{0.4, 0.33} + min{0.4, 0.5} + min{0.2, 0.17} = 0.9

Anti-monotone property of Support Example: s({W1}) = = 1 s({W1, W2}) = = 0.9 s({W1, W2, W3}) = = 0.17 So, standard APRIORI algorithm can be applied.