William Norris Professor and Head, Department of Computer Science

Slides:



Advertisements
Similar presentations
Association Rule Mining
Advertisements

Mining Association Rules in Large Databases
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Data Mining Techniques Association Rule
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Rakesh Agrawal Ramakrishnan Srikant
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Fast Algorithms for Association Rule Mining
SEG Tutorial 2 – Frequent Pattern Mining.
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
Data Mining Association Analysis Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
Data & Text Mining1 Introduction to Association Analysis Zhangxi Lin ISQS 3358 Texas Tech University.
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
1 What is Association Analysis: l Association analysis uses a set of transactions to discover rules that indicate the likely occurrence of an item based.
Data Mining Find information from data data ? information.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Data Mining  Association Rule  Classification  Clustering.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Introduction to Data Mining Mining Association Rules Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Stats 202: Statistical Aspects of Data Mining Professor Rajan Patel
Data Mining – Association Rules
Data Mining Find information from data data ? information.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association rule mining
Techniques for Finding Patterns in Large Amounts of Data: Applications in Biology Vipin Kumar William Norris Professor and Head, Department of Computer.
Frequent Pattern Mining
Association Rules.
Frequent Itemsets Association Rules
Waikato Environment for Knowledge Analysis
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms
Data Mining for Finding Connections of Disease and Medical and Genomic Characteristics Vipin Kumar William Norris Professor and Head, Department of Computer.
Department of Computer Science National Tsing Hua University
Lecture 11 (Market Basket Analysis)
Mining Association Rules in Large Databases
Association Analysis: Basic Concepts
Presentation transcript:

William Norris Professor and Head, Department of Computer Science Association Analysis for Finding Patterns in Large Amounts of Biological Data Vipin Kumar William Norris Professor and Head, Department of Computer Science kumar@cs.umn.edu www.cs.umn.edu/~kumar

Association Analysis Given a set of records, find dependency rules which will predict occurrence of an item based on occurrences of other items in the record Applications Marketing and Sales Promotion Supermarket shelf management Traffic pattern analysis (e.g., rules such as "high congestion on Intersection 58 implies high accident rates for left turning traffic") Rules Discovered: {Milk} --> {Coke} (s=0.6, c=0.75) {Diaper, Milk} --> {Beer} (s=0.4, c=0.67)

Association Rule Mining Task Given a set of transactions T, the goal of association rule mining is to find all rules having support ≥ minsup threshold confidence ≥ minconf threshold Brute-force approach: Two Steps Frequent Itemset Generation Generate all itemsets whose support  minsup Rule Generation Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset Frequent itemset generation is computationally expensive

Efficient Pruning Strategy (Ref: Agrawal & Srikant 1994) Found to be Infrequent Pruned supersets If an itemset is infrequent, then all of its supersets must also be infrequent

Illustrating Apriori Principle Items (1-itemsets) Pairs (2-itemsets) (No need to generate candidates involving Coke or Eggs) Minimum Support = 3 Triplets (3-itemsets) If every subset is considered, 6C1 + 6C2 + 6C3 = 41 With support-based pruning, 6 + 6 + 1 = 13

Counting Candidates A B C D A C E B C D A B D E B C E B D Candidates Frequent Itemsets are found by counting candidates. Simple way: Search for each candidate in each transaction. Expensive!!! Candidates Count Transactions 1 A D A E A B C D E A B D E B C D A B E B D B C A C A B 2 1 A D A E A B C D E A B D E B C D A B E B D B C A C A B A B A C A D A E B C B D A B E B C D A B D E A B C D E Reduce the number of comparisons (NM) by using hash tables to store the candidate itemsets 2 A D A E A B C D E A B D E B C D A B E B D B C A C A B 1 4 3 Naïve approach requires O(NM) comparisons A B C D A C E B C D A B D E B C E B D M N

Where are the parts located? How many roles can these play? How flexible and adaptable are they mechanically? What are the shared parts (bolt, nut, washer, spring, bearing), unique parts (cogs, levers)? What are the common parts -- types of parts (nuts & washers)? Where are the parts located? Which parts interact? © Mark Gerstein, Yale

Create a data set that records the presence and absence of Association Analysis for Finding Connections of Disease and Medical and Genomic Characteristics Create a data set that records the presence and absence of Phenotypic characteristics Genetic characteristics (SNPs) Disease Apply association analysis to find groups of phenotypic and genetic characteristics that are highly associated with disease Uses characteristics of the patterns to prune the search space Clustering and classification can also be applied

The Need for Error-Tolerant Itemsets An error-tolerant itemset (ETI) can have a fraction  of the items missing in each transaction. Example: see the data in the table Let  = 1/4. In other words, each transaction needs to have 3/4 (75%) of the items. X = {i1, i2, i3, i4} and Y = {i5, i6, i7, i8} are both ETIs with a support of 4. Algorithms to find ETIs are still in development You can think of these ETIs as blocks in the data matrix

ETIs in For Finding Patterns in Phenotypic and Genomic Data ETIs consist of A set of patients and A set of attributes such that The block is relatively dense These blocks identify sets of patients that are highly associated with certain sets of attributes and vice-versa If most of these patients share a disease, then these attributes (genetic and/or phenotypic) are candidate markers for the disease X: Set of patients Y: Set of attributes, i.e., SNPs, medical characteristics