CS 590M Fall 2001: Security Issues in Data Mining Lecture 5: Association Rules, Sequential Associations.

Slides:



Advertisements
Similar presentations
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Advertisements

Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
LOGO Association Rule Lecturer: Dr. Bo Yuan
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Mining Sequential Patterns Authors: Rakesh Agrawal and Ramakrishnan Srikant. Presenter: Jeremy Dalmer.
Chase Repp.  knowledge discovery  searching, analyzing, and sifting through large data sets to find new patterns, trends, and relationships contained.
ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
Berendt: Advanced databases, winter term 2007/08, 1 Advanced databases – Inferring implicit/new.
Privacy Preserving Association Rule Mining in Vertically Partitioned Data Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
Association Rules l Mining Association Rules between Sets of Items in Large Databases (R. Agrawal, T. Imielinski & A. Swami) l Fast Algorithms for.
Rakesh Agrawal Ramakrishnan Srikant
Chapter 5: Mining Frequent Patterns, Association and Correlations
Learning Fuzzy Association Rules and Associative Classification Rules Jianchao Han Computer Science Department California State University Dominguez Hills.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Multi-dimensional Sequential Pattern Mining
Item Selection By “Hub-Authority” Profit Ranking Presented by: Thomas Su.
Data Quality Class 9. Rule Discovery Decision and Classification Trees Association Rules.
Data Mining Association Rules Yao Meng Hongli Li Database II Fall 2002.
4/3/01CS632 - Data Mining1 Data Mining Presented By: Kevin Seng.
Association Analysis: Basic Concepts and Algorithms.
Fast Algorithms for Mining Association Rules * CS401 Final Presentation Presented by Lin Yang University of Missouri-Rolla * Rakesh Agrawal, Ramakrishnam.
September, 13th gR2002, Vienna PAOLO GIUDICI Faculty of Economics, University of Pavia Research carried out within the laboratory: Statistical.
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
DATA MINING -ASSOCIATION RULES-
Summarization of Frequent Pattern Mining. What is FPM? Why being frequent is so important? Application of FPM Decision make/Business Software Debugging.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
Fast Algorithms for Association Rule Mining
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
Performance and Scalability: Apriori Implementation.
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
CS 349: Market Basket Data Mining All about beer and diapers.
Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.
『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
What Is Sequential Pattern Mining?
1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
1 Multi-dimensional Sequential Pattern Mining Helen Pinto, Jiawei Han, Jian Pei, Ke Wang, Qiming Chen, Umeshwar Dayal ~From: 10th ACM Intednational Conference.
DATA MINING LECTURE 3 Frequent Itemsets Association Rules.
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.
Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Data Mining Find information from data data ? information.
Associations and Frequent Item Analysis. 2 Outline  Transactions  Frequent itemsets  Subset Property  Association rules  Applications.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.
18 February 2003Mathias Creutz 1 T Seminar: Discovery of frequent episodes in event sequences Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo.
Overview Definition of Apriori Algorithm
Mining Sequential Patterns © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Slides are adapted from Introduction to Data Mining by Tan, Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rules Carissa Wang February 23, 2010.
1 Top Down FP-Growth for Association Rule Mining By Ke Wang.
Introduction to Data Mining Mining Association Rules Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Data Mining Find information from data data ? information.
Jian Pei and Runying Mao (Simon Fraser University)
A Research Oriented Study Report By :- Akash Saxena
Association rule mining
Frequent Itemsets Association Rules
Gyozo Gidofalvi Uppsala Database Laboratory
I don’t need a title slide for a lecture
Farzaneh Mirzazadeh Fall 2007
15-826: Multimedia Databases and Data Mining
Presentation transcript:

CS 590M Fall 2001: Security Issues in Data Mining Lecture 5: Association Rules, Sequential Associations

Why Association Rules? Understand attributes, not entities Discover relationships that –Show some dependency between attributes –Are “interesting” Give an understanding of the data space

Formal Definition Data: –Items={i 1,…,i n } –Transactions T={t 1,…,t m } where t i = {i j1, …, i jk } Support: Given A  I, supp(A) = |{t  T | t  A}| / |T| Goal: Find rules A  B with support ≥ s and confidence ≥ c where: –A, B  I, A  B =  –s = supp(A  B), c = supp(A  B) / supp(A)

Sample: Market Basket I T HardwareAutoClothingFurnishingsPaper goods Grocery t0t t1t t2t t3t t4t t5t t6t t7t t8t t9t

Types of associations Machine-learning base: classification / decision rules –Entities independent, unordered –Find rules leading to target class –To get rule sets, re-run for all classes as targets Market-basket –Collection of related entities with same key –Can be modeled as independent entities, sparse data Sequential –Like market basket, but group by distance rather than same key

Historical Association Rule Learning Decision tree converted to rules –ID3, as discussed in previous lecture Direct production of decision rules –CN2, others Problem: Algorithms don’t scale well to many practical problems

Database community contribution: Market Basket Association Rules Rakesh Agrawal, Tomasz Imielinski, and Arun Swami. Mining association rules between sets of items in large databases. In Proc. of the ACM SIGMOD Conference on Management of Data, pages , Washington, D.C., May 1993.Mining association rules between sets of items in large databases Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: Rakesh AgrawalRamakrishnan SrikantFast Algorithms for Mining Association Rules in Large DatabasesVLDB 1994

Database community contribution: Market Basket Association Rules Practical problems often have sparse data –Many attributes, few items per transaction Goal is typically search for high support –High support = broad impact –High confidence not crucial (as opposed to classification) Very Large data sets (main-memory algorithms impractical)

A-Priori Algorithm Observation: if A has support s, then –  i  A, supp(i) ≥ s Gives bottom-up algorithm –Find single items with support ≥ s –Just look at transaction subsets with those items for pairs –Recurse

A-Priori Algorithm First, generate all large itemsets –Sets X  I such that supp(X) ≥ s (threshold) –Captures “supp(A  B) ≥ s” part of problem Second, find high-confidence rules that are subsets of X –B = X i, A = X-B –To find confidence, need supp(A) But A will be in all large itemsets – don’t need to go back to the database!

A-Priori Algorithm L 1 = {large 1-itemsets}; for ( k = 2; L k-1   ; k++ ) C k = select p.i 1, p.iY, …, p.i k-1, q.i k-1 from L k-1 p, L k-1 q where p.i 1 = q.i 1, …, p.i k-2 = q.i k-2  transactions t  T C t = subset(C k, t);// Candidates contained in t  candidates c  C t : c.count++; L k = {c  C k | c.count  minsup} Answer =  k L k ;

Frequent episodes for sequential associations Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo: Discovering Frequent Episodes in Sequences. In First International Conference on Knowledge Discovery and Data Mining (KDD'95), , Montreal, Canada, August AAAI Press. Instead of transaction, items grouped by sliding window in time Same basic idea as A-Priori

Frequent Episodes: Definition Event types E Event (A,t) where A in E Sequence S=((A1,t1),…,(An,tn)) Frequent episode F = (Ai, …, Aj) where –  tl, tm such that t1  tl<…<tm  tn tm-tl  window: –count( ((Ai,tl), …, (Aj, tm)) )  support

Applications/Issues in Security Frequent episodes in intrusion detection data –What does this tell us? Preventing the discovery of associations –Known items to protect –What if we don’t know what we want to protect?