Frequent Pattern Mining Toon CaldersBart Goethals ADReM research group.

Slides:



Advertisements
Similar presentations
An Introduction to Data Mining
Advertisements

Mining Association Rules from Microarray Gene Expression Data.
Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
CSCI 347 / CS 4206: Data Mining Module 02: Input Topic 03: Attribute Characteristics.
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Week 9 Data Mining System (Knowledge Data Discovery)
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Association Rule Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Fast Algorithms for Association Rule Mining
Database – Part 2 Dr. V.T. Raja Oregon State University.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Data Mining: A Closer Look
Data Mining: A Closer Look Chapter Data Mining Strategies 2.
Chapter 5 Data mining : A Closer Look.
MIS 451 Building Business Intelligence Systems Association Rule Mining (1)
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Enterprise systems infrastructure and architecture DT211 4
6/22/2006 DATA MINING I. Definition & Business-Related Examples Mohammad Monakes Fouad Alibrahim.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
1 Data Mining DT211 4 Refer to Connolly and Begg 4ed.
ACS1803 Lecture Outline 2 DATA MANAGEMENT CONCEPTS Text, Ch. 3 How do we store data (numeric and character records) in a computer so that we can optimize.
Data Mining Chun-Hung Chou
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.
Dr. Russell Anderson Dr. Musa Jafar West Texas A&M University.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
Association Rule By Kenneth Leung. Data Mining The process of extracting valid, previously unknown, comprehensible, and actionable information from large.
Principles of Data Mining. Introduction: Topics 1. Introduction to Data Mining 2. Nature of Data Sets 3. Types of Structure Models and Patterns 4. Data.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Multi-Relational Data Mining: An Introduction Joe Paulowskey.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.
1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 12 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Data Mining and Decision Support
Elsayed Hemayed Data Mining Course
Data Mining By Minh Osborne. Overview What is data mining? What can data mining do for you? The technologies involved with data mining.
Data Mining Copyright KEYSOFT Solutions.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
Optimization of Association Rules Extraction Through Exploitation of Context Dependent Constraints Arianna Gallo, Roberto Esposito, Rosa Meo, Marco Botta.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
DATA MINING It is a process of extracting interesting(non trivial, implicit, previously, unknown and useful ) information from any data repository. The.
Chapter 26: Data Mining Prepared by Assoc. Professor Bela Stantic.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization.
Data Mining – Intro.
By Arijit Chatterjee Dr
DATA MINING © Prentice Hall.
Mining Association Rules
Data Mining 101 with Scikit-Learn
Introduction C.Eng 714 Spring 2010.
Data Mining Modified from
Sangeeta Devadiga CS 157B, Spring 2007
Data Warehousing and Data Mining
Association Rule Mining
CSE591: Data Mining by H. Liu
Presentation transcript:

Frequent Pattern Mining Toon CaldersBart Goethals ADReM research group

2 Outline What is data mining? -Definition -local patterns vs global models -Supervised vs Unsupervised -What do we do? Frequent set mining More complex data types

3 What is data mining? DataInformation $ $ $ “the use of sophisticated data analysis tools to discover previously unknown, valid patterns and relationships in large data sets.”

4 Supervised vs Unsupervised Supervised: -data has been annotated -well-defined task: learn to annotate new data E.g.: examples of good/bad customers Unsupervised: -only data has been given -no annotation -« find knowledge » x x x x x x x y y n

5 Local vs Global Local pattern: -tells something about a small subset of the data E.g. « 90% of the customers that purchase beer also buy chips » Global model: -fits a global model to the data, a summary E.g. : there is a linear relationship between $ spent and the income of the customers

6 What do we do? Pattern mining -Local -Unsupervised Useful for -large datasets -exploration: « what is this data like? » Less suitable for -well-studied and understood problem domains

7 Outline What is data mining? Frequent set mining -Market Basket analysis -Association rules -Interestingness measures -Numerical attributes More complex data types

8 Market Basket Analysis Data: collection of transactions of customers: Goal: find sets of products frequently occuring together

9 Applications Supermarket -product placement -special promotions Websearch -which keywords often occur together in webpages? Health care -frequent sets of symptoms for a disease

10 Applications Basically works for all data that can be represented as a set of examples/objects having certain properties -patient / symptoms -movies / ratings -web pages / keywords -basket / products -…

11 Algorithms Computationally a very hard problem -with n products, 2 n sets of products Hundreds of algorithms have been proposed -for sparse/dense data -many rows/columns -data fits/does not fit in memory -…

12 Association Rules Conditional probabilities X  Y (c%): if X is in the transaction, then there is a probability of c% that Y is in it as well. Based on the frequent sets, associations can be computed easily: { Beer, Chips }  { Snack nuts }75% { adrem.html, cnts.html }  { islab.html }80% { rain }  { overcast }100%

13 Interestingness Measures Not all association rules are interesting -Domain knowledge pregnant  female, rain  overcast -Redundancy A  B (100%) then: AC  B, AD  B, … -Independence 70% buys product A: X  A(70%), Y  A(70%) Too many rules

14 Interestingness Measures Incorporating background knowledge -e.g., via Bayesian network -only produce rules that deviate from background knowledge Redundancies -Condensed representations: produce only a non- redundant subset of patterns

15 Interestingness Measures Independence -statistical significance tests X 2 Careful with conclusions !! 1000 tests with significance level 0.05 … (Bonferroni correction) Too many rules -Constraints -Top-k mining

16 Numerical Attributes Association rule mining is also possible for numerical attributes -discretization: make continuous attributes ordinal information loss not appropriate if the order between the values is important -other methods: recently new method based on rank correlation measures

17 Complex Patterns Sets Sequences Graphs Relational Structures Generation and Counting of such patterns becomes much more complex too!

18 Sequences CGATGGGCCAGTCGATACGTCGATGCCGATGTCACGA

19 Patterns in Sequences Substrings Regular expressions (bb|[^b]{2}) Partial orders Directed Acyclic Graphs

20 Graphs

21 Patterns in Graphs

22 Rules f: 5 f: 8 f: 4 f: 7 f: f:

23 Relational Databases

24 Patterns in RDBs Queries Query 1: Select L.drinker, V.bar From Likes L, Visits V Where V.drinker = L.drinker And L.beer = ‘Duvel’

25 Patterns in RDBs Query 2: Select L.drinker, V.bar From Likes L, Visits V, Serves S Where V.drinker = L.drinker And L.beer = ‘Duvel’ And S.bar = V.bar And S.beer = ‘Duvel’

26 Patterns in RDBs Association Rule: Query 1 => Query 2 If a person that likes Duvel visits bar, then that bar serves Duvel

27