G54DMT – Data Mining Techniques and Applications Dr. Jaume Bacardit

Slides:



Advertisements
Similar presentations
Association Rules Apriori Algorithm
Advertisements

Association Rule Mining
Mining Association Rules in Large Databases
Recap: Mining association rules from large datasets
Association Rule Mining
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Mining Frequent Patterns Using FP-Growth Method Ivan Tanasić Department of Computer Engineering and Computer Science, School of Electrical.
CSE 634 Data Mining Techniques
Data Mining Techniques Association Rule
Graph Mining Laks V.S. Lakshmanan
LOGO Association Rule Lecturer: Dr. Bo Yuan
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Data Mining Association Analysis: Basic Concepts and Algorithms
Rakesh Agrawal Ramakrishnan Srikant
Chapter 5: Mining Frequent Patterns, Association and Correlations
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining: Concepts and Techniques (2nd ed.) — Chapter 5 —
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Association Rule Mining Instructor Qiang Yang Slides from Jiawei Han and Jian Pei And from Introduction to Data Mining By Tan, Steinbach, Kumar.
Business Systems Intelligence: 4. Mining Association Rules Dr. Brian Mac Namee (
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Association Rule Mining Instructor Qiang Yang Thanks: Jiawei Han and Jian Pei.
Chapter 4: Mining Frequent Patterns, Associations and Correlations
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
What Is Sequential Pattern Mining?
October 2, 2015 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 8 — 8.3 Mining sequence patterns in transactional.
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 6 —
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Association Rule Mining March 5, 2009.
Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Data Mining Find information from data data ? information.
Mining Frequent Patterns. What Is Frequent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs.
Chapter 6: Mining Frequent Patterns, Association and Correlations
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
Introduction to Machine Learning Lecture 13 Introduction to Association Rules Albert Orriols i Puig Artificial.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Data Mining Find information from data data ? information.
Data Mining: Concepts and Techniques
A Research Oriented Study Report By :- Akash Saxena
Information Management course
Association rule mining
Frequent Pattern Mining
Waikato Environment for Knowledge Analysis
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Department of Computer Science National Tsing Hua University
Association Rule Mining
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 6 —
Association Analysis: Basic Concepts
Presentation transcript:

G54DMT – Data Mining Techniques and Applications Dr. Jaume Bacardit Topic 3: Data Mining Lecture 5: Regression and Association Rules Some slides from chapter 5 of Data Mining. Concepts and Techniques by Han & Kamber

Outline of the lecture Regression – Definition – Representations Association rules – Definition – Methods Resources

Regression Regression problems are supervised problems where the output variable is continuous Many techniques with different names are included in this category – Regression – Function approximation – Modelling – Curve-fitting Given an input vector X and a corresponding output y, we want to find a function f such that y’=f(X) is as close as possible to the true y

Evaluating regression Supervised learning: we know the true outputs, so we check how different are from the predicted ones – Mean Absolute Error – Mean Squared Error – Root Mean Squared Error

Linear Regression Most classic (and widespread in statistics) type of regression f(X) is modelled as – y’=w 0 +w 1 x 1 +w 2 x 2 +…+w n x n

Linear regression Simple but limited in expression power – The same model would apply to these four datasets

Linear regression How to find W? – Many mathematical methods availableavailable Least squares Ridge regression Lasso Etc – We can also use some kind of metaheuristic (e.g. a Genetic Algorithm)

Polynomial regression More complex and sophisticated functions – y=w 0 +w 1 x+w 2 x 2 +….. – Y=w 0 +w 1 x 1 +w 2 x 2 +w 3 x 1 x 2 +… Now the job is double – Choosing the correct function (human inspection may help) – Adjusting the weights of the model Still, would a single mathematical function fit any type of data?

Piece-wise regression Input space is partitioned in regions A local regression model is generated from the training examples that fell inside each region – Approximating a sine function with linear regressions (Butz, 2010)

Piece-wise regression How to partition the input space – Using a series of rules With a (hyper)rectangular condition (XCSF) With a (hyper)ellipsoidal condition (XCSF,LWPR) With a neural condition (XCSF) – Using a tree-like structure ( CART, M5 ) How to perform the regression process for each local approximation – Pick any of the functions discussed before – Plus some truly non-linear methods (SVR)

Piece-wise approximation with hyperellipsoids Using XCSF (Wilson, 02) with hyperellipsoid conditions (Butz et al, 08) Test function XCSF’s population (Stalph et al, 2010)

Other regression methods Neural networks – A MLP is natively a regression method Classification is done by discretising the output of the network – It is proven that a MLP with enough hidden nodes can approximate any function Support Vector Regression – As in SVM, depending on the kernel we got linear or non- linear regression – The margin specifies a tube around the approximated function. All points inside the tube have their errors ignored – Support Vectors are the points that lay outside the tube

Association Rules Association rules try to find frequent patterns in the dataset that appear together It can use the class label but it does not have to  we can consider it an unsupervised learning paradigm Two types of elements being generated – Association rules: They have antecedent and consequent – Frequent itemsets: They just have an antecedent. Both antecedent and consequent are logic predicates (generally of conjunctive form)

Association rules mining Witten and Frank, 2005 (

Origin of Association Rules These methods were originally employed to analyse shopping carts Database is specified as a set of transactions. Each of them includes one or more of a set of items An frequent itemset is a set of items that appears in many transactions These databases are extremely sparse TidItems 10A, C, D 20B, C, E 30A, B, C, E 40B, E

Beers and diapers An urban myth about association rules says that when applied to analyze a very large volume of shopping carts they discovered a very simple pattern – “Customers that buy beer also tend to buy diapers” This story has changed through time. You can find an article about it herehere It is a good example of data mining, as it was able to find an unexpected pattern

Why Is Freq. Pattern Mining Important? Discloses an intrinsic and important property of data sets Forms the foundation for many essential data mining tasks – Association, correlation, and causality analysis – Sequential, structural (e.g., sub-graph) patterns – Pattern analysis in spatiotemporal, multimedia, time-series, and stream data – Classification: associative classification – Cluster analysis: frequent pattern-based clustering – Data warehousing: iceberg cube and cube-gradient – Semantic data compression: fascicles – Broad applications

Evaluation of association rules Support – Percentage of examples covered by the predicate in the antecedent – Applies to both association rules and frequent itemsets Confidence – Percentage of the examples matched by the antecedent for which also match the consequent – Only apply to association rules Typically, the user specifies a minimum support and confidence and the algorithm finds all rules above the thresholds

Scalable Methods for Mining Frequent Patterns The downward closure property of frequent patterns – Any subset of a frequent itemset must be frequent – If {beer, diaper, nuts} is frequent, so is {beer, diaper} – i.e., every transaction having {beer, diaper, nuts} also contains {beer, diaper} Scalable mining methods: Three major approaches – Apriori (Agrawal & – Freq. pattern growth (FPgrowth—Han, Pei & – Vertical data format approach (Charm—Zaki &

Apriori: A Candidate Generation-and-Test Approach Apriori pruning principle: If there is any itemset which is infrequent, its superset should not be generated/tested! (Agrawal & Mannila, et KDD’ 94) Method: – Initially, scan DB once to get frequent 1-itemset – Generate length (k+1) candidate itemsets from length k frequent itemsets – Test the candidates against DB – Terminate when no frequent or candidate set can be generated

The Apriori Algorithm—An Example Database TDB 1 st scan C1C1 L1L1 L2L2 C2C2 C2C2 2 nd scan C3C3 L3L3 3 rd scan TidItems 10A, C, D 20B, C, E 30A, B, C, E 40B, E Itemsetsup {A}2 {B}3 {C}3 {D}1 {E}3 Itemsetsup {A}2 {B}3 {C}3 {E}3 Itemset {A, B} {A, C} {A, E} {B, C} {B, E} {C, E} Itemsetsup {A, B}1 {A, C}2 {A, E}1 {B, C}2 {B, E}3 {C, E}2 Itemsetsup {A, C}2 {B, C}2 {B, E}3 {C, E}2 Itemset {B, C, E} Itemsetsup {B, C, E}2 Sup min = 2

The Apriori Algorithm Pseudo-code: C k : Candidate itemset of size k L k : frequent itemset of size k L 1 = {frequent items}; for (k = 1; L k !=  ; k++) do begin C k+1 = candidates generated from L k ; for each transaction t in database do increment the count of all candidates in C k+1 that are contained in t L k+1 = candidates in C k+1 with min_support end return  k L k ;

Resources “The Elements of Statistical Learning” by Hastie et al. contains a lot of detail about statistical regression List of Regression and association rules methods in KEELRegressionassociation rules Weka also contains both kind of methods Chapter 5 of the Han and Kamber book is all about association rules (Han created the Fpgrowth method) Chapter 5 Review of evolutionary algorithms for association rule mining Review

Questions?