A Research Oriented Study Report By :- Akash Saxena

Slides:



Advertisements
Similar presentations
Association Rules Evgueni Smirnov.
Advertisements

Association Rule and Sequential Pattern Mining for Episode Extraction Jonathan Yip.
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
DATA MINING Association Rule Discovery. AR Definition aka Affinity Grouping Common example: Discovery of which items are frequently sold together at a.
Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
LOGO Association Rule Lecturer: Dr. Bo Yuan
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Nadia Andreani Dwiyono DESIGN AND MAKE OF DATA MINING MARKET BASKET ANALYSIS APLICATION AT DE JOGLO RESTAURANT.
Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.
Rakesh Agrawal Ramakrishnan Srikant
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
DATA MINING -ASSOCIATION RULES-
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Fast Algorithms for Association Rule Mining
Lecture14: Association Rules
Mining Association Rules
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Bayesian Decision Theory Making Decisions Under uncertainty 1.
『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rules Mining in Distributed Environments By: Shamila Mafazi Supervised by: Dr. Abrar Haider.
Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Part II - Association Rules © Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II – Association Rules Margaret H. Dunham Department of.
Association Rule Mining
DATA MINING By Cecilia Parng CS 157B.
CS Data Mining1 Data Mining The Extraction of useful information from data The automated extraction of hidden predictive information from (large)
CURE Clustering Using Representatives Handles outliers well. Hierarchical, partition First a constant number of points c, are chosen from each cluster.
DATA MINING PREPARED BY RAJNIKANT MODI REFERENCE:DOUG ALEXANDER.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining ARM: Improvements March 10, 2009 Slide.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rules Carissa Wang February 23, 2010.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Data Mining Find information from data data ? information.
TITLE What should be in Objective, Method and Significant
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Association rule mining
Association Rules Repoussis Panagiotis.
William Norris Professor and Head, Department of Computer Science
Adrian Tuhtan CS157A Section1
Market Basket Analysis and Association Rules
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining II: Association Rule mining & Classification
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Association Rule Mining
Transactional data Algorithm Applications
Data Mining Association Analysis: Basic Concepts and Algorithms
I don’t need a title slide for a lecture
Association Rule Mining
Farzaneh Mirzazadeh Fall 2007
Unit 3 MINING FREQUENT PATTERNS ASSOCIATION AND CORRELATIONS
Market Basket Analysis and Association Rules
©Jiawei Han and Micheline Kamber
Association Rule Mining
Association Analysis: Basic Concepts
Presentation transcript:

A Research Oriented Study Report By :- Akash Saxena 121030931 Data Mining A Research Oriented Study Report By :- Akash Saxena 121030931

Organization of the Presentation General Overview of Data Mining Concept A detailed study of Association Rule Mining Concepts and Algorithms

Basic Data Mining Introduction General Overview Basic Data Mining Introduction

A View Into The World of Data Mining Introduction What is Data Mining? Why Data Mining? Various Techniques of Data Mining

INTRODUCTION Data Abundance Services for Customer Importance of Knowledge Discovery

What is Data Mining? “ It is defined as the process of discovering patterns in data. The process must be automatic or semiautomatic. The pattern discovered must be meaningful, in that they lead to some advantage, usually economic advantage. The data is invariably present in substantial quantities.”

In easier terms…. Data mining is about extracting useful knowledge from large databases. Further simply, Data miner analyzes historical data, discovers some patterns in them and these help human analyst (semi-automatic) or automated decision making tool to predict an apt outcome for a future situation.

Why Data Mining? Automation has beaten Manual Human Analyst efforts Increasing Competition in the market Accurate, Fast, Unexpected predictions having enormous effect on economy.

Data Mining Application Areas Business Transactions E-commerce Scientific Study Health Care Study Web Study Crime Detection Loan Delinquency Banking

Data Mining Techniques Association Rule Mining Cluster Analysis Classification Rule Mining Frequent Episodes Deviation Detection/ Outlier Analysis Genetic Algorithms Rough Set Techniques Support Vector Machines

Association Rule Mining Going Specific

What will I cover… Basic terminology in Association DM Market Basket Analysis Algorithms Apriori Algorithm Partition Algorithm Pincer Search Algorithm Dynamic Itemset Counting Algorithm FP Tree growth Algorithm

Basic Association DM Terms Support It is the percentage of records containing an item combination compared to total number of records. Confidence It is the support of an item combination divided by support for a condition. We actually measure how confident can we be, given that a customer has purchased one product, that he will also purchase another product. Association Rule It is a rule of the form X=›Y showing an association between X and Y that if X occurs then Y will occur. It is accompanied by a confidence % in the rule.

Basic Association DM Terms Itemset It is a set of items in a transaction. K-itemset is a set of ‘k’ number of items. Frequent Itemset It is an itemset whose support in a transaction database is more than the minimum support specified. Maximal Frequent Itemset It is an itemset which is frequent and no superset of it is frequent. Border set It is an itemset if it is not frequent but all its proper subsets are frequent.

Basic Association DM Terms Downward Closure Property Any subset of a frequent itemset is frequent. Upward Closure Property Any superset of an infrequent itemset is infrequent.

Market Basket Analysis It is an analysis conducted to determine which products customers purchase together. Knowing this pattern of purchasing traits of customer can be very useful to a retail store or company. This can be very useful as once it is known that customers’ who buy product A are likely to buy product B, then company can market both A and B together. Thus making purchase of one product target prospects of another.

Market Basket Analysis For Example consider the following database: Transaction Products 1 Burger, Coke, Juice 2 Juice, Potato Chips 3 Coke, Burger 4 Juice, Ground Nuts 5 Coke, Ground Nuts Seeing this, there is no visible obvious rule or relationship between items in the buying patterns of the customers. Move ahead to see mining of relationships…

Example explanation Burger Juice Coke P chips G Nuts Burger 2 1 2 0 0 Above table shows how many times was one item purchased with other item. Central diagonal shows how items were purchased with themselves so we ignore it.

Giving view of terms in this example Association Rule: “If a customer purchases Coke, then he will probably purchase a burger” is an association rule associating Coke and Burger. Support: This rule has a support of 40%. records in which both burger and coke occur together = 2 total number of records = 5 support = 2/5 * 100 = 40% Confidence: Above rule has a confidence of 66% support for combination (Coke+Burger) is 40% support for condition (Coke) is 60% confidence = 40/60 * 100 = 66%

Giving view of terms in this example Itemset: {Coke, Burger} is a 2-itemset containing 2 items. {Coke} is a 1-itemset. Frequent Itemset: If Minimum support be 2 then {Coke}, {Juice}, {Burger}, {G Nuts}, {Coke, Burger} are frequent itemsets. Maximal Frequent Itemset: {Coke, Burger} is a maximal frequent itemset. This is confirmed by the fact that both its subsets viz. {Coke}, {Burger} are frequent too.

Apriori Algorithm This is a level wise algorithm developed by Dr R Aggarwal & Dr R Srikant. A set of frequent 1-itemsets is found. Then it is used to generate frequent 2-itemsets and these 2-itemsets are used to generate 3-itemsets and so on. It has two parts: -Joint Step Candidate Generation Process -Pruning Process

Apriori Algorithm continued… Input: Database D of transactions, Minimum Support Threshold σ Output: L, set of all Frequent itemsets in D. Initialize: k=1, C1=all 1-itemsets Read Database D to count support of C1 to determine L1 L1= {frequent 1-itemsets} k=2 // k represents pass number

While (Lk-1≠ Ø) do Begin Ck=Ø For all itemsets l1 є Lk-1do For all itemsets l2 є Lk-1do if (l1[1] = l2[1]) ^ (l1[2] = l2[2]) ^ …………. ^ (l1[k-2] = l2[k-2]) ^ (l1[k-1] < l2[k-1]) then c= l1[1], l1[2], l1[3],….., l1[k-1], l2[k-1] Ck= Ck U {c} for all c є Ck for all (k-1)-subsets of d of c do if d έ Lk-1 then Ck= Ck / {c}

For all transactions t є D do Increment count of all candidates in Ck that are contained in t. Lk = All candidates in Ck with minimum support K=k+1 End Answer: Uk Lk

Pincer Search Algorithm This algorithm is one of the fastest Apriori based algorithm which implement horizontal mining. It was developed by Dao I Lin and Z M Kedem. It uses the Apriori Method but makes it more efficient with the use of concept of Maximal Frequent Itemset thus combining both bottom-up (for frequent itemset generation) and top-down approach (for searching MFS).

L0= Ø, k=1, C1= {{i} | i є I}; S0= Ø MFCS = {all items}; MFS = Ø Do until Ck= Ø and Sk-1=Ø read the database and count support for Ck & MFCS MFS = MFS U {frequent itemsets in MFCS} Sk= {infrequent itemsets in Ck} if Sk≠ Ø for all itemsets s є Sk for all itemsets m є MFCS if s is a subset of m MFCS = MFCS \{m} for all items e є s if m \ {e} is not a subset of any itemset in MFCS MFCS = MFCS U {m \ {e}}

For all itemsets c in Lk If c is a subset of any itemset in current MFS Delete c from Lk Generate candidates from Ck+1 from Ck If any frequent itemset in Ck is removed from Lk Then for all itemsets l є Lk for all itemsets m є MFS if first k-1 items in l are also in m for i from k+1 to |m| Ck+1 = Ck+1 U {{l.item1 ,…., l.itemk, m.itemk} For all itemsets in Ck+1 If c I not a subset of any itemset in current MFCS Delete c from Ck+1 K=k+1 Answer : MFS

Thank You