Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.

Slides:



Advertisements
Similar presentations
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Advertisements

A distributed method for mining association rules
Data Mining Techniques Association Rule
Data Mining in Clinical Databases by using Association Rules Department of Computing Charles Lo.
Data Mining of Very Large Data
LOGO Association Rule Lecturer: Dr. Bo Yuan
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Nadia Andreani Dwiyono DESIGN AND MAKE OF DATA MINING MARKET BASKET ANALYSIS APLICATION AT DE JOGLO RESTAURANT.
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Ex. 11 (pp.409) Given the lattice structure shown in Figure 6.33 and the transactions given in Table 6.24, label each node with the following letter(s):
Association Rule Mining Part 2 (under construction!) Introduction to Data Mining with Case Studies Author: G. K. Gupta Prentice Hall India, 2006.
Chapter 9 Business Intelligence Systems
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
September, 13th gR2002, Vienna PAOLO GIUDICI Faculty of Economics, University of Pavia Research carried out within the laboratory: Statistical.
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
Asssociation Rules Prof. Sin-Min Lee Department of Computer Science.
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
Fast Algorithms for Association Rule Mining
Mining Association Rules
Performance and Scalability: Apriori Implementation.
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
Data Mining An Introduction.
Ch5 Mining Frequent Patterns, Associations, and Correlations
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int ’ l Conference on Data Engineering (ICDE) March 1995 Presenter: Sam Brown.
Association Rules Mining in Distributed Environments By: Shamila Mafazi Supervised by: Dr. Abrar Haider.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Association Rule Mining March 5, 2009.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Data Mining Find information from data data ? information.
Association Rule Mining
CURE Clustering Using Representatives Handles outliers well. Hierarchical, partition First a constant number of points c, are chosen from each cluster.
Data Mining  Association Rule  Classification  Clustering.
Data Analytics CMIS Short Course part II Day 1 Part 1: Clustering Sam Buttrey December 2015.
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Market Basket Analysis
Data Mining Find information from data data ? information.
DATA MINING © Prentice Hall.
A Research Oriented Study Report By :- Akash Saxena
Knowledge discovery & data mining Association rules and market basket analysis--introduction UCLA CS240A Course Notes*
Frequent Pattern Mining
CPS216: Advanced Database Systems Data Mining
Market Basket Many-to-many relationship between different objects
Data Mining Association Analysis: Basic Concepts and Algorithms
Sangeeta Devadiga CS 157B, Spring 2007
Gyozo Gidofalvi Uppsala Database Laboratory
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Transactional data Algorithm Applications
I don’t need a title slide for a lecture
Association Rule Mining
Market Basket Analysis and Association Rules
Comparisons of Clustering Detection and Neural Network in E-Miner, Clementine and I-Miner Jong-Hee Lee and Yong-Seok Choi.
Presentation transcript:

Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management

Contents Aims General steps in the procedure Market basket analysis Frequent itemsets Conclusion

Aims search hidden coherences in the existing data bases (DB) help to take a well grounded decision Data mining techniques are able to find such relationships. they provide the ability to optimize decision- making they are the most powerful tools for retrieval important information

Steps of the data mining 1. Declaration of the key and the predictor variables in order to analyse (Sampling from a large amount of data) 2. Modification of variables, where we should examine whether some variables should be integrated (in large DBs always occur some mistakes) (some transformations should be executed)

Additional steps of the data mining 3. Modelling, data mining techniques: neural network, decision tree, regression procedures, cluster analysis, factor analysis, discriminant analysis, etc. 4. Comparison the data mining models built on the same DB (the best model can be selected). The procedure can be cyclically repeated. After the whole procedure the hidden relationships between different aspects can be shown.

Market Basket Analysis is used for finding groups of items that tend to occur together. The models give the likelihood of different products being purchased together. Market basket analysis is useful for: 1. items occur together 2. items occur in a particular sequence

Table of Co-Occurrence of Products Product 1Product 2Product 3Product 4Product 5 Product Product Product Product Product

Procedure of the market basket analysis 1. Choose the right level of the product hierarchy for the items. 2. Probabilities and joint probabilities of the items are calculated. 3. Determine the association rules.

Example Bicycle (A)140 Hand tools for bicycle (B)100 Tool rack (C)61 Bicycle and hand tool (A & B)50 Bicycle and tool rack (A & C)7 Hand tool and tool rack (B & C)45 Bicycle and hand tool and tool rack (A & B & C) 5

Table of probabilities and joint probabilities of items A 14 % B 10 % C 6,1 % A & B 5 % A & C 0,7 % B & C 4,5 % A & B & C 0,5 %

Association rules The rules ( A  B) consist of two parts: 1. condition and 2. consequence A confidence can be defined for the rules:

Example P(A  B) = 5 / 14 = P((A&B)  C) = 0.05 / 0.5 = 0.1 P((A&C)  B) = 0.05 / 0.07 = P((B&C)  A) = 0.05 / 0.45 = Is this association rule can help us? If we offer product A for everybody, then 14 % of the persons will purchase. If A for only B and C, then 11 % of the people will purchase.

Improvement This will help us to decide that the association rule is useful or not.

In our example Improvement ((B&C)  A) = / 0.14 = Improvement ((A&B)  C) = 0.1 / = The value of improvement shows the usefulness of the analysis: a) improvement > 1 b) improvement < 1

Dissociation rules similar to association rules count the inverse of the original item,  modify each transaction: A transaction includes an inverse item if, and only if, it does not contain the original item.

Time series the transactions must have two additional features: time information (e.g. time sequence or time stamp) identifying information (e.g. customer id, account number in a bank)

Frequent itemsets appear in at least fixed ratio problem a-priori trick: If a set of items S is frequent, then every subset of S is also frequent. procedure built from lower level to upper level (frequent items, frequent pairs, etc.)

A-Priori Algorithm 1. Define a threshold for relative frequency. All items are examined. The set of the frequent items: L Pairs of items in L 1 become the candidate (C 2 ). This is compared with the threshold limit. L 2 contains the frequent pairs.

A-Priori Algorithm (cont.) 3. The candidate triples (C 3 ) are those sets {A,B,C} such that all of subset are in L 2. L 3 will contain the frequent triples. 4. L i is the frequent sets of size i, C i+1 is the candidate set of size i+1 until the sets become empty

Criticism of A-Priori Algorithm good if we would like to know only the frequent pairs at searhing maximal frequent itemsets too many steps may be needed physical capacity of computers

Market Basket Mining with High Correlation Analysis The data are organised in a matrix. The cells contain Boolean. 1: yes 0: no This matrix is very sparse. We want to find the highly correlated pairs.

Applications of High Correlation Mining 1. Rows are the document, columns are the words. The highly correlated pairs of columns will give the words that appear almost together. 2. Rows and columns are Web pages. The cell contains 1, if the page of row links to the page of column. Result: pages about the same topic. 3. Page of columns links to the page of row. Result: the mirror pages.

Conclusion Planning store layout Bundling products Offering coupons

Future Further development: hierarchical association rules association rules maintenance sequential pattern mining functional dependency mining

Thank you! The flow is open for the discussion.

References Fajszi Bulcsú, Cser László: Üzleti tudás az adatok mélyén – Adatbányászat alkalmazói szemmel, Budapest, 2004, Budapesti Műszaki és Gazdaságtudományi Egyetem, Információ- és Tudásmenedzsment Tanszék. Michael J. A. Berry, Gordon Linoff: Data Mining Techniques – For Marketing, Sales, and Customor Support, Canada, 1997, John Wiley & Sons, Inc. Sam Kash Kachigan: Multivariate Statistical Analysis, New York, 1991, Radius Press. Ferenc Bodon: A fast APRIORI implementation. Agrawal, R., Srikant, R: Fast algorithms for mining association rules, The International Conference on Very Large Databases, 1994, pages