September, 13th gR2002, Vienna PAOLO GIUDICI Faculty of Economics, University of Pavia Research carried out within the laboratory: Statistical.

Slides:



Advertisements
Similar presentations
Web Usage Mining Web Usage Mining (Clickstream Analysis) Mark Levene (Follow the links to learn more!)
Advertisements

An Introduction to Data Mining
DATA MINING Association Rule Discovery. AR Definition aka Affinity Grouping Common example: Discovery of which items are frequently sold together at a.
Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Data e Web Mining Paolo Gobbo
Bayesian Network and Influence Diagram A Guide to Construction And Analysis.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)
Introduction of Probabilistic Reasoning and Bayesian Networks
Relative Linkage Disequilibrium: An intersection between evolution, algebraic statistics, text mining and contingency tables Ron S. Kenett KPA Ltd., Raanana,
Decision Making: An Introduction 1. 2 Decision Making Decision Making is a process of choosing among two or more alternative courses of action for the.
Rakesh Agrawal Ramakrishnan Srikant
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Ex. 11 (pp.409) Given the lattice structure shown in Figure 6.33 and the transactions given in Table 6.24, label each node with the following letter(s):
Association Rule Mining Part 2 (under construction!) Introduction to Data Mining with Case Studies Author: G. K. Gupta Prentice Hall India, 2006.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 5: Association Rules, Sequential Associations.
LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.
Summarization of Frequent Pattern Mining. What is FPM? Why being frequent is so important? Application of FPM Decision make/Business Software Debugging.
Fast Algorithms for Association Rule Mining
Chaotic Mining: Knowledge Discovery Using the Fractal Dimension Daniel Barbara George Mason University Information and Software Engineering Department.
Building Knowledge-Driven DSS and Mining Data
Data Mining – Intro.
Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
Ch5 Mining Frequent Patterns, Associations, and Correlations
1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.
Lecture 9: Knowledge Discovery Systems Md. Mahbubul Alam, PhD Associate Professor Dept. of AEIS Sher-e-Bangla Agricultural University.
第十讲 概率图模型导论 Chapter 10 Introduction to Probabilistic Graphical Models
Association Rules Mining in Distributed Environments By: Shamila Mafazi Supervised by: Dr. Abrar Haider.
Data & Text Mining1 Introduction to Association Analysis Zhangxi Lin ISQS 3358 Texas Tech University.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Multi-Relational Data Mining: An Introduction Joe Paulowskey.
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
27-18 września Data Mining dr Iwona Schab. 2 Semester timetable ORGANIZATIONAL ISSUES, INDTRODUCTION TO DATA MINING 1 Sources of data in business,
ASSOCIATION RULES (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
DISCOVERING SPATIAL CO- LOCATION PATTERNS PRESENTED BY: REYHANEH JEDDI & SHICHAO YU (GROUP 21) CSCI 5707, PRINCIPLES OF DATABASE SYSTEMS, FALL 2013 CSCI.
A generalized bivariate Bernoulli model with covariate dependence Fan Zhang.
Charles Tappert Seidenberg School of CSIS, Pace University
A shared random effects transition model for longitudinal count data with informative missingness Jinhui Li Joint work with Yingnian Wu, Xiaowei Yang.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Overview Definition of Apriori Algorithm
Mining Sequential Patterns © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Slides are adapted from Introduction to Data Mining by Tan, Steinbach,
Data Analytics CMIS Short Course part II Day 1 Part 1: Clustering Sam Buttrey December 2015.
Web Analytics Xuejiao Liu INF 385F: WIRED Fall 2004.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.
CS548 Spring 2016 Association Rules Showcase by Shijie Jiang, Yuting Liang and Zheng Nie Showcasing work by C.J. Carmona, S. Ramírez-Gallego, F. Torres,
1 Top Down FP-Growth for Association Rule Mining By Ke Wang.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Differential Analysis on Deep Web Data Sources Tantan Liu, Fan Wang, Jiedan Zhu, Gagan Agrawal December.
Data Mining – Intro.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Association rule mining
Waikato Environment for Knowledge Analysis
Lin Lu, Margaret Dunham, and Yu Meng
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Frequent patterns and Association Rules
Statistical Relational AI
Charles Tappert Seidenberg School of CSIS, Pace University
Presentation transcript:

September, 13th gR2002, Vienna PAOLO GIUDICI Faculty of Economics, University of Pavia Research carried out within the laboratory: Statistical models for data mining (SMDM)

A small sample of web clickstream data (from a logfile)

Analysis of web clickstream data 1. In data matrix form (Giudici and Castelo, 2001; Blanc and Giudici, 2001): -Association measures -Association models (graphical association models) 2. In transactional data form (in this talk) - Association and sequence rules - Statistical models for sequences

Association measures and models Based on data arranged in contingency table form FOR INSTANCE: Odds ratios Graphical loglinear models Recursive logistic regression models For a review, see Giudici, Applied data mining, Wiley, 2003

Association and sequence rules Implemented in main Data Mining softwares Based on transactional databases Such databases arise for instance in -Market basket analysis (order does not matter) -Web clickstream analysis (order matters) Aim: search for itemsets (groups of events) that occurr simultaneously with a high frequency

A 1,.., A p : p binary random variables. Itemset: logical expression such as A = (A j1 = 1,...,. A jk =1), k< p. Association rule: logical relationship between two itemsets: e.g. if A, then B Example:A= (Milk, Coffee) B=(Bread, Biscuits) Sequence rule: the relationship is determined by a temporal order. Example: A= (Home, Register) B=(P_info) Formally:

Interestingness of a rule Support = Confidence = = Lift =Confidence / Support (B) A priori search algorithm (Agrawal et al., 1995): based on the support.

Application to real data Data set from a logfile of an e-commerce site, kindly supplied by SAS. Contains the userid (C_VALUE), the time of connection (C_TIME) and the page visualised (C_CALLER). Number of clicks: 21889; Number of visitors (sessions): 1240.

Exploratory step (data selected from a cluster of visitors, N. 3) ClusterN.obsVariables Cluster mean Overall mean 18802CLICKS LENGTH start %PURCH 8 6 min h min 14 h CLICKS LENGTH start %PURCH min h CLICKS LENGTH start %PURCH 18 59min h CLICKS LENGTH start %PURCH 8 6 min h

Remark Data could have been transformed from transactional to data matrix format. Doing so information on the order of the visited pages would have been lost Data matrix format for the considered data:

Application of the apriori algorithm Most frequent indirect sequences of order 2

Most frequent indirect sequences of any order

Proposal: direct sequences Only “subsequent” visits are being considered We have inserted two fictitious (deterministic) pages: (start_session; end_session)

Most frequent direct sequences of order 2

Towards a global model: graphical representation of direct association rules

Link analysis representation

Global models for web mining Sequence rules are an instance of a local model (or pattern, see Hand et al, 2001) of data mining. A local model draws statistical conclusions on parts of the dataset, rather than on the whole. Link analysis is an example of a global descriptive model. We have considered two global inferential models: - probabilistic expert systems - Markov chains

Probabilistic expert systems Graphical models that allow to describe (recursive) dependencies between (binary) random variables Can be described by a directed conditional independence graph, that specifies the factorisation of the joint probability distribution. They ARE NOT directly comparable with sequence rules, that are local indexes to study dependencies between events (itemsets) They are built from contingency table data, thus DO NOT model order of visit to pages.

Probabilistic expert systems: structural learning

Probabilistic expert systems: quantitative learning

Markov Chains for web mining Ideal to model dependencies between events. Order of the chain parallels order of a sequence rule. Data have been structured in the following form:

Results from Markov chains (entrance to the site- start session)

Exit from the site (end session)

Most likely paths Progra m HomeStart_session P_info 45,81% 17,80% Product 70,18% 26,73% Markov chains ARE DIRECTLY comparable with direct sequence rules. E.g. for the most likely path: from start_session, the highest confidence is with home (45,81%), then program (20.39,), product ( 78,09% ) and addcart (28,79%). There are small differences, due to the fact that apriori algorithm considers only rules with support higher than a fixed threshold (e.g. 5%).

Essential references Agrawal, R., Manilla, H., Srikant, R., Toivonen, H. and Verkamo, A.I. (1995) Fast discovery of association rules, in: Advances in knowledge discovery and data mining, AAAI/MIT Press, Cambridge. Giudici, P. (2003) Applied Data mining. Wiley, London. Giudici, P. and Castelo, R. (2001) Association models for web mining. Journal of Knowledge discovery and data mining, 5, pp Trevor Hastie, Robert Tibshirani and Jerome Friedman (2001).The elements of statistical learning: data mining, inference and prediction. Springer-Verlag. Hand, D.J., Mannilla, H. and Smyth, P (2001) Principles of Data Mining, MIT Press, New York.

THANKS FOR THE ATTENTION ! Comments to: