Mining Path Traversal Patterns with User Interaction for Query Recommendation 龚赛赛 2013-05-28.

Slides:

Advertisements

Similar presentations

Mining Association Rules

Advertisements

Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.

Mining Frequent Patterns Using FP-Growth Method Ivan Tanasić Department of Computer Engineering and Computer Science, School of Electrical.

Graph Mining Laks V.S. Lakshmanan

Data e Web Mining Paolo Gobbo

Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.

10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.

Chase Repp.  knowledge discovery  searching, analyzing, and sifting through large data sets to find new patterns, trends, and relationships contained.

ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)

FP-Growth algorithm Vasiljevic Vladica,

Rakesh Agrawal Ramakrishnan Srikant

IncSpan: Incremental Mining of Sequential Patterns in Large Databases Hong Cheng,Xifeng Yan,Jiawei Han University of Illinois at Urbana-Champaign.

© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.

1 IncSpan :Incremental Mining of Sequential Patterns in Large Database Hong Cheng, Xifeng Yan, Jiawei Han Proc Int. Conf. on Knowledge Discovery.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.

Mining Time-Series Databases Mohamed G. Elfeky. Introduction A Time-Series Database is a database that contains data for each point in time. Examples:

1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.

Association Analysis: Basic Concepts and Algorithms.

Association Rule Mining. Generating assoc. rules from frequent itemsets  Assume that we have discovered the frequent itemsets and their support  How.

LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.

The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.

Efficient Data Mining for Path Traversal Patterns CS401 Paper Presentation Chaoqiang chen Guang Xu.

Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏

Mining Association Rules

© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.

Performance and Scalability: Apriori Implementation.

What Is Sequential Pattern Mining?

AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.

Data Mining Frequent-Pattern Tree Approach Towards ARM Lecture

Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.

Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )

Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int ’ l Conference on Data Engineering (ICDE) March 1995 Presenter: Sam Brown.

Pattern-Growth Methods for Sequential Pattern Mining Iris Zhang

Sequential Pattern Mining

Mining Frequent Patterns without Candidate Generation.

Mining Frequent Patterns without Candidate Generation : A Frequent-Pattern Tree Approach 指導教授：廖述賢博士報告人：朱佩慧班級：管科所博一.

Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.

Parallel Mining Frequent Patterns: A Sampling-based Approach Shengnan Cong.

연관규칙탐사, 박종수 1 연관 규칙 탐사와 그 응용 성신여자대학교 전산학과 박 종수

Faculty of Informatics and Information Technologies Slovak University of Technology Personalized Navigation in the Semantic Web Michal Tvarožek Mentor:

Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.

1 Efficient Algorithms for Incremental Update of Frequent Sequences Minghua ZHANG Dec. 7, 2001.

CanTree: a tree structure for efficient incremental mining of frequent patterns Carson Kai-Sang Leung, Quamrul I. Khan, Tariqul Hoque ICDM ’ 05 報告者：林靜怡.

ASSIST: Adaptive Social Support for Information Space Traversal Jill Freyne and Rosta Farzan.

Research Academic Computer Technology Institute (RACTI) Patras Greece1 An Algorithmic Framework for Adaptive Web Content Christos Makris, Yannis Panagis,

M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining ARM: Improvements March 10, 2009 Slide.

Association Analysis (3)

Advanced Topics in Data Mining: Web Mining. Web Mining.

Fuzzy Set Approach for Improving Web Log Mining Sajitha Naduvil-Vadukootu Csc 8810 : Computational Intelligence Instructor: Dr. Yanqing Zhang Dec 4, 2006.

Reducing Number of Candidates

Data Mining: Concepts and Techniques

A Research Oriented Study Report By :- Akash Saxena

Software Design and Architecture

Frequent Pattern Mining

Market Basket Analysis and Association Rules

Spatio-temporal Rule Mining: Issues and Techniques

Data Mining Association Analysis: Basic Concepts and Algorithms

Vasiljevic Vladica, FP-Growth algorithm Vasiljevic Vladica,

Association Rule Mining

Mining Complex Data COMP Seminar Spring 2011.

Mining Access Pattrens Efficiently from Web Logs Jian Pei, Jiawei Han, Behzad Mortazavi-asl, and Hua Zhu 2000년 5월 26일 DE Lab. 윤지영.

Association Rule Mining

COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong

Mining Frequent Patterns without Candidate Generation

Mining Sequential Patterns

Frequent-Pattern Tree

SpeedTracer: A Web usage mining and analysis tool

FP-Growth Wenlong Zhang.

SView 0.3设计

Presentation transcript:

Mining Path Traversal Patterns with User Interaction for Query Recommendation 龚赛赛 2013-05-28

Contents Background and motivation Idea of approach Problem statement Related work Sketch of our solution Discussion

Background View Entity and Navigate between Entities http://ws.nju.edu.cn/sview2/ View, Filter Collection and Navigation between Collections http://ws.nju.edu.cn/sview

Fig. b helps us know about Nanjing Better! Background Navigation via links for better understanding dbpedia:Nanjing Fig. a. Historical incident of Nanjing from SView Fig. b. New version adding the combatant information Fig. b helps us know about Nanjing Better!

Background Information Overload

Motivation Query recommendation Show some information in advance to enhance user viewing experience Explore less neighbors, while Know about the current entity better Help user find information he is interested in Filter Further navigation

Relieve user’s burden of query construction Idea of approach Relieve user’s burden of query construction Combine usage mining and user interaction Mine navigation patterns from logs and then formulate queries according to patterns Leverage user interaction in mining and query formulation e.g. Save/Hide/Modify recommended queries Currently, mining path traversal patterns combined with user interaction

Problem statement P11 P21 P31 e1 e2 e3 e6 P22 P32 T31 P12 e7 e4 L12 C1 L41,v41 L31,v31 C4 C3 e:实体 C: 集合 T:类 P, L:属性 (L,v): filter L32 e5

Problem statement Given an entity or a collection, recommend some path based queries starting from the of the entity or collection Challenges Data preparation: session, user identification, sequence generation Time and storage requirements Usefulness of queries such as interesting, preference Reasoning How to combine mining and user interaction

Frequent Path Traversal Patterns Mining Related Work Frequent Path Traversal Patterns Mining Apriori like Candidate generation and checking BFS: join and pruning Support and Confidence Reduce database scan and candidates FP-growth Hashing Partition Sampling ……

Related Work Chen at al. 98 Determine maximal forward references from logs maximal forward reference: DFS Determine large reference sequence from the set of maximal forward references join if contains or contains Determine maximal reference sequences from large reference sequences Chen et al. Efficient Data Mining for Path Traversal Patterns, IEEE transactions on knowledge and data engineering,1998

Related Work Chen at al. 98 ABCD ABEGH ABEGW AOU AOV

Related Work El-Sayed et al. FS-tree Mining frequent pattern without candidate generation by using prefix tree (FP-growth) El-Sayed et al. FS-Miner: Efficient and Incremental Mining of Frequent Sequence Patterns in Web logs. WIDM’04

Related Work El-Sayed et al. FS-tree

Related Work El-Sayed et al.

Related Work El-Sayed et al.

Related Work Multiple Level Mining Srikant R 95. Basic Idea: Give support at each level. Add ancestors of each item into the original data. Use adapted Apriori Srikant R, Agrawal R. Mining generalized association rules[M]. IBM Research Division, 1995.

Following is a draft of solution. Sketch of our solution Following is a draft of solution. For each user, mine frequent path traversal patterns by adapted method of Chen (Personalization) Data preparation Identify each session data from logs In each session, simplify filter to property by ignoring the value in the filter Apriori support: number of sessions containing the specified (sub)path pattern. Discard reasoning at present

Sketch of our solution (Continue) For each user, mine frequent path traversal patterns with user interaction Three types of interaction: bookmark(maybe give a name), hide the query of the relevant path pattern(not shown again), add the tail of the path pattern (the tail may not be logged) Bookmark and adding tail: add frequent path patterns that will in turn leveraged in candidate generation Hiding the query: add infrequent path patterns that will be in turn leveraged in pruning.

Sketch of our solution Mining frequent path traversal patterns from all users’ navigation behavior Candidate pattern: all users’ frequent path pattern A candidate path pattern is frequent if enough number of users having the pattern With a path traversal pattern, it is trivial to construct a relevant query

Discussion Restrictions: Mining from all users’ navigation behavior appears simple The value in filter is discarded The order of consecutive filters on the same collection may be not important Class information on the vertex discarded No reasoning