Mining Path Traversal Patterns with User Interaction for Query Recommendation 龚赛赛 2013-05-28
Contents Background and motivation Idea of approach Problem statement Related work Sketch of our solution Discussion
Background View Entity and Navigate between Entities http://ws.nju.edu.cn/sview2/ View, Filter Collection and Navigation between Collections http://ws.nju.edu.cn/sview
Fig. b helps us know about Nanjing Better! Background Navigation via links for better understanding dbpedia:Nanjing Fig. a. Historical incident of Nanjing from SView Fig. b. New version adding the combatant information Fig. b helps us know about Nanjing Better!
Background Information Overload
Motivation Query recommendation Show some information in advance to enhance user viewing experience Explore less neighbors, while Know about the current entity better Help user find information he is interested in Filter Further navigation
Relieve user’s burden of query construction Idea of approach Relieve user’s burden of query construction Combine usage mining and user interaction Mine navigation patterns from logs and then formulate queries according to patterns Leverage user interaction in mining and query formulation e.g. Save/Hide/Modify recommended queries Currently, mining path traversal patterns combined with user interaction
Problem statement P11 P21 P31 e1 e2 e3 e6 P22 P32 T31 P12 e7 e4 L12 C1 L41,v41 L31,v31 C4 C3 e:实体 C: 集合 T:类 P, L:属性 (L,v): filter L32 e5
Problem statement Given an entity or a collection, recommend some path based queries starting from the of the entity or collection Challenges Data preparation: session, user identification, sequence generation Time and storage requirements Usefulness of queries such as interesting, preference Reasoning How to combine mining and user interaction
Frequent Path Traversal Patterns Mining Related Work Frequent Path Traversal Patterns Mining Apriori like Candidate generation and checking BFS: join and pruning Support and Confidence Reduce database scan and candidates FP-growth Hashing Partition Sampling ……
Related Work Chen at al. 98 Determine maximal forward references from logs maximal forward reference: DFS Determine large reference sequence from the set of maximal forward references join if contains or contains Determine maximal reference sequences from large reference sequences Chen et al. Efficient Data Mining for Path Traversal Patterns, IEEE transactions on knowledge and data engineering,1998
Related Work Chen at al. 98 ABCD ABEGH ABEGW AOU AOV
Related Work El-Sayed et al. FS-tree Mining frequent pattern without candidate generation by using prefix tree (FP-growth) El-Sayed et al. FS-Miner: Efficient and Incremental Mining of Frequent Sequence Patterns in Web logs. WIDM’04
Related Work El-Sayed et al. FS-tree
Related Work El-Sayed et al.
Related Work El-Sayed et al.
Related Work Multiple Level Mining Srikant R 95. Basic Idea: Give support at each level. Add ancestors of each item into the original data. Use adapted Apriori Srikant R, Agrawal R. Mining generalized association rules[M]. IBM Research Division, 1995.
Following is a draft of solution. Sketch of our solution Following is a draft of solution. For each user, mine frequent path traversal patterns by adapted method of Chen (Personalization) Data preparation Identify each session data from logs In each session, simplify filter to property by ignoring the value in the filter Apriori support: number of sessions containing the specified (sub)path pattern. Discard reasoning at present
Sketch of our solution (Continue) For each user, mine frequent path traversal patterns with user interaction Three types of interaction: bookmark(maybe give a name), hide the query of the relevant path pattern(not shown again), add the tail of the path pattern (the tail may not be logged) Bookmark and adding tail: add frequent path patterns that will in turn leveraged in candidate generation Hiding the query: add infrequent path patterns that will be in turn leveraged in pruning.
Sketch of our solution Mining frequent path traversal patterns from all users’ navigation behavior Candidate pattern: all users’ frequent path pattern A candidate path pattern is frequent if enough number of users having the pattern With a path traversal pattern, it is trivial to construct a relevant query
Discussion Restrictions: Mining from all users’ navigation behavior appears simple The value in filter is discarded The order of consecutive filters on the same collection may be not important Class information on the vertex discarded No reasoning