Download presentation
Presentation is loading. Please wait.
Published byNorma Rice Modified over 9 years ago
1
Discovery of Aggregate Usage Profiles for Web Personalization Bamshad Mobasher, Honghua Dai, Tao Luo, Miki Nakagawa, Yuqing Sun, Jim Wiltshire WebKDD 2000
2
System Architecture
3
Data Abstractions Drafts from W3C Web Characterization Activity(WCA) user TERM clickstream server session DEFINITION pageview user session episode A single individual that is accessing file from one or more Web servers through a browser Every file that contributes to the display on a user’s browser at one time. It is usually associated with a single user action. A sequential series of page view requests The click-stream of pageviews for a single user across the entire web The set of pageviews in a user session for a particular web site Any semantically meaningful subset of a user or server session.
4
Typical Web Usage Mining Preprocessing
5
Example A BCDE F G H O P T IL J Q KN M R S USER1 : A B F O G A D USRE2 : A B C J USRE3 : L R
6
Usage Mining After preprocessing, we will have –A set of n pageview records, P = { p 1, p 2, …, p n } –A set of m user transactions, T = { t 1, t 2, …, t m } Each transaction can be viewed as n-dimensional vector t = Goal of Usage Mining –Aggregate Usage profiles representing groups of different user behaviors. –Each item in a usage profile is a URL representing a relevant pageview object, and can have an associated weight representing its significance within the profile.
7
Transaction Clustering Use k-means algorithm to partition this this pageview space into different clusters. PACT(Profile Aggregations on Clustering Transactions) Given a transaction cluster c, construct a usage profile prc. pr c = { | p P, weight(p,pr c ) } weight(p,pr c ) = Σ w(p,t) 1 |C| tctc
8
Pageview Clustering (1/2) Use Apriori algorithm to find frequent item sets. Use (ARHP)Association Rule Hypergraph Partitioning to find aggregate profiles. Hypergraph H = (V,E) V : pageview set E : weighted frequent itemsets A B C D E F G H I J K L M N O P Q R 0.6 0.4 0.7 0.6 average confidence
9
Pageview Clustering (2/2) A B C D E F G H I K L M N P Q R 0.6 0.4 0.7 0.6 1 2 2 Fitness(C) = Σ e C Weight(e) Σ| e ∩ C | Weight(e) J O Connectivity(v) = | {e| e C, v e}| |{e|e C}|
10
Recommendation Given a usage profile C, we can represent C as a vector C = { w 1 c, w 2 C, …,w n C } W i c = Given current active session S, S= weight(p i,C), if p i C 0, otherwise match(S,C) = Σw k c s k Σ(s k ) 2 Σ(w k c ) 2 Rec(S,p) = weight(p,C)match(S,C)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.