Item Selection By “Hub-Authority” Profit Ranking Presented by: Thomas Su
Agenda Introduction Overview of HITS Our Approach – Hub-Authority Profit Ranking Estimate Profitability Empirical Study Experiment Results
Introduction Difficulty of Item Selection/Ranking problem – the “cross-selling effect”. Size-constrained selection Cost-constrained selection # of items selected Optimal cutoff Estimated profit Selection cost
Web Page Ranking Algorithm – HITS ( Hyperlink-Induced Topic Search ) Ranking the relevance of web pages on a given topic The Mutually Reinforcing Relationship –Hub pages –Authority Pages Started by finding a set of candidates for a given topic
Computing Hub/Authority Weights for web pages Each page i is associated with a non-negative hub weight h(i) and authority weight a(i) Before each iteration, the weights are normalized so that – a(i) 2 = 1 and h(i) 2 = 1 For each iteration, the weights are updated as follows: –a(i) = all page j that have a link to i h(j) –h(i) = all page j such that i have a link to j a(j) The iteration continues until a(i) and h(i) converge to stable values.
Hub-Authority Profit Ranking Find frequent items and 2-itemsets Each item is a node in the graph Conf(i j) forms the link between item i and j –The larger the conf(i j), the stronger the link –Every frequent item i has a link to itself since conf(i i) = 100% Item j is a good authority if it is necessary for many other items i. Item i is a good hub if it implies buying many other items j.
Model the Individual Profit Individual profit – the recorded profit of item i in all transactions. Can’t treat individual profit as the initial authority weights –since the convergence is independent of the initial weights.
Model the Individual Profit (con’t) Solution: incorporate the individual profit into links. For each iteration, –Updating authority weights: a(i) = j i prof(j) conf(j,i) h(j) –Updating hub weights: h(i) = i j prof(i) conf(i,j) a(i)
Computing the weights- Iterative Algorithm Initialize hub/authority weights to (1, 1, 1,..,1) For i = 1, 2,.., k –Update the authority weights –Update the hub weights –Normalize authority weights –Normalize hub weights Return hub/authority weights
Example Suppose we have frequent items, X, Y, and Z –prof(X) = $5 –prof(Y) = $1 –prof(Z) = $0.1 –conf(X Y) = 0.2conf(Y X) = 0.06 –conf(X Z) = 0.8conf(Z X) = 0.2 –conf(Y Z) = 0.5conf(Z Y) = 0.375
Example (con’t) X Y Z prof(X) conf(X X) = 5.0 prof(Y) conf(Y X) = 0.06 prof(X) conf(X Y) = 1.0
Example (con’t) – updating a(X) X Y Z a(X) = prof(X) conf(X X) h(X) + prof(Y) conf(Y X) h(Y) + prof(Z) conf(Z X) h(Z)
Example (con’t) prof(X) = $5, prof(Y) = $1, prof(Z) = $0.1 a(X) = 0.767, a(Y) = 0.166, a(Z) = Ranking based on authority weights is different from individual profit –The cross-selling effect increase the importance of Z
Estimate Profitability Estimate the profit generated by the selected items. Consider the transaction (A, B, C) –All items are selected –None of the items are selected –Only some of the items are selected. Compute the loss based on the statistics
Empirical Study Drug StoreSynthetic Transaction #193,99510,000 Item #26,1281,000 Total profit$1,006,970$317,579 minsupp0.1%0.05%0.5%0.1% Freq. items Freq. pairs
Experiment Results *PROFSET[4]
Experiment Results (con’t)
References [1] R. Agrawal. IBM synthetic data generator. In IBM [2] R. Agrawal, T. Imielinski, and A.N. Swami. “Mining association rules between sets of items in large database.” In SIGMOD pp , [3] R. Agrawal and R. Srikant. “Fast algoritm for mining association rules.” In VLDB, pp , September [4] T. Brijs, B. Goethals, G. Swinnen, K. Vanhoof, and G. Wets, A Data Mining Framework for Optimal Product Selection in Retail Supermarket Data: The Generalized PROFSET Model..In ACM SIGKDD, August 2000.
References [5] T. Brijs, G. Swinnen, K.Vanhoof, and G. Wets, “Using Association Rules for Product Assortment Decisions: A Case Study.” In KDD-99, page , August [6] ILOG CPLEX: [7] G. Golub and C. F. V. Loan. Matrix computations. Johns Hopkins University Press, [8] J. M. Kleinberg. Authoritative source in a hyperlink environment. In Proceedings of the 9 th ACM-SIAM Symposium on Discrete Algorithms, page ACM, [9] MINILAB: