Item Selection By “Hub-Authority” Profit Ranking Presented by: Thomas Su.

Slides:



Advertisements
Similar presentations
Hyper search ing the Web Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins Jacob Kalakal Joseph CS.
Advertisements

Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
LOGO Association Rule Lecturer: Dr. Bo Yuan
Information Networks Link Analysis Ranking Lecture 8.
Privacy Preserving Association Rule Mining in Vertically Partitioned Data Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
Association Rules l Mining Association Rules between Sets of Items in Large Databases (R. Agrawal, T. Imielinski & A. Swami) l Fast Algorithms for.
Rakesh Agrawal Ramakrishnan Srikant
Web Search – Summer Term 2006 VI. Web Search - Ranking (cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Association Rule Mining Part 2 (under construction!) Introduction to Data Mining with Case Studies Author: G. K. Gupta Prentice Hall India, 2006.
Item Selection By “Hub-Authority” Profit Ranking Ke Wang Ming-Yen Thomas Su Simon Fraser University.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 March 23, 2005
Link Analysis Ranking. How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would.
Data Mining Association Rules Yao Meng Hongli Li Database II Fall 2002.
Authoritative Sources in a Hyperlinked Environment Hui Han CSE dept, PSU 10/15/01.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 21: Link Analysis.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 5: Association Rules, Sequential Associations.
LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.
Authoritative Sources in a Hyperlinked Environment By: Jon M. Kleinberg Presented by: Yemin Shi CS-572 June
1 Cross-Selling with Collaborative Filtering Qiang Yang HKUST Thanks: Sonny Chee.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 April 2, 2006
Summarization of Frequent Pattern Mining. What is FPM? Why being frequent is so important? Application of FPM Decision make/Business Software Debugging.
Statistical Analysis of Transaction Dataset Data Visualization Homework 2 Hongli Li.
Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏
Link Structure and Web Mining Shuying Wang
(hyperlink-induced topic search)
Link Analysis. 2 HITS - Kleinberg’s Algorithm HITS – Hypertext Induced Topic Selection For each vertex v Є V in a subgraph of interest: A site is very.
Prestige (Seeley, 1949; Brin & Page, 1997; Kleinberg,1997) Use edge-weighted, directed graphs to model social networks Status/Prestige In-degree is a good.
Link Analysis HITS Algorithm PageRank Algorithm.
Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg Presentation by Julian Zinn.
ICMLC2007, Aug. 19~22, 2007, Hong Kong 1 Incremental Maintenance of Ontology- Exploiting Association Rules Ming-Cheng Tseng 1, Wen-Yang Lin 2 and Rong.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
2015/10/111 DBconnect: Mining Research Community on DBLP Data Osmar R. Zaïane, Jiyang Chen, Randy Goebel Web Mining and Social Network Analysis Workshop.
Presented by, Lokesh Chikkakempanna Authoritative Sources in a Hyperlinked environment.
Web Mining Class Nam Hoai Nguyen Hiep Tuan Nguyen Tri Survey on Web Structure Mining
Link Analysis on the Web An Example: Broad-topic Queries Xin.
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
Overview of Web Ranking Algorithms: HITS and PageRank
Lecture #10 PageRank CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
Hypersearching the Web, Chakrabarti, Soumen Presented By Ray Yamada.
Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.
Link Analysis Rong Jin. Web Structure  Web is a graph Each web site correspond to a node A link from one site to another site forms a directed edge 
Data Mining Find information from data data ? information.
Association Rule Mining
Ranking Link-based Ranking (2° generation) Reading 21.
Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg ACM-SIAM Symposium, 1998 Krishna Venkateswaran 1.
『 Personalization of Supermarket Product Recommendations 』 김용수.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
- Murtuza Shareef Authoritative Sources in a Hyperlinked Environment More specifically “Link Analysis” using HITS Algorithm.
2016/2/131 Structural and Temporal Analysis of the Blogosphere Through Community Factorization Y. Chi, S. Zhu, X. Song, J. Tatemura, B.L. Tseng Proceedings.
Block-level Link Analysis Presented by Lan Nie 11/08/2005, Lehigh University.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
Extrapolation to Speed-up Query- dependent Link Analysis Ranking Algorithms Muhammad Ali Norozi Department of Computer Science Norwegian University of.
Data Mining Find information from data data ? information.
HITS Hypertext-Induced Topic Selection
7CCSMWAL Algorithmic Issues in the WWW
Gyozo Gidofalvi Uppsala Database Laboratory
Inf 723 Information & Computing
I don’t need a title slide for a lecture
Information retrieval and PageRank
Improved Algorithms for Topic Distillation in a Hyperlinked Environment (ACM SIGIR ‘98) Ruey-Lung, Hsiao Nov 23, 2000.
Jiawei Han Department of Computer Science
Junghoo “John” Cho UCLA
COMP5331 Web databases Prepared by Raymond Wong
Presentation transcript:

Item Selection By “Hub-Authority” Profit Ranking Presented by: Thomas Su

Agenda Introduction Overview of HITS Our Approach – Hub-Authority Profit Ranking Estimate Profitability Empirical Study Experiment Results

Introduction Difficulty of Item Selection/Ranking problem – the “cross-selling effect”. Size-constrained selection Cost-constrained selection # of items selected Optimal cutoff Estimated profit Selection cost

Web Page Ranking Algorithm – HITS ( Hyperlink-Induced Topic Search ) Ranking the relevance of web pages on a given topic The Mutually Reinforcing Relationship –Hub pages –Authority Pages Started by finding a set of candidates for a given topic

Computing Hub/Authority Weights for web pages Each page i is associated with a non-negative hub weight h(i) and authority weight a(i) Before each iteration, the weights are normalized so that –  a(i) 2 = 1 and  h(i) 2 = 1 For each iteration, the weights are updated as follows: –a(i) =  all page j that have a link to i h(j) –h(i) =  all page j such that i have a link to j a(j) The iteration continues until a(i) and h(i) converge to stable values.

Hub-Authority Profit Ranking Find frequent items and 2-itemsets Each item is a node in the graph Conf(i  j) forms the link between item i and j –The larger the conf(i  j), the stronger the link –Every frequent item i has a link to itself since conf(i  i) = 100% Item j is a good authority if it is necessary for many other items i. Item i is a good hub if it implies buying many other items j.

Model the Individual Profit Individual profit – the recorded profit of item i in all transactions. Can’t treat individual profit as the initial authority weights –since the convergence is independent of the initial weights.

Model the Individual Profit (con’t) Solution: incorporate the individual profit into links. For each iteration, –Updating authority weights: a(i) =  j  i prof(j)  conf(j,i)  h(j) –Updating hub weights: h(i) =  i  j prof(i)  conf(i,j)  a(i)

Computing the weights- Iterative Algorithm Initialize hub/authority weights to (1, 1, 1,..,1) For i = 1, 2,.., k –Update the authority weights –Update the hub weights –Normalize authority weights –Normalize hub weights Return hub/authority weights

Example Suppose we have frequent items, X, Y, and Z –prof(X) = $5 –prof(Y) = $1 –prof(Z) = $0.1 –conf(X  Y) = 0.2conf(Y  X) = 0.06 –conf(X  Z) = 0.8conf(Z  X) = 0.2 –conf(Y  Z) = 0.5conf(Z  Y) = 0.375

Example (con’t) X Y Z prof(X)  conf(X  X) = 5.0 prof(Y)  conf(Y  X) = 0.06 prof(X)  conf(X  Y) = 1.0

Example (con’t) – updating a(X) X Y Z a(X) = prof(X)  conf(X  X)  h(X) + prof(Y)  conf(Y  X)  h(Y) + prof(Z)  conf(Z  X)  h(Z)

Example (con’t) prof(X) = $5, prof(Y) = $1, prof(Z) = $0.1 a(X) = 0.767, a(Y) = 0.166, a(Z) = Ranking based on authority weights is different from individual profit –The cross-selling effect increase the importance of Z

Estimate Profitability Estimate the profit generated by the selected items. Consider the transaction (A, B, C) –All items are selected –None of the items are selected –Only some of the items are selected. Compute the loss based on the statistics

Empirical Study Drug StoreSynthetic Transaction #193,99510,000 Item #26,1281,000 Total profit$1,006,970$317,579 minsupp0.1%0.05%0.5%0.1% Freq. items Freq. pairs

Experiment Results *PROFSET[4]

Experiment Results (con’t)

References [1] R. Agrawal. IBM synthetic data generator. In IBM [2] R. Agrawal, T. Imielinski, and A.N. Swami. “Mining association rules between sets of items in large database.” In SIGMOD pp , [3] R. Agrawal and R. Srikant. “Fast algoritm for mining association rules.” In VLDB, pp , September [4] T. Brijs, B. Goethals, G. Swinnen, K. Vanhoof, and G. Wets, A Data Mining Framework for Optimal Product Selection in Retail Supermarket Data: The Generalized PROFSET Model..In ACM SIGKDD, August 2000.

References [5] T. Brijs, G. Swinnen, K.Vanhoof, and G. Wets, “Using Association Rules for Product Assortment Decisions: A Case Study.” In KDD-99, page , August [6] ILOG CPLEX: [7] G. Golub and C. F. V. Loan. Matrix computations. Johns Hopkins University Press, [8] J. M. Kleinberg. Authoritative source in a hyperlink environment. In Proceedings of the 9 th ACM-SIAM Symposium on Discrete Algorithms, page ACM, [9] MINILAB: