Item Selection By “Hub-Authority” Profit Ranking Ke Wang Ming-Yen Thomas Su Simon Fraser University.

Slides:



Advertisements
Similar presentations
Christoph F. Eick Questions and Topics Review Dec. 10, Compare AGNES /Hierarchical clustering with K-means; what are the main differences? 2. K-means.
Advertisements

Information Networks Link Analysis Ranking Lecture 8.
Site Level Noise Removal for Search Engines André Luiz da Costa Carvalho Federal University of Amazonas, Brazil Paul-Alexandru Chirita L3S and University.
1 Profit Mining: From Patterns to Action Ke Wang, Senqiang Zhou, Jiawei Han Simon Fraser University.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Web Search – Summer Term 2006 VI. Web Search - Ranking (cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Item Selection By “Hub-Authority” Profit Ranking Presented by: Thomas Su.
Link Analysis Ranking. How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
1 CS 430 / INFO 430: Information Retrieval Lecture 16 Web Search 2.
Authoritative Sources in a Hyperlinked Environment Hui Han CSE dept, PSU 10/15/01.
Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 21: Link Analysis.
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
1 Cross-Selling with Collaborative Filtering Qiang Yang HKUST Thanks: Sonny Chee.
Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.
Link Structure and Web Mining Shuying Wang
(hyperlink-induced topic search)
1 COMP4332 Web Data Thanks for Raymond Wong’s slides.
Algorithm: For all e E t, define X e = {w e if e G t, 1 - w e otherwise}. Measure likelihood of substructure S by. Flag S as anomalous if, where is an.
Computer Science 1 Web as a graph Anna Karpovsky.
Association Analysis (5) (Mining Word Associations)
Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.
Link Analysis HITS Algorithm PageRank Algorithm.
Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.
CS246 Link-Based Ranking. Problems of TFIDF Vector  Works well on small controlled corpus, but not on the Web  Top result for “American Airlines” query:
Motivation When searching for information on the WWW, user perform a query to a search engine. The engine return, as the query’s result, a list of Web.
The PageRank Citation Ranking: Bringing Order to the Web Larry Page etc. Stanford University, Technical Report 1998 Presented by: Ratiya Komalarachun.
HITS – Hubs and Authorities - Hyperlink-Induced Topic Search A on the left is an authority A on the right is a hub.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan Instructor: Dr. Gautam Das.
Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg Presentation by Julian Zinn.
ITCS 6265 Lecture 17 Link Analysis This lecture Anchor text Link analysis for ranking Pagerank and variants HITS.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Web Intelligence Web Communities and Dissemination of Information and Culture on the www.
Presented by, Lokesh Chikkakempanna Authoritative Sources in a Hyperlinked environment.
Link Analysis on the Web An Example: Broad-topic Queries Xin.
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
Overview of Web Ranking Algorithms: HITS and PageRank
Adaptive On-Line Page Importance Computation Serge, Mihai, Gregory Presented By Liang Tian 7/13/2010 1Adaptive On-Line Page Importance Computation.
Lecture #10 PageRank CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
Ranking Link-based Ranking (2° generation) Reading 21.
Charles Tappert Seidenberg School of CSIS, Pace University
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Information Retrieval and Web Search Link analysis Instructor: Rada Mihalcea (Note: This slide set was adapted from an IR course taught by Prof. Chris.
- Murtuza Shareef Authoritative Sources in a Hyperlinked Environment More specifically “Link Analysis” using HITS Algorithm.
1 CS 430: Information Discovery Lecture 5 Ranking.
1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴聖筌 )
Block-level Link Analysis Presented by Lan Nie 11/08/2005, Lehigh University.
An Optimization Model for Placement of Wavelength Converters to Minimize Blocking Probability in WDM Networks Authored by: SuixiangGao,XiaohuaJia Authored.
Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
Extrapolation to Speed-up Query- dependent Link Analysis Ranking Algorithms Muhammad Ali Norozi Department of Computer Science Norwegian University of.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Quality of a search engine
HITS Hypertext-Induced Topic Selection
7CCSMWAL Algorithmic Issues in the WWW
Data Mining Association Rules: Advanced Concepts and Algorithms
Lecture 22 SVD, Eigenvector, and Web Search
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
HITS Hypertext Induced Topic Selection
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Improved Algorithms for Topic Distillation in a Hyperlinked Environment (ACM SIGIR ‘98) Ruey-Lung, Hsiao Nov 23, 2000.
HITS Hypertext Induced Topic Selection
Jiawei Han Department of Computer Science
Junghoo “John” Cho UCLA
Lecture 22 SVD, Eigenvector, and Web Search
COMP5331 Web databases Prepared by Raymond Wong
--WWW 2010, Hongji Bao, Edward Y. Chang
Presentation transcript:

Item Selection By “Hub-Authority” Profit Ranking Ke Wang Ming-Yen Thomas Su Simon Fraser University

Item Selection By “Hub-Authority” Profit Ranking Ke Wang Ming-Yen Thomas Su Simon Fraser University

Ranking in Inter-related World Web pages Social networks Cross sellings

$1.5 $10 $3 $0.5 $5 $3 $15 $2 $8 100% 30% 50% 35% 60% Item Ranking with Cross-selling Effect What are the most profitable items?

The Hub/Authority Modeling Hubs i: “introductory” for sales of other items j (i->j). Authorities j: “necessary” for sales of other items i (i->j). Solution: model the mutual enforcement of hub and authority weights through links. –Challenges: Incorporate individual profits of items and strength of links, and ensure hub/authority weights converges

Selecting Most Profitable Items Size-constrained selection –given a size s, find s items that produce the most profit as a whole –solution: select the s items at the top of ranking Cost-constrained selection –given the cost for selecting each item, find a collection of items that produce the most profit as a whole –solution: the same as above for uniform cost

Solution to const-constrained selection # of items selected Optimal cutoff Estimated profit Selection cost

Web Page Ranking Algorithm – HITS (Hyperlink-Induced Topic Search) Mutually reinforcing relationship –Hub weight: h(i) =  a(j), for all page j such that i have a link to j –Authority weight: a(i) =  h(j), for all page j that have a link to i h(j) a and h converge if normalized before each iteration

The Cross-Selling Graph Find frequent items and 2-itemsets Create a link i  j if Conf(i  j) is above a specified value (i and j may be same) “Quality” of link i  j: prof(i)*conf(i  j). Intuitively, it is the credit of j due to its influence on i

Computing Weights in HAP For each iteration, –Authority weights: a(i) =  j  i prof(j)  conf(j  i)  h(j) –Hub weights: h(i) =  i  j prof(i)  conf(i  j)  a(i) Cross-selling matrix B –B[i, j] = prof(i)  conf(i, j) for link i  j –B[i, j]=0 if no link i  j (i.e. (i, j) is not frequent set) Compute weights iteratively or use eigen analysis Rank items using their authority weights

Example Given frequent items, X, Y, and Z and the table We get the cross-selling matrix B: prof(X) = $5 conf(X  Y)= 0.2conf(Y  X)= 0.06 prof(Y) = $1 conf(X  Z)= 0.8conf(Z  X)= 0.2 prof(Z) = $0.1 conf(Y  Z)= 0.5conf(Z  Y)= XYZ X Y Z e.g. B[X,Y] = prof(X)  conf(X,Y) =

Example (con’t) prof(X) = $5, prof(Y) = $1, prof(Z) = $0.1 a(X) = 0.767, a(Y) = 0.166, a(Z) = HAP Ranking is different from ranking the individual profit –The cross-selling effect increases the profitability of Z

Empirical Study Conduct experiments on two datasets Compare 3 selection methods: HAP, PROFSET [4, 5], and Naïve. HAP generate the highest estimated profit in most cases.

Empirical Study Drug StoreSynthetic Transaction #193,99510,000 Item #26,1281,000 Avg. Trans length Total profit$1,006,970$317,579 minsupp0.1%0.05%0.5%0.1% Freq. items Freq. pairs

Experiment Results *PROFSET[4]

Experiment Results (con’t)