Download presentation
Presentation is loading. Please wait.
Published byLeona Rich Modified over 9 years ago
1
Trust and Profit Sensitive Ranking for Web Databases and On-line Advertisements Raju Balakrishnan (Arizona State University)
2
Agenda Trust and Relevance based Ranking of Web Databases for the Deep Web. Ad-Ranking Considering Mutual- Influences. Optimal Ad Ranking for Profit Maximization
3
Deep Web Integration Problem Web DB Mediator ← query Web DB Millions of Databases Containing Structured Tuples Uncontrolled Collection of Redundant Information answer tuples → ← answer tuples ← query query → Deep Web
4
Given a user query, select a subset of sources to provide most relevant and trustworthy answers. Trustworthiness: Degree of Belief in the correctness of the data Relevance: Degree by which the data satisfies the information needs of the user. Search Results must be Trustworthy and Relevant. Surface web Search combines hyper-link based PageRank and Relevance to Assure trust and relevance of results. Source Selection in Deep Web
5
Source Agreement Observations Many Sources Return Answers to the Same Query. Comparison of Semantics of the answers is facilitated by structure of the tuples Idea: Compare Agreement of Answers Returned by Different Sources to Assess the Reputation of Sources! Agreement Based Relevance and Trust assessment May be intuitively understood as a meta-reviewer assessing quality of a paper based on agreement between primary reviews. Reviewers agreed upon by other reviewers are likely to be relevant and trustworthy.
6
Agreement Implies Trust & Relevance Probability of Agreement or Two independently selected irrelevant/false tuples Probability of Agreement or two independently picked relevant and true tuples is
7
Computing Agreement between Sources Closely Related to Record Linkage Problem for Integration of databases without common domains (Cohen 98). We used a Greedy matching between tuples using Jaro-Winkler similarity with SoftTF-IDF, since this measure performs best for named entity matching (Cohen et al. 03) Agreement computed using top-5 answer tuples to sample queries (200 queries each domain). The computation complexity is ; where V is number of data sources, using top-k answers.
8
Representation: Agreement Graph Link Semantics from S i to S j with weight w: S i acknowledges w fraction of tuples in S j Sample agreement graph for the book sources. where induces the smoothing links to account for the unseen samples. R 1, R 2 are the result sets of S 1, S 2.
9
Calculating SourceRank How do I Search using the agreement graph? 1.Start on a random node 2.If he likes the result, randomly traverse a link, with a probability proportional to its weight to search an agreed database. 3.If he does not like the node, restart the search traversing a smoothing link. This is a Weighted Markov Random Walk. The visit probability of the searcher for a database is given by the stationary visit probability of the random walk on the database vertex. SourceRank is equal to this stationary visit probability of the random walk on the database vertex.
10
Combining Coverage and SourceRank Coverage of a set of tuples T w.r.t a query q Coverage is calculated using sample queries, and we used Jaro-Winkler with SoftTF-IDF similarity between the query and the tuple as the relevance measure. We combine the Coverage and SourceRank as Databases are ranked based on this Score, with.
11
Evaluations and Results Evaluated in movies and books domain web databases listed in UIUC TEL-8 repository, twenty two from each domain. Evaluation Metrics 1.Ability to remove closely related out of domain Sources. 2.Top-5 precision. (relevance evaluation) 3.Ability to remove corrupted sources (trustworthiness) 4.Time to Compute the Agreement Graph
12
1. Ranks of Out of Domain Sources
13
2. Top-5 Precision-Movies Movies Top-4 Source SelectionMovies Top-8 Source Selection 36% 40%
14
2. Top-5 Precision-Books Top-4 Source SelectionTop-8 Source Selection
15
3. Trustworthiness of Source Selection Trustworthiness-MoviesTrustworthiness-Books
16
4. Time to Compute Agreement Graph Time Vs number of SourcesTime Vs top-k tuples
17
System Implementation System Architecture Implemented as a web application. Searches real web databases http://rakaposhi.eas.asu.edu/scuba Searches Online books and movies Web Databases
18
Agenda Trust and Relevance based Ranking of Web Databases for the Deep Web. Ad-Ranking Considering Mutual- Influences. Optimal Ad Ranking for Profit Maximization
19
Ad Ranking: State of the Art Sort by Bid Amount x Relevance We Consider Ads as a Set, and ranking is based on User’s Browsing Model Sort by Bid Amount Ads are Considered in Isolation, Ignoring Mutual influences.
20
Mutual Influences Optimal Ad Ranking for Profit Maximization Three Manifestations of Mutual Influences on an Ad are 1.Similar ads placed above Reduces user’s residual relevance of the ad 2.Relevance of other ads placed above User may click on above ads may not view the ad 3.Abandonment probability of other ads placed above User may abandon search and not view the ad
21
User’s Browsing Model Optimal Ad Ranking for Profit Maximization User Browses Down Staring at the first Ad Abandon Browsing with Probability Goes Down to next Ad with probability At every Ad he May Process Repeats for the Ads Below With a Reduced Probability Click the Ad With Relevance Probability If is similar to residual relevance of goes down and abandonment probabilities goes up.
22
Optimal Ad Ranking for Profit Maximization Expected Profit Considering Ad Similarities Considering Bid Amounts ( ), Residual Relevance ( ), abandonment probability ( ), and similarities the expected profit from a set of n ads is, THEOREM: Optimal Ad Placement Considering Similarities between the ads is NP-Hard Proof is a reduction of independent set problem to choosing top k ads considering similarities. Expected Profit =
23
Dropping similarity, hence replacing Residual Relevance ( ) by Absolute Relevance ( ), Ranking to Maximize This Expected Profit is a Sorting Problem Optimal Ad Ranking for Profit Maximization Expected Profit Considering other two Mutual Influences (2 and 3) Expected Profit =
24
Optimal Ad Ranking for Profit Maximization Optimal Ranking The physical meaning RF is the profit generated for unit consumed view probability of ad Ads above have more view probability. Placing ads producing more profit per consumed view probability is intuitively justifiable. (Refer Balakrishnan & Kambhampati (WebDB 08) for proof of optimality) Rank ads in Descending order of:
25
Comparison to Yahoo and Google Yahoo! Assume abandonment probability is zero Google Assume where is a constant for all ads Optimal Ad Ranking for Profit Maximization Assumes that the user has infinite patience to go down the results until he finds the ad he wants. Assumes that abandonment probability is negatively proportional to relevance.
26
Optimal Ad Ranking for Profit Maximization Quantifying Expected Profit Proposed strategy gives maximum profit for the entire range 45.7% 35.9% Number of Clicks Zipf Random with exponent 1.5 Abandonment Probability Uniform Random as Relevance Uniform Random as Bid Amounts Uniform Random Difference in profit between RF and competing strategy is significant Bid Amount Only strategy becomes optimal at
28
Optimal Ad Ranking for Profit Maximization Contributions SourceRank Agreement based computation of relevance and trust of deep web sources. System implementation to search the deep web, and formal evaluation. Ad-Ranking Extending Expected Profit Model of Ads Based on Browsing Model, Considering Mutual Influences Optimal Ad Ranking Considering Mutual Influences Other than Ad Similarities. Thank You!
29
Deep Web Integration Roadmap
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.