Download presentation
Presentation is loading. Please wait.
1
Neighborhood - based Tag Prediction
Adriana Budura joint work with: Sebastian Michel, Philippe Cudré-Mauroux, Karl Aberer 1 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
2
Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
Outline Motivation Principles of Tag Propagation Scoring Model Top-k Tag Inference Experimental Results Conclusions 2 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
3
Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
Motivation Tagging portals « Web 2.0 » users attach keywords (tags) to resources: flickr, del.icio.us, citeulike,… Tags: unstructured textual information reflect the meaning of resources for users powerful tool to improve search BUT: we need many tags and users are lazy Therefore…. Automatic Tag Inference 3 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
4
Neighborhood -based Tag Prediction
IDEA: copy tags from other resources Semantically related resources –> related tags How to discover semantically similar resources? Resources are connected via links (e.g., HTML, citations ) neighborhood of a resource captures its context (e.g., citations in „Related Work“ ) propagate tags along the edges of the graph How relevant is a tag found in the neighborhood? Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
5
Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
Computational Model 3 concepts: Documents the resources for which we infer tags; uniquely identifiable in our scenario: scientific publications, Web pages Tags keywords attached to the resources Document neighborhoods documents connected by users graph Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
6
How relevant is a tag found in the neighborhood?
Neighborhood defines context (far away -> less related) Enough support in the neighborhood Some tags are more likely to occur together Similar documents are likely to share the same tags Tag Distance Tag Occurence Tag Co-Occurence Document – Document Similarity Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
7
Principles of Tag Propagation e.g. Citation graph of publications
d_init Tag Occurence Doc-Doc Similarity TopK distributed IR IR ranking PageRank P2P Tag Co-Occurence Tag Distance IR distributed P2P Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
8
Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
Overview Motivation Principles of Tag Propagation Scoring Model Top-k Tag Inference Experimental Results Conclusions 8 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
9
Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
(1) Tag Co-Occurrence relevance of a tag t for d_init based on the tags already assigned to d_init ? conditional probability: d_init can have more than one initial tag => we aggregate for sets of tags T(d_init) 9 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
10
Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
(2) Doc-Doc Similarity relevance of a tag t (coming from a document d) for d_init, based on the similarity between d and d_init ? vector space model: for documents that are several hops away we aggregate 10 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
11
(3) Tag Distance / (4) Tag Occurence
the distance between the documents d_init and d with tag t smallest path Tag Occurrence what is the popularity (support) of a tag in the neighborhood expressed as a sum over all scores for a tag t 11 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
12
Putting it All Together
Combined Scoring Function: - sum of partial scores for each occurrence of a tag t in the neighborhood d_init Tag Occurence Doc-Doc Similarity, Tag Distance Tag Co-Occurence 12 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
13
Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
Overview Motivation Principles of Tag Propagation Scoring Model Top-k Tag Inference Experimental Results Conclusions 13 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
14
Inferring tags for a document
traverse the graph of documents and gather tags for the initial document do not visit the whole neighborhood need smart graph traversal the scoring model can compute a score for “every” tag top-k tags are enough … when should we stop? Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
15
Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
Graph Traversal Precomputed: Tags + Scores for each document Doc 1 Doc 1 Doc 2 P2P, 0.3 Tag, 0.28 Social, 0.25 Paper, 0.2 2009, 0.1 Social, 0.4 Search, 0.33 Budura, 0.25 Tag, 0.2 Paper, 0.2 Doc 2 D_init Select the next document based on the doc-doc similarity 15 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
16
Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
Top-K Graph Traversal List of all neighbors sorted by doc-doc sim Select best document Doc x Visited Doc x P2P, 0.3 Tag, 0.28 Social, 0.25 Paper, 0.2 2009, 0.1 Social, 0.4 Search, 0.33 Budura, 0.25 Tag, 0.2 Paper, 0.2 D_init Social, 0.65 Paper, 0.4 Tag, 0.48 P2P, 0.3 .... top-k 16 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
17
Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
Top-k Tag Inference Fagin et al. - NRA Algorithm w b for each candidate tag worst_score = actual score best_score = worst_score + best_to_come_score prune a tag when best_score < score of tag currently at rank k stop when seen k tags && no candidate tags left w b w b score (m-m‘) * Top-k, pos. k Candidate Expelled unknown final “score” mass for each tag Consider ONLY m occurences for each tag Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
18
Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
Overview Motivation Principles of Tag Propagation Scoring Model Top-k Tag Inference Experimental Results Conclusions 18 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
19
Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
Experimental Setup Datasets del.icio.us (120K bookmarks) CiteULike/CiteSeer (2200 crawled pdfs) Measures of Interest: Precision (user study) Relative precision (computed based on already assigned tags) Cost (number of visited neighbors) 19 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
20
Experimental Results: CiteULike
30 initial documents manual precision evaluation (user study) m k Precision Neighbors 3 0.73 41 5 0.65 93 7 0.55 74 0.7 124 153 0.57 247 0.72 243 257 356 20 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
21
Experimental Results: Del.icio.us
120 initial documents relative precision evaluation m k Precision Neighbors 3 0.5 1.65 5 0.42 7 0.36 1.67 21 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
22
Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
Conclusions Tag inference over edges of resource graphs 4 principles of tag propagation Scoring model Top-k tag inference with modest access to the resource graph 22 Adriana Budura “Neighborhood – based Tag Prediction” - ESWC’09
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.