Searching and Browsing Using Tags Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007
Social Resource Sharing The del.icio.us paradigm. Users store links to web pages of interest along with arbitrary, user-specified tags in a server. The model is independent of the resource being shared. Music (Last.fm) Photos (Flickr) Publications (CiteULike) …
Part I: Searching
Ranking Web Search Results Two prevalent models. Ranking based on query-document similarity. TF/IDF Metadata extraction Link analysis Query independent static ranking. PageRank “Quality” based
Similarity Ranking, Take I Query q={q 1,q 2,…,q n }. Tags of URL p, T(p)={t 1,t 2,…,t m }. Define similarity as |q∩T(p)|/|T(p)|. Problems Synonymy (according to the authors) Others? Synonymy example Linux, Ubuntu and Gnome
Similarity Ranking, Take II Use tags with “similar” meaning to enrich query. Create 3 matrices M TP, tag-URL count matrix S T, tag-tag similarity matrix S P, URL-URL similarity matrix
Similarity Ranking, Take II Iterate Similarly update S P, until convergence. Then, similarity between a query q and a url p is
Social PageRank “Popular web pages are tagged by many up- to-date users, using hot tags”. Transfer popularity between entities. Define matrices M PU, M UT, M TP. Iterate
Putting It All Together Train a ranking function (RankSVM) using the following features BM25 similarity between query and url content Simple query-url tags similarity measure Complex query-url tags similarity measure PageRank Social PageRank Results Precision, NDCG at k Small improvement over BM25, up to 25% for NDCG and synthetic queries
Part II: Browsing
Tag Assisted Browsing Currently two methods for tag driven browsing Keyword search Clouds of popular tags We would like to support Semantic browsing: also present URLs annotated with similar tags Hierarchical browsing: browse in a top-down fashion
Semantic Browsing Define similarity between tags: Synonymic tags: similarity above a threshold. The synonymic tags and the tag itself defines its semantic concept. Given that the user has selected L tags, that define semantic concepts S c ={C 1,…,C L }, related URLs are:
Hierarchical Browsing Observations No neat tree structure Multiple ways to target resource URLs associated with different categories Dynamic structure: leafs can become inner nodes
Hierarchical Browsing Generating sub-tags Train a classifier to identify which of the tags in the semantic concept are sub-tags Features used: ratio of tag counts, intersection size, etc. Clustering sub-tags Ranks tags based on a complex formula Greedy clustering technique