Efficient Top-k Querying over Social-Tagging Networks Ralf Schenkel, Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Xavier Parreira,

Efficient Top-k Querying over Social-Tagging Networks Ralf Schenkel, Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Xavier Parreira, Gerhard Weikum Max-Planck-Institut für Informatik, Saarbrücken, Germany École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland SIGIR’08, 2008, Singapore 2009. 1. 8 Summarized and Presented by Seungseok Kang, IDS Lab.

Copyright  2008 by CEBT Outline  Introduction  Social Network Model Relations Among Nodes  Social Scoring Model  Query Processing Preprocessing Algorithmic Framework  Experiments  Conclusion

Copyright  2008 by CEBT Introduction  Online communities have become popular for publishing and searching contents Social tags and derived user-specific scores can be leveraged for searching relevant content – “wisdom of the crowds” Queries for tag or keyword that compute and rank the top-k result face a large variety of options  Problems Fast growth of communities and the very high rate of content production and tagging efforts call for highly efficient and scalable methods Existing algorithms do not consider user relationships and the assets from social tagging  Goals Developing an incremental top-k algorithm with two-dimensional expansions – Social expansion: strength of relations – Semantic expansion: relatedness of different tags

Copyright  2008 by CEBT Social Network Model  Common graph model User, tag, and document (node) Exhibiting various relationships (edge) – Nodes of the same type – Nodes of different types

Copyright  2008 by CEBT Social Network Model (cont’d)  Relations among nodes of the same type Friendship(User1, User2, FriendshipStrength) – The trust of on user in another user – Transitive connection TagSimilarity(Tag1, Tag2, TagSim) – Different tag words may express synonyms Linkage(Document1, Document2, Weight) – Hyperlink bet’n documents (Web documents) – Geographic proximity of different contents (photos)

Copyright  2008 by CEBT Social Network Model (cont’d)  Relations among nodes of different type DocContent(Document, Tag, ContentScore) – By annotating a document with tag – ContentScore reflects how well the tag describes the document Tagging(User, Tag, TagScore) – Capturing the topics that the user is interested in – TagScore reflects how intensively a tag has been used by one user Rating(User, Document, RatingScore) – Rating the document explicitly by user – An authorship of a content item (e.g. Bookmarks)  Top-k query processing focuses on the Friendship and TagSimilarity scores

Copyright  2008 by CEBT Social Scoring Model  Basic notations Query Q(u, q 1 …q n ) where query initiator u, set of tags q 1 …q n Number of times user u used tag t for document d tf u (d,t) Friendship similarity function F u (u’) where user u, u’ Social frequency function sf u (d,t) where tag t, document d, user u  Social scoring model extends the traditional IR scoring model A measure for the importance of users, relative to the querying user A context-specific tag frequency relative to the querying user that reflects the relative importance of users The expansion of query tags with related tags

Copyright  2008 by CEBT Social Scoring Model (cont’d)  Friendship Similarity F u (u’) The importance of a user, relative to the querying user The probability that a random document that was tagged by u’ will be interesting to u Overlap-based similarity (direct friends) Aggregating similarities Along each path (Indirect friends) Final definition

Copyright  2008 by CEBT Social Scoring Model (cont’d)  Social Frequency sf u (d,t) Reflecting the similarity of the users who tagged a document that may be of interest to the querying user Replacing tf in traditional IR scoring model – Considering friendship similarities Number of times user u used tag t for document d Combination of tf u (d,t) and Friendship Similarity Quantitative ratings or user feedbacks Weighted global tern frequency Depends on the friendship strengths of the querying user

Copyright  2008 by CEBT Social Scoring Model (cont’d)  Social Score for Tags s u (d,t) Score of a document d with respect to a single tag t relative to the querying user u  Tag expansions Careful expansions of query tags to “semantically” related tags  Social score for queries s u (d,q) The score for entire query with multiple tags where k 1 is a tunable coefficient and idf(t) is the inverse document frequency of tag t where df(t) denotes the number of documents that were tagged with t bt ay least one user

Copyright  2008 by CEBT Query Processing  ContextMerge algorithm One of the well-established threshold algorithm over impact-ordered inverted lists for efficient top-k query processing It is impossible to precompute per-tag scores for each document and each user – Social score depends on the user Basic step – 1. builds social frequencies incrementally by considering users that are related to the querying user in descending order of friendship similarity – 2. computes upper and lower bounds for the social score – 3. stops the execution as soon as it can be guaranteed that the best k document have been identified

Copyright  2008 by CEBT Query Processing (cont’d)  Preprocessing ContextMerge makes use of four different kinds of preprocessed index list that are built at indexing time – DOCS(t) global tag frequencies tf(d,t) – USERDOCS(u,t) specific tag frequency tf u (d,t) – FRIENDS(u) All related users u’ Their similarity P u (u’) – SIMTAGS(t) All similar tags t’ Their similarity tsim(t,t’)

Copyright  2008 by CEBT Query Processing (cont’d)  Algorithmic framework Sequential scan – Scanning the FRIENDS and DOCS list – Compute the upper / lower bounds of social score for each DOCS and USERDOCS – Add upperbound as bestscore(d) and lowerbound as worstscore(d) in the candidate queue Candidate management and termination test – Terminating when max. best score of the candidate queue and the best score of any unseen document is not larger than min. worstscore Random access (RA) – To look up missing scores of candidates in DOCS and USERDOCS – Much more expensive than SA Tag Expansion – Generates a new DOCS[i][j] and FRIENDS[i][j] – Final bestscore and worstscore is multiplied with tag similarity tsim(q,t)

Copyright  2008 by CEBT Experiments  Data Collections Del.icio.us – 12,389 users, 175,754 bookmarks, 2,781,096 tags, 152,306 relations Flickr – 52,347 users, about 10,000,000 images, 29,111,183 tags, 1,293,777 friendship connections – Two users are friends if one of them commented the other’s picture LibraryThing – 9,986 users, 6,435,605 books, 14,295,693 tags, 17,317 relations – Two users are friends if they marked same interesting libraries  Benchmarking queries User-specific ground truth User-study with manual relevance assessments

Copyright  2008 by CEBT Conclusion  Social search is a promising direction to increase user-perceived query result quality  Social network and scoring model Effective scoring model for user-centric searches  ContextMerge algorithm Algorithm for efficiency evaluate queries in social network Combining a top-k algorithm with dynamic tag expansion Up to an order of magnitude faster in terms of measured runtime and cheaper in terms of abstract cost than the standard baseline method

Efficient Top-k Querying over Social-Tagging Networks Ralf Schenkel, Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Xavier Parreira,

Similar presentations

Presentation on theme: "Efficient Top-k Querying over Social-Tagging Networks Ralf Schenkel, Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Xavier Parreira,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Efficient Top-k Querying over Social-Tagging Networks Ralf Schenkel, Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Xavier Parreira,

Similar presentations

Presentation on theme: "Efficient Top-k Querying over Social-Tagging Networks Ralf Schenkel, Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Xavier Parreira,"— Presentation transcript:

Similar presentations

About project

Feedback