2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007 Advisor: Hsin-Hsi Chen Reporter: Y.H Chang Rui Li, Shenghua Bao, Yong Yu, Zhong Su, and Ben Fei Shanghai JiaoTong University IBM China Research Lab
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations2 Outline Introduction ELSABer overview Components of ELSABer Enhanced models Experimental results Conclusion
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations3 Introduction Today, a lot of services (e.g., Del.icio.us, Filckr) have been provided for helping users to manage and share their favorite URLs and photos based on social annotations. How to effectively find desired resources from large annotation data is a new problem. In this paper, we propose a novel algorithm, namely Effective Large Scale Annotation Browser (ELSABer), to browse large-scale social annotation data.
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations4 Introduction ELSABer helps the users browse huge number of annotations in a semantic, hierarchical and efficient way. By incorporating the personal and time information, ELSABer can be further extended for personalized and time-related browsing.
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations5 A set of pages related to the current annotation “programming” The prototype system based on ELSABer Sub-tags (sub category) of “programming”
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations6 ELSABer overview Input An empty concept set S C Step 1 Output the initial view of annotations –generates TOP 100 tags from 2000 most frequently URLs and tags. –They are the roots in hierarchical browsing. Loop User select a tag T i Step 2 Concept Matching –Add tag T i to set S C –Calculate related tag set and URL set Step 3 (optional) sample URL set and sample Tag set Step 4 Hierarchical Browsing –4-1 Calculate candidate sub-tags –4-2 Rank the sub-tags by Infor-score IF Termination condition Satisfied; Return ELSE Loop
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations7 Components of ELSABer Data setup and representation Semantic Browsing –a. Annotation Similarity Estimation –b. Generating the Semantic Concept Hierarchical Browsing –c. Sub-Tag Generation –d. Sub-Tag Clustering Efficient Browsing
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations8 Data setup and representation Del.icio.us (May, 2006) We define an annotation as a quadruple: –(User, URL, Tag, Time). Associated matrix M mxn m and n is the total number of tags and URLs |URL(t i )| represents the number of URLs annotated by tag t i. C ij denote the number of users who annotate the jth URL with the ith tag Like the TFIDF of IR
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations9 Data setup and representation Given the associated matrix M mxn : T1 T2. Tm the tag can be represented as a row vector Ti (U1,U2,.. Un) of M the URL can be represented as a column vector Ui (t1,t2,…,tm) of M. U1 U2.. ….. Un
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations10 Semantic Browsing a. Annotation Similarity Estimation Similarity: Special case-1(stemming): Ex: Programs & Programming => add 0.1 weight Special case-2(punctuation): Ex: Web-dev & WebDev => add 0.08 weight
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations11 Semantic Browsing b. Generating the Semantic Concept Given the selected tag ti, we choose a tag set STi that is most related to ti by following rules: –1. tj should be among the N most similar tags related to ti –2. The similarity should be larger than a threshold θ. –N=4, θ=0.7 semantic concept Ci = STi ∪ {ti}
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations12 Semantic Browsing b. Generating the Semantic Concept The path of user’s clicking: t 1, t 2,…,t L will bring a sequence of concepts: C 1, C 2,…,C L. Let concept set S C = {C 1, C 2,…, C L }. The related URLs : –ReURL(S C ) = {u | ∀ C ∈ S C,T(u) ∩C ≠ Φ} –T(u) means the set of annotations given to URL u. the related tags can be defined as all the tags given to ReURL(S C ): –ReTag(S C ) {t | u ∈ ReURL(S C ),t ∈ T(u)}
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations13 Hierarchical Browsing c. Sub-Tag Generation If the intersection URL set is the main part of all the URLs of ti, but a small part of tj, we can infer that ti is a sub-tag of tj 40 related tags of “google”
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations14 Hierarchical Browsing c. Sub-Tag Generation Features Coverage of Tags ICR Intersection Rate IR’ IRR Top 1~30 =1 (by IR rank) Top 30~60 =2 Top 60~ =3 U(ti) denotes the number of URLs tagged with ti
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations15 Hierarchical Browsing c. Sub-Tag Generation Given the features above, each related tag is represented as a feature vector. A decision tree can be derived from the manually labeled data set to predict the sub-tag relations using C4.5.
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations16 Hierarchical Browsing d. Sub-Tag Clustering
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations17 Hierarchical Browsing d. Sub-Tag Clustering Infor(t) = w1TFIDF(t) + w2ICS(t) + w3TE(t) Intra-Cluster Similarity: –ot denotes the centroid of all the URLs associated with the tag Tag Entropy: In our experiment, these weights are 0.58, 0.27, and 0.13, respectively.
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations18 Efficient Browsing Observation : People use popular tags to annotate URLs and also the popular URLs are annotated by the majority of tags.
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations19 Efficient Browsing So we can get good results efficiently by running our algorithm in a small sub tagging space. In our experiment, we sampling 2000 most frequently annotated URLs and 2000 most frequently tag, so the size of M is 2000 × 2000 After a sequence of click by the user, the intention of the user will be more specific, this causes a decreasing number of related URLs or related tags. When the number is less than 2000, all the tags and URLs will be calculated
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations20 Enhanced Models User’s profile: The user interested annotations and resources can be found as follows: Ri denotes the vector representation of a resource, and Ti denotes the vector representation of Ai. Adjust the sampling and ranking algorithms according to the user’s preference: –Infor (t,U) = α × Infor (t) + β ×UI (t | P(U))
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations21 Enhanced Models Given the user required time interval TI= [ts, te]. We define the match of the URL’s time sequence TS and the user required time interval TI as follows: θ =0.5
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations22 Experiment results The scale of the dataset: Machine: Intel Pentium IV 3.0 GHz, 1GB memory, 2 processors Java Lucene API is also used to build URL and Tag index. Del.icio.us (May, 2006) 1,736,268 web pages 269,566 different annotations
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations23 Experiment results Red tag: owned by user Orange tag: recommended
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations24 Experiment results
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations25 Conclusion Our main contributions: The proposal of the effective algorithm – ELSABer based on the analysis of social annotation’s characteristics. The proposal of enhanced models for personalized and time related browsing.
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations26 Future work more user studies emphasize on how to find more qualified URL resources utilize existing hierarchical structures such as ODP and WordNet for helping construct more meaningful hierarchical structures for social annotations.
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations27 Thank you!!