Download presentation
Presentation is loading. Please wait.
1
Why We Tag and How We Tag:
Understanding User Tagging Behavior on a Chinese Social Sharing Site Li He
2
What is Tagging? Collaborative social tagging: assign free-form descriptive terms, also known as tags, to resources Social aspect of tagging: publicly tag one’s own content and browse the annotated and categorized content of other people Personal aspect of tagging: manage one’s resources by his/her own vocabulary and categorization methods
3
What is Tagging? Collaborative tagging offers –
An alternative to having an authority, such as a librarian, to perform categorizing and indexing, making users the ones who impact how the whole community perceives the content A new way to personalize the process of information organization and information exchange. Users have much more freedom to handle and make sense of their own information as well as that of the others.
4
Tagging in China: Chinese websites have been following the collaborative tagging trend. Although the adoption rate of tagging by Chinese Internet users was reported to be as low as 2.3% by the beginning of 2007, more and more Chinese Blog Service Provider, social bookmarking sites and social sharing sites similar to Flikr and Youtube have incorporated tagging, and it now becomes almost a standard feature on social sites. It could be expected that tagging will be a more common practice for Chinese Internet users.
5
Purpose of Study: This study analyzes a Chinese social site Douban.com, the most influential Chinese book, music and movie recommendation community. The research focus on the tagging activity on Douban, and will try to find out: Usage patterns and behaviors of annotation and tagging on Douban Douban users' motivations for tagging content in this system Comparison of Chinese taggers and Western taggers on the above-discussed aspects
6
Related Work 1: Golder & Huberman
Analyzed the structure of Deli.cio.us and examined how it evolves over time with two data sets, URLs and a random sample of 229 users, derived in a 5-day timeframe. Major findings: The number of bookmarks a user has created and the number of tags they used have a weak relationship Users' tag lists gradually grow as they discover new interests and add new tags to categorize and describe them There is a stability in the relative proportions of tags within a given URL Tags can be categorized into 7 possible kinds
7
Related Work 2: Marlow et al.
Described a model of tagging systems that consists of three individual elements: resources, users and tags Provided a 7-dimesion taxonomy of tagging systems for design Categorized user tagging incentives into two high-level practices and six potential motivations Findings from studying Flikr with the above frameworks: The number of bookmarks a user has created and the number of tags they used have a weak relationship The interaction between user, tag, and usage varied a lot
8
Other Related Works: Ames & Naaman: interviewed Flikr users; offered another taxonomy of motivations for tagging along the two dimensions of sociality and function; suggest that social incentives for tagging appear to be particularly important in motivating users to tag Kipp: compared tagging on Del.icio.us and CiteULike to traditional cataloging; suggests that users employ a wide variety of conventions in constructing tags, which extends beyond the traditional objectives of subject access, and expresses a dynamic relationship between document and user, and between subject and task Zollers: studied Amazon.com and Last.fm; concludes 3 emergent social motivations, include expression, performance, and activism for tagging.
9
Douban.com: “Douban” literally means one individual pea in a peapod, but it actually is the name of a Hutong (narrow alley, a traditional architecture unique to Beijing) where the site founder lives. The main purpose of Douban, as it is stated on the site, is to: "… help you to find the people who shares your interests in movie, music, and books, and discover more good stuff through them." [1] [1] Translated from Chinese from:
10
Douban.com: A resource page on Douban
Douban's system aggregates the bibliographic information of worldwide books, music albums and movies [2] from various databases (Amazon.com, imdb, etc), and then present each of them as a resource on Douban with a unique page. The content of such a page is created by both the system and the users. On one hand, the system displays the harvested bibliographic information, and generates a recommendation list of 10 related resources. It also displays the users who have saved the resource, and the top 6 interest groups they belong to. On the other hand, users collaboratively contribute to the ratings, tags, and reviews of the resource. [2] Recently Douban includes the aggregation of blogs, but this function is comparatively new and much less used, so it is not taken into account for this study.
11
Douban.com A user’s (me) main page on Douban
Once registered, a user is allowed to save, tag, rate and comment on the resources. In this way, s/he can build his/her own collection of books, movies and music, which are displayed on the users’ own pages. On a user’s main page, one can browse through all types of resources the user has saved, his/her contact list and the interest groups s/he belongs to, and read the reviews the user has written.
12
Douban.com: A user’s (me) book page on Douban
By clicking into the user’s, e.g. book page, one can see the books saved by the user, along with the tags s/he gave to each book and a complete tag list for the book collection. The music or movie page is of the same fashion.
13
Tagging on Douban.com Douban has the following dimensions according to the taxonomy of tagging system suggested by Marlow et al: Tags on Douban have the same classification as by the resource in the system, so there are movie tags, music tags and book tags. For example, the tag "China", is hence considered as different by the system when it is assigned to a book or a movie. If the user clicks on this tag among book tags, the results will only be all the books that have been assigned "China". This principle also applies to a user's tags, which means each user will have three independent pools of tags. By selecting a tag from the book tag pool, the user or browser can filter the book collection so that only the books with that tag are displayed. The user can manage his/her tag pools, including rename or deletion of tags, and change (add or remove) the tags previously given to a saved resource. Tags are searchable, but within the same kind of tags only.
14
Tagging on Douban.com Suggestive tagging interface:
Tagging on Douban takes place in this fashion: when the user saves a resource, e.g. a book, a within page interface will pop up for the user to rate the book and assign tags. The user's top 50 most highly used tags for books and the top 10 most highly used tags to this book will be shown for suggestions.
15
Tagging on Douban.com Douban manifests the social and personal characteristics of collaborative tagging system found on similar English sites such as Del.icio.us and Flikr. The design decisions of the system imply that tags and tagging have three meanings on Douban: They can be used for Personal Information Management as users interact with their tag pools to categorize and retrieve their own collections. They help support interest discovery and sharing by linking users to one another through tags assigned to resources. They provide additional access points for searching and navigation since the system’s default search for a resource is not full-text based, and will only match the search terms with the bibliographic information harvested and indexed by the system.
16
Methodology: 1. Data Collection
Three sets of Douban data were derived in a 30-day timeframe, from October 13th to November 12th. Data Set 1: 7 Douban Users. For each user, the following data were collected on a daily base: Total number of tags in the tag pools; Total number of distinct tag (tag that has been used only once); Total number of saved resources; On the last day of the data collection period, the 7 users' tags were all derived for a content analysis 7 users were selected using an accidental sampling technique. The selection criteria are that the user should at least have more than 500 resources in his/her whole collection and the total number of tags from the three pools should exceed 200. This sampling obviously suffered from bias, but it provides a basis for further discussion of possible behavior patterns that may be common beyond the current participants. .
17
Methodology: 1. Data Collection
Data Set 2: Tags The top 5 most highly used tags of each resource type were collected and their numbers of total usage were recorded on a daily base. The total number of tags of all three tag types, defined as the total number of tags[1] in the system, and the total number of Douban users were also recorded at the same frequency. A one-time data collection of the numbers of tags that fall into different usage ranges (e.g. used 1 time, used 1,000 – 4,999 times) were conducted for movie, book and music tags. [1] The actual total number of tags tends to be smaller than the ones reported here due to the overlap of tags between the tags in each tag types. For example, the tag "America" can be found in both music tag and movie tag.
18
Methodology: 1. Data Collection
Data Set 3: Resources. 1. The following data were collected on a daily base for 5 books, 5 music albums and 5 movies : Total number of tags assigned the resource Total number of times a tag is assigned to the resource, for each of the most popular 8 tags that are displayed on the resource's page Total number of users that have saved this resource 2. The resource(s) saved by the 7 users during the data collection period and the most popular 8 tags assigned to them For each resource type, 4 were the top 4 cumulatively popular items, under the assumption that a popular resource would be highly tagged; and 1 was the current most popular item, considering that it would manifest a more obvious evolving pattern.
19
Methodology: 2. Content Analysis
Coding schema The content analysis that aims to examine the term usage patterns of tags was conducted using a coding schema. The schema is primarily developed from Marlow's and Golder's categorization of tag functions and the Dublin Core Metadata Element Set. After an initial pass over of 60 resources (20 books, 20 music albums and 20 movies) and an examination of the top 50 most popular tags for each resource type, the schema was refined. The table shows the coding schema, which has two levels. The first level includes descriptor, keyword and personal. Tags coded as Descriptor are assumed to be nouns that describe the hard facts of the resource. Keyword tags would presumably reflect the various relevant facets of the resources. Personal tags, on the other hand, express the relationship built by the tagger with the resource, and could be highly subjective and could even be regarded as non-relevant to the essence of the resource. With this schema, the top 8 most highly used tags of the collected resources and all the tags of the 7 users were coded and analyzed.
20
Methodology: 3. Interview
The invitation for an interview was first sent to the 7 users that have been observed for one-month. Only 3 were willing to accept the invitation. The invitation was then sent to another group of users who met the original selection criteria. In the end, a total of 8 users (all mainland Chinese) were interviewed independently using Instant Messengers. The interview sessions lasted from minutes. The questions in the interview are all open-ended ones. Interviewees were asked about their tagging habit, and opinions about tags, tagging and tagging system. The conversations were transcribed and then translated into English for further analysis. It should be noted that the sampling of users is not scientific, so the outcome of these interviews should be considered only as opinions and anecdotes rather than representative view. Nevertheless, it could be useful for those interested in the phenomenon.
21
Results & Discussions: 1. General Tag Usage
As might be expected, the quantity of tags on Douban grows steadily as both the numbers of users and system resources grow. There are many factors that may contribute to the increase of tags. A user may continue to add novel tags to describe the resources s/he just found and saved, and a new resource in the system would attract users to describe it. But what about the usage about this large amount of tags? The right table shows the number of movie tags in different usage frequency ranges. The similar distribution pattern is also found for music and book tags. Over half of the tags are used only once, whereas averagely 64 tags for each kind of resource are much more highly used than others.
22
Results & Discussions: 1. General Tag Usage
The left graph shows the growth of the 15 most popular tags of the three resource types during one month. The average daily growth rate is 0.05%. And the more popular a tag, the more likely it is to be used, which corroborates with the frequency distribution discussed above. But is it simply because of the snowball effect of popularity or is it because these tags have better descriptive power or provide more helpful categorizations? A coding of the tags was thus conducted. As the table on the right shows, Keyword tags appear more, but the mean rank for Category tags(9.125) and Subject tags(9) is lower than that of the Region (5), a Descriptor tag. All the Region tags are movie tags except “Taiwan”, which is a music tag. The popular tags all have low specificity. This is understandable because they are more easy to think of when people tag. V=movie tag; B=book tag; M=music tag
23
Results & Discussions: 2. Tag Usage for Resources
Does a resource continue to receive novel tags as more people have saved it? Will the number of tags for a resource be endless? The data presented here suggest a positive regression. Since the time the resource becomes available in the system would influence the tag growth, the items chosen for each resource type respectively dated back from 2 month, 2 year to around 10 years. Take the three book tags as an example, B1 (“The Da Vinci Code”, translated version) has been published for about 2 years; B3 (“Norwegian Wood”, translated version) was published 9 years ago; whereas B4 is a recently published bestseller. It can be seen that tags for B4 has a stronger growing tendency, but people still come up with new tags to describe B3 despite the already big vocabulary it acquires. The same pattern can be found for Movie Tag. For example, V3 (“The Shawshank Redemption”), has an average of almost 4 new tags add to its over 2000 tag pool everyday. Note that the nature of the resource will also affect the total number of tags and tag growth. As the flatness and closeness of M3, M4 and M5 suggest, Music seems to be more difficult for people to tag. In later interviews with the users, several did confirm this assumption. “An album may have several subjects, and I don’t know how to categorize it if I’m not familiar with the music genre”, said a respondent. The graph shows the growth of the total number of tags of 9 resources. Each point on the graph shows the total number of tags (Y-axis) at a daily increase of people who have saved the resource (X-axis). In general, the quantity continues to increase at a steady rate.
24
Results & Discussions: 2. Tag Usage for Resources
Golder and Huberman found the same stability in their study and attributed it to user imitation and shared knowledge. These two explanations can be applied to the Douban environment. Douban’s tagging interface, as introduced above, shows users the tags most commonly used by others to tag the resource. Thus users are attempted to follow the main trend. Moreover, as later content analysis of tags suggests, the most popular tags generally reflect the basic levels of the resource and hence easily gain more agreement. *note*: The proportion is calculated as the number of times a tag is used divided by the number of times the 8 tags are used. The left graph shows the proportion changes of each of the 8 most popular tags for one resource. Obviously, each tag’s use frequency is a nearly fixed proportion. The most popular one tends to grow a little bit more, but the less used ones remain quite stable. This conclusion is again confirmed when we look at the proportion changes of the most popular tag of the 15 resources. The lines are all fairly flat.
25
Results & Discussions: 2. Tag Usage for Resources
Besides the fact that creator/contributor names may be among the first things that come into a tagger’s mind, the heavy portion of Name tag can also be explained by the greater variation of names, especially for non-Chinese names. During the coding, it was found that non-Chinese resources will usually have more than one Name tags among the top 8 that actually address the same person (such as “Sylvia_Plath” and “SylviaPlath”, and the different translations of the name also add to ). As to Chinese names, what’s interesting is that the nickname of the person, especially celebrity, very often co-exists with the real name, which suggest people’s showing of affection for the person. *note*: The percentage is calculated by weight, that is, the number of times used. The graph shows the coding results of the 8 most popular tags for each of the 210 resources, which were saved by the 7 users during the data collection period. 37 are books, 52 are music albums and 121 are movies. Some obvious patterns include: The most popular tags for a resource generally include the title of the resource, its creator/contributor, and where it is produced, which is quite similar to the index terms a professional indexer would use. Name tags and Category tags are usually the most frequently used tags; Subject tags are much more highly used for movies and books.
26
Results & Discussions: 2. Tag Usage for Resources
As the above comparison suggests, Descriptor tags with general meanings will likely be used by many taggers. Assigning Name tags, Region tags or Title tags require less mental effort as they are very evident aspects of a resource. The prevalence of Category tags, on the other hand, seems to imply that categorization as an important motivation for tagging. However, during the coding, it is again found that these popular Category tags do not have high specificity. Another interesting is that, the tag “movie” is among the top 8 for almost all the movie resources examined. But movie is already one of the default classification of the system for all resources and a user’s collection. Can we thus assume that this is due to users’ limited understanding of tagging? The graph shows the proportion of each kind of tags for movie resources of different popularity. Opinion tags and Self Reference tags gradually go off the main stage as more people save the resource. While the proportions of Name tags and Category tags remain almost the same; Title tags, on the other hand, exhibit an apparent increase.
27
Results & Discussions: 3.1 User Activities
As previous studies on English users’ tagging behavior have found, people vary greatly in their tagging practices. The same phenomenon is found on Douban users. Though the writer didn’t have a big enough user sample, the 7 observed users’ resource and tag collection sizes show quite a degree of variance, which supports Golder’s Del.icio.us study. For example, User5 has 1810 resources but 250 tags only, while User6 has 625 resources along with 634 tags. Also, the percentage of distinct tags in a user’s tag pool is not consistent. Take User4 as an example, s/he has 540 tags and 2904 resources in total, and 69% of the tags have been used more than ones. On the other hand, 62% of User7’s 801 tags are never used twice for a collection of 2004 resources. In general, the tag pool size of the 7 observed users is growing over time as they add more resources. But their resource and tag collection sizes and the relationship between the two show quite a degree of variance. Some users have a large resource collection but a small tag pool, while some build their tag pools faster than saving resources.
28
Results & Discussions: 3.1 User Activities
The graph is the coding result of the total tags of the 7 users. The left part shows the percentage of each kind of tag by weight (times of use), the right part shows the percentage by the quantity of that kind of tag. The constitution of a user’s tag pool is quite similar to that of the most popular tags for a resource. Again, Name tags take up the largest part, and are frequently used. The two Keyword tags, Category and Subject tags are also very often assigned to resources, followed by Region tags. Opinion tags and Self Reference tags are comparatively much less used. Two users don’t have any Personal tag at all.
29
Result & Discussion: 3.1 User Activities
Percentages of Keyword tags in the tag pool generally increase when calculated by weight, while that of the Name tag decrease. This is extremely obvious for User3. Besides name variations discussed above, it is possible that Name tags serve more as additional access points for others and, while Descriptor tags, especially Category tags are used for retrieval.
30
Results & Discussions: 3.2 User Opinions
Tagging Motivations: 1. Personal Information Management All respondents stated that their primary purpose of tagging is to better manage their collection for future retrieval. 2. Interest Discovery Only 1 respondent mentioned this motivation; others agreed it could be one of the motivations in the follow-up question, but they didn’t consciously tag for this purpose. Surprisingly, other motivations suggested by the literature in the English world, such as Attract Attention, Self Presentation, Performance, Communication were not recognized by the respondent. 3 respondent said that using tagging for self expression is “silly” and “meaningless”. “You could write a blogpost or attach a review in stead”, commented a respondent. Another respondent said such highly Personal tags would “create confusion”, are “rubbish tags” that hinder retrieval. However, as the writer deliberately searched for some “weird” tags and traced the taggers from these tags, a number of unconventional taggers did emerge. And another interesting thing is that people either have no personalized tags at all or have all their tags personalized that would make little sense to others.
31
Results & Discussions: 3.2 User Opinions
Tagging Habits: 1. Random or systematic? Only 1 respondent replied that he always tags randomly. Others stated that they have a clear idea of what to tag and how to make use of tags. A summary of their replies suggests: They become much more systematic after tagging for a period of time, and would follow certain rules when they tag, such as using a consistent format for non-Chinese names, control the number of Category tags, assigning the same group of tags (Name, Region, Category, etc) according to the resource type (Movie, Book or Music). They would be as exact as possible when they tag a resource, whereas comprehensiveness is less often pursued because they are more concerned about tagging the facet of the resource that interest them. They show a willingness to contribute tags to resources they have interests in, so as to create access points for others.
32
Results & Discussions: 3.2 User Opinions
Tagging Habits: 2. Tag preference The respondents generally consider good tags as those helpful for faster retrieval and finding their special interests, such as a non-mainstream music/movie genres or less popular persons. Name tags and Category tags are highly preferred. While most respondents expressed much less interest in Personal tags, 2 people stated that some tags are too general to help with retrieval and they intended to use more personalized tags in the future, such as specifying categorizations. “Tagging after all, is a personal way of organizing one’s information. If you just do as the others, it would lose the flavor of this fun stuff”, said one respondent. “I found some of the tags are over-used, and they become meaningless as they umbrella more and more resources. So I want to use more unique tags in the future”, said another respondent.
33
Results & Discussions: 3.2 User Opinions
Tagging Habits: 3. Interactions with the tagging system All respondents said they will use the tags suggested by the system if they find the tags appropriate. Respondents seldom or never specifically search for a tag. The preference of tag cloud or tag list for tag display varied by person. Those who like tag cloud said it looks fun and gives a direct impression of the person’s interests; while those that prefer tag list think it is more clear and in a better order.
34
Results & Discussions: 3.2 User Opinions
Tagging Habits: 4. Tag management Several respondents said that they were clueless when they first knew about tagging, having no idea of what tagging can do for them. And they just followed others. As they gained more experience, more systematic tagging habits were developed. They would then notice there is a certain amount of repetitive (functionally or semantically) tags and tags that they have little use of. But they find it “difficult”, “tiring” and “very time-consuming” to edit them, and would do so only when they are really idle. Main tag management activities include: Fix factual errors. Such as to correct spelling mistakes and inconsistent name formats. Combine or break down Category tags. Refine Subject tags. Such as to reduce synonyms. The difficulties experienced by users to manage their tag should be taken into account by system designers. If users have more efficient and satisfactory interaction with their tags, it may improve the quality of social tagging and thus ultimately benefit the whole community as we consider the social aspect of tagging.
35
Results & Discussions: 3.3 Possible Types of Tagger
Imitating Tagger: these taggers have recently discovered tagging and are not yet sure about its use, so they simply follow others. It seems that most people will go through this stage and will probably evolve into one of the next two types of taggers, or abandon tagging. Serious Tagger: these taggers act more like professional cataloger or indexer when they tag. They rationally create and make use of tags for information organization and retrieval by establishing certain rules (either consciously or unconsciously) and with a long-range approach (such as better define their Category tags). Playful Tagger: these taggers think of tagging as a random and fun thing to do rather than a formal way to categorize one’s information, and are less concerned with tag quality.
36
Conclusion: 1. Similarities with Western Taggers
In general, Chinese users exhibit similar usage patterns of social tagging system as Western users. The same regularities in user activity, tag frequencies, kinds of tags used, a stability in the relative proportions of tags for a given tagged resource are found. Most people use tagging for Personal Information Management, but tags also help them find things they have interests in.
37
Conclusion: 2. Differences with Western Taggers
Comparatively, when Chinese people tag, they tend to be more “conservative” and conventional for their preference of terms and kinds of tags to use, and they consider self-expression in tagging less important for them. This may be attributed to culture differences and the Internet censorship in China: Individualism is not highly deemed in Chinese culture and personal opinions and self-expression are less appreciated. With a collectivistic mind, people are more concerned about the effects of unconventional action and self-presentation. Being unique to draw attention is more likely to be seen as “showing off” than “being cool”. Under the government’s Internet censorship and surveillance system, the Great Firewall of China, people would be rather cautious with their online activities. Taggers that create highly personalized tags do exists, only that they are not the minority.
38
Future Work: How can system better support different types of tagger?
System support of tagging could become offensive because too much machine interference may abuse the individuality and “democratic” nature of tagging. The environment that tagging takes place would be an important factor for the design of system support. What can catalogers or indexers learn from social tagging practices for future organization of information services? “Traditional taggers” (librarians, indexers, etc) may benefit by looking at the social tagging practices of their patrons. It has been recognized that the distance between the standard language and public popular language, or the semantic gap, will impede resource discovery and hence the degree that audiences engage with the information repository. So social tagging and folksonomy could be helpful to gain a better understanding of user’s perceptions of the information domain, by offering an opportunity for the institutions to connect with individuals. Of course, it will be ridiculous to observe the highly divergent Del.icio.us users, but enterprises or organizations with specific user groups may find this approach worth taking.
39
Reference Ames, M., and Naaman, M. (2007). Why We Tag: Motivations for Annotation in Mobile and Online Media. Proceedings of the SIGCHI conference on Human factors in computing systems, San Jose: ACM Press,2007. Golder, S., and Huberman, B. (2006). Usage patterns of collaborative tagging systems. Journal of Information Science, 32(2), Hammond, T., Hannay, T., Lund, B. and Scott, J. Social Bookmarking Tools – A General Overview. D-Lib Magazine11, 4 (April 2005) Kipp, Margaret E. I. and Campbell, D. Grant (2006a) Patterns and Inconsistencies in Collaborative Tagging Systems : An Examination of Tagging Practices. In Proceedings Annual General Meeting of the ASIST, Austin, Texas (US). Kipp, M. (2006). Complementary or Discrete Contexts in Online Indexing : A Comparison of User, Creator, and Intermediary Keywords., Canadian Journal of Information and Library Science. Kipp, Margaret E. I. and Cool: Tagging for Time, Task and Emotion. In: Proc. Information Architecture Summit Las Vegas. Marlow, C., Naaman, M., Boyd, D., and Davis, M. (2006). HT06, Tagging Paper, Taxonomy, Flickr, Academic Article, ToRead. Proceedings of Hypertext 2006, New York: ACM Press,2006. Voß, J. (2007). Tagging, Folksonomy & Co - Renaissance of Manual Indexing? In: Osswald, A.; Stempfhuber, M.; Wolff, C. (Eds.): Open Innovation. Proc. 10th International Symposium for Information Science. Constance: UVK, Zollers, A. (2007). Emerging Motivations for Tagging: Expression, Performance, and Activism. 16th International World Wide Web Conference (WWW2007), Retrieved on October 7, 2007, from
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.