Presentation is loading. Please wait.

Presentation is loading. Please wait.

Some Thoughts on Tagging Marti Hearst UC Berkeley.

Similar presentations


Presentation on theme: "Some Thoughts on Tagging Marti Hearst UC Berkeley."— Presentation transcript:

1 Some Thoughts on Tagging Marti Hearst UC Berkeley

2 Marti Hearst, MIT HCI ‘07 Outline  What are Tags?  Organizing Tags for Navigation  Facets and faceted navigation  How to (semi)automatically create facet hierarchies  What’s up with Tag Clouds?

3 Marti Hearst, MIT HCI ‘07 Social Tagging  Metadata assignment without all the bother  Spontaneous, easy, and tends towards single terms  Usually used in the context of social media

4 Marti Hearst, MIT HCI ‘07 Example from del.icio.us

5 Marti Hearst, MIT HCI ‘07 The Tagging Opportunity  At last! Content-oriented metadata in the large!  Attempts at metadata standardization always end up with something like the Dublin Core  author, date, publisher, … yaaawwwwnnn.  I’ve always thought the action was in the subject metadata, and have focused on how to navigate collections given such data.

6 Marti Hearst, MIT HCI ‘07 The Tagging Opportunity  Tags are inherently faceted !  It is assumed that multiple labels will be assigned to each item  Rather than placing them into a folder  Rather than placing them into a hierarchy  Concepts are assigned from many different content categories  Helps alleviate the metadata wars:  Allows for both splitters and lumpers  Is this a bird or a robin  Doesn’t matter, you can do both!  Allows for differing organizational views  Does NASCAR go under sports or entertainment?  Doesn’t matter, you can do both!

7 Marti Hearst, MIT HCI ‘07 Tagging Problems  Tags aren’t organized  Thorough coverage isn’t controlled for  The haphazard assignments lead to problems with  Synonymy  Homonymy  See how this author attempts to compensate:

8 Marti Hearst, MIT HCI ‘07 Tagging Problems / Opportunities  Some tags are fleeting in meaning or too personal  toread todo  Tags are not “professional”  (I personally don’t think this matters)  Great example from Trant:  "Anecdotal evidence also shows that ‘professional’ cataloguers find the basic description of visual elements surprisingly difficult: a curator exhibited significant discomfort during this description task. When asked what was wrong, he blurted out "everything I know isn't in the picture". Investigating social tagging and folksonomy in the art museum with steve.museum", J. Trant, B. Wyman, WWW 2006 Collaborative Tagging Workshop

9 Marti Hearst, MIT HCI ‘07 Investigating social tagging and folksonomy in the art museumwith steve.museum", J. Trant, B. Wyman, WWW 2006 Collaborative Tagging Workshop

10 Marti Hearst, MIT HCI ‘07 What about Browsing?  I think tags need some organization  Currently most tags are used as a direct index into items  Click on tag, see items assigned to it, end of story  Co-occurring tags are not shown  Grouping into small hierarchies is not usually done  del.icio.us now has bundles, but navigation isn’t good  IBM’s dogear and RawSugar come the closest  I think the solution is to organize tags into faceted hierarchies and do browsing in the standard way

11 Faceted Navigation and Flamenco

12 Marti Hearst, MIT HCI ‘07 The Problem With Hierarchy  Most things can be classified in more than one way.  Most organizational systems do not handle this well.  Example: Animal Classification otter penguin robin salmon wolf cobra bat Skin Covering Locomotion Diet robin bat wolf penguin otter, seal salmon robin bat salmon wolf cobra otter penguin seal robin penguin salmon cobra bat otter wolf

13 Marti Hearst, MIT HCI ‘07  Inflexible  Force the user to start with a particular category  What if I don’t know the animal’s diet, but the interface makes me start with that category?  Wasteful  Have to repeat combinations of categories  Makes for extra clicking and extra coding  Difficult to modify  To add a new category type, must duplicate it everywhere or change things everywhere The Problem with Hierarchy

14 Marti Hearst, MIT HCI ‘07 The Problem With Hierarchy start salmon bat robin wolf feathersfurscalesfurscalesfeathersfurscalesfeathers … Covering: swimflyrun slither Locomotion: fish rodents insects fish rodents insects fish rodents insects fish rodents insects fish rodents insects fish rodents insects fish rodents insects fish rodents insects fish rodents insects Diet: otter

15 Marti Hearst, MIT HCI ‘07 The Idea of Facets  Facets are a way of labeling data  A kind of Metadata (data about data)  Can be thought of as properties of items  Facets vs. Categories  Items are placed INTO a category system  Multiple facet labels are ASSIGNED TO items

16 Marti Hearst, MIT HCI ‘07 The Idea of Facets  Create INDEPENDENT categories (facets)  Each facet has labels (sometimes arranged in a hierarchy)  Assign labels from the facets to every item  Example: recipe collection Course Main Course Cooking Method Stir-fry Cuisine Thai Ingredient Bell Pepper Curry Chicken

17 Marti Hearst, MIT HCI ‘07 The Idea of Facets  Break out all the important concepts into their own facets  Sometimes the facets are hierarchical  Assign labels to items from any level of the hierarchy Preparation Method Fry Saute Boil Bake Broil Freeze Desserts Cakes Cookies Dairy Ice Cream Sorbet Flan Fruits Cherries Berries Blueberries Strawberries Bananas Pineapple

18 Marti Hearst, MIT HCI ‘07 Using Facets  Now there are multiple ways to get to each item Preparation Method Fry Saute Boil Bake Broil Freeze Desserts Cakes Cookies Dairy Ice Cream Sherbet Flan Fruits Cherries Berries Blueberries Strawberries Bananas Pineapple Fruit > Pineapple Dessert > Cake Preparation > Bake Dessert > Dairy > Sherbet Fruit > Berries > Strawberries Preparation > Freeze

19 The Flamenco Interface Fine Arts Museum Example

20 Marti Hearst, MIT HCI ‘07

21

22

23

24

25

26

27

28

29

30

31 Advantages of the Approach  Systematically integrates search results:  reflect the structure of the info architecture  retain the context of previous interactions  Gives users control and flexibility  Over order of metadata use  Over when to navigate vs. when to search  Allows integration with advanced methods  Collaborative filtering, predicting users’ preferences

32 Marti Hearst, MIT HCI ‘07 Advantages of Facets  Can’t end up with empty results sets  (except with keyword search)  Helps avoid feelings of being lost.  Easier to explore the collection.  Helps users infer what kinds of things are in the collection.  Evokes a feeling of “browsing the shelves”  Is preferred over standard search for collection browsing in usability studies.  (Interface must be designed properly)

33 Marti Hearst, MIT HCI ‘07 Related Work: Automated Tag Organization  Some efforts are on tag prediction:  Mishne ’06:  Uses IR techniques to find the closest tagged documents, uses their tags to assign new tags. Measures on how well new tags predicted  Xu et al. ’06:  Use tags that have already been predicted for a document to predict which to show to a new user who is tagging the document  Some efforts on tag organization:  Brooks & Montanez ’06:  Tries to see if tags can predict document clusters, which in my book aren’t really categories  After clustering based on text they try to induce a tag hierarchy by agglomerative clustering the text. Results not described in detail  Begelman et al. ’06:  Use clustering and tag co-occurrence to find associated tags. Not clear what the organizational goal is

34 Marti Hearst, MIT HCI ‘07 RawSugar  A company/website that organizes tags from blogs into facets  They are undergoing a revamp, will move to channels  However, nothing published on this  (presumably, patents filed)

35 Marti Hearst, MIT HCI ‘07

36

37

38 How to Create Facet Hierarchies? Our Approach: Castanet (Stoica & Hearst, to appear at HLT-NAACL ’07)

39 Marti Hearst, MIT HCI ‘07 Example: Recipes (3500 docs)

40 Marti Hearst, MIT HCI ‘07 Castanet Output (shown in Flamenco)

41 Marti Hearst, MIT HCI ‘07 Castanet Output (shown in Flamenco)

42 Marti Hearst, MIT HCI ‘07 Castanet Output (shown in Flamenco)

43 Marti Hearst, MIT HCI ‘07 Example: Biology Journal Titles Castanet Output (shown in Flamenco)

44 Marti Hearst, MIT HCI ‘07 Castanet Algorithm  Leverage the structure of WordNet Documents WordNet Get hypernym paths Select terms Build tree Compress tree Divide into facets

45 Marti Hearst, MIT HCI ‘07 1. Select Terms red blue  Select well distributed terms from collection Documents WordNet Get hypernym paths Select terms Build tree Comp. tree

46 Marti Hearst, MIT HCI ‘07 2. Get Hypernym Path red blue chromatic color abstraction property visual property color red, redness abstraction property visual property color blue, blueness chromatic color Documents WordNet Get hypernym paths Select terms Build tree Comp. tree

47 Marti Hearst, MIT HCI ‘07 3. Build Tree red blue chromatic color abstraction property visual property color red, redness abstraction property visual property color blue, blueness chromatic color red blue abstraction property visual property color red, redness chromatic color blue, blueness Documents WordNet Get hypernym paths Select terms Build tree Comp. tree

48 Marti Hearst, MIT HCI ‘07 4. Compress Tree Documents WordNet Get hypernym paths Select terms Build tree Comp. tree red, redness color red chromatic color blue, blueness blue green, greenness green red color chromatic color blue

49 Marti Hearst, MIT HCI ‘07 4. Compress Tree (cont.) red color chromatic color blue green color redbluegreen Documents WordNet Get hypernym paths Select terms Build tree Comp. tree

50 Marti Hearst, MIT HCI ‘07 5. Divide into Facets Divide into facets

51 Marti Hearst, MIT HCI ‘07 Disambiguation  Ambiguity in:  Word senses  Paths up the hypernym tree Sense 1 for word “tuna” organism, being => plant, flora => vascular plant => succulent => cactus => tuna Sense 2 for word “tuna” organism, being => fish => food fish => tuna => bony fish => spiny-finned fish => percoid fish => tuna 2 paths for same word2 paths for same sense

52 Marti Hearst, MIT HCI ‘07 How to Select the Right Senses and Paths?  First: build core tree  (1) Create paths for words with only one sense  (2) Use Domains  Wordnet has 212 Domains  medicine, mathematics, biology, chemistry, linguistics, soccer, etc.  Automatically scan the collection to see which domains apply  The user selects which of the suggested domains to use or may add own  Paths for terms that match the selected domains are added to the core tree  Then: add remaining terms to the core tree.

53 Marti Hearst, MIT HCI ‘07 Castanet Evaluation Method  Information architects assessed the category systems  For each of 2 systems’ output:  Examined and commented on top-level  Examined and commented on two sub-levels  Also compared to a baseline system  Then comment on overall properties  Meaningful?  Systematic?  Likely to use in your work?

54 Marti Hearst, MIT HCI ‘07 CastaNet Evaluation Results  Results on recipes collection for “Would you use this system in your work?”  # “Yes in some cases” or “yes, definitely”:  Castanet: 29/34  LDA: 0/18  Subsumption: 6/16  Baseline: 25/34  Average response to questions about quality (4 = “strongly agree”)

55 Marti Hearst, MIT HCI ‘07 Will Castanet Work on Tags?  Class project by Simon King and Jeff Towle, 2004  1650 captions captured from mobile phones  “Blocks with Grandpa”, “Weezer”, “A veterans day tour of berkeley in front of south hall.”, “Bad photo”, “Kitchen”, “Jgj ”  Wanted to organize them.  Use the CastaNet wordnet-based facet-hierarchy creation algorithm  by Stoica & Hearst, to appear at HLT-NAACL ’07  Had to first remove proper names

56 Marti Hearst, MIT HCI ‘07 Example Photos & Captions (King & Towle) very scary x-mas treeHp presentation chasing a cat in the dark My cat

57 Marti Hearst, MIT HCI ‘07  instrumentality, (112)(112)  vehicle (26)(26)  car (9)(9)  bike (8)(8)  vessel, watercraft (4)(4)  mayflower (2)(2)  ferry (1)(1)  gig (1)(1)  truck (3)(3)  airplane (2)(2)  device (20)(20)  machine (7)(7)  computer (4)(4)  laptop (1)(1)  sander (1)(1)  container (16)(16)  vessel (7)(7)  bottle (5)(5)  water_bottle (2)(2)  jug (1)(1)  pill_bottle (1)(1)  bath (2)(2)  bowl (1)(1)  can (2)(2)  backpack (1)(1)  bumper (1)(1)  empty (1)(1)  salt_shaker (1)(1)  furniture, piece of furniture, article of furniture (12)(12)  seat (8)(8)  bench (2)(2)  chair (2)(2)  couch (2)(2)  lounge (1)(1)  bed (4)(4)  desk (1)(1)

58 Marti Hearst, MIT HCI ‘07 Research Questions for Tags & Search  The role of interface on tag convergence  There seems to be a big effect  Would be really interesting to experiment with this  Also, for facet grouping  Anchor text vs. tags?  How are they the same; how do they differ?  How to get tag expertise?  Right now, in many cases it is least-common- denominator  ESP-game

59 What’s up with Tag Clouds? What does a typical tag cloud look like?

60 Marti Hearst, MIT HCI ‘07 Definition Tag Cloud: A visual representation of social tags, organized into paragraph-style layout, usually in alphabetical order, where the relative size and weight of the font for each tag corresponds to the relative frequency of its use.

61 Marti Hearst, MIT HCI ‘07 Definition Tag Cloud: A visual representation of social tags, organized into paragraph -style layout, usually in alphabetical order, where the relative size and weight of the font for each tag corresponds to the relative frequency of its use.

62 Marti Hearst, MIT HCI ‘07 flickr’s tag cloud

63 Marti Hearst, MIT HCI ‘07 del.icio.us

64 Marti Hearst, MIT HCI ‘07 del.icio.us

65 Marti Hearst, MIT HCI ‘07 blogs

66 Marti Hearst, MIT HCI ‘07 ma.gnolia.com

67 Marti Hearst, MIT HCI ‘07 NYTimes.com: tags from most frequent search terms

68 Marti Hearst, MIT HCI ‘07 IBM’s manyeyes project

69 Marti Hearst, MIT HCI ‘07 Amazon.com: Tag clouds on term frequenies

70 Marti Hearst, MIT HCI ‘07 Alternative: “Semantic” Layout  Improving Tag- Clouds as Visual Information Retrieval Interfaces, Yusef Hassan- Monteroa, 1 and Víctor Herrero- Solana, InSciT2006  Tags grouped by “similarity, based on clustering techniques and co-occurrence analysis”

71 Marti Hearst, MIT HCI ‘07 I was puzzled by the questions:  What are designers and authors’ intentions in creating or using tag clouds?  How do they expect their readers to use them?

72 Marti Hearst, MIT HCI ‘07 On the positive side:  Compact  Draws the eye towards the most frequent (important?) tags  You get three dimensions simultaneously!  alphabetical order  size indicating importance  the tags themselves

73 Marti Hearst, MIT HCI ‘07 Weirdnesses  Initial encounters unencouraging  Some reports from industry:  Is the computer broken?  Is this a ransom note?

74 Marti Hearst, MIT HCI ‘07 Weirdnesses  Violates principles of perceptual design  Longer words grab more attention than shorter  Length of tag is conflated with its size  White space implies meaning when there is none intended  Ascenders and descenders can also effect focus  Eye moves around erratically, no flow or guides for visual focus  Proximity does not hold meaning  The paragraph-style layout makes it quite arbitrary which terms are above, below, and otherwise near which other terms  Position within paragraph has saliency effects  Visual comparisons difficult (see Tufte)

75 Marti Hearst, MIT HCI ‘07 Weirdnesses  Meaningful associations are lost  Where are the different country names in this tag clouds?

76 Marti Hearst, MIT HCI ‘07 Weirdnesses Which operating systems are mentioned?

77 Marti Hearst, MIT HCI ‘07 Tag Cloud Study (1)  First part compared tag cloud layouts  Independent Variables:  Tag size  Tag proximity to a large font  Tag quadrant position  Task: recall after a distractor task  13 participants; effects for size and quadrant  Second part compared tag clouds to lists  11 participants  Tested recognition (from a set of like words) and impression formation  Alphabetical lists were best for the latter; no differences for the former Getting our head in the clouds: Toward evaluation studies of tagclouds, Walkyria Rivadeneira Daniel M. Gruen Michael J. Muller David R. Millen, CHI 2007 note

78 Marti Hearst, MIT HCI ‘07 Tag Cloud Study (2)  62 participants did a selection task  (find this country out of a list of 10 countries)  Independent Variables:  Horizontal list  Horizontal list, alphabetical  Vertical list  Vertical list, alphabetical  Spatial tag cloud  Spatial tag cloud, alphabetical  Order for non-alphabetical not described  Alphabetical fastest in all cases, lists faster than spatial  May have used poor clouds (some people couldn’t “see” larger font answers)  An Assessment of Tag Presentation Techniques; Martin Halvey, Mark Keane, poster at WWW 2007.

79 Marti Hearst, MIT HCI ‘07 A Justifying Claim  You get three dimensions simultaneously!  alphabetical order  size indicating importance  the tags themselves … but is this really a conscious design decision?

80 Marti Hearst, MIT HCI ‘07 Solution: Celebrity Interviews  I was really confused about tag clouds, so I decided to ask the people behind the puffs  15 interviews, conducted at foocamp’06  Several web 2.0 leaders  5 more interviews at Google and Berkeley

81 Marti Hearst, MIT HCI ‘07 A Surprise  7 interviewees DID NOT REALIZE that alphabetical ordering is standard.  2 of these people were in charge of such sites but had had others write the code  What was the answer given to “what order are tags shown in?”  hadn’t thought about it  don’t think about tag clouds that way  random order  ordered by semantic similarity  Suggests that perhaps people are too distracted by the layout to use the alphabetical ordering

82 Marti Hearst, MIT HCI ‘07 Suggested main purposes:  To signal the presence of tags on the site  A good way to get the gist of the site  An inviting and fun way to get people interacting with the site  To show what kinds of information are on the site  Some of these said they are good for navigation  Easy to implement

83 Marti Hearst, MIT HCI ‘07 Tag Clouds as Self-Descriptions  Several noted that a tag cloud showing one’s own tags can be evocative  A good summary of what one is thinking and reading about  Useful for self-reflection  Useful for showing others one’s thoughts  One example: comparing someone else’s tags to own’s one to see what you have in common, and what special interests differentiate you  Useful for tracking changes in friends’ lives  Oh, a new girl’s name has gotten larger; he must have a new girlfriend!

84 Marti Hearst, MIT HCI ‘07 Tag Clouds as showing “Trends”  Several people used this term, that tag clouds show trends in someone’s behavior  Trends are usually patterns across time, which are not inherently visible in tag clouds  To note a trend using a tag cloud, one must remember what was there at an earlier time, and what changed  tracking the girls’ names example  This suggests a reason for the importance of the large tags – draws one’s attention to what is big now versus was used to be large.  Suggests also why it doesn’t matter that you can’t see small tags.

85 Marti Hearst, MIT HCI ‘07 New Perspective: Tag Clouds are Social!  It’s not about the “information”!  Not surprising in retrospect; tagging is in large part about the social aspect  Seems to work mainly when the tags can be seen by many  Even better when items can be tagged by many and seen by many  What does this mean though when tag clouds are applied to non-social information?

86 Marti Hearst, MIT HCI ‘07 Follow-up Study  Informed by the interview results, we search for, read, and coded web pages that mentioned tag clouds.  Looked at about 140 discussions  Developed 21 codes  Looked at another 90 discussions  Used web queries: “tag clouds”, usability tag clouds, etc  Sampled every 10 th url  58% personal blogs  20% commercial blogs  10% commercial web pages  rest from group blogs and discussion lists  Doesn’t tell us what people who don’t write about tag clouds think.

87 Marti Hearst, MIT HCI ‘07 The Role of Popularity  Popularity in the sense that tag clouds (and tagging) are trendy and popular.  Some people liked the visualization, but their popularity made them less appealing  Famous post: “Tag clouds are the new mullets”  Led to self-consciousness about liking them  Many complained about unaesthetic cloud designs  Little consensus on if they are a fad or have staying power  Popularity also in the sense of the large font size for more popular tags  Many people like the prominence of large tags, but several commented on the tyranny of the popular

88 Marti Hearst, MIT HCI ‘07 The Role of Navigation  Opinions vary  Many simply state they are useful for navigation, but with no support for this claim  Some claim the compactness makes navigation easier than a vertical list  Some object to the varying font size on scannability  Others object to the lack of organization  Overall, there is no evidence either way that we could find in the blog community

89 Marti Hearst, MIT HCI ‘07 Aesthetic Considerations  Disagreement on the aesthetic and emotional appeal, especially for lay users.  Those who like them find them fun and appealing  Those who don’t find them messy, strange, like a ransom note  Informal reports with first time users who are not in the Web 2.0 community are negative

90 Marti Hearst, MIT HCI ‘07 Trends again  As in the interviews, the benefit of “trends” was mentioned many times.  There is another sense of “trend” as “tendency or inclination,” and this might be what people mean.

91 Marti Hearst, MIT HCI ‘07 Summary of Stated Reasons for Tag Clouds (Note: some refuted by studies)

92 Marti Hearst, MIT HCI ‘07 Tag Clouds as Social Information  An emphasis that tag clouds are meant to show human behavior.  We found reports of people commenting on other uses that were invalid because they did not reflect live user input:  One blogger noted the incongruity of an online library using keyword frequencies in a tag cloud rather than having it reflect patron’s usage of the collection.  An online community noticed one site’s cloud didn’t change over time and realized the sizes were decided by marketing. This was greated with derision.

93 Marti Hearst, MIT HCI ‘07 Implications  Assume tag clouds are meant to reflect human mental activity (individual or group)  Then what might seem design flaws from an information conveyance perspective may not be  A large part of the appeal is the fun and liveliness.  The informality of the layout reflects the human activity beneath it.

94 Marti Hearst, MIT HCI ‘07 Judith Donath, CACM 45(4), 2002 “Traditional data visualization focuses on making abstract numbers and relationships into concrete, spatialized images; the goal is to highlight important patterns while also representing the data accurately. This is a fine approach for social scientists studying the dynamics of online interactions. Yet for our purpose it is also important that the visualization evoke an appropriate intuitive response representing the feel of the conversation as well as depicting its dynamics”

95 Marti Hearst, MIT HCI ‘07 Judith Donath, CACM 45(4), 2002 “[O]ne argument for deliberately designing evocative visualizations for online social environments is the existing default textual interfaces are themselves evocative, they simply evoke an aura of business-like monotony rather than the lively social scene that actually exists.''

96 Marti Hearst, MIT HCI ‘07 Tag Cloud Alternatives Provided by Martin Wattenberg

97 Marti Hearst, MIT HCI ‘07 Conclusions  Social tagging is, in my view, a terrific way to get good content metadata.  I think automated techniques can do a lot to help clean them up and organize them.  They are an inherently social phenomenon, part of social media, which is a really exciting area.  The socialness of social media can yield surprises, like tag clouds.


Download ppt "Some Thoughts on Tagging Marti Hearst UC Berkeley."

Similar presentations


Ads by Google