Thoughts on Tagging & Search Marti Hearst UC Berkeley.

Slides:



Advertisements
Similar presentations
Search Engine Optimisation (SEO) by Graham Sowerby (28 th November 2013)
Advertisements

Copyright © 2014 Pearson Education, Inc. Publishing as Prentice Hall
File Management Chapter 3
Information retrieval mon jan data…. framework for today’s lecture…
Internet Vision - Lecture 3 Tamara Berg Sept 10. New Lecture Time Mondays 10:00am-12:30pm in 2311 Monday (9/15) we will have a general Computer Vision.
Search Engines. 2 What Are They?  Four Components  A database of references to webpages  An indexing robot that crawls the WWW  An interface  Enables.
Information Retrieval in Practice
Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web.
Measuring Information Architecture CHI 01 Panel Position Statement Marti Hearst UC Berkeley.
What’s up with Tag Clouds? Marti Hearst School of Information, UC Berkeley.
Social Tagging and Search Marti Hearst UC Berkeley.
SM2215 Fundamentals of New Media and Interactivity Mark Green School of Creative Media.
Thoughts on Social Tagging Marti Hearst UC Berkeley Taxonomy Bootcamp ’07 Keynote Talk.
Measuring Information Architecture Marti Hearst UC Berkeley.
Measuring Information Architecture Marti Hearst UC Berkeley.
Semi-Automated Creation of Facet Hierarchies Marti Hearst School of Information, UC Berkeley Joint work with Dr. Emilia Stoica.
A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.
Yahoo Visit Day Joint Reseach Opportunities Marti Hearst UC Berkeley School of Information.
SLIDE 1IS 202 – FALL 2003 Lecture 26: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.
Next Generation OPACs Kat Hagedorn Scott Martin Jake Glenn July 12, 2007.
Faceted Metadata for Information Architecture and Search Marti Hearst, SIMS at UC Berkeley Preston Smalley & Corey Chandler, eBay User Experience & Design.
Some Thoughts on Tagging Marti Hearst UC Berkeley.
1 I256: Applied Natural Language Processing Marti Hearst Nov 8, 2006.
UCB CS Research Fair Search Text Mining Web Site Usability Marti Hearst SIMS.
IMT530- Organization of Information Resources1 Feedback Like exercises –But want more instructions and feedback on them –Wondering about grading on these.
Measuring Information Architecture Marti Hearst UC Berkeley.
Transforming Tags to (Faceted) Tagsonomies Marti Hearst UC Berkeley School of Information This Research Supported by NSF IIS
Some Thoughts on Tagging Marti Hearst UC Berkeley.
Ideas for USA.gov Marti Hearst USA.gov & Web Best Practices Team Meeting July 29, 2009.
Information Architecture Donna Maurer Usability Specialist.
Academic Computing Services 2010 Microsoft ® Office Visio ® 2007 Training Get to know Visio.
Information retrieval thur jan data…. framework for today’s lecture…
HTML and Designing Web Pages. u At its creation, the web was all about –Web pages were clumsily assembled –Web sites were accumulations of hyperlinked.
SEO Toolkit – Part 2: On-Site Changes. SEO Has 3 Main Legs: Copyright , Subscription Site Insider a division of Anne Holland Ventures, Inc.
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
Information retrieval wed sept data…. -start at 6.45.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
1 Term Paper Mohammad Alauddin MSS (Government &Politics) MPA(Governance& Public Policy) Deputy Secretary Welcome to the Presentation Special Foundation.
-1- Philipp Heim, Thomas Ertl, Jürgen Ziegler Facet Graphs: Complex Semantic Querying Made Easy Philipp Heim 1, Thomas Ertl 1 and Jürgen Ziegler 2 1 Visualization.
Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.
The term “ ” represents media that users can easily participate in and contribute to. Forms include blogs, forums, virtual worlds, wikis and social.
Search Engine Optimization 101 What is SEM? SEO? How can I use SEO on my blogs and/or my personal web space?
Information Visualization: Ten Years in Review Xia Lin Drexel University.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
Encyclopaedia Idea1 New Library Feature Proposal 22 The Encyclopaedia.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Social Bookmarking with del.icio.us. What is del.icio.us? Social Software Store your bookmarks online Tag your bookmarks Share your bookmarks with others.
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Module 10a: Display and Arrangement IMT530: Organization of Information Resources Winter, 2008 Michael Crandall.
+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.
1 © Xchanging 2010 no part of this document may be circulated, quoted or reproduced without prior written approval of Xchanging. MOSS Training – UI customization.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Individualized Knowledge Access David Karger Lynn Andrea Stein.
INFO 330 Forward Engineering Project From User To Info.
User Modeling and Recommender Systems: Introduction to recommender systems Adolfo Ruiz Calleja 06/09/2014.
The Technical Report Hitting the ground running. Research Research is a way of… What are some everyday uses of research? What experiences have you had.
10 Effective Website Tips Luana Mattey For Professionals in Private Practice Get Online, Get Found, Get Clients.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
Web coordinator workshop. Introduction Meet and greet –Who are you and what was the last website you visited? Comms team – here for support + our role.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
Information Retrieval in Practice
Information Organization: Overview
NLP Support for Faceted Navigation in Scholarly Collections
Federated & Meta Search
Welcome to SharePoint Saturday Denver!
Introduction into Knowledge and information
Platinum Sponsors Silver Sponsors Say Thanks to our Sponsors
Information Organization: Overview
Welcome to SharePoint/O365 Saturday Kansas City!.
Presentation transcript:

Thoughts on Tagging & Search Marti Hearst UC Berkeley

Marti Hearst, Future of Search ‘07 Talk Outline: Two Main Points 1.Massive user behavior is aiding search algorithms in interesting ways. 2.Going deeper: An examination of social tagging:  The controversy  Research questions  Our work on automating creation of metadata structure

Marti Hearst, Future of Search ‘07 User-contributed content is exploding!

Marti Hearst, Future of Search ‘07 Social Information & Search  Trend: human behavioral information is getting “baked in” to search algorithms.  In many cases, the actions of the many is more useful than the actions of the individual.  Three examples follow.

Marti Hearst, Future of Search ‘07 Actions of the Many vs. Individual 1.Anchor text for improved ranking.  vs author-supplied meta-tags

Marti Hearst, Future of Search ‘07 Actions of the Many vs. Individual 2.“Clickthrough” to improve ranking.  vs. an individual’s prior clicks  Joachims et al. and Agichtein et al. found that human selections of links from search results could improve rankings for popular queries.  Some surprising rules:  Assign negative weight to an unclicked link that appears above and below a clicked link

Marti Hearst, Future of Search ‘07 Actions of the Many vs. the Individual 3.Query auto-suggest based on other users’ queries  vs based on one one’s prior queries alone

Marti Hearst, Future of Search ‘07 Social Tagging  Metadata assignment without all the bother  Spontaneous, easy, low cognitive overhead  Usually used in the context of social media

Marti Hearst, Future of Search ‘07 Popular pages on del.icio.us

Marti Hearst, Future of Search ‘07 Visitor tagging at Powerhouse Museum

Tagging is Controversial!  Sloppy!  Disorganized!  Incorrect!  Power to the people!  Easy!  Cheap!

Marti Hearst, Future of Search ‘07 Investigating social tagging and folksonomy in the art museumwith steve.museum", J. Trant, B. Wyman, WWW 2006 Collaborative Tagging Workshop Professional Cataloguer: “Everything I know isn't in the picture!”

Marti Hearst, Future of Search ‘07 The Tagging Opportunity  At last! Content-oriented metadata in the large!  Attempts at metadata standardization always end up with something like the Dublin Core  author, date, publisher,....  I think the action is in the subject metadata, and have focused on how to navigate collections given such data.

Marti Hearst, Future of Search ‘07 The Tagging Opportunity  Tags are inherently faceted !  Multiple labels are assigned to each item  Rather than placing them into a folder  Rather than placing them into a hierarchy  Concepts are assigned from many different content categories

Marti Hearst, Future of Search ‘07 Tagging Problems  The haphazard assignments lead to problems with  Synonymy  Homonymy  Unpredictability See how this author attempts to compensate:

Marti Hearst, Future of Search ‘07 Tagging Problems  Some tags are fleeting in meaning or too personal  toread todo  Tags don’t “cover” all the concepts  Tags are disorganized  Tags are not “professional”  (I personally don’t think this matters)

Marti Hearst, Future of Search ‘07 Research Questions for Tags & Search  How to improve tag convergence?  How to group tags meaningfully? How to eliminate uninteresting tags?  What is the role of user interface on tag convergence?  Preliminary evidence suggests there is a big effect  There are some good ideas out there  More experimentation is needed.  What algorithms can we use to clean up the tags after they are assigned?  There is some work here, much more can be done.  TagAssist: Automatic Tag Suggestion for Blog Posts, Sood et al., ICWSM 2007

Marti Hearst, Future of Search ‘07 Interface for adding tags on del.icio.us

Marti Hearst, Future of Search ‘07 Effects of Interface On the Structure, Properties and Utility of Internal Corporate Blogs,Kolari et al. ICWSM 2007

Marti Hearst, Future of Search ‘07 Research Questions for Tags & Search How to get tag expertise? office desk plants windows shadows Who will identify the plant species in this image?

Marti Hearst, Future of Search ‘07 Research Questions for Tags & Search  What is the relationship of social tags to automated content extraction?  Are tags more informative, or differently informative, than other labeling methods?

Marti Hearst, Future of Search ‘07 Research Questions for Tags & Society  What motivates people to tag?  Who owns the tags?  Privacy and sharing of tags?

Marti Hearst, Future of Search ‘07 Research Questions for Tags & Search  How to use tags for browsing / navigation?  Currently most tags are used as a direct index into items  Click on tag, see items assigned to it, end of story  Grouping into small hierarchies is not usually done  del.icio.us now has bundles, but navigation isn’t good  IBM’s dogear and RawSugar come the closest  One solution: organize tags into faceted hierarchies, use faceted navigation.

Faceted Metadata & Navigation

Marti Hearst, Future of Search ‘07 The Idea of Faceted Metadata  Create INDEPENDENT categories (facets)  Each facet has labels (sometimes arranged in a hierarchy)  Assign labels from the facets to every item  Example: recipe collection Course Main Course Cooking Method Stir-fry Cuisine Thai Ingredient Bell Pepper Curry Chicken

Marti Hearst, Future of Search ‘07 Faceted Navigation  A flexible, dynamic way to allow everyday users browse & search large information collections.  We’ve been investigating and promoting this at UCB since  It’s now widely used for e-commerce sites;  Digital libraries, image collections, etc., are following.  Search verticals as well  Google co-op  More info: flamenco.berkeley.edu Next Generation Web Search: Setting Our Sites, M. Hearst, IEEE Data Engineering Bulletin,IEEE Data Engineering Bulletin Special issue on Next Generation Web Search, Sept. 2000

Marti Hearst, Future of Search ‘07 Faceted Nav in eBay Express

Marti Hearst, Future of Search ‘07 Faceted Nav in Digital Libraries  NCSU has a start at it

Marti Hearst, Future of Search ‘07 Faceted Nav in eBay Express

Marti Hearst, Future of Search ‘07

Facets in Google Co-op

Marti Hearst, Future of Search ‘07 Advantages of Faceted Navigation  Gives users control and flexibility  Can’t end up with empty results sets  (except with keyword search)  Helps avoid feelings of being lost.  Easier to explore the collection.  Helps users infer what kinds of things are in the collection.  Evokes a feeling of “browsing the shelves”  Is preferred over standard search for collection browsing in usability studies.  (Interface must be designed properly)

Marti Hearst, Future of Search ‘07 Advantages of Faceted Metadata  Helps alleviate the metadata wars:  Allows for both splitters and lumpers  Is this a bird or a robin  Doesn’t matter, you can do both!  Allows for differing organizational views  Does NASCAR go under sports or entertainment?  Doesn’t matter, you can do both!

How to Create Facet Hierarchies? Our Approach: Castanet (Stoica, Hearst, & Merichar, HLT-NAACL ’07)

Marti Hearst, Future of Search ‘07 Example: Recipes (3500 docs)

Marti Hearst, Future of Search ‘07 Castanet Output (shown in Flamenco)

Marti Hearst, Future of Search ‘07 Castanet Output (shown in Flamenco)

Marti Hearst, Future of Search ‘07 Castanet Output (shown in Flamenco)

Marti Hearst, Future of Search ‘07 Example: Biology Journal Titles Castanet Output (shown in Flamenco)

Marti Hearst, Future of Search ‘07 Castanet Algorithm  Leverage the structure of WordNet Documents WordNet Get hypernym paths Select terms Build tree Compress tree Divide into facets

Marti Hearst, Future of Search ‘07 Will Castanet Work on Tags?  Class project by Simon King and Jeff Towle, 2004  1650 captions captured from mobile phones  Wanted to organize them.  Used the CastaNet algorithm  Had to first remove proper names

Marti Hearst, Future of Search ‘07 Example Photos & Captions (King & Towle) very scary x-mas treeHp presentation chasing a cat in the dark My cat

Marti Hearst, Future of Search ‘07  instrumentality, (112)(112)  vehicle (26)(26)  car (9)(9)  bike (8)(8)  vessel, watercraft (4)(4)  mayflower (2)(2)  ferry (1)(1)  gig (1)(1)  truck (3)(3)  airplane (2)(2)  device (20)(20)  machine (7)(7)  computer (4)(4)  laptop (1)(1)  sander (1)(1)  container (16)(16)  vessel (7)(7)  bottle (5)(5)  water_bottle (2)(2)  jug (1)(1)  pill_bottle (1)(1)  bath (2)(2)  bowl (1)(1)  can (2)(2)  backpack (1)(1)  bumper (1)(1)  empty (1)(1)  salt_shaker (1)(1)  furniture, piece of furniture, article of furniture (12)(12)  seat (8)(8)  bench (2)(2)  chair (2)(2)  couch (2)(2)  lounge (1)(1)  bed (4)(4)  desk (1)(1)

Marti Hearst, Future of Search ‘07 Conclusions  The actions of the many are a boon for improving search algorithms.  Social tagging is, in my view, a terrific way to get good content metadata.  I think automated techniques can do a lot to help clean them up and organize them.  They are an inherently social phenomenon, part of social media, which is a really exciting area.

Marti Hearst, Future of Search ‘07 Related Work: Automated Tag Organization  Some efforts are on tag prediction:  Mishne ’06:  Uses IR techniques to find the closest tagged documents, uses their tags to assign new tags. Measures on how well new tags predicted  Xu et al. ’06:  Use tags that have already been predicted for a document to predict which to show to a new user who is tagging the document  Some efforts on tag organization:  Brooks & Montanez ’06:  Tries to see if tags can predict document clusters, which in my book aren’t really categories  After clustering based on text they try to induce a tag hierarchy by agglomerative clustering the text. Results not described in detail  Begelman et al. ’06:  Use clustering and tag co-occurrence to find associated tags. Not clear what the organizational goal is

Thank you! For more information: flamenco.berkeley.edu