A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

Slides:



Advertisements
Similar presentations
RSP Summer School14-16 September 2009 UK Institutional Repository Search: a collaborative project to showcase UK research output through advanced discovery.
Advertisements

Automating Creation of Hierarchical Faceted Metadata Structures Emilia Stoica, Marti Hearst and Megan Richardson* School of Information, Berkeley *Dept.
Information retrieval mon jan data…. framework for today’s lecture…
1 Using Words to Search a Thousand Images Hierarchical Faceted Metadata in Search & Browsing Marti Hearst SIMS, UC Berkeley Research funded by: NSF CAREER.
Semi-Automated Creation of Facet Hierarchies Marti Hearst School of Information, UC Berkeley Joint work with Dr. Emilia Stoica.
1 Pathways to Library Resources David Lindahl Director of Digital Library Initiatives Jeff Suszczynski Lead Developer.
Module 8a: Faceted Classification
Castanet: Using WordNet to Build Facet Hierarchies Emilia Stoica and Marti Hearst School of Information, Berkeley.
Measuring Information Architecture CHI 01 Panel Position Statement Marti Hearst UC Berkeley.
1 Ideas for Integrating Browsing and Search in the CDL Marti Hearst SIMS, UC Berkeley
Faceted Metadata for Site Navigation and Search Marti Hearst 12/17/2009.
Social Tagging and Search Marti Hearst UC Berkeley.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Nearly-Automated Metadata Hierarchy Creation Emilia Stoica and Marti Hearst SIMS University of California, Berkeley.
Faceted Metadata in Search Interfaces Marti Hearst UC Berkeley School of Information This Research Supported by NSF IIS
Measuring Information Architecture Marti Hearst UC Berkeley.
FROM INFORMATION, KNOWLEDGE Prof. Marti Hearst MIMS Visit Day, 2006 Some Research Projects.
Castanet: Using WordNet to Build Facet Hierarchies Emilia Stoica and Marti Hearst School of Information, Berkeley.
Measuring Information Architecture Marti Hearst UC Berkeley.
Semi-Automated Creation of Facet Hierarchies Marti Hearst School of Information, UC Berkeley Joint work with Dr. Emilia Stoica.
Yahoo Visit Day Joint Reseach Opportunities Marti Hearst UC Berkeley School of Information.
Faceted Metadata in Search Interfaces Marti Hearst UC Berkeley School of Information This Research Supported by NSF IIS
Internet Resources Discovery (IRD) IBM DB2 Digital Library Thanks to Zvika Michnik and Avital Greenberg.
SLIDE 1IS 202 – FALL 2003 Lecture 26: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.
Faceted Metadata in Search Interfaces Marti Hearst UC Berkeley School of Information This Research Supported by NSF IIS
Faceted Metadata for Information Architecture and Search Marti Hearst, SIMS at UC Berkeley Preston Smalley & Corey Chandler, eBay User Experience & Design.
Facets of a Metaproject: a case in human interface design research Human Factors and Interface Design Ransom Byers April 25, 2005.
1 Pathways to Library Resources David Lindahl Director of Digital Library Initiatives Jeff Suszczynski Lead Developer.
Thoughts on Tagging & Search Marti Hearst UC Berkeley.
UCB CS Research Fair Search Text Mining Web Site Usability Marti Hearst SIMS.
OPAL Conference, August Social Tagging, Folksonomies & Controlled Vocabularies Inviting New Access Systems to our Academic Table Margaret Maurer.
Measuring Information Architecture Marti Hearst UC Berkeley.
SIMS 213: User Interface Design & Development Marti Hearst Thurs, March 18, 2004.
Faceted Metadata in Search Interfaces Marti Hearst UC Berkeley School of Information This Research Supported by NSF IIS
Transforming Tags to (Faceted) Tagsonomies Marti Hearst UC Berkeley School of Information This Research Supported by NSF IIS
1 Flexible Search and Navigation using Faceted Metadata Prof. Marti Hearst University of California, Berkeley Search Engines Meeting, April 2002 Research.
Considering a Faceted Search-based Model Marti Hearst UCB SIMS NAS CSTB DNS Meeting on Internet Navigation and the Domain Name.
1 Using Words to Search a Thousand Images Hierarchical Faceted Metadata in Search & Browsing Marti Hearst SIMS, UC Berkeley Research funded by: NSF CAREER.
Multimedia Data Mining Arvind Balasubramanian Multimedia Lab (ECSS 4.416) The University of Texas at Dallas.
How Search Engines Work: A Technology Overview Avi Rappoport Search Tools Consulting UC Berkeley SIMS class.
Making sense of the data jumble Trinity College Library Dublin’s Discovery Solution Experience Arlene Healy & Charles Montague Digital Systems and Services.
NOBLE Digital Library. How does it work? The NOBLE Digital Library uses the DSpace platform. Image files and metadata are imported into DSpace using.
Information retrieval thur jan data…. framework for today’s lecture…
Navigating and Browsing 3D Models in 3DLIB Hesham Anan, Kurt Maly, Mohammad Zubair Computer Science Dept. Old Dominion University, Norfolk, VA, (anan,
SharePoint Users Group Content Classification Step by Step SharePoint 2007 and 2010.
Multimedia Databases (MMDB)
Using Taxonomies Effectively in the Organization v. 2.0 KnowledgeNets 2001 Vivian Bliss Microsoft Knowledge Network Group
Social scope: Enabling Information Discovery On Social Content Sites
Visual User Interfaces David Rashty. “Grasping the whole is a gigantic theme. Arguably, intellectual history’s most important. Ant-vision is humanity’s.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Information retrieval wed sept data…. -start at 6.45.
Jenn Riley Metadata Librarian Digital Library Program.
Using Taxonomies Effectively in the Organization KMWorld 2000 Mike Crandall Microsoft Information Services
Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.
Faceted Search Zhao Jing Outline  What is faceted search?  Why use faceted search?  Topics of interests  Faceted Search in Dataspace.
Faceted Navigation An Alternative to Search and Browse Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Folksonomy Folktales Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
What is an open source discover tool? is a standalone, open source software used as alternative interface to existing integrated library systems that may.
May 26-28ICNEE 2003 ARCHON: BUILDING LEARNING ENVIRONMENTS THROUGH EXTENDED DIGITAL LIBRARY SERVICES Hesham Anan, Kurt Maly, Mohammad Zubair,et al. Digital.
Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,
Supporting document use through interactive visualization of metadata Visual Interfaces to Digital Libraries JCDL 28/06/2001 Mischa Weiss-Lijn.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
NLP Support for Faceted Navigation in Scholarly Collections
SharePoint Information Architecture
Federated & Meta Search
DIGITAL LIBRARY.
Visualizing Document Collections
Document Clustering Matt Hughes.
Magnet & /facet Zheng Liang
Presentation transcript:

A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS The Problem: How to help people navigate and organize the world’s information?

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS The SIMS Solution Focus on METADATA System Support for Structured Search Search User Interfaces Cheshire Flamenco Community-based Metadata Creation MMM Content Analysis for Metadata Creation Mamba

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS Example: Search and Navigation of Large Collections Image Collections E-Government Sites Example: the University of California Library Catalog Shopping Sites Digital Libraries

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS What do we want done differently? Organization of results Hints of where to go next Flexible ways to move around … How to structure the information?

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS How to Structure Information for Search and Browsing? Hierarchy is too rigid KL-One is too complex Hierarchical faceted metadata: –A useful middle ground

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS What are facets? Sets of categories, each of which describe a different aspect of the objects in the collection. Each of these can be hierarchical. (Not necessarily mutually exclusive nor exhaustive, but often that is a goal.) Time/DateTopicRoleGeoRegion 

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS Facet example: Recipes Course Main Course Cooking Method Stir-fry Cuisine Thai Ingredient Red Bell Pepper Curry Chicken

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS How to Put In an Interface? Some Challenges: Users don’t like new search interfaces. How to show lots of information without overwhelming or confusing?

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS A Solution (The Flamenco Project) Use proper HCI methods. Organize search results according to the faceted metadata so navigation looks similar throughout –Easy to see what to go next, were you’ve been –Avoids empty result sets –Integrates seamlessly with keyword search

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS Art History Images Collection

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS Usability Studies Usability studies done on 3 collections: –Recipes: 13,000 items –Architecture Images: 40,000 items –Fine Arts Images: 35,000 items Conclusions: –Users like and are successful with the dynamic faceted hierarchical metadata, especially for browsing tasks –Very positive results, in contrast with studies on earlier iterations.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS Post-Test Comparison FacetedBaseline Overall Assessment More useful for your tasks Easiest to use Most flexible More likely to result in dead ends Helped you learn more Overall preference Find images of roses Find all works from a given period Find pictures by 2 artists in same media Which Interface Preferable For:

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS Cheshire: System Support for Metadata-based Search Cheshire is an XML/SGML Information Retrieval system using probabilistic relevance ranking Cheshire3 includes Grid-based data storage and processing support, permitting very large-scale databases and high efficiency while providing effective relevance ranked results

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS Cheshire The system is currently in production use for many JISC-funded national information services and projects in the UK including: –The Archives Hub –MerseyLibraries –Resource Discovery Network (RDN) –National Center for Text Mining (NaCTeM)

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS Mamba: Creating Classifications from Data Most approaches are associational –AKA clustering, LSA, LDA, etc. –This leads to poor results when applied to text To derive facets, need a different angle –We have a simple approach based on WordNet

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS Example: Recipes (3500 docs)

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS Stoica & Hearst ’04 WordNet-based

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS Stoica & Hearst ’04 WordNet-based

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS Stoica & Hearst ’04 WordNet-based

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS Our Approach Leverage the structure of WordNet Documents WordNet Get hypernym paths Select terms Build tree Compress tree

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS A New Opportunity Tagging, folksonomies –(flickr de.lici.ous) –People are created facets in a decentralized manner –They are assigning multiple facets to items –This is done on a massive scale –This leads naturally to meaningful associations

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS Recap Organizing and Navigating Information is a huge IT opportunity Several research projects at SIMS tackle this with a special perspective: METADATA –System support for efficient search over structured information –User interfaces using hierarchical faceted metadata –Community-based metadata creation –Automated analysis algorithms for metadata creation Thank you!