Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.

Similar presentations


Presentation on theme: "Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April."— Presentation transcript:

1

2 Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April 4, 2000

3 Organizing Search Results List OrganizationCategory Org (SWISH) Query: jaguar

4 Outline Background Using category structure to organize information SWISH System Searching With Information Structured Hierarchically Text classification User interface User Study Future Work

5 Using Category Structure To Organize Information Superbook, Cat-a-Cone, etc. To Help Web Search Yahoo!, Northern Light What’s New in SWISH? Automatic categorization of new documents User interface that tightly couples hierarchical category structure with search results User study for the new user interface

6 SWISH System Combines the Advantages of Manually crafted & easily understood directory structure Broad coverage from search engines System Components Text classification models User interface

7 Text Classification Assign documents to one or more of a predefined set of categories E.g., News feeds, Email - spam/no-spam, Web data Manually vs. automatically Inductive Learning for Classification Training set: Manually classified a set of documents Learning: Learn classification models Classification: Use the model to automatically classify new documents

8 Category Structure (spring 99) 13 top-level categories 150 second-level categories Training Set ~50k web pages; chosen randomly from all cats Top-level Categories Training Set: LookSmart Web Directory People & Chat Reference & Education Shopping & Services Society & Politics Sports & Recreation Travel & Vacations Automotive Business & Finance Computers & Internet Entertainment & Media Health & Fitness Hobbies & Interests Home & Family

9 Learning & Classification Support Vector Machine (SVM) Accurate and efficient for text classification (Dumais et al., Joachims) Model = weighted vector of words “Automobile” = motorcycle, vehicle, parts, automobile, harley, car, auto, honda, porsche … “Computers & Internet” = rfc, software, provider, windows, user, users, pc, hosting, os, downloads... Hierarchical Models 1 model for N top level categories N models for second level categories Very useful in conjunction w/ user interaction

10 SWISH Architecture manually classified web pages SVM model Train (offline) web search results local search results... Classify (online)

11 Interface Characteristics Problems Large amount of information to display Search results Category structure Limited screen real estate Solutions Information overlay Distilled information display

12 Information Overlay Use tooltips to show Summaries of web pages Category hierarchy

13 Expansion of Category Structure

14 Expansion of Web Page List

15 User Study - Conditions Category InterfaceList Interface

16 User Study

17 Participants: 18 intermediate Web users Tasks 30 search tasks e.g., “Find home page for Seattle Art Museum” Search terms are fixed for each task Experimental Design Category/List – within subjects 15 search tasks with each interface Order (Category/List First) – counterbalanced between subjects Both Subjective and Objective Measures

18 Subjective Results 7-point rating scale (1=disagree; 7=agree) Questions:

19 Use of Interface Features Average Number of Uses of Feature per Task

20 Search Time Category: 56 secs List: 85 secs p <.002 50% faster with Category interface

21 Search Time by Query Difficulty Top20: 57 secs NotTop20: 98 secs No reliable interaction between query difficulty and interface condition Category interface is helpful for both easy and difficult queries

22 Summary Text Classification Organize search results Use hierarchical category models Classify new web pages on-the-fly User Interface Tightly couple search results with category structure Allow manipulation of presentation of category structure User Study Suggest strong preference and performance advantages for categorically organized presentation of search results

23 Open Issues Improve Accuracy of Classification Algorithms Enhance User Interface Heuristics for selecting categories and pages to display Query_Match: rank of page, and sometimes match score Categ_Match: p(category for each page) Integration with non-content information Conduct End-to-end User Study More info: http://research.microsoft.com/~sdumais

24 Searching With Information Structured Hierarchically SWISH


Download ppt "Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April."

Similar presentations


Ads by Google