Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scatter/Gather : A Cluster Based Approach to Large Document Collections Alyssa Katz LIS 551 March 23, 2003.

Similar presentations


Presentation on theme: "Scatter/Gather : A Cluster Based Approach to Large Document Collections Alyssa Katz LIS 551 March 23, 2003."— Presentation transcript:

1 Scatter/Gather : A Cluster Based Approach to Large Document Collections Alyssa Katz LIS 551 March 23, 2003

2 Introduction Alternate uses for document clustering Alternate uses for document clustering Give document clustering a second chance! Give document clustering a second chance!

3 Old Approach Compare Document Clustering with Vector Space Models Compare Document Clustering with Vector Space Models Cluster searches are for the most part inferior to VS searchesCluster searches are for the most part inferior to VS searches Document clustering algorithms are SLOWDocument clustering algorithms are SLOW CONCLUSION: Document clustering should only be used to the extent of accelerating VS searches CONCLUSION: Document clustering should only be used to the extent of accelerating VS searches

4 New Approach Document Clustering is not bad, just misunderstood Document Clustering is not bad, just misunderstood The REAL question is: How can clustering be effective in its own right? The REAL question is: How can clustering be effective in its own right? THE ANSWER: The “Scatter/Gather Method” THE ANSWER: The “Scatter/Gather Method”

5 Searching vs. Browsing Specific information need Specific information need User has good idea of keywords or search terms User has good idea of keywords or search terms Faster, more pointed Faster, more pointed User wants more general info User wants more general info Is not familiar with the vocabulary, or doesn’t want to commit to a specific set of words Is not familiar with the vocabulary, or doesn’t want to commit to a specific set of words User will sift through info to find what he wants User will sift through info to find what he wants

6 Solution Use clustering to browse a system the way one would browse a table of contents Use clustering to browse a system the way one would browse a table of contents Have a function where user can alternate between browsing and searching Have a function where user can alternate between browsing and searching

7 Scatter/Gather User is presented with short summaries of a small number of document groups. User is presented with short summaries of a small number of document groups. User selects one or more groups for further study User selects one or more groups for further study Continue this process until the individual document level Continue this process until the individual document level

8 Example 5000 Articles in the NYT News Service 5000 Articles in the NYT News Service International News Kuwait and Germany and Oil Articles about effect of invasion on oil market, U.S. Military deployment in Kuwait Document

9 Requirements New Algorithms New Algorithms One that can appropriately cluster large document collectionsOne that can appropriately cluster large document collections One that can sufficiently generate summaries of these document collectionsOne that can sufficiently generate summaries of these document collections

10 Solution Buckshot algorithm for the first requirement Buckshot algorithm for the first requirement Employs a random sampling of clustersEmploys a random sampling of clusters Fractionation for the second requirement Fractionation for the second requirement

11 Application to Scatter/Gather Basically, clustering is done beforehand, and real time searches do not cluster from scratch Basically, clustering is done beforehand, and real time searches do not cluster from scratch Real time searches just refine what already exists Real time searches just refine what already exists


Download ppt "Scatter/Gather : A Cluster Based Approach to Large Document Collections Alyssa Katz LIS 551 March 23, 2003."

Similar presentations


Ads by Google