Download presentation
Presentation is loading. Please wait.
Published byWilla Ellis Modified over 9 years ago
1
Clustering Top-Ranking Sentences for Information Access Anastasios Tombros, Joemon Jose, Ian Ruthven University of Glasgow & University of Strathclyde Glasgow, Scotland
2
Clustering Top-Ranking Sentences for Information Access Tombros, Jose & Ruthven 2 Some Background & Motivation Challenge: How to provide effective access to information Approach: Combine clustering & top-ranking sentences (TRS) clustering has been used extensively on the document level TRS are based on single document summaries Overall aim of the work to create a personalised information space to use information from users’ interaction
3
Clustering Top-Ranking Sentences for Information Access Tombros, Jose & Ruthven 3 Top-Ranking Sentences Assume a user with a query: the query is sent to an IR system consider only the top retrieved documents, e.g. 30 apply a query-biased sentence extraction model to each of these documents construct a sentence extract of max. 4 sentences per document the set of these sentences for the 30 documents is the set of TRS TRS can be ranked by their query-biased scores
4
Clustering Top-Ranking Sentences for Information Access Tombros, Jose & Ruthven 4 Top-Ranking Sentences (cntd.) TRS have shown to be effective in interactive IR on the Web they provide effective access to the retrieved information They can be seen as a level of abstraction of the set of retrieved documents We introduce an extra layer of abstraction by clustering the set of TRS
5
Clustering Top-Ranking Sentences for Information Access Tombros, Jose & Ruthven 5 Clustering Top-Ranking Sentences An attempt to create a personalised information space sentences give local contexts in which query terms occur sentences discussing query terms in similar contexts should cluster together this structure should facilitate a more intuitive and effective access to information Similarities and differences to document clustering
6
Clustering Top-Ranking Sentences for Information Access Tombros, Jose & Ruthven 6 We used 4 searchers with a total of 16 queries each searcher assessed the utility of the top 30 documents on a scale of 1-10 For each query: we downloaded the top-30 retrieved documents we extracted the set of TRS we clustered the 30 documents and the set of TRS we assigned scores to document & TRS clusters sum of the document (sentence) scores divided by the number of documents (sentences) in the cluster Comparing TRS and Document Clustering
7
Clustering Top-Ranking Sentences for Information Access Tombros, Jose & Ruthven 7 Some Results Scores of TRS clusters were significantly higher than those of document clusters best cluster averages: 4.78 vs. 5.82 overall averages: 3.2 vs. 3.73 Average precision and recall were higher for TRS clusters define P & R based on documents with scores ≥ 7 average P: 0.38 vs. 0.49 average R: 0.73 vs. 0.77 Cluster sizes were comparable 5 docs per cluster vs. 5.3 sentences per cluster
8
Clustering Top-Ranking Sentences for Information Access Tombros, Jose & Ruthven 8 Conclusions & Future Plans TRS clusters have the potential to offer more effective information access only one aspect of their expected utility Integrate TRS clustering in interactive web searching investigate its utility in user-based studies on the live Internet We have extended the reported work more searchers & queries, different clustering methods inter-sentence similarities, structure of information space
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.