Download presentation
Presentation is loading. Please wait.
1
Partitioning Search-Engine Returned Citations for Proper-Noun Queries Reema Al-Kamha Supported by NSF
2
The Problem Search engines return too many citations Example: “Bonnie Lake” Google returns around 800 citations Citations ranked best first Many refer to the same object Can we partition by same object? Proper Noun Queries Discard citations not of the right kind Partition the rest by same object Retain the best-first ranking
3
“Bonnie Lake” Query to Google
4
The Interface
5
“Bonnie Lake” Query Result
6
Classification Group 1: those of the chosen kind Group 2: those not of the chosen kind Partition Three facets Attributes Links Page Similarity Sub-facets for each facet Confidence Matrix for each sub-facet (Weighted) Mean for each facet Final Confidence Matrix Solution
7
Attributes Attribute(s) (One-to-One) Latitude and longitude Single Attribute (Functional Determination) Province with a lake’s name Multiple Attributes (Functional Determination) Campground name and highway with a lake’s name Attributes (Nonfunctional Determination) Country with a lake’s name Distinguishing Attribute State for a lake
8
Links Returned citations that link together Returned citations that have a common URL prefix: same Host, same File name, and same URL. example of Host: http://www.cs.byu.edu/info/dwembley.html http://www.cs.byu.edu/info/directory.php example of File: http://sunsite.unc.edu/javafaq/oldnews.html http://helios.oit.unc.edu/javafaq/oldnews.html
9
12345678 11.50.89.50 21 31 41 51 61 71 81 Confidence Matrix for Returned Citations that Link Together 14
10
Page Similarity Similarity between each two returned citations Similarity between two citations-referenced documents
11
12345678 110010000 2.001.22.00.36.01.00.41 3.00 1.99.00 410010000 50.9901.00 6.33.00.29.00.221.00.56 7.00.01.00.01.001.99 8.00.99.00 1 Confidence Matrix for Similarity between two Citation-Referenced Documents
12
12345678 11.00 10.17.00 21.11.00.18.01.00.21 31.001.00.15.01.00 410 51.11.01.50 61.00.08 71.50 81 Modified Confidence Matrix for Similarity between two Citation-Referenced Documents
13
Final Matrix 12345678 11.25.95.25.34.25 21.30.25.34.26.25.36 31.25.74.36.26.25 41 51.30.26.50 61.25.29 71.50 81 {3,5,7,8}{6} 1,4 3,5 5,8 7,8 {2}{1,4}
14
“Bonnie Lake”—Results
15
Measurements Classification ( Percent correctly classified) Number of Partitions (Precision and Recall) Each Partition (Precision and Recall)
16
Current Implementation Status Interface Google connection Citations retrieval Page retrieval
17
Contribution Solve one type of object-identity problem Provide an additional tool for search engine queries
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.