Presentation is loading. Please wait.

Presentation is loading. Please wait.

Partitioning Search-Engine Returned Citations for Proper-Noun Queries Reema Al-Kamha Supported by NSF.

Similar presentations


Presentation on theme: "Partitioning Search-Engine Returned Citations for Proper-Noun Queries Reema Al-Kamha Supported by NSF."— Presentation transcript:

1 Partitioning Search-Engine Returned Citations for Proper-Noun Queries Reema Al-Kamha Supported by NSF

2 The Problem Search engines return too many citations Example: “Bonnie Lake” Google returns around 800 citations Citations ranked best first Many refer to the same object Can we partition by same object? Proper Noun Queries Discard citations not of the right kind Partition the rest by same object Retain the best-first ranking

3 “Bonnie Lake” Query to Google

4 The Interface

5 “Bonnie Lake” Query Result

6 Classification Group 1: those of the chosen kind Group 2: those not of the chosen kind Partition Three facets  Attributes  Links  Page Similarity Sub-facets for each facet  Confidence Matrix for each sub-facet  (Weighted) Mean for each facet Final Confidence Matrix Solution

7 Attributes Attribute(s) (One-to-One) Latitude and longitude Single Attribute (Functional Determination) Province with a lake’s name Multiple Attributes (Functional Determination) Campground name and highway with a lake’s name Attributes (Nonfunctional Determination) Country with a lake’s name Distinguishing Attribute State for a lake

8 Links Returned citations that link together Returned citations that have a common URL prefix: same Host, same File name, and same URL. example of Host: http://www.cs.byu.edu/info/dwembley.html http://www.cs.byu.edu/info/directory.php example of File: http://sunsite.unc.edu/javafaq/oldnews.html http://helios.oit.unc.edu/javafaq/oldnews.html

9 12345678 11.50.89.50 21 31 41 51 61 71 81 Confidence Matrix for Returned Citations that Link Together 14

10 Page Similarity Similarity between each two returned citations Similarity between two citations-referenced documents

11 12345678 110010000 2.001.22.00.36.01.00.41 3.00 1.99.00 410010000 50.9901.00 6.33.00.29.00.221.00.56 7.00.01.00.01.001.99 8.00.99.00 1 Confidence Matrix for Similarity between two Citation-Referenced Documents

12 12345678 11.00 10.17.00 21.11.00.18.01.00.21 31.001.00.15.01.00 410 51.11.01.50 61.00.08 71.50 81 Modified Confidence Matrix for Similarity between two Citation-Referenced Documents

13 Final Matrix 12345678 11.25.95.25.34.25 21.30.25.34.26.25.36 31.25.74.36.26.25 41 51.30.26.50 61.25.29 71.50 81 {3,5,7,8}{6} 1,4 3,5 5,8 7,8 {2}{1,4}

14 “Bonnie Lake”—Results

15 Measurements Classification ( Percent correctly classified) Number of Partitions (Precision and Recall) Each Partition (Precision and Recall)

16 Current Implementation Status Interface Google connection Citations retrieval Page retrieval

17 Contribution Solve one type of object-identity problem Provide an additional tool for search engine queries


Download ppt "Partitioning Search-Engine Returned Citations for Proper-Noun Queries Reema Al-Kamha Supported by NSF."

Similar presentations


Ads by Google