Presentation is loading. Please wait.

Presentation is loading. Please wait.

Grouping Search-Engine Returned Citations for Person Name Queries Reema Al-Kamha Research Supported by NSF.

Similar presentations


Presentation on theme: "Grouping Search-Engine Returned Citations for Person Name Queries Reema Al-Kamha Research Supported by NSF."— Presentation transcript:

1 Grouping Search-Engine Returned Citations for Person Name Queries Reema Al-Kamha Research Supported by NSF

2 The Problem Search engines return too many citations Example: “Christopher Young” Google returns around 26,500 citations Many people named “Christopher Young” It would help to group the citations by person. How do we group them?

3 “Christopher Young” Query to Google

4 “Christopher Young” Query Results for Our System

5 Three facets Attributes Links Page Similarity Confidence matrix for each facet Final confidence matrix Our Solution

6 Attributes Email Address, Phone, City, State, Zip Code.

7 D0D1D2D3D4D5D6D7D8D9 D01000000000 D110000.49000 D210000000 D31000000 D4100000.86 D510000 D61000 D7100 D810 D91 Confidence Matrix for Attributes Facet D1&D5 have the same State. D1&D9 have the same State. D4&D9 have the same City.

8 Links Returned citations that have a same host www.cs.byu.edu/info/dwembley.html www.cs.byu.edu/info/dwembley.html www.cs.byu.edu/info/directory.php One returned citation links to another returned citation.

9 Confidence Matrix for Links Facet D0D1D2D3D4D5D6D7D8D9 D010.99000 0000 D1100000000 D210000000 D31000000 D4100000 D510000 D61000 D7100 D810 D91 D5D0D1D0

10 Page Similarity Similarity between two documents to which the two returned citations link The number of shared pairs of adjacent capitalized words

11 Confidence Matrix for Page Similarity Facet D0D1D2D3D4D5D6D7D8D9 D01000000000 D11000.920.9500 D210000000 D31000000 D410.95000.920.95 D51000.920.95 D61000 D7100 D810.95 D91

12 Final Matrix Combine the confidence matrices using Stanford Certainty Measure. For Example: D1, D5 Confidence value for the attribute facet is 0.49 Confidence value for the link facet is 0 Confidence value for the link facet is 0.95 Confidence value between D1, D5 is 0.49+0.95- 0.49*0.95 = 0.97

13 Final Matrix and Grouping Method D0D1D2D3D4D5D6D7D8D9 D010.99000 0000 D11000.920.97000.950.97 D210000000 D31000000 D410.95000.920.99 D51000.920.95 D61000 D7100 D810.95 D91 {D0,D1}, {D0,D5}, {D1,D4}, {D1,D5}, {D1,D8}, {D1,D9}, {D4,D5}, {D4,D8}, {D4,D9}, {D5,D8}, {D5,D9}, {D8,D9} {D0,D1,D4,D5,D8,D9}, {D2}, {D3}, {D6}, {D7}

14 Recall and Precision Assume we get:{0,1,3} {2,4} {5} The correct grouping is: {0,1,2,3} {4,5} We get:(0,1) (0,3) (1,3) (2,4) The correct group gives: (0,1) (0,2) (0,3) (1,2) (1,3) (2,3) (4,5) R=3/7, P=3/(3+1)

15 Split and Merge Assume we get:{0,1,3} {2,7,4} {5} {6} The correct grouping is: {0,1,3,5,6} {2,7} {4} Merge: 1/8 +1/8 = 2/8 Split: 1/8

16 Measurements Precision and Recall R=89%, P=96.6% Weighted Merge and Split M=0.036, S=0.008

17 Contributions Grouped person-name queries by person Provided an additional tool for search engine queries


Download ppt "Grouping Search-Engine Returned Citations for Person Name Queries Reema Al-Kamha Research Supported by NSF."

Similar presentations


Ads by Google