Grouping Search-Engine Returned Citations for Person Name Queries Reema Al-Kamha Research Supported by NSF
The Problem Search engines return too many citations Example: “Christopher Young” Google returns around 26,500 citations Many people named “Christopher Young” It would help to group the citations by person. How do we group them?
“Christopher Young” Query to Google
“Christopher Young” Query Results for Our System
Three facets Attributes Links Page Similarity Confidence matrix for each facet Final confidence matrix Our Solution
Attributes Address, Phone, City, State, Zip Code.
D0D1D2D3D4D5D6D7D8D9 D D D D D D D61000 D7100 D810 D91 Confidence Matrix for Attributes Facet D1&D5 have the same State. D1&D9 have the same State. D4&D9 have the same City.
Links Returned citations that have a same host One returned citation links to another returned citation.
Confidence Matrix for Links Facet D0D1D2D3D4D5D6D7D8D9 D D D D D D D61000 D7100 D810 D91 D5D0D1D0
Page Similarity Similarity between two documents to which the two returned citations link The number of shared pairs of adjacent capitalized words
Confidence Matrix for Page Similarity Facet D0D1D2D3D4D5D6D7D8D9 D D D D D D D61000 D7100 D D91
Final Matrix Combine the confidence matrices using Stanford Certainty Measure. For Example: D1, D5 Confidence value for the attribute facet is 0.49 Confidence value for the link facet is 0 Confidence value for the link facet is 0.95 Confidence value between D1, D5 is *0.95 = 0.97
Final Matrix and Grouping Method D0D1D2D3D4D5D6D7D8D9 D D D D D D D61000 D7100 D D91 {D0,D1}, {D0,D5}, {D1,D4}, {D1,D5}, {D1,D8}, {D1,D9}, {D4,D5}, {D4,D8}, {D4,D9}, {D5,D8}, {D5,D9}, {D8,D9} {D0,D1,D4,D5,D8,D9}, {D2}, {D3}, {D6}, {D7}
Recall and Precision Assume we get:{0,1,3} {2,4} {5} The correct grouping is: {0,1,2,3} {4,5} We get:(0,1) (0,3) (1,3) (2,4) The correct group gives: (0,1) (0,2) (0,3) (1,2) (1,3) (2,3) (4,5) R=3/7, P=3/(3+1)
Split and Merge Assume we get:{0,1,3} {2,7,4} {5} {6} The correct grouping is: {0,1,3,5,6} {2,7} {4} Merge: 1/8 +1/8 = 2/8 Split: 1/8
Measurements Precision and Recall R=89%, P=96.6% Weighted Merge and Split M=0.036, S=0.008
Contributions Grouped person-name queries by person Provided an additional tool for search engine queries