Grouping Search-Engine Returned Citations for Person Name Queries Reema Al-Kamha Research Supported by NSF.

Slides:



Advertisements
Similar presentations
Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
Advertisements

Schema Matching and Data Extraction over HTML Tables Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University supported.
SCRAPING BUSINESS PHONE NOS Anisha S. Agenda When business URLs are present When business URLs are not present; What is present is a list of keywords.
Sean Blong Presents: 1. What are they…?  “[…] specific type of information filtering (IF) technique that attempts to present information items (movies,
Easing Semantic Data Publishing and Processing Using Semantic MediaWiki and RDFa Jin Guang Zheng.
Grouping Search-Engine Returned Citations for Person Name Queries Reema Al-Kamha Research Supported by NSF.
 How many pages does it search?  How does it access all those pages?  How does it give us an answer so quickly?  How does it give us such accurate.
Web People Search using Extracted Attributes Joseph S. Park Computer Science Brigham Young University.
Transforming XML Schema to Conceptual XML Reema Al-Kamha Spring Research Conference Supported by NSF.
Schema Matching and Data Extraction over HTML Tables Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University supported.
Anatomy of a Large-Scale Hypertextual Web Search Engine (e.g. Google)
Partitioning Search-Engine Returned Citations for Proper-Noun Queries Reema Al-Kamha Supported by NSF.
A machine learning approach to improve precision for navigational queries in a Web information retrieval system Reiner Kraft
Mail Merge. What is a mail merge? One letter that you want to send to lots of different people.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
Compute This! Michigan Science Olympiad Anthony Kendall Cheryl Kendall
Partitioning Search-Engine Returned Citations for Proper-Noun Queries Reema Al-Kamha.
Learning Bit by Bit Search. Information Retrieval Census Memex Sea of Documents Find those related to “new media” Brute force.
Problem Addressed The Navigation –Aided Retrieval tries to provide navigational aided query processing. It claims that the conventional Information Retrieval.
Preforming Mail Merges Lesson 11 © 2014, John Wiley & Sons, Inc. Microsoft Official Academic Course, Microsoft Word Microsoft Word 2013.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
Problem: Extracting attribute set for classes (Eg: Price, Creator, Genre for class ‘Video Games’) Why?  Attributes are used to extract templates which.
1 Ontology Generation Based on a User-Specified Ontology Seed Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University.
1 Automating the Extraction of Domain-Specific Information from the Web A Case Study for the Genealogical Domain Troy Walker Spring Research Conference.
 Search engines are programs that search documents for specified keywords and returns a list of the documents where the keywords were found.  A search.
Using Inverse Matrices Solving Systems. You can use the inverse of the coefficient matrix to find the solution. 3x + 2y = 7 4x - 5y = 11 Solve the system.
Internet Research, Second Edition- Illustrated 1 Internet Research: Unit A Searching the Internet Effectively.
SCRAPING BUSINESS ADDRESSES Anisha S. Agenda When business URLs are present When business URLs are not present; What is present is a list of keywords.
CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
New Registration This document takes you through the process of new registrations for the University’s online systems. A new registration is defined as.
Roy Tennant California Digital Library Is Metasearch Dead?
Grouping search-engine returned citations for person-name queries Reema Al-Kamha, David W. Embley (Proceedings of the 6th annual ACM international workshop.
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
How do you Google? Collaborating with Google tools Della Shorman, Unit of Online Learning CO Department of Education.
Authentication Training Guide 1 The Red Flag Ruling requires automotive dealerships to detect red flags that are applicable to their operation. After.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
LATENT SEMANTIC INDEXING Hande Zırtıloğlu Levent Altunyurt.
 An Academic Exercise.  A comparison of Google’s Knowledge Graph and Copernic  3 Queries  Batch Evaluation  Qualitative Evaluation.
Amended Registration  This document takes you through the process of amended registrations for the University’s online systems.  An amended registration.
Publication Spider Wang Xuan 07/14/2006. What is publication spider Gathering publication pages Using focused crawling With the help of Search Engine.
1 FollowMyLink Individual APT Presentation Third Talk February 2006.
Ranking Link-based Ranking (2° generation) Reading 21.
Microsoft Access is a database program to manage sort retrieve group filter for certain records.
Mobile Search Engine Based on idea presented in paper Data mining for personal navigation, Hariharan, G., Fränti, P., Mehta S. (2002)
A Hybrid Search Engine -- Combining Google and P2P Xuanhui Wang.
InK4DEV Week – Information and Knowledge for Development, 4th Edition Entebbe, Uganda (24 th – 28 th Sept, 2012) CTA is an ACP-EU institution working in.
1.7 Linear Independence. in R n is said to be linearly independent if has only the trivial solution. in R n is said to be linearly dependent if there.
Setting up a search engine KS 2 Search: appreciate how results are selected.
INTERNET VOCAB. WEB BROWSER An app for finding info on the web.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
Healthappointment.com Dr-agenda.com Your own web-agenda and medical website and hosting. Manage your appointments through the Net for your patients, colleagues.
3: Search & retrieval: Structures. The dog stopped attacking the cat, that lived in U.S.A. collection corpus database web d1…..d n docs processed term-doc.
Mail Merge in Ms-Word 2010 Mail merge is a software function describing the production of multiple (and potentially large numbers of) documents from a.
User Modeling for Personal Assistant
Large Scale Search: Inverted Index, etc.
Preforming Mail Merges
Text Indexing and Search
Small Engine Tool ID Part 1.
Performing Mail Merges
Preforming Mail Merges
app today and share with all your clients!
CSE 454 Advanced Internet Systems University of Washington
CS 440 Database Management Systems
Learning Literature Search Models from Citation Behavior
Identify Different Chinese People with Identical Names on the Web
Unit 4 Test CSS Test.
How to Submit Google Docs to the Homework Drop Box
Combining Keyword and Semantic Search for Best Effort Information Retrieval  Andrew Zitzelberger 1.
How to Login to English Central
Presentation transcript:

Grouping Search-Engine Returned Citations for Person Name Queries Reema Al-Kamha Research Supported by NSF

The Problem Search engines return too many citations Example: “Christopher Young” Google returns around 26,500 citations Many people named “Christopher Young” It would help to group the citations by person. How do we group them?

“Christopher Young” Query to Google

“Christopher Young” Query Results for Our System

Three facets Attributes Links Page Similarity Confidence matrix for each facet Final confidence matrix Our Solution

Attributes Address, Phone, City, State, Zip Code.

D0D1D2D3D4D5D6D7D8D9 D D D D D D D61000 D7100 D810 D91 Confidence Matrix for Attributes Facet D1&D5 have the same State. D1&D9 have the same State. D4&D9 have the same City.

Links Returned citations that have a same host One returned citation links to another returned citation.

Confidence Matrix for Links Facet D0D1D2D3D4D5D6D7D8D9 D D D D D D D61000 D7100 D810 D91 D5D0D1D0

Page Similarity Similarity between two documents to which the two returned citations link The number of shared pairs of adjacent capitalized words

Confidence Matrix for Page Similarity Facet D0D1D2D3D4D5D6D7D8D9 D D D D D D D61000 D7100 D D91

Final Matrix Combine the confidence matrices using Stanford Certainty Measure. For Example: D1, D5 Confidence value for the attribute facet is 0.49 Confidence value for the link facet is 0 Confidence value for the link facet is 0.95 Confidence value between D1, D5 is *0.95 = 0.97

Final Matrix and Grouping Method D0D1D2D3D4D5D6D7D8D9 D D D D D D D61000 D7100 D D91 {D0,D1}, {D0,D5}, {D1,D4}, {D1,D5}, {D1,D8}, {D1,D9}, {D4,D5}, {D4,D8}, {D4,D9}, {D5,D8}, {D5,D9}, {D8,D9} {D0,D1,D4,D5,D8,D9}, {D2}, {D3}, {D6}, {D7}

Recall and Precision Assume we get:{0,1,3} {2,4} {5} The correct grouping is: {0,1,2,3} {4,5} We get:(0,1) (0,3) (1,3) (2,4) The correct group gives: (0,1) (0,2) (0,3) (1,2) (1,3) (2,3) (4,5) R=3/7, P=3/(3+1)

Split and Merge Assume we get:{0,1,3} {2,7,4} {5} {6} The correct grouping is: {0,1,3,5,6} {2,7} {4} Merge: 1/8 +1/8 = 2/8 Split: 1/8

Measurements Precision and Recall R=89%, P=96.6% Weighted Merge and Split M=0.036, S=0.008

Contributions Grouped person-name queries by person Provided an additional tool for search engine queries