Download presentation
Presentation is loading. Please wait.
Published byAriel Peters Modified over 9 years ago
1
Semantic Search via XML Fragments: A High-Precision Approach to IR Jennifer Chu-Carroll, John Prager, David Ferrucci, and Pablo Duboue IBM T.J. Watson Research Center Krzysztof Czuba Google Inc. SIGIR 2006
2
Introduction high precision search strategy return a small set of documents highly focused and relevant to the user’s information need automatic named entity and relation recognizers pre-process text corpora identify the semantic information
3
Introduction (Con.) four frequently-occurring query-time semantic needs specifying target information type, disambiguating keywords, specifying search term context, and relating select query terms three XML fragment operations that can be applied to a query conceptualize, restrict, or relate terms in the query [section 4] demonstrates how they can be utilized to address our query-time semantic needs
4
XML Fragments Overview sufficiently expressive to address most information needs in multiple domains phrase operator “ White House ” numeric comparison operators 1999 retrieves books published in or after 1999
5
Two Sample Book Documents
6
Example Bill Clinton allows Bill and Clinton to appear anywhere inside a Book tag retrieves both documents Bill Clinton retrieves only the first document Bill +Clinton requires that all books retrieved have titles containing Clinton + retrieves only those books with publication dates in their records
7
Corpus Analysis for XML Retrieval majority of electronic documents available today are significantly less structured XML search can be more broadly applicable if existing unannotated documents can be automatically processed to include useful semantic information which can subsequently be used to improve search results
8
Example annotated with named entities and relations
9
Using XML Fragments for Semantic Search focus on enriching query expressiveness to address four query-time semantic needs to yield higher precision search results
10
Three XML Fragment Operations can be applied to a query to conceptualize, restrict, or relate terms in the query
11
Conceptualization Operation generalizes a lexical string to an appropriate concept in the type system represented by that string query: “ animal ” returns documents containing the word “ animal ” conceptual query: retrieves documents containing the annotation Animal, e.g., lion, owl, and salmon.
12
Restriction Operation constrains the XML tags in which keywords must appear to be considered relevant query: bass returns documents in which the literal “ bass ” is used in its fish sense query: bass retrieves those where it represents a musical instrument the span of the specified annotation needs to contain the span of the keywords query: bass will match striped bass
13
Relation Operation annotation represents a relation that holds between terms covered by the annotation syntactic Unabomber kill semantic Unabomber pragmatic Clinton war on Iraq allows for nesting of relation and entity annotations John Victoria
14
Four Query-time Semantic Needs 1. to specify target information type 2. to disambiguate keywords 3. to specify search term context 4. to specify relations between select terms
15
Target Information Type Specification conceptualization operation - enables semantic search queries that explicitly specify the semantic type of the information the user is seeking example: find the zip code for the White House query: “ white house ” retrieve a large number of documents but few mention its zip code adding “ zip code ” to the query no improvement The White House, 1600 Pennsylvania Avenue NW, Washington, DC 20500 + “ white house ” + successfully retrieve assuming 20050 was annotated as a Zipcode
16
Utilizing in Two Applications SAW (Semantic Analysis Workbench) user can include desired concepts in the query to focus search by by directly typing “ ” as part of the query selecting tag from a dropdown list of available types open-domain question answering (QA) system “ What is the telephone number for the University of Kentucky? ” generate query: + + “ University of Kentucky ”
17
Search Term Disambiguation restriction operation bass disambiguate terms based on their word senses “ Victoria ” can be annotated as City, County, State, Person, Royalty, or Deity generating the query: Victoria Can inclusion of extra disambiguating keywords yield similar search results? Victoria versus +Victoria +<>city state county
18
Utilizing in Two Applications SAW useful in this end user application refine the original query and formulate a disambiguating query using the restriction operator QA “ When was George Washington born? ” + bear + +George +Washington eliminates matches with George Washington University George Washington Bridge
19
Search Term Context Specification restriction operation bass semantic tags are specify the context in which keywords should occur XML tags surrounding a keyword specify meta information syntactic information (Subject/Object) semantic information (Agent/Patient) discourse information (Request/Suggest) “ war on Iraq ”
20
Utilizing on AQUAINT 2005 Opinion Pilot observation: many opinions in documents are expressed as direct quotes from a person ’ s speech annotated by named entity recognizer with the semantic type Quotation “ What was the Pentagon panel ’ s position with respect to the dispute over the US Navy training range in the island of Vieques? ” +Pentagon +panel + +US +Navy training range island +Vieques
21
Relation Specification relation operation express relations that must hold between query terms shifts the burden of synonym expansion as a query-time process performed by the user or search engine to the relation annotator +Iraq + +own expands into +Iraq + +<> own have possess
22
Utilizing in Two Applications SAW user can formulate XML Fragment queries containing relations QA provides the capability of automatically generating XML Fragment queries containing relations from natural language questions “ When did Nixon visit China? ” + +Nixon +China +
23
Evaluation – Target Information Type Specification corpus 3GB AQUAINT corpus contains just over one million news articles between 1996-2000 pre-processed with a named entity recognizer that identifies about 100 entity types test set 50 questions and relevant judgments in the TREC 2005 QA track document task QA system processes questions to identify answer types and a set of salient keywords, then generates query
24
Baseline & Results – Target Information Type Specification baseline run constructed with keywords alone the run using semantic search the presence of the concept + that corresponds to the semantic type of the answer 2.9%4.1%37.5% improvement
25
Evaluation – Search Term Disambiguation same corpus, index, and test set as in the previous experiment QA system reconfigured to generate queries that include disambiguating tags for query terms
26
Baseline & Result – Search Term Disambiguation baseline run semantic search run in previous section (?) out of the 50 questions, only 2 questions resulted in different queries improvement is not statistically significant due to the small sample size improvement4.3%1.1%
27
Evaluation – Search Term Context Specification same corpus and index test set 46 out of the 50 questions in the AQUAINT 2005 opinion pilot general form: ” What does OpinionHolder think about SubjectMatter? ” query: +OpinionHolder + +SubjectMatter
28
Baseline & Result – Search Term Context Specification baseline run merely added + as the target information type retrieved twice as many “ vital ” nuggets as baseline run
29
Evaluation – Relation Specification corpus national intelligence domain from CNS contains over 37,000 documents and is about 75MB in size resulting annotations were indexed along with the lemmatized terms as in the AQUAINT corpus relation recognizer identifies 10 relations test set constructed 2-3 semantic queries for each of 10 relations, resulting in a total of 25 queries +Russia + (biological weapons produced by Russia)
30
Baseline & Result – Relation Specification baseline run replacing the relation in each query with one or more keywords +Russia + +produce suitable for applications in which high recall is not initially an important factor improvement6.3%-0.9% 73%
31
Conclusions work on semantic search using the XML Fragments query language the use of three operators to express different query-time semantic needs places additional constraints on the query leads to more focused and more precise results these uses of XML Fragment operators have been implemented and evaluated in multiple systems useful in the precision-centric class of applications
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.