6 ~ GIR
Motivation Former GIR : capturing and handling geonames and associated feature ignored other terms with important geographic connotation : spatial relationship (in, near, on the shore of, etc) feature type (cities, mountains, airports, etc) there is disambiguation geonames use a graph-ranking algorithm to analyse the captured feature and assign one single feature as the scope of each document other partial geographic contexts of the document were ignored incorrectly assigned scopes often lead to poor results
Problem Definition Rebuilt the query procesing module all geographic information present on a query is captured giving special attention to feature type and spatial relationship, as guides for the geographic query expansion Using text mining methods to capture and extract disambiguate geonames from text so that geographic scope can be inferred for each document
Objective Generation of geographics signatures for both query (QSig) and documents (DSig) DSig is generated for each document by a text mining module QSig is generated through a geographic query expansion module Geographic query expansion focused on feature, features type and spatial relationship Geographic ranking improvement
New Architecture of Geographic IR Topic titles as query string
Geographic Ontology Using GKB 2.0 (Geographic Knowledge Base) All modules rely on geographic ontology support relationship between feature and feature type a better property assignment for feature and feature type a better control of information source enrichment in physical domain, with the addition of new feature type airports, circuits, and mountains, along with their instance
Statistic of Geographic Ontology
Query Processing (1) Geographic query parsing module with the help of Geo. Ontology & manual-crafted context rule Split into <what, spatial relation, where> Recognize feature and feature type Features (ISO-19109) – an unambiguous location. It can be described by one or more placenames. For example: Paris. Feature Types (ISO-19109) – classes of features. For example, island, mountain, lake (physical), city, continent, NUT-3 (administrative). A feature has only one feature type. Relations – Links joining features OR feature types: part of, adjacent, capital of, etc. Examples: [Oslo] part-of [Norway], [city] part-of [country]. Example : Ship traffic in portuguese island Ship traffic in portuguese island
Query Processing (2) Perform : Term Expansion expand the thematic ~ what Blind Relevance Feedback Geographic Expansion expand the geographic ~ where based on query type driven by spatial relationship, feature & feature type
Query Processing (3) Example : CLEF topic #74 Ship traffic in Portuguese island Ship traffic : thematic part ~ what in : spatial relationship Portuguese : feature ~ grounded geoname Island : feature type Mapped into the corresponding ontological concept
Geographic QE 1. Ship Traffic in Portugal 2. Ship Traffic in island 3. Ship Traffic in Portuguese island Europe UK Portugal 1 London Lisbon 2 3 Isle of Wight Isle of Man Sao Miguel Isl Madeira Isl.
Geographic QE (2) Scope of the interest : All geographic concepts of type island that are part of Portugal QSig : São Miguel, Madeira, Santa Maria, Formigas, Terceira, Graciosa, São Jorge, Pico, Faial, Flores, Corvo, Porto Santo, Desertas and Selvagens
Term Expansion (1) ~ Blind Relevance Feedback ~ Before Relevance Feedback
Term Expansion (2) ~ Blind Relevance Feedback ~ After Relevance Feedback
Text Mining (1) Relies on a gazetteer of text pattern generated from the geographic ontology Containing all concept represented by their feature name and respective feature type [<feature type> <feature name>] And [<feature type> $ <feature name>] parse the document for geoname generating DSig Example : Lisbon Airport Airport of Lisbon
Text Mining (2) Gazzeter : city $ Lisbon: 1 Lisbon city: 1 district $ Lisbon: 2 Lisbon district: 2 Street $ Lisbon: 3 Lisbon Street: 3 (...) Lisbon: 1,2,3,(...) LA072694-0011: 5668[1.00]; 2230[0.33]; 4555[0.33]; 4556[0.33]; 4557[0.33] LA072694-0012: 5388[1.00]; 5389[1.00]; 5390[1.00]; 12097[1.00]; 6653[0.67] ID ConfMeas Normalized into [0,1] Left side : text pattern Right side : identifier of the geographic concept in ontology
Sidra Sidra5 : text indexing and ranking module with geographic capabilities based on MG4J Generating Geo and Term Index Based on QE Term – Query Signature and GeoIndex – Term Index to rank document result
Flow chart of searches in Sidra5
GeoScore Spatial Distance Similarity ~ AdjSim(s1,s2) ~ Population ~ PopSim(s1,s2) ~ 10% 20% 20% Spatial Adjacency Similarity ~ DistSim(s1,s2) ~ 50% Ontology Similarity ~ OntSim(s1,s2) ~ Geographic Similarity ~ GeoSim(s1,s2) Geographic Score ~ GeoScore(s1,s2)
Geographic Score ~ GeoScore(s1,s2)
Example Computing of GeoScore
Document Scoring Textual Scoring
Experiment Type
Experiment Result IR GIR IR / GIR MAP Result
Conclusion The best experiment setup is to generate an initial run with classic text retrieval, and use the full geographic ranking modules for the generation of the final run GIR system is very dependent on the quality of the geographic ontology, and has some limitations in the text mining step
Terima kasih