Presentation is loading. Please wait.

Presentation is loading. Please wait.

6 ~ GIR.

Similar presentations


Presentation on theme: "6 ~ GIR."— Presentation transcript:

1 6 ~ GIR

2 Motivation Former GIR :
capturing and handling geonames and associated feature ignored other terms with important geographic connotation : spatial relationship (in, near, on the shore of, etc) feature type (cities, mountains, airports, etc) there is disambiguation geonames use a graph-ranking algorithm to analyse the captured feature and assign one single feature as the scope of each document other partial geographic contexts of the document were ignored incorrectly assigned scopes often lead to poor results

3 Problem Definition Rebuilt the query procesing module
all geographic information present on a query is captured giving special attention to feature type and spatial relationship, as guides for the geographic query expansion Using text mining methods to capture and extract disambiguate geonames from text so that geographic scope can be inferred for each document

4 Objective Generation of geographics signatures for both query (QSig) and documents (DSig) DSig is generated for each document by a text mining module QSig is generated through a geographic query expansion module Geographic query expansion focused on feature, features type and spatial relationship Geographic ranking improvement

5 New Architecture of Geographic IR
Topic titles as query string

6 Geographic Ontology Using GKB 2.0 (Geographic Knowledge Base)
All modules rely on geographic ontology support relationship between feature and feature type a better property assignment for feature and feature type a better control of information source enrichment in physical domain, with the addition of new feature type airports, circuits, and mountains, along with their instance

7 Statistic of Geographic Ontology

8 Query Processing (1) Geographic query parsing module
with the help of Geo. Ontology & manual-crafted context rule Split into <what, spatial relation, where> Recognize feature and feature type Features (ISO-19109) – an unambiguous location. It can be described by one or more placenames. For example: Paris. Feature Types (ISO-19109) – classes of features. For example, island, mountain, lake (physical), city, continent, NUT-3 (administrative). A feature has only one feature type. Relations – Links joining features OR feature types: part of, adjacent, capital of, etc. Examples: [Oslo] part-of [Norway], [city] part-of [country]. Example : Ship traffic in portuguese island Ship traffic in portuguese island

9 Query Processing (2) Perform : Term Expansion
expand the thematic ~ what Blind Relevance Feedback Geographic Expansion expand the geographic ~ where based on query type driven by spatial relationship, feature & feature type

10 Query Processing (3) Example : CLEF topic #74
Ship traffic in Portuguese island Ship traffic : thematic part ~ what in : spatial relationship Portuguese : feature ~ grounded geoname Island : feature type Mapped into the corresponding ontological concept

11 Geographic QE 1. Ship Traffic in Portugal 2. Ship Traffic in island
3. Ship Traffic in Portuguese island Europe UK Portugal 1 London Lisbon 2 3 Isle of Wight Isle of Man Sao Miguel Isl Madeira Isl.

12 Geographic QE (2) Scope of the interest :
All geographic concepts of type island that are part of Portugal QSig : São Miguel, Madeira, Santa Maria, Formigas, Terceira, Graciosa, São Jorge, Pico, Faial, Flores, Corvo, Porto Santo, Desertas and Selvagens

13 Term Expansion (1) ~ Blind Relevance Feedback ~
Before Relevance Feedback

14 Term Expansion (2) ~ Blind Relevance Feedback ~
After Relevance Feedback

15 Text Mining (1) Relies on a gazetteer of text pattern generated from the geographic ontology Containing all concept represented by their feature name and respective feature type [<feature type> <feature name>] And [<feature type> $ <feature name>] parse the document for geoname generating DSig Example : Lisbon Airport Airport of Lisbon

16 Text Mining (2) Gazzeter : city $ Lisbon: 1 Lisbon city: 1
district $ Lisbon: 2 Lisbon district: 2 Street $ Lisbon: 3 Lisbon Street: 3 (...) Lisbon: 1,2,3,(...) LA : 5668[1.00]; 2230[0.33]; 4555[0.33]; 4556[0.33]; 4557[0.33] LA : 5388[1.00]; 5389[1.00]; 5390[1.00]; 12097[1.00]; 6653[0.67] ID ConfMeas Normalized into [0,1] Left side : text pattern Right side : identifier of the geographic concept in ontology

17 Sidra Sidra5 : text indexing and ranking
module with geographic capabilities based on MG4J Generating Geo and Term Index Based on QE Term – Query Signature and GeoIndex – Term Index to rank document result

18 Flow chart of searches in Sidra5

19 GeoScore Spatial Distance Similarity ~ AdjSim(s1,s2) ~ Population
~ PopSim(s1,s2) ~ 10% 20% 20% Spatial Adjacency Similarity ~ DistSim(s1,s2) ~ 50% Ontology Similarity ~ OntSim(s1,s2) ~ Geographic Similarity ~ GeoSim(s1,s2) Geographic Score ~ GeoScore(s1,s2)

20 Geographic Score ~ GeoScore(s1,s2)

21 Example Computing of GeoScore

22 Document Scoring Textual Scoring

23 Experiment Type

24 Experiment Result IR GIR IR / GIR MAP Result

25 Conclusion The best experiment setup is to generate an initial run with classic text retrieval, and use the full geographic ranking modules for the generation of the final run GIR system is very dependent on the quality of the geographic ontology, and has some limitations in the text mining step

26 Terima kasih


Download ppt "6 ~ GIR."

Similar presentations


Ads by Google