Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dist(q,d) is the metric distance between footprints q and d dist MBR (q) is the diagonal length for the MBR of the query The DIGMAP GeoParser is a software.

Similar presentations


Presentation on theme: "Dist(q,d) is the metric distance between footprints q and d dist MBR (q) is the diagonal length for the MBR of the query The DIGMAP GeoParser is a software."— Presentation transcript:

1 dist(q,d) is the metric distance between footprints q and d dist MBR (q) is the diagonal length for the MBR of the query The DIGMAP GeoParser is a software service that takes textual resources as input (e.g. the documents indexed by the LGTE system), automatically identifies the names for places or historical periods, and finally assigns the resources to the corresponding geo- temporal scopes (i.e., it assigns each document to the encompassing geographic footprint discussed in the content). User interface for a geo-temporal search service using DIGMAP components Jorge Machado, Gilberto Pedrosa, Nuno Freire, Bruno Martins, Hugo Manguinhas LGTE INDEX FRAMEWORK…DIGMAP Search Service Interface This demo presents a user interface for a Geo-Temporal search service built in the sequence of DIGMAP project. DIGMAP was a co-funded European Union project on old digitized maps and deals with resources rich in geographic and temporal information. This search interface followed a mashup approach using existing DIGMAP components: a metadata repository, a text mining tool, a Gazetteer, and a service to generate geographic contextual thumbnails. Google Maps API is used to provide a friendly and interactive user interface. This demo will present the resulting geo-temporal search engine functionalities, whose interface uses WEB 2.0 capabilities to provide contextualization in time and space and text clustering. The user interface is powered by LGTE is the IR system behind the DIGMAP search service (screen shots at right). It is built around the Lucene library for full-text search, with extensions for dealing with geographical and temporal information. The package also includes useful utilities for IR evaluation experiments, such as methods for handling CLEF/TREC topics and document collections, many different text retrieval models (i.e. vector space model, Okapi BM25, language modeling and divergence from randomness models) and query expansion mechanisms. In a glance, the main features of LGTE are: Provides a simple and effective abstraction layer on top of Lucene; Supports integrated retrieval and ranking with basis on thematic, temporal and geographical aspects; Supports the Lucene standard retrieval model, as well as the more advanced probabilistic retrieval approaches; Supports Rochio Query Expansion; Provides a framework for IR evaluation experiments (e.g. handling CLEF/TREC topics); Includes a Java alternative to the trec_eval tool, capable of performing significance tests over pairs of runs; Includes a simple test application for searching over the Braun Corpus, the Cranfield Corpus and several geo referenced records from DIGMAP repository; It is a service that manages, transparently, large amounts of data structures, independently of their schemas or formats. Metadata Repository (e.g. REPOX@DIGMAP) Fields Selectors Time Selector Geo Selector TitleLusiadas CreatorCamões …… Geographic InformationDate Time InformationOther Data Fields UnkownForm PointForm RectangleForm CircleForm > TimeStamp Fields Map LGTE Document LGTE provides a tool to do this job. In just a few steps you can choose a source folder with XML files and define a set of handlers to extract documents, split them into fields and filter their content in order to build the indexable objects like timestamps, rectangle forms and text fields. The tool provides also a way to build an Search Configuration, defining scoring model like BM25, defining stemmers, stopwords and query expansion. Easly you can setup the filepath for topics and relevance judjements (assessements) in order to obtain results provided by IREval already integrated with the tool. LGTE Writer Lucene Writer Indexes I’m not sure about the title but the writer was Camões The only thing I remember is that BIG Monster they find near South Africa. What was his name?? Adamastor?? I’m not sure about that but the teacher sad that it was written in XVI century LGTE Query: Time: XVI Century (1500 < ? < 1600) Author: Camões Geo: South Africa LGTE Query: Time: XVI Century (1500 < ? < 1600) Author: Camões Geo: South Africa LGTE Searcher LGTE Parser GeoTemporal Fields Time Filters Time Distance Calculators Build Filters Integrated Scorer Model Time Scorer Text Scorer GEO Scorer Add Fields Create Document Geo Filters Geo Distance Calculators GeoParser A gazetteer is a list of geographic names, with their geographic locations and other descriptive information, such as information related to historical periods. The DIGMAP Gazetteer is an on-line service implementing an ADL-GP1 service interface which enables querying by place names, place types, footprints (the area covered by a geographic place), relations to other places, time frames and other metadata Nail Map is a support service to generate contextualization thumbnails. Some examples of services supported by Nail Map are: thumbnails from images given the URL, small screenshots of web-sites, world maps with dots and forms, etc Google Maps API to provide an interactive map background with place marks and areas disambiguated with gazetteer The interface is created in browser using AJAX services. The search interface uses the Google Maps and the Gazetteer Services to help user formulating the query. The results interface gets the document list and the faceted clusters from the search engine. The thumbnails come from NailMap. For each retrieved document a context is built in space and time using the query and the metadata loaded from the Metadata Repository Lucene Searcher Lucene Query Time Line Space Map dist MBR (q) dist(q,d) LGTE combines the scores of relevance (i.e. thematic, geographical and temporal) into an overall retrieval measure, we propose to use a linear score combination due to it's simplicity. In the current experiments we will maximize MAP (Mean Average Precision) measure tuning k values in raking formula: Score(q,d) = k1 * scoreText(q,d) + k2 * scoreGeo(q,d) + k3 * scoreTime(q,d) Searching for Poland in Gazetteer returned a geographic area illustrated by a bounding box that is being suggested to user as search region. Searching in National Library of Portugal for roads (in Portuguese: “estradas”) related with the region of Angola between year 1900 and 2000. The visible region in the map window defines the query region. Retrieved map from 1884 is the nearest result for our query. It is about roads and railways in Luanda region, Angola. Searching for Maps with Baltic Sea near Poland Text Mining found frequent fragments of phrases in title and author fields. User realized that the country he wanted was Sweden instead of Poland. Filtering title with fragment: “Sweden Norway ” Floating Window is contextualizing the first result in Time and Space. It moves with the mouse movement, contextualizing the focused result. Metadata Record retrieved from the Metadata Repository The same query but choosing a more recent time frame, 1950 to 1960, produces more recent maps. Geoparser recognize and disambiguate the names of places and temporal expressions given in the text, also assigning documents to the encompassing geo-temporal scopes that they discuss as a whole


Download ppt "Dist(q,d) is the metric distance between footprints q and d dist MBR (q) is the diagonal length for the MBR of the query The DIGMAP GeoParser is a software."

Similar presentations


Ads by Google