Dist(q,d) is the metric distance between footprints q and d dist MBR (q) is the diagonal length for the MBR of the query The DIGMAP GeoParser is a software.

Slides:



Advertisements
Similar presentations
CWS: A Comparative Web Search System Jian-Tao Sun, Xuanhui Wang, § Dou Shen Hua-Jun Zeng, Zheng Chen Microsoft Research Asia University of Illinois at.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Search in Source Code Based on Identifying Popular Fragments Eduard Kuric and Mária Bieliková Faculty of Informatics and Information.
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
Multi-Model Digital Video Library Professor: Michael Lyu Member: Jacky Ma Joan Chung Multi-Model Digital Video Library LYU9904 Multi-Model Digital Video.
Depositing e-material to The National Library of Sweden.
The XLDB Group at GeoCLEF 2005 Nuno Cardoso, Bruno Martins, Marcírio Chaves, Leonardo Andrade, Mário J. Silva
Information Retrieval in Practice
Search Engines and Information Retrieval
Parametric search and zone weighting Lecture 6. Recap of lecture 4 Query expansion Index construction.
Retrieving Documents with Geographic References Using a Spatial Index Structure Based on Ontologies Database Laboratory University of A Coruña A Coruña,
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Web Archive Information Retrieval Miguel Costa, Daniel Gomes (speaker) Portuguese Web Archive.
PaperScope: Visually Exploring the ADS Mark Holliman VOTECH Web Developer University of Edinburgh ADASS XVII, London,
Overview of Search Engines
ECPRD seminar on the net IX”, Brussels, 2011 Faceted Search Some examples of applied faceted search on websites developed by the EP Jerry.
Information Retrieval in Practice
With Windows 7 Comprehensive© 2012 Pearson Education, Inc. Publishing as Prentice Hall1 PowerPoint Presentation to Accompany GO! with Windows 7 Comprehensive.
Search Engines and Information Retrieval Chapter 1.
1. 2 introductions Nicholas Fischio Development Manager Kelvin Smith Library of Case Western Reserve University Benjamin Bykowski Tech Lead and Senior.
Next generation library catalogs and the integration of gazetteer information for geographical research Julie Sweetkind-Singer Assistant Director of Geospatial,
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
NCSU Libraries Kristin Antelman NCSU Libraries June 24, 2006.
Complex Data Transformations in Digital Libraries with Spatio-Temporal Information B. Martins, N. Freire, J. Borbinha Instituto Superior Técnico, Technical.
Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011.
TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal.
Clustering Top-Ranking Sentences for Information Access Anastasios Tombros, Joemon Jose, Ian Ruthven University of Glasgow & University of Strathclyde.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
LOGO A comparison of two web-based document management systems ShaoxinYu Columbia University March 31, 2009.
Facilitating Document Annotation using Content and Querying Value.
Uwe SchindlerGES 2007 – May 2-4, 2007 Data Information Service based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler 1, Benny Bräuer.
Managed by UT-Battelle for the Department of Energy Mercury – Distributed Metadata Tool for Finding and Retrieving CDIAC Data CDIAC UWG Meeting September.
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
ELISQ Systems Demonstration Sagnik Ray Choudhury Doha -- May 2015.
1 CS 8803 AIAD (Spring 2008) Project Group#22 Ajay Choudhari, Avik Sinharoy, Min Zhang, Mohit Jain Smart Seek.
Facilitating Document Annotation Using Content and Querying Value.
Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Information Retrieval and Extraction 2009 Term Project – Modern Web Search Advisor: 陳信希 TA: 蔡銘峰&許名宏.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
Discovery and Metadata March 9, 2004 John Weatherley
Alexandria Digital Library ADL Metadata Architecture Greg Janée.
High performance, full-featured text search engine written in Java. Technology suitable for nearly any application requiring full-text search, especially.
Information Retrieval in Practice
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Evaluation Anisio Lacerda.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
VI-SEEM Data Discovery Service
6 ~ GIR.
Building Search Systems for Digital Library Collections
European Network of e-Lexicography
N. Capp, E. Krome, I. Obeid and J. Picone
Thanks to Bill Arms, Marti Hearst
Lecture 12: Data Wrangling
BUILDING A DIGITAL REPOSITORY FOR LEARNING RESOURCES
Relevance and Reinforcement in Interactive Browsing
The New LexisNexis® Statistical
Lab 2: Information Retrieval
Introduction to Search Engines
Presentation transcript:

dist(q,d) is the metric distance between footprints q and d dist MBR (q) is the diagonal length for the MBR of the query The DIGMAP GeoParser is a software service that takes textual resources as input (e.g. the documents indexed by the LGTE system), automatically identifies the names for places or historical periods, and finally assigns the resources to the corresponding geo- temporal scopes (i.e., it assigns each document to the encompassing geographic footprint discussed in the content). User interface for a geo-temporal search service using DIGMAP components Jorge Machado, Gilberto Pedrosa, Nuno Freire, Bruno Martins, Hugo Manguinhas LGTE INDEX FRAMEWORK…DIGMAP Search Service Interface This demo presents a user interface for a Geo-Temporal search service built in the sequence of DIGMAP project. DIGMAP was a co-funded European Union project on old digitized maps and deals with resources rich in geographic and temporal information. This search interface followed a mashup approach using existing DIGMAP components: a metadata repository, a text mining tool, a Gazetteer, and a service to generate geographic contextual thumbnails. Google Maps API is used to provide a friendly and interactive user interface. This demo will present the resulting geo-temporal search engine functionalities, whose interface uses WEB 2.0 capabilities to provide contextualization in time and space and text clustering. The user interface is powered by LGTE is the IR system behind the DIGMAP search service (screen shots at right). It is built around the Lucene library for full-text search, with extensions for dealing with geographical and temporal information. The package also includes useful utilities for IR evaluation experiments, such as methods for handling CLEF/TREC topics and document collections, many different text retrieval models (i.e. vector space model, Okapi BM25, language modeling and divergence from randomness models) and query expansion mechanisms. In a glance, the main features of LGTE are: Provides a simple and effective abstraction layer on top of Lucene; Supports integrated retrieval and ranking with basis on thematic, temporal and geographical aspects; Supports the Lucene standard retrieval model, as well as the more advanced probabilistic retrieval approaches; Supports Rochio Query Expansion; Provides a framework for IR evaluation experiments (e.g. handling CLEF/TREC topics); Includes a Java alternative to the trec_eval tool, capable of performing significance tests over pairs of runs; Includes a simple test application for searching over the Braun Corpus, the Cranfield Corpus and several geo referenced records from DIGMAP repository; It is a service that manages, transparently, large amounts of data structures, independently of their schemas or formats. Metadata Repository (e.g. Fields Selectors Time Selector Geo Selector TitleLusiadas CreatorCamões …… Geographic InformationDate Time InformationOther Data Fields UnkownForm PointForm RectangleForm CircleForm > TimeStamp Fields Map LGTE Document LGTE provides a tool to do this job. In just a few steps you can choose a source folder with XML files and define a set of handlers to extract documents, split them into fields and filter their content in order to build the indexable objects like timestamps, rectangle forms and text fields. The tool provides also a way to build an Search Configuration, defining scoring model like BM25, defining stemmers, stopwords and query expansion. Easly you can setup the filepath for topics and relevance judjements (assessements) in order to obtain results provided by IREval already integrated with the tool. LGTE Writer Lucene Writer Indexes I’m not sure about the title but the writer was Camões The only thing I remember is that BIG Monster they find near South Africa. What was his name?? Adamastor?? I’m not sure about that but the teacher sad that it was written in XVI century LGTE Query: Time: XVI Century (1500 < ? < 1600) Author: Camões Geo: South Africa LGTE Query: Time: XVI Century (1500 < ? < 1600) Author: Camões Geo: South Africa LGTE Searcher LGTE Parser GeoTemporal Fields Time Filters Time Distance Calculators Build Filters Integrated Scorer Model Time Scorer Text Scorer GEO Scorer Add Fields Create Document Geo Filters Geo Distance Calculators GeoParser A gazetteer is a list of geographic names, with their geographic locations and other descriptive information, such as information related to historical periods. The DIGMAP Gazetteer is an on-line service implementing an ADL-GP1 service interface which enables querying by place names, place types, footprints (the area covered by a geographic place), relations to other places, time frames and other metadata Nail Map is a support service to generate contextualization thumbnails. Some examples of services supported by Nail Map are: thumbnails from images given the URL, small screenshots of web-sites, world maps with dots and forms, etc Google Maps API to provide an interactive map background with place marks and areas disambiguated with gazetteer The interface is created in browser using AJAX services. The search interface uses the Google Maps and the Gazetteer Services to help user formulating the query. The results interface gets the document list and the faceted clusters from the search engine. The thumbnails come from NailMap. For each retrieved document a context is built in space and time using the query and the metadata loaded from the Metadata Repository Lucene Searcher Lucene Query Time Line Space Map dist MBR (q) dist(q,d) LGTE combines the scores of relevance (i.e. thematic, geographical and temporal) into an overall retrieval measure, we propose to use a linear score combination due to it's simplicity. In the current experiments we will maximize MAP (Mean Average Precision) measure tuning k values in raking formula: Score(q,d) = k1 * scoreText(q,d) + k2 * scoreGeo(q,d) + k3 * scoreTime(q,d) Searching for Poland in Gazetteer returned a geographic area illustrated by a bounding box that is being suggested to user as search region. Searching in National Library of Portugal for roads (in Portuguese: “estradas”) related with the region of Angola between year 1900 and The visible region in the map window defines the query region. Retrieved map from 1884 is the nearest result for our query. It is about roads and railways in Luanda region, Angola. Searching for Maps with Baltic Sea near Poland Text Mining found frequent fragments of phrases in title and author fields. User realized that the country he wanted was Sweden instead of Poland. Filtering title with fragment: “Sweden Norway ” Floating Window is contextualizing the first result in Time and Space. It moves with the mouse movement, contextualizing the focused result. Metadata Record retrieved from the Metadata Repository The same query but choosing a more recent time frame, 1950 to 1960, produces more recent maps. Geoparser recognize and disambiguate the names of places and temporal expressions given in the text, also assigning documents to the encompassing geo-temporal scopes that they discuss as a whole