Retrieving Documents with Geographic References Using a Spatial Index Structure Based on Ontologies Database Laboratory University of A Coruña A Coruña,

Slides:



Advertisements
Similar presentations
Multimedia Database Systems
Advertisements

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Information Retrieval in Practice
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Xyleme A Dynamic Warehouse for XML Data of the Web.
Efficient Processing of Top-k Spatial Keyword Queries João B. Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, and Kjetil Nørvåg 1 SSTD 2011.
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Cláudio Baptista, UFCG A Model for Geographic Knowledge Extraction on Web Documents Cláudio E. C. Campelo and Cláudio de Souza.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
Learning to Advertise. Introduction Advertising on the Internet = $$$ –Especially search advertising and web page advertising Problem: –Selecting ads.
A New Point Access Method based on Wavelet Trees Nieves R. Brisaboa, Miguel R. Luaces, Diego Seco Database Laboratory University of A Coruña A Coruña,
LSDS-IR’08, October 30, Peer-to-Peer Similarity Search over Widely Distributed Document Collections Christos Doulkeridis 1, Kjetil Nørvåg 2, Michalis.
GTECH 361 Lecture 02 Introduction to ArcGIS. Today’s Objectives explore a map and get information about map features preview geographic data and metadata.
ICPCA 2008 Research of architecture for digital campus LBS in Pervasive Computing Environment 1.
Information Retrieval
Chapter 5: Information Retrieval and Web Search
Chapter 1 Database and Database Users Dr. Bernard Chen Ph.D. University of Central Arkansas.
Chapter 1 Database and Database Users Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2008.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Modern Information Retrieval Chap. 02: Modeling (Structured Text Models)
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
 Introduction Introduction  Purpose of Database SystemsPurpose of Database Systems  Levels of Abstraction Levels of Abstraction  Instances and Schemas.
Efficient Keyword Search over Virtual XML Views Feng Shao and Lin Guo and Chavdar Botev and Anand Bhaskar and Muthiah Chettiar and Fan Yang Cornell University.
Funded by: European Commission – 6th Framework Project Reference: IST WP 2: Learning Web-service Domain Ontologies Miha Grčar Jožef Stefan.
Spatiotemporal Tile Indexing Scheme Oscar Pérez Cruz Polytechnic University of Puerto Rico Mentor: Dr. Ranga Raju Vatsavai Computational Sciences and Engineering.
Master Thesis Defense Jan Fiedler 04/17/98
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Querying Structured Text in an XML Database By Xuemei Luo.
Chapter 1 : Introduction §Purpose of Database Systems §View of Data §Data Models §Data Definition Language §Data Manipulation Language §Transaction Management.
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.
Greg Janée chit-chat with CS database folks 10/26/01 Gazetteer database 4.5 million items, each having: –1+ names fair to good discriminator –1 geospatial.
Chapter 6: Information Retrieval and Web Search
FEN Introduction to the database field:  Applications, concepts and terminology Seminar: Introduction to relational databases.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
 2001 Prentice Hall Business Publishing, Accounting Information Systems, 8/E, Bodnar/Hopwood A field may be a single character or number, or it.
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
Search Engine Architecture
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
MySQL spatial indexing for GIS data in a web 2.0 internet application Brian Toone Samford University
Facilitating Document Annotation using Content and Querying Value.
XML and Database.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Modern Information Retrieval Presented by Miss Prattana Chanpolto Faculty of Information Technology.
M4 / September Integrating multimodal descriptions to index large video collections M4 meeting – Munich Nicolas Moënne-Loccoz, Bruno Janvier,
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
Towards Unifying Vector and Raster Data Models for Hybrid Spatial Regions Philip Dougherty.
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Introduction to DBMS Purpose of Database Systems View of Data
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
6 ~ GIR.
Search Engine Architecture
CHAPTER 3 Architectures for Distributed Systems
Database & Record Structure
Data Mining Chapter 6 Search Engines
Introduction to DBMS Purpose of Database Systems View of Data
Magnet & /facet Zheng Liang
Introduction to Database Systems
Recuperação de Informação B
Information Retrieval and Web Design
Introduction to Search Engines
Database management systems
Presentation transcript:

Retrieving Documents with Geographic References Using a Spatial Index Structure Based on Ontologies Database Laboratory University of A Coruña A Coruña, Spain Miguel R. Luaces, Ángeles S. Places, Francisco J. Rodríguez, Diego Seco

20th October, 2008Barcelona - SeCoGIS 20082/29 Motivation The Database Laboratory at the University of A Coruña works in two very active research fields: — Geographic Information Systems (GIS) EIEL Project ( — Information Retrieval (IR) Galician Virtual Library ( GISIR GIR Retrieval of geographically and thematically relevant documents in response to a query of the form

20th October, 2008Barcelona - SeCoGIS 20083/29 Motivation Many of the documents stored in digital libraries and document databases include geographic references — Example: “…the hurricane touched land at Veracruz…” Few index structures or retrieval algorithms take into account these geographic references Some proposals have appeared recently. But… There are some specific particularities of geographic space that are not taken into account by these proposals: — Hierarchical nature of geographic space — Topological relationships between the geographic objects

20th October, 2008Barcelona - SeCoGIS 20084/29 Contents Motivation Related Work Architecture Index Structure Supported Query Types Experiments Conclusions and Future Work

20th October, 2008Barcelona - SeCoGIS 20085/29 Related Work Many index structures and techniques have been proposed in each area: — IR: Inverted Index Geographic references are normal words — GIS: R-Tree These structures do not take into consideration the hierarchy of space The combination of both types of indexes (SPIRIT project): — Text-First — Geo-First They do not take into account the relationships between the geographic objects that they are indexing An Ontology can describe the specific characteristics of geographic space: — Ontologies are used in query expansion, relevance rankings, and web resource annotation Nobody has ever tried to combine it with other types of indexes to have a hybrid structure

20th October, 2008Barcelona - SeCoGIS 20086/29 Contents Motivation Related Work Architecture Index Structure Supported Query Types Experiments Conclusions and Future Work

20th October, 2008Barcelona - SeCoGIS 20087/29 Architecture Administration User InterfaceQuery User Interface Geographic Information Retrieval Module Geometry Supplier Service Gazetteer Service WMS Query Evaluation Service Document Abstraction Index constructionIndex Structure Geographic Space Ontology Service Continent Country Administrative Region Populated Place  contains Ontology Europe Spain Galicia Coruña Ontology Instance

20th October, 2008Barcelona - SeCoGIS 20088/29 Architecture Document storage workflow: — Keyword extraction Classic IR techniques (removing stopwords, stemmers) — Build the index structure Natural Language Processing Techniques Gazetteer Service Geographic Space Ontology Service Processing services: — Query solving service — Web Map Service (WMS) — Geographic information retrieval module User Interface: — Administration (manage the document collection) — Query

20th October, 2008Barcelona - SeCoGIS 20089/29 Contents Motivation Related Work Architecture Index Structure Supported Query Types Experiments Conclusions and Future Work

20th October, 2008Barcelona - SeCoGIS /29 Index Structure …… hotel1,3,7,8,12,… sea3,5,6,9,10,… …… Inverted Index …… Germany Spain …… Place Name Hash Table DocIds2,3,… Europe Asia … Spain Germany MadridGalicia … … Coruña

20th October, 2008Barcelona - SeCoGIS /29 Index Structure Based on a spatial ontology — It models both the vocabulary and the spatial structure of places for purposes of information retrieval Tree composed by nodes that represent place names — Each node: Keyword (a place name) Bounding box Document identifiers Children nodes — These nodes are connected by means of inclusion relationships — If the list of children nodes exceeds a threshold, an R-Tree is used Auxiliary structures: — Place name hash table — Traditional Inverted Index

20th October, 2008Barcelona - SeCoGIS /29 Index Structure Advantages: — All textual queries and spatial queries can be efficiently processed — Queries combining textual and spatial aspects are supported — Updates and optimizations in each index are handled independently Drawbacks: — The tree that supports the structure is possibly unbalanced — The structure is static

20th October, 2008Barcelona - SeCoGIS /29 Contents Motivation Related Work Architecture Index Structure Supported Query Types Experiments Conclusions and Future Work

20th October, 2008Barcelona - SeCoGIS /29 Supported Query Types Pure textual queries — “Retrieve all documents where the words hotel and sea appear” — How do we solve it? Inverted Index Pure spatial queries — “Retrieve all documents that refer to the following geographic area” — How do we solve it? Descend the structure + refine the result The same algorithm that is used with spatial indexes

20th October, 2008Barcelona - SeCoGIS /29 Supported Query Types Textual queries over a geographic area — “Retrieve all documents with the word hotel that refer to the following area” — How do we solve it? Example — Geographic references can be given using place names

20th October, 2008Barcelona - SeCoGIS /29 Supported Query Types …… hotel1,3,7,8,12,… sea3,5,6,9,10,… …… Inverted Index Europe Asia … Spain Portugal MadridGalicia … … Index Structure Text Result Spatial Result Query Result …… hotel1,3,7,8,12,… sea3,5,6,9,10,… …… Text Result1,3,7,8,12,… Spatial Result Query Result Text Result1,3,7,8,12,… Spatial Result12,14,… Query Result Query Window Europe Portugal Spain Galicia Coruña … DocIds12,14,… Text Result1,3,7,8,12,… Spatial Result12,14,… Query Result12,… Text Result1,3,7,8,12,… Spatial Result12,14,… Query Result12,…

20th October, 2008Barcelona - SeCoGIS /29 Supported Query Types Textual queries with place names — “Retrieve all documents with the word hotel that refer to Spain” — How do we solve it? Example — We save some time by avoiding a tree traversal

20th October, 2008Barcelona - SeCoGIS /29 Supported Query Types …… hotel1,3,7,8,12,… sea3,5,6,9,10,… …… Inverted Index Europe Asia … Spain Germany MadridGalicia … … Index Structure …… Spain Germany …… Place Name Hash Table Text Result Spatial Result Query Result DocIds2,3,… DocIds5,7,… DocIds12,14,… …… hotel1,3,7,8,12,… sea3,5,6,9,10,… …… Text Result1,3,7,8,12,… Spatial Result2,3,5,7,12,14,… Query Result3,7,12,… Text Result1,3,7,8,12,… Spatial Result2,3,5,7,12,14,… Query Result Text Result1,3,7,8,12,… Spatial Result2,3,5,7,… Query Result Text Result1,3,7,8,12,… Spatial Result Query Result Text Result1,3,7,8,12,… Spatial Result2,3,… Query Result …… Spain Germany …… Spain GaliciaMadrid Text Result1,3,7,8,12,… Spatial Result2,3,5,7,12,14,… Query Result3,7,12,…

20th October, 2008Barcelona - SeCoGIS /29 Supported Query Types Another improvement: QUERY EXPANSION — “retrieve all documents that refer to Spain” — How do we solve it? The Query Evaluation Service discovers that Spain is a geographic reference The Place Name Hash Table locates quickly the internal node that represents the geographic object Spain All the documents associated to this node are part of the result All the documents associated to the subtree are part of the result — The result contains not only those documents that include the term Spain, but also all documents that contain the name of a geographic object included in Spain

20th October, 2008Barcelona - SeCoGIS /29 Demo

20th October, 2008Barcelona - SeCoGIS /29 Contents Motivation Related Work Architecture Index Structure Supported Query Types Experiments Conclusions and Future Work

20th October, 2008Barcelona - SeCoGIS /29 Experiments Document collections FT-91FT-94 # documents5,36871,489 # textual terms64,711251,057 # place names candidates23,550299,661 # documents with candidates4,65260,823 # documents with place names4,18254,899 # place names167,7742,274,146 # spatial nodes21,28264,843

20th October, 2008Barcelona - SeCoGIS /29 Experiments Randomly generated pure spatial queries Ontology-based index versus R-Tree [FT-91] Query area0.001%0.01%0.1%1% Ontology-based index R-Tree

20th October, 2008Barcelona - SeCoGIS /29 Experiments Ontology-based index versus R-Tree (zones of high document density) [FT-91] Query area0.001%0.01%0.1%1% Ontology-based index R-Tree Ontology-based index versus R-Tree (zones of low document density) [FT-91] Query area0.001%0.01%0.1%1% Ontology-based index R-Tree

20th October, 2008Barcelona - SeCoGIS /29 Experiments Improved R-Tree — List of documents for each location Query area0.001%0.01%0.1%1% Ontology-based index Improved R-Tree Ontology-based index versus Improved R-Tree [FT-91] Ontology-based index versus Improved R-Tree [FT-94] Query area0.001%0.01%0.1%1% Ontology-based index Improved R-Tree

20th October, 2008Barcelona - SeCoGIS /29 Contents Introduction Motivation Related Work Architecture Index Structure Supported Query Types Experiments Conclusions and Future Work

20th October, 2008Barcelona - SeCoGIS /29 Conclusions We have defined an architecture for Geographic Information Retrieval systems It takes into account both textual and geographic references in the documents This is achieved by a new index structure that combines an inverted index, a spatial index and an ontology Traditional queries and new types of queries can be solved

20th October, 2008Barcelona - SeCoGIS /29 Future Work Evaluate the performance of the index structure Define algorithms to rank the retrieved documents Improve the disambiguation process of place names Explore the use of different ontologies Include other types of spatial relationships (e.g., adjacency)

Retrieving Documents with Geographic References Using a Spatial Index Structure Based on Ontologies Database Laboratory University of A Coruña A Coruña, Spain Miguel R. Luaces, Ángeles S. Places, Francisco J. Rodríguez, Diego Seco  Contact: