Geographical Information Retrieval Instituto Superior Técnico - INESC-ID Data Management and Information Retrieval Group (DMIR) - TagusPark Por Bruno Martins.

Slides:



Advertisements
Similar presentations
Presentation at Society of The Query conference, Amsterdam November 13-14, 2009 (original title: Learning from Google: software design as a methodology.
Advertisements

Reference Model Ideas. Geospatial Semantics and Ontology Reference Model Metadata Data Sources Underlying Ontologies Semantic and Ontology Services Ontology.
GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Alexandria Digital Library Project Integration of Knowledge Organization Systems into Digital Library Architectures Linda Hill, Olha Buchel, Greg Janée.
Korean Place Name Information Service on the Web 2.0 Environment
Chapter 5: Introduction to Information Retrieval
Introduction to Information Retrieval
Alexandria Digital Library Project The ADEPT Bucket Framework.
Overview of key concepts and features
Historical Gazetteer Integration: CHGIS, Regnum Francorum & GeoNames Working Digitally with Historical Maps AAG 2012 Merrick Lex Berman & Johan Åhlfeldt.
Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool.
Spatial Mining.
Information Retrieval in Practice
Search Engines and Information Retrieval
1 Adaptive Management Portal April
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)
Retrieving Documents with Geographic References Using a Spatial Index Structure Based on Ontologies Database Laboratory University of A Coruña A Coruña,
1 CS 502: Computing Methods for Digital Libraries Lecture 20 Multimedia digital libraries.
Cláudio Baptista, UFCG A Model for Geographic Knowledge Extraction on Web Documents Cláudio E. C. Campelo and Cláudio de Souza.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
 Image Search Engine Results now  Focus on GIS image registration  The Technique and its advantages  Internal working  Sample Results  Applicable.
Web Mining Research: A Survey
A Mobile World Wide Web Search Engine Wen-Chen Hu Department of Computer Science University of North Dakota Grand Forks, ND
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
A Digital Geolibrary: Integrating Keywords and PlacenamesECDL A Digital GeoLibrary: Integrating Keywords And Place Names Mathew Weaver and Lois Delcambre.
Information Retrieval
Development of Japanese GIS Tool for use in the Humanities ○ Masatoshi ISHIKAWA †, Yoichi KAWANISHI ††, Hidefumi OKUMURA †††, Shoichiro HARA †††† † University.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Overview of Search Engines
Alexandria Digital Library Project Goals and Challenges in Georeferenced Digital Libraries Greg Janée.
Lecture 5 Geocoding. What is geocoding? the process of transforming a description of a location—such as a pair of coordinates, an address, or a name of.
Design and Implementation of a Geographic Search Engine Alexander Markowetz Yen-Yu Chen Torsten Suel Xiaohui Long Bernhard Seeger.
Concept of Map Projection Presented by Reza Wahadj University of California,San Diego (UCSD)
Concept of Map Projection. Map Projection A map projection is a set of rules for transforming features from the three- dimensional earth onto a two-dimensional.
Databases & Data Warehouses Chapter 3 Database Processing.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Search Engines and Information Retrieval Chapter 1.
TOURISM PLANNING OF ALMATY INFRASTRUCTURE IN GEOINFORMATION SYSTEMS Erkin H. KakymzhanovErkin H. Kakymzhanov.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Complex Data Transformations in Digital Libraries with Spatio-Temporal Information B. Martins, N. Freire, J. Borbinha Instituto Superior Técnico, Technical.
Geo-Semantics and Interoperability for Spatial Data and Technology Joshua Lieberman Traverse Technologies Inc. SOCoP Workshop, Mc Lean, VA, October 17,
Extracting Metadata for Spatially- Aware Information Retrieval on the Internet Clough, Paul University of Sheffield, UK Presented By Mayank Singh.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
The Internet 8th Edition Tutorial 4 Searching the Web.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.
Alexandria Digital Library Project Introduction ---- Digital Gazetteers Integration into Distributed Library Services JCDL 2002 Workshop Sponsored by Networked.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
OWL Representing Information Using the Web Ontology Language.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
GeoCrossWalk Use Cases. Reference use Information server Searching (1) Geo-parsing & indexing The GeoCrossWalk Server GeoCrossWalk use cases Searching.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
Alexandria Digital Library Project Four Steps to Geospatial Enlightenment Greg Janée Additional text in “Notes” view.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Alexandria Digital Library The ADL Testbed Greg Janée
Discovery and Metadata March 9, 2004 John Weatherley
Dist(q,d) is the metric distance between footprints q and d dist MBR (q) is the diagonal length for the MBR of the query The DIGMAP GeoParser is a software.
Introduction Multimedia initial focus
6 ~ GIR.
Information Retrieval and Web Search
Information Retrieval and Web Search
Web IR: Recent Trends; Future of Web Search
Thanks to Bill Arms, Marti Hearst
Information Retrieval
Context-Aware Internet
Presentation transcript:

Geographical Information Retrieval Instituto Superior Técnico - INESC-ID Data Management and Information Retrieval Group (DMIR) - TagusPark Por Bruno Martins

Motivation for Geographic IR  Geo-information associates things and events with places.  Geo-information is abundant on the Web and on Digital Libraries.  Collections of geo-referenced photographs.  Newsfeeds.  General databases of geo-referenced information.  Around 80% of Web pages contain references to places.  Many information needs are related to a given geographical context.  Find me the nearest restaurants.  Find me news about Lisboa.  Find me photographs taken in Sintra. ...  Around 20% of Web searches are “local” in nature.  Geographic information is part of our everyday lives!

Existing Geographical IR Systems  Web search engines with “local search”  Yahoo! Local, Google Local,...  Integration with navigation mechanisms.  Mostly explore “Yellow-pages” information.  Web-based GIS platforms (virtual globes)  Google Earth,...  Explore databases of georeferenced info.  OGC standards for Web-GIS  Photo repositories with “local search”  Flickr geo-tagging interface,...  Explore automatic “GPS” geo-referencing.  Many more location-based services  Advertisement, discussion communities,...  Location is everywhere in information systems.

Challenges for Geographical IR Very few systems explore information on the Web directly. –They instead used databases of georeferenced information. Geographic context embedded in natural language descriptions. –This presents problems to automated processing. –Place names are ambiguous and get confused with names of organizations, people, buildings and streets. Web queries depend on exact match of text terms. –Handling structured queries (e.g. “concept, relation, location”). –Intelligent interpretation of spatial relationships (“near”, “west” etc). –Ranking results against some measure of geographic relevance.

Geographical Information Retrieval (GIR)  Geographic information retrieval (GIR) is concerned with the retrieval of geographically referenced information objects.  Information objects can be maps, images, digital geographic data or even textual (web) documents.  New multidisciplinary field  Combines techniques from database systems, information retrieval, digital libraries, user interfaces, geographical information systems,... Geographic Information Systems Information Retrieval Knowledge Management Geographic IR

The difference among GIR and GIS GIS is concerned with exact spatial representations and complex analysis at the level of the individual spatial object or field. –Users are experts, information is structured and unambiguous! GIR is concerned with retrieving geo-referenced information resources that may be relevant to a geographic query region. –Unstructured and ambiguous information, everyday applications! Similar to the difference between search engines and relational database systems!

Geo-referencing and GIR Information objects can be geo-referenced by either place names or by geographic coordinates (i.e. longitude & latitude) –Geographic coordinates represent exact physical location –Placenames are ambiguous (main problem of GIR) Spatial relations may be either: –Geometric: distance and direction measured on a continuous scale. –Topological: spatially related but not directly measurable. Y X

The typical steps involved in GIR

Anatomy of a Geographical IR System Textual Spatial Indexes Spatial Textual Search Engine Relevance Ranking Ranked Results Search Request + Query footprint Unranked Results Ontology a.k.a. Gazetteer User Interface Broker Ranked Results Query disambiguation Geo- tagging Textual Spatial Info. Resources Document Footprints Text Indexing Query footprint Mapping

Gazetteers / Geographic Ontology Database containing placenames, the spatial relationships among them and the associated geographical footprints. Support for geo-referencing with basis on the place names over text. Many problems in using traditional gazetteers for GIR.

Roles of the Gazetteer in GIR User Interface Query Disambiguation Geo-Tagging Metadata Extraction document collection document footprints Relevance Ranking Spatial Index document footprints Search Component Query Expansion (query footprint) gazetteer

Challenges to using Gazetteers in GIR To be useful in GIR the gazetteer should support –Different locations and boundary changes, integrating data from multiple sources. –Synonymous and variant names with differing locations for the same entity. –Different relationships among concepts. –Names in multiple languages. –“Fuzzy” regions and intra-urban place names. More than gazetteers, we need an ontology!

Existing Gazetteer Systems/Services Alexandria Digital Library (ADL) Gazetteer. –~6 million entries –Has tried to standardize the format, description, and distribution of gazetteer data. –Has a published, detailed schema. –Basis for OGC standard. Geonames website. –Integrates information from multiple sources. –Publishes OWL ontology. –~6 million entries EuroGeoNames project.

GeoTagging = GeoParsing+GeoCoding Geo-parsing Recognizing geographic references, ignoring non-geographic uses of place terminology Geo-coding Attaching a unique quantitative location (footprint) to the extracted geographic references

GeoParsing Textual Documents The presence of placenames can be recognised with the help of gazetteers/geo-ontologies (i.e. lists of names) Some types of place references given over text: –the name of the place : Coimbra –an address: INESC-ID, Rua Alves Redol, 9 Lisboa –an address fragment: “Manuel lived near Largo do Rato in Lisboa” –a postcode / zip code: –a phone number : most Lisbon phone numbers start with

Ambiguity in GeoParsing Documents Examples of false place references: Personal names Smedes York,Jack London Business names Dorchester Hotel,York Properties.. Street names Oxford Street, London Road… Common words bath, battle, derby, over, well, …… Approach for handling ambiguity: –Look for patterns in surrounding context!!! –One reference per discourse.

GeoCoding place references in text Many different places with the same name (referent ambiguity) Newport, Cambridge, Springfield, Lisboa……… Use context to decide: references to parent or nearby places. Choose most important one: by population or place type. Optional step taken by some GIR approaches: Finding a document’s encompassing geographic scope. –Combine all place references given in the document. –Use heuristics to guide the process.

Document Indexing for Geographic IR Different indexing strategies are possible: –Index documents with basis on gazetteer ids. –Use documents scopes to create document footprints (point, bounding rectangle,...) and use footprints to index documents. Strategy for handling queries: –Convert query to a query footprint/gazetteer id. –Match query footprint to document footprints/ids. –Rank documents according to “relevance”.

Handling queries in GIR systems

Data structures for indexing in GIR Typical strategy is to have separate indexes. –Inverted index for text. –R-tree for footprints. Access spatial index with query footprint/gazetteer id. Access text index with query terms. Merge results and find the intersection. Term1D1, D2, D23, … Term2D9, D11, D100, … Term3D27, D85,..

Ranking search results in GIR Spatial similarity can indicate relevance –Documents whose spatial content is more similar to the spatial content of query should appear first. But we need to consider both the: –Thematic relevance: BM25, TF-IDF,... –Geographic relevance: proximity, containment,... Geometric (e.g. distance) and non-geometric (e.g. topology) –Other importance metrics: PageRank State of the art consists of doing a linear combination.

Existing GIR systems : MetaCarta The MetaCarta system –Pioneer system addressing all aspects given in this talk. –Conducts geo-parsing and geo- coding of text documents, and sends back possible location references with relative strength scores. –Uses Natural Language Processing (NLP) to find possible location references. –Contains a gazetteer of ~14 million entries.

Other GIR Systems : Research projects Prototype system from the SPIRIT EU project –Spatially-aware information retrieval on the Internet. –Geo-tagging of Web documents with basis on geo-ontology. Alexandria Digital Library –Digital library of geo-referenced materials. –Focus on development of a large gazetteer. GREASE, GIPSY, Web-a-Where, GeoXWalk,... –Many more research projects addressing GIR aspects individually. –GeoCLEF evaluation contest similar to TREC. Project DIGMAP under development at IST –Digital library for old maps and historical cartography resources –Indexing metadata records for geographic retrieval.

Current Challenges in Geographic IR  Improve “conventional GIR” components and methods  Geo-tagging, spatio-textual indexing and geo-relevance ranking.  Improved understanding of spatial natural language terminology.  Principled approaches for integration and evaluation of GIR.  Better user interfaces for exploration of GIR results.  Integration of geographical with temporal aspects.  Everything we do happens in space and time!  Creation of rich place ontologies with world-wide coverage.  Fuzzy regions and intra-urban placenames present challenges  Open GeoInformation Web services and Geospatial Semantic Web.

Where To Find More Information Georeferencing: The Geographic Associations of Information –By Linda L. Hill (Author), MIT Press Proceedings of the Workshops on Geographical IR –Edited by Chris Jones and Ross Purves (4th edition in 2007, Lisbon) Talk to me using the address