Alexandria Digital Library Project Introduction to digital gazetteers and their development issues Alexandria Digital Library Project Gazetteer Development.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Schedule of Releases (since Tromso meeting) and New Access Interfaces.
Alexandria Digital Library Project Integration of Knowledge Organization Systems into Digital Library Architectures Linda Hill, Olha Buchel, Greg Janée.
Chapter 10: Designing Databases
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
Developing a Metadata Exchange Format for Mathematical Literature David Ruddy Project Euclid Cornell University Library DML 2010 Paris 7 July 2010.
An Operational Metadata Framework For Searching, Indexing, and Retrieving Distributed GIServices on the Internet By Ming-Hsiang.
Alexandria Digital Library Project The ADEPT Bucket Framework.
Information Retrieval in Practice
Database Management An Introduction.
1 CS 502: Computing Methods for Digital Libraries Lecture 20 Multimedia digital libraries.
© Tefko Saracevic, Rutgers University1 metadata considerations for digital libraries.
1 COS 425: Database and Information Management Systems XML and information exchange.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
Geospatial standards Beyond FGDC Geog 458: Map Sources and Errors March 3, 2006.
Alexandria Digital Library Project The ADL Gazetteer & Thesaurus Service Protocols Greg Janée
Dublin Core as a tool for interoperability Common presentation of data from archives, libraries and museums DC October 2006 Leif Andresen Danish.
Introduction and Conceptual Modeling
Overview of Search Engines
Alexandria Digital Library Project Goals and Challenges in Georeferenced Digital Libraries Greg Janée.
Alexandria Digital Library Project Building a Distributed Geospatial Library Greg Janée where we are now where we’re going what.
BIS310: Week 7 BIS310: Structured Analysis and Design Data Modeling and Database Design.
ISO Standards: Status, Tools, Implementations, and Training Standards/David Danko.
AIXM 5.1 Seminar 12 – 13 December 2011
U.S. Department of the Interior U.S. Geological Survey NWIS, STORET, and XML National Water Quality Monitoring Council August 20, 2003.
North American Profile: Partnership across borders. Sharon Shin, Metadata Coordinator, Federal Geographic Data Committee Raphael Sussman; Manager, Lands.
1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
U.S. Department of the Interior U.S. Geological Survey NWIS, STORET, and XML Advisory Committee on Water Information September 10, 2003 Kenneth J. Lanfear,
ZLOT Prototype Assessment John Carlo Bertot Associate Professor School of Information Studies Florida State University.
Content and Computer Platforms Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers.
Complex Data Transformations in Digital Libraries with Spatio-Temporal Information B. Martins, N. Freire, J. Borbinha Instituto Superior Técnico, Technical.
Page 1 LAITS Laboratory for Advanced Information Technology and Standards ISO & Status Liping Di Laboratory for Advanced Information Technology.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Greg Janée chit-chat with CS database folks 10/26/01 Gazetteer database 4.5 million items, each having: –1+ names fair to good discriminator –1 geospatial.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Alexandria Digital Earth ProtoType DIGITAL LIBRARIES AND ENVIRONMENTAL INFORMATION Terence R. Smith Alexandria Digital Library Project.
Alexandria Digital Library Project Introduction ---- Digital Gazetteers Integration into Distributed Library Services JCDL 2002 Workshop Sponsored by Networked.
Introduction to Morpho BEAM Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
Alexandria Digital Library Project The ADL Gazetteer Protocol Greg Janée
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
RSISIPL1 SERVICE ORIENTED ARCHITECTURE (SOA) By Pavan By Pavan.
Documenting UAF Data Ted Habermann NOAA/NESDIS/National Geophysical Data Center.
1 Understanding Cataloging with DLESE Metadata Karon Kelly Katy Ginger Holly Devaul
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
Alexandria Digital Library Greg Janée
Introduction to Morpho RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Slide 1 SDTSSDTS FGDC CWG SDTS Revision Project ANSI INCITS L1 Project to Update SDTS FGDC CWG September 2, 2003.
INTRODUCTION TO GIS  Used to describe computer facilities which are used to handle data referenced to the spatial domain.  Has the ability to inter-
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Alexandria Digital Library Project Four Steps to Geospatial Enlightenment Greg Janée Additional text in “Notes” view.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Introduction to Active Directory
Copyright (c) 2014 Pearson Education, Inc. Introduction to DBMS.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
A look to the past for the future- The North American Profile Sharon Shin Metadata Coordinator Federal Geographic Data Committee.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
Introduction: Databases and Database Systems Lecture # 1 June 19,2012 National University of Computer and Emerging Sciences.
Alexandria Digital Library The ADL Testbed Greg Janée
1 CS 430: Information Discovery Lecture 23 Non-Textual Materials.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
© 2017 by McGraw-Hill Education. This proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
Alexandria Digital Library ADL Metadata Architecture Greg Janée.
Information Retrieval in Practice
INTRODUCTION TO GEOGRAPHICAL INFORMATION SYSTEM
XML QUESTIONS AND ANSWERS
Proposal of a Geographic Metadata Profile for WISE
Presentation transcript:

Alexandria Digital Library Project Introduction to digital gazetteers and their development issues Alexandria Digital Library Project Gazetteer Development Team February 2002 Contributions by Jim Frew, Linda Hill, Greg Janee, and Dave Valentine

Alexandria Digital Library Project ADL Gazetteer Team February 2002 ADEPT, Smith, October 1999 Place-based information challenge Metadata <!-- a geographic latitude in degrees north of the equator or geographic longitude in degrees east of the Greenwich meridian, e.g., " " --> Search Engines Cataloging – Metadata Creation Where is …? What’s there? What happened there? Harvested Webpages GIS datasets Oral histories Georeferencing by placename and by spatial footprint Aerial photos Books Maps Data Papers Translation needed between placenames - locations Gazetteers

Alexandria Digital Library Project ADL Gazetteer Team February 2002 What's a gazetteer? o Originally (in the simplest case)  setof (name, location) –the "index" in an atlas –a "geographical dictionary" o ADL basics  setof (name, type, location) o ADL extended  Time-stamped names, extents, and relationships  Descriptive information about names and places  Merging of information about a place from multiple sources o Preferred definition  Spatial dictionary of named and typed places

Alexandria Digital Library Project ADL Gazetteer Team February 2002 Digital gazetteer essentials (controlled vocabulary)

Alexandria Digital Library Project ADL Gazetteer Team February 2002 Roles of gazetteers in digital libraries o Collections  useful information in their own right o References  canonical (official or preferred) names and locations o "Finding aids"  where's this? location = gaz(name, type)  what's here? (name, type) = gaz(location)

Alexandria Digital Library Project ADL Gazetteer Team February 2002 Gazetteers as georeferencing services o Implicit: turn textual references into locations  location = gaz(geoparse(text))  Textual Geospatial Integration (TGI) project goal o Indirect: use gazetteer locations as query constraints  query(..., gaz(name, type))

Alexandria Digital Library Project ADL Gazetteer Team February 2002 Digital libraries and gazetteers o Standards + Services =  Communities >> domain-specific gazetteers  Protocols >> search & retrieval for distributed gazetteers o Federations  "middleware" (broker) aggregates access to multiple gazetteers

Alexandria Digital Library Project ADL Gazetteer Team February 2002 Spatial representation of place Footprints (latitude/longitude values)  Nature and usefulness of spatial generalizations –Points – most common; useful for disambiguating one place from another –Bounding boxes – simplest footprint for spatial extent; easy to handle in information systems; faithfulness to shape is a problem –Generalized polygons – needs to be defined for gazetteer information services: how many points; effect of generalization on retrieval –Complex polygons – computationally intensive to handle  Inherent spatial relationships: contains, overlaps, is- contained-by, adjacent  Explicit statements of relationships  Documenting spatial accuracy

Alexandria Digital Library Project ADL Gazetteer Team February 2002 Temporal aspects of gazetteer data o Representation of  Historical placenames  Spatial extents linked to time  Historical administrative relationships  Historical data values: e.g., population  Historical types/roles: e.g., church becomes a school o Highly important for cultural history collections, specimen collection sites for previous expeditions, … o Issues  Structural design issues for linking time-stamped description elements together  User interface design for time-based searching and display

Alexandria Digital Library Project ADL Gazetteer Team February 2002 Names for geographic places o Concept of “the” name versus variant names  Authorized naming bodies  Preferred name varies with location and use  Attribute set for names (see ADL Gazetteer Content Standard online) o Language and character code set issues o Name codes: standard codes for postal addresses and other purposes o “Surnames” as indicators of type of place o Perth Airport o Baldwin County o Admiralty Oil Seep o Jar Qudug Gas Field o Sussex Correctional Institution o Kindley Field o The Rock o Toledo Useful Not Useful

Alexandria Digital Library Project ADL Gazetteer Team February 2002 Typing o Typing supports queries such as  “What schools exists Miami and where are they?”  Show wetlands in southern Florida o Typing schemes  List  Hierarchical (2-level list)  Thesaurus (hierarchy, synonymous terms, associations) o No shared typing schemes among gazetteers o ADL Feature Type Thesaurus (online)  1156 terms: 210 preferred terms and 946 non-preferred terms  Based on existing typing schemes and placenames themselves o Goal: community adoption of typing schemes (controlled vocabulary)

Alexandria Digital Library Project ADL Gazetteer Team February 2002 Merging of data and attribution o For a named geographic feature, merge information about it o Allow multiple footprints, names, data, etc. from different sources and for different times o Document the source of every piece of information o Tucson example (ADL Gaz ID if Internet connection available)

Alexandria Digital Library Project ADL Gazetteer Team February 2002 Digital gazetteer information exchange o Gazetteer data comes from many sources o Being able to share this data would bring great benefits in richness of data o What’s needed for data exchange  A content standard – structure for documentation of information  An exchange format – XML version of the content standard  Shared typing schemes o What’s needed for interoperability among gazetteers  Gazetteer service protocol –ADL draft in progress –OpenGIS protocol in progress

Alexandria Digital Library Project ADL Gazetteer Team February 2002 ADL implementation o 4.4 million entry global gazetteer – merging of the two federal gazetteers plus other entries o Internet gazetteer service – worldwide usage o Published components  Gazetteer Content Standard  Feature Type Thesaurus  XML DTD o “Content Standard” approach instead of “thesaurus approach”  Geographic footprint required  Explicit statement of relationships among features optional

Alexandria Digital Library Project ADL Gazetteer Team February 2002 Contrasting structures ISO TC211 Spatial Reference System (SRS) Gazetteer Location Type Location Instance parent child 0..* 1. Names are unique 2. Gazetteers are typed 3. Type scheme and gazetteer are packaged together 4. Footprint optional 5. Cryptic description 6. Gazetteer structured as a thesaurus 1. Uniqueness by ID 2. Gazetteer holds various types 3. Type schemes independent 4. Footprint required 5. Expressive description Gazetteer Location Instance Location Type child Type Scheme parent 0..* ADL

Alexandria Digital Library Project ADL Gazetteer Team February 2002 Contrasting structure examples ISO TC211 ADL identifier Towns scope large population centres territory of use UK custodian Ordnance Survey coord. ref. sys. Nat Grid of Gr Brit location type town Title ADL Gazetteer Responsible Party ADL Project, UCSB Scope & Purpose A gazetteer associates geographic names with geographic locations and other descriptive information. A gazetteer can … Subject Coverage Worldwide … Gazetteer Descriptions geographic identifier Cambridge temporal extent alternative geographic identifier none geographic extent , , , , position administrator Cambridgeshire County Council parent location instance Cambridgeshire Sample Entries Feature Name Cambridge (BGN-NIMA-1) Feature Type populated places (ADL FTT) Spatial Ref. –2,37,51.73 (BGN-NIMA-1) Related Entity IsPartOf UTM grid WC43 Related Entity IsPartOf United Kingdom Source BGN-NIMA-1: U.S. Board on Geographic Names, U.S. National Imagery and Mapping Agency, …

Alexandria Digital Library Project ADL Gazetteer Team February 2002 ADL gazetteer protocol: goals o Create published standard to support access to distributed gazetteer services o Capture the essence of...  what a gazetteer is  what a gazetteer does o Balance client needs vs. server burden  clients want functionality, uniformity, completeness  servers want minimal requirements, overhead  “non-preclusive simplicity” wins o Accommodate differing implementations  semantics deliberately underspecified

Alexandria Digital Library Project ADL Gazetteer Team February 2002 Protocol: abstract gazetteer model o Gazetteer = gazetteer entries + relationships o Gazetteer entry  describes a single place  one entry per place o Inter-entry relationships  Explicit: Sacramento is the “capital of” California  Implicit: geospatial relationships

Alexandria Digital Library Project ADL Gazetteer Team February 2002 Protocol: gazetteer entry o Identifier o Attributes  1+ names –unqualified, e.g., “San Diego”  1+ footprints –region defined in WGS84 coordinates –not necessarily contiguous  0+ classes –term drawn from vocabulary or thesaurus –city, park, mountain, lake, etc. o Attribute qualifiers  Primary (e.g., primary name or primary footprint)  Historical (e.g., historical name or historical footprint)

Alexandria Digital Library Project ADL Gazetteer Team February 2002 Protocol: services Stateless, independent, synchronous functions o get-capabilities()  capabilities description  which protocol features are supported o query( query )  reports  returns all entries that match a query o download()  reports  downloads entire gazetteer o add-entry( report )  identifier o relate-entries( relationship, identifier 1, identifier 2 ) o remove-entry( identifier )

Alexandria Digital Library Project ADL Gazetteer Team February 2002 Protocol: query language o Five fundamental constraint types...  identifier –find gazetteer entry #  name –find “San Diego”  footprint –find places that overlap a given region  class –find place by type; e.g., cemeteries  relationship –find the capital of California o …and boolean combinations thereof

Alexandria Digital Library Project ADL Gazetteer Team February 2002 Protocol technology o In current version  XML –XML schemas, XML namespaces, XML linking  OpenGIS Geography Markup Language (GML)  HTTP o Newest technologies for later implementation  SOAP (Simple Object Access Protocol)  WSDL (Web Services Description Language)

Alexandria Digital Library Project ADL Gazetteer Team February 2002 Protocol: Future directions/outstanding issues o Seeking broad deployment  At least to the “rule of three”: i.e., 3 implementations o Qualification of names in queries  “Santa Barbara, CA” o Relationships  codify specific relationships?  relationship types? –topological, role,... o Extensions  if and how to enrich gazetteer protocol model  federation of gazetteers

Alexandria Digital Library Project ADL Gazetteer Team February 2002 Database implementation issues o Issues  Database Size  Loading Issues  Indexing Issues  Real Query Issues

Alexandria Digital Library Project ADL Gazetteer Team February 2002 Gazetteer database size issues o 4.4 million records  5.9 million names associated with records o 2 databases  Main for report production and data loading –33 tables; generic types and indexing  ADL bucket approach for searching –7 tables –Uses object-oriented and spatial data types, –Uses clustered indexes, text indexes, and spatial indexes

Alexandria Digital Library Project ADL Gazetteer Team February 2002 Gazetteer loading issues o Large data loads can fill logs  Backup, split files that are being loaded, make logs larger  Turn off logging during loading  Turn off indexing during loading o Know about database extents  Unload or copy to new table with extent defined large enough to hold data

Alexandria Digital Library Project ADL Gazetteer Team February 2002 Gazetteer indexing issues o Indexing is the most important issue for performance  Corrupt indexes were a big problem, which was solved by reloading the database o Text indexing  Original “blade” required more than 1 gigabyte ram to index gazbucket database  Multilingual: How do you handle it? o Multiple types and custom datatypes complicate indexing  We cannot use parallel database features

Alexandria Digital Library Project ADL Gazetteer Team February 2002 Gazetteer query issues o Real queries cause real problems  Hand-coded query optimizer being used  Generic query translator –In general, much faster than hand-coded queries o Query of Death (generic query translator)  The query optimizer chooses the wrong path for queries using (text and spatial and type) constraints  Solution: submit with optimizer directives

Alexandria Digital Library Project ADL Gazetteer Team February 2002 Duplicate detection for gazetteers o Premise: one entry for one place o Problem:  Places have multiple names, types, and footprints  How, then, can duplicate entries for the same place be identified? o Approach:  This is a “textual geospatial integration” problem  “Test record” is the query; result set is a ranked list of gazetteer entries, ranked according to their similarity to the “test record”  Tests include –Source comparison (Are the records from the same contributor?) –Name comparison (Same primary names and/or variant names) –Type comparison (Same scheme? Same type?) –Spatial comparison (Spatial relationships according to footprint type)

Alexandria Digital Library Project ADL Gazetteer Team February 2002 Example of duplicate detection o New record (incoming)  Name: Paris  IsPartOf: Texas  Type scheme: Local  Type: PPL  Coords: ,33.66 o Existing record  Name: Paris (county seat)  IsPartOf: Lamar County, Texas  Type scheme: ADL FTT  Type: populated places  Coords: -94,32 –96,34 Example test results (hypothetical scores) Source comparison: 0.0 (sources are not the same) Name comparison: 0.8 (partial but close match of primary names) Type comparison: 0.8 (different schemes; types are similar) Spatial comparison: 1.0 (point is contained within the box) Rank value: 2.6

Alexandria Digital Library Project ADL Gazetteer Team February 2002 Duplicate detection technologies o Text  Syntactic normalization of placenames (e.g., removing parenthetical phrases)  Information retrieval techniques for text similarity  Thesaurus techniques for related types o Spatial  Spatial match types –Polygon-to-polygon match (contains, overlaps) –Point-in-polygon match (contained within) u Edge buffers where point near the edge of polygon –Point-to-point match (nearness)  Accuracy weighting (confidence in the coordinate values)  Visual checking ( evaluating footprints displayed on a map)

Alexandria Digital Library Project ADL Gazetteer Team February 2002 ADL Gazetteer development o Web page for all ADL Gazetteer developments is at o Includes links to  ADL Gazetteer Server  ADL Gazetteer Middleware Server  Content Standard  Feature Type Thesaurus  Gazetteer Service Protocol  Information about online discussion list