Mining For Lost Treasure National Geospatial Data Clearinghouse Archibald Warnock U.S. Federal Geographic Data Committee A/WWW Enterprises
What is Clearinghouse? v A distributed service to locate geospatial data based on characteristics expressed in metadata v Clearinghouse allows a user to pose a query of all or a portion of the community in a single session v Like a spatial AltaVista
National Geospatial Data Clearinghouse v Distributed data producers and users. v Key components: –Data documentation (metadata) –Networking (Internet) –Serving, searching, and accessing software u Z39.50 Search and Retrieve Protocol u WWW - World Wide Web
Components of Clearinghouse v There are three functional areas that interact to create the Clearinghouse: –Metadata preparation and indexing –Metadata service –User Access via Gateway forms
Clearinghouse Method Metadata preparation Metadata validation/ staging Metadata publication User access
Clearinghouse Design v The Clearinghouse in its distributed form includes a registry of servers, several WWW-to-Z39.50 gateways, and many Z39.50 servers v A primary goal of Clearinghouse is to provide the ability to find spatial data throughout the entire community, not one site at a time
Essential Configuration FGDC Gateways Web Client Web Client Node Clearinghouse Sites
Userdownloads query form User downloads query form FGDC Gateways Web Client Web Client Node Clearinghouse Sites
User sends query to web server FGDC Gateways Web Client Web Client Node Clearinghouse Sites
Gateway passes query to Clearinghouse Servers FGDC Gateways Web Client Web Client Node Clearinghouse Sites
Gateway receives and collates “hits” FGDC Gateways Web Client Web Client Node Clearinghouse Sites
Client receives results summary as HTML FGDC Gateways Web Client Web Client Node Clearinghouse Sites
Client can request a specific metadata record for viewing FGDC Gateways Web Client Web Client Node Clearinghouse Sites
Node in More Detail MetadataIndex/DB Z39.50 server InternetData
Data v The most expensive investment for an organization v Created by many different organizations v To solve many different problems v Using many different methods and technologies
But... v Data are hard to find v Data are difficult to access v Data are hard to integrate v Data are not current v Data are undocumented v Data are incomplete
The uses of metadata v Provides documentation of existing internal geospatial data resources within an organization (inventory) v Permits structured search and comparison of held spatial data by others (advertising) v Provides end-users with adequate information to take the data and use it in an appropriate context (liability)
Metadata Solutions v Numerous software solutions available v Commercial and free-ware v Standalone, DB-linked, GIS-linked v Permit collection and structuring of FGDC-compatible metadata v Present metadata as HTML, XML, or text
GILS, Dublin Core and Others v Dublin Core is a minimal (15 fields) generic metadata scheme for virtually any kind of document v GILS represents a more detailed approach, including most of DC, providing greater interoperability v GILS is less bibliographically oriented than (Z39.50) BIB-1 v GILS is lightweight compared to GEO (FGDC) and EOS/CIP (which have specific functional requirements)
What Structured Metadata Means -1 v GILS - Fewer fields More documents More documents More metadata records More metadata records Skinnier metadata records Skinnier metadata records Easier abstraction Easier abstraction v FGDC - More fields Fewer documents Fewer documents Fewer metadata records Fewer metadata records Fatter metadata records Fatter metadata records Less abstraction Less abstraction GILS is a good, general compromise
What Structured Metadata Means - 2 v A Z39.50 profile as defines a language At some level, Z39.50 is a detail At some level, Z39.50 is a detail Protocols are about communication, profiles are about abstraction and GILS is about content Protocols are about communication, profiles are about abstraction and GILS is about content Z39.50 guarantees that the user’s query can be unambiguously decoded - no guarantees about content Z39.50 guarantees that the user’s query can be unambiguously decoded - no guarantees about content We could implement the profile over any protocol - http, CORBA, etc. We could implement the profile over any protocol - http, CORBA, etc. v Do we have to use Z39.50? No, but the abstraction is required No, but the abstraction is required Z39.50 already includes the abstraction model Z39.50 already includes the abstraction model
How much metadata is enough? v Internal documentation for local use (local inventory) v Basic documentation for discovery of information holdings (catalog/search) v Detailed documentation to provide end-users with adequate information for re-use (asset management)
Server Solutions v Z39.50 Protocol is used v “GEO” Geospatial Metadata Profile is published for Z39.50 implementors to understand FGDC metadata structures v Supports search across numeric, text, date, and spatial extent and full-text v Freeware and commercial solutions
Gateway in more detail Nodes Gateway Web server interface Z39.50 clients Web Gateway Case Web client Web client
User Interfaces v HTML-based forms hosted at Gateways are the primary access method v Java map-based interface from MEL allows more sophisticated search v Inclusion of search capabilities in GIS client software is possible
Who’s in Clearinghouse? v 109 Nodes (servers) online as of 3/1/99 –28 Federal, national scope –35 State/University state-wide scope –28 International scope or location –18 Local or Regional scope
US Federal Participation v NOAA (10) v USGS (6) v FEMA (sampler) v NRCS climate and soils v CIESIN/EPA v CIESIN/NASA v DOT NTAD v National Park Service v Army Corps of Engineers v Tri-Services Center v National Wetlands Inventory v Census (sampler) v Minerals Management Service
State Participation v New York (2) v North Carolina v Oklahoma v Kansas v Texas v Montana (3) v Vermont v Pennsylvania v West Virginia v Washington v Wisconsin v Wyoming (2) v Florida v Alabama v New Mexico v Arizona Georgia Illinois Minnesota Alaska California Delaware Nebraska (2) New Jersey
Regional/Local Participation v McKinley Co, NM v City of Santa Fe, NM v North Texas GIS v Research Planning v Sabine R Authority, TX v San Francisco Bay v S Florida Ecosystem v SW Natural Resources v Olympic Peninsula, WA v Greater Yellowstone v Helena NF v Ecological Reserves, KS v MIT/Mass Boston DOQs v Great Lakes EIS v Eastern Sierra
International Participation v NOAA/Japan GOIN v South Africa (2) v ESA AVHRR sampler v GELOS, Italy v PAIGH, Mexico v S57 Hydrography, Canada v NRL MEL v Africa DDS v Inter-American Geospatial Data Network v Hong Kong v CIESIN/USDA Global Environmental Change v Australia (10+) v Costa Rica v Caribbean CEPNET, Jamaica
Planned or Funded Nodes v Mt Desert Island, ME v SW Washington COG v NASA GCMD v CODEPLAN, Brazil v Iowa v Missouri v Kentucky v South Dakota v Oregon v Louisiana v Ohio v Connecticut MAGIC v Colorado v NW Ecosystems
Clearinghouse provides... v Discovery of spatial data v Distributed search worldwide v Uniform interface for spatial data searches v Advertising for your data holdings
For more information: Visit the FGDC website: Contact the Clearinghouse Coordinator, Doug Nebert or Archie Warnock