FGDC and ASF Using Structured Metadata Archie Warnock A/WWW Enterprises
FGDC Project n FGDC Project is based at USGS n Cooperative effort of USGS, A/WWW, Blue Angel Technologies n Utilizing structured metadata to locate geospatial data through Z39.50 n Based on FGDC Metadata Standard n Centralized search gateway, distributed sites
GILS and the Advanced Search Facility n ASF is a US Dept. of Commerce project, built by Pilot Research Associates, A/WWW Enterprises and collaborators n Information Communities - networks of cooperative, low-impact, distributed nodes n The basic interchange will be structured GILS metadata n Search on full text and GILS metadata
FGDC Reference Implementation n FGDC Node Software Iindex, Isearch, Iutil - the search engine zclient, izclient, zping, zbatch - the Z39.50 clients zserver, zserverNT - the Z39.50 servers n zcon & zgate - the WWW-to-Z39.50 gateway (not supported by FGDC)
ASF Reference Implementation n Isearch - basic search engine n Yaz - Z39.50 toolkit n htDig - URL harvester n ZAP - Web-to-Z39.50 gateway n Custom APIs and Components Search API xyz, the Z39.50 server ids, the internode data communications
GILS, Dublin Core and Others n Dublin Core is a minimal (15 fields) generic metadata scheme for virtually any kind of document n GILS represents a more detailed approach, including most of DC, providing greater interoperability n GILS is less bibliographically oriented than (Z39.50) BIB-1 n GILS is lightweight compared to GEO (FGDC) and EOS/CIP (which have specific functional requirements)
What Structured Metadata Means -1 n GILS - Fewer fields More documents More metadata records Skinnier metadata records Easier abstraction n FGDC - More fields Fewer documents Fewer metadata records Fatter metadata records Less abstraction GILS is a good, general compromise
What Structured Metadata Means - 2 n A Z39.50 profile as defines a language At some level, Z39.50 is a detail Protocols are about communication, profiles are about abstraction and GILS is about content Z39.50 guarantees that the user’s query can be unambiguously decoded - no guarantees about content We could implement the profile over any protocol - http, CORBA, etc. Does we have to use Z39.50? No, but the abstraction is required Z39.50 already includes the abstraction model
Related Documents n Tools ftp://ftp.cnidr.org/pub/software/Isite ftp://ftp.clark.net/pub/warnock/Software n A/WWW Enterprises
Isearch Features Full text search Search on text fields Search on numeric fields with appropriate relations (>, <, =) Search on date fields with appropriate relations (before, during, after) Search on geospatial bounding box Boolean searches Phrase searching Right truncation Proximity searching (within N characters) Case insensitive searching, punctuation ignored Configurable stopword list Customizable results presentation Relevance ranked scores Term weighting
Isearch Document Types n ASCII text n SGML tagged fields HTML GILS (XML) templates FGDC templates n Colon delimited fields GCMD DIF templates n USMARC records n IAFA templates n SOIF templates n First line in file n Filenames n folders n Usenet news archives whois++ templates n Multi-file documents n US patents n BIBTeX n Medline