Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Lola M. Olsen Global Change Master Directory NASA’s Goddard Space Flight Center The Value of Controlled Vocabularies Beyond the DRM 2.0: The Importance.

Similar presentations


Presentation on theme: "1 Lola M. Olsen Global Change Master Directory NASA’s Goddard Space Flight Center The Value of Controlled Vocabularies Beyond the DRM 2.0: The Importance."— Presentation transcript:

1 1 Lola M. Olsen Global Change Master Directory NASA’s Goddard Space Flight Center The Value of Controlled Vocabularies Beyond the DRM 2.0: The Importance of Normalization For the Data Search SiCOP Conference 6 February 2007

2 2 Presentation Guide  The Global Change Master Directory (GCMD) and the DRM 2.0  Vision and Mission  Data Context, Data Descriptions, and Data Sharing  Content and Usage  Value (most appreciated aspects expressed from users)  Design Evolution  MD2 - MD9.7  The Data Reference Model 3.0 Presentation Guide

3 3  Strategic Vision:  To serve as a trusted source for Earth (and space) science metadata and related services.  To contribute to scientific discovery.  Mission:  To provide for the creation of “unique”, high quality data, services, and ancillary descriptions of data.  To design enabling authoring tools (with ability to tag entries) and robust scientific search software. [The ability to “tag” data at the time of writing is key. Tagging is more difficult when this ability is not integrated through tools and is less likely to be accurate and normalized.] Best to gather information when the “getting is good.”}  To assist the global community in the discovery of the scientific resources within the directory. The GCMD Vision and Mission

4 4 Presentation Guide Does the GCMD Follow the DRM 2.0?

5 5 Presentation Guide Data Context  Facilitates discovery of data through an approach to the categorization of data according to taxonomies.  Enables the definition of authoritative data assets within the COI, (using unique identifiers).  Provides linkages to data described, thereby managing the ‘info glut’, through:  Open API links, such as OPeNDAP (an open source framework that simplifies aspects of science data networking.)  Related_URL “controlled keyword” links to data.  New “use” metadata associated with detailed variables within the data sets.  ~ 20 Petabytes of data represented through the GCMD.

6 6 Number of Science Keywords by Topic

7 7 Number of Services Keywords by Topic

8 8 Ancillary Keywords Coming Soon: Orbit Types, Spectral/Frequency Domain, Launch Sites and 4 Level Taxonomy for Models.

9 9 Presentation Guide Data Description: “How do we understand what data are available?”  Provides a means to uniformly describe data - thereby supporting its discovery, harmonization, categorization, sharing, and rapid coordination/ communication.  GCMD uses the DIF “standard”. There are many advantages.  Descriptions must be identified UNIQUELY.

10 10 1995 - Major steps in evolution through modification to a multilevel Earth science hierarchy: Category > Topic > Term > Variable > Detailed Variable Two important trends were emerging that would affect evolution:  FGDC and concept of “metadata” for geospatial and other data initiated.  Web taking shape. 1997 - DIF evolves from 23 to 34 fields  Compatible with mandated FGDC and Dublin Core.  Era of metadata initiated. Other “standards” emerging: ANZLIC  Web expanding: Search interfaces abound; GCMD ready for this revolution. 1999 - DIF evolves to 35 fields in MD7. [3 added; 2 deleted]  DIF creation date and revision history added.  New field for paleoclimate data: paleo-temporal coverage.  Personnel subfields modified.  FGDC mandated, but DIF compatible with all required fields, serving users with added benefits of unique ID. Conversion tools available: FGDC=><=DIF 2002 - DIF acquires new sibling: the SERF, allowing cross linkages between services & data.  Redesign of query language; XML syntax; separation of presentation from business/application logic, with unexpected gifts: SOA architecture; querying multiple data sources for spatial, temporal, RDF and RDBMS databases, full-text; Struts facilitated creation of customized portals.  LDA experiment 2004 - MD9 ISO 19115 compliance and evolves to 36 fields:  3 New fields added: new address; 2 data resolution subfields The Evolving DIF **International Interoperability Forum functions at the international level through CEOS.

11 11 Data Set (DIF) Population by Topic

12 12 Data Services (SERF) Population by Topic

13 13 Presentation Guide Data Sharing Supports the access to data - enabled by capabilities provided by both the Data Context and Data Description standardization areas through: 1. Ad-hoc requests (such as a query of a data asset) - an OpenAPI supports ad hoc requests. Example: OPeNDAP. 2. Exchange of data (such as those that consist of fixed, reoccurring transactions among parties): Examples: GeoConnections (Canada) OAI with NCAR and NOAA Data centers that use docBuilder tools to submit metadata descriptions.

14 14 JCADM/AMD Collaborations: 18 NADC’s

15 15

16 16 GCMD Hits Recorded Jan 2001-Dec 2006

17 17  Maintenance reduction  Improving the Discovery of and Access to Data and Services  Ease of use, such as web site navigation.  Accuracy of Results  Content Requirements  Quality control  Integration with metadata authoring tools that allow real-time updating by data set holders/producers.  Integrated keyword and free-text search, with both as “refinements”.  Bidirectional linkages between data sets and data set services.  Providing virtual subsets of the directory  Standards: ISO 19115/19139; OpenGIS; XML; RDF.  NASA needs  Science User Working Group Recommendations; user and partner requests.  Evolving coding languages and databases [e.g., C to Perl to Java]. Evolution: Project Development Drivers

18 18 MD7 10/99 MD5 04/9 7 0201009998979695940304 MD2 10/94 MD4 04/96 MD6 04/98 MD8 OPS 06/01 MD8 08/03  DIFs  SERFs  10,000  5,000  200 MD History 5/04 MD7 MD5 Science keyword hierarchy FGDC Compatibility Isite free-text search MD2  500  2,000 Features  10/96 MD4 DIFmorph for translating between FGDC and DIF PC-based DIF Writing Tool Transitioned space science DIFs to NSSDC First web client distributed  DIFWEB tools X-Windows client JAM client First use of Oracle DIFmacs Authoring Tool MD  12/00  05/00  08/00 MD7 Switched code base from C to Perl Conference Calendar Personnel "Role" field Paleo_Temporal Coverage  1st request in FGDC  MD8   DIFbuilder tools  09/01  07/02 MD8 OPS Switched code base from Perl to Java XML syntax for metadata OPS for managing metadata  Services Prototype Launched  First time the coordinators were able to load their own DIFs/SERFs  docBUILDER tools  11/01 Upgraded Isite free-text JAVA Applet for geospatial search and "Advanced" search interface First web page to use Science keyword Topics to search Parent/Child display Related_URL field added HCIL and Matrix interface MD6

19 19 06/04 MD8 MD9.3 MD9.1- 9.2 05 07 02/05  DIFs  SERFs  500  1,200  1,000  15,000 030406 MD9.5 MD9.6  17,000 03/05 02/06 07/06 08/03 New Home Page Portals DTD for DIF and SERF Open API MD8 Struts Compatible with ISO 19115 metadata standard Geographic coverage map added to record MD9.1- 9.2 Lucene Search engine Search term highlighted in records Refinement search by keywords or full text search User Comment form MD9.3 Spatial search with google map Refinement option by data resolution for NASA portals Support foreign characters record display Subscription service for science keywords  docBUILDER tools available for public MD9.5 Relative Temporal coverage added to accommodate data pools Added two level hierarchy for Related_URL (e.g. support Get Data) MD9.6 Features MD9.4  08/05 Location and data center hierarchy Increased number of characters for fields Spatial and temporal resolution range keywords docBUILDER tool personalized templates MD9.4 MD History (MD 8 and beyond)

20 20 The Hype: Distributed Systems >Check if application is appropriate for needs. >Determine its ROI. >Offered LDA, as “Local Database Agents” - not “Latent Dirichlet Allocation”. >Be vigilant for change. >Know when to cut losses. >Scope the future. The Out-In-The-Wilderness Request: Example: Offline Authoring Tool >Check longevity to assure usefulness when development complete. >Determine the ROI in advance. >Know when to cut your losses. Following the “Hype” or Listening Too Intently to the “Wilderness Request”.

21 21 “The Web doesn’t have a single, comprehensive clearinghouse where you can find all of the data and domains of knowledge covering all geographies …. Instead there are hundreds of … “Very few geospatial information scientists are working on the challenge beyond the GCMD (Global Change Master Directory), whose database holds more than 15,000 [actually this number is 17,300 +] descriptions of data sets and services covering all aspects of earth and environmental sciences.” Finding Ourselves Liebhold (May 2005) O’Reilly Network

22 22  Controlled Keywords (& definitions) to reference and retrieve a record or sets of records.  Authoring Tools with Update Capability. {Heavy use of controlled keywords.}  Keyword & Full Text Search to Data and Services with ordered “Result Set”. (No need to build a client to query, although the option to do so is available through Open API.)  Customized Portals - virtual subsets of the directory, created through use of controlled keywords.  The “Get Data” feature, which takes the user directly to the data.  Unique data set and services entries.  Easy compliance to related standards through XML.  Results available through Google.  Well-designed home page, with access to full set of services provided. Internal View of GCMD Value (2007)

23 23 MD Software Version 9.7  Support for 2 additional levels of Science Keyword hierarchy.  Improved Features for docBUILDER Authoring Tools  Support for writing Platform and Instrument descriptions & new keywords  Support for “GET DATA” tab.  “Text Only” display for 508 compliance. (in docBuilder)  Improved multimedia sample.  Improved spatial coverage selection.  Ability to change entry identifier.  Reference Guide for use of international characters and symbols.  RSS Feed, in addition to Keyword Subscription Service, to signal new directory entries.  Upgrades to Java 1.5 and Tomcat 5.  Location Keywords & “Chronostratigraphic Units” recreated as true taxonomies.

24 24  Keyword Functionality Upgrade.  Functionality “abstracted” to use a SKOS data model for navigating arbitrary taxonomies.  Integrated SKOS query into query language.  Backed by Berkeley DB XML for querying.  Example: [skos:Parameters=‘EARTH SCIENCE|ATMOSPHERE’] AND [skos:Instrument=‘AVHRR’]  New Platform/Instrument Display Reflects Taxonomic Changes.  Support for loading, extracting, querying.  Support for navigating through new taxonomies.  Support for full text search.  Support for creating these descriptions in docBUILDER. MD Software Version 9.7

25 25 SKOS Application

26 26

27 27

28 28

29 29 Page 1 of SERF

30 30 Page 2 of SERF

31 31 Page 1 of DIF

32 32 Page 2 of DIF

33 33  docBUILDER Enhancements  Option for public vs private view.  Automated reminders to metadata authors.  Initial testbed for multilingual capabilities using SKOS.  Variable Keyword extensions for “use” metadata.  Client to ECHO for metadata sharing using web services. MD Software Version 9.8

34 34 The Data Reference Model 3.0, Web 3.0 & SOAs Data Resource Awareness Agent Data & Information & Knowledge Repository staticdynamic Figure 3-1 DRM standardization Areas LanguageLogic


Download ppt "1 Lola M. Olsen Global Change Master Directory NASA’s Goddard Space Flight Center The Value of Controlled Vocabularies Beyond the DRM 2.0: The Importance."

Similar presentations


Ads by Google