Download presentation
Presentation is loading. Please wait.
Published byBrenda Reed Modified over 9 years ago
1
C ommunity In ventory of E arthCube R esources for G eoscience I nteroperability data discovery is the most often cited issue in executive summaries on the EarthCube web site CINERGI Ilya Zaslavsky, Steve Richard and the CINERGI team http://workspace.earthcube.org/cinergi
2
Goals Large inventory of high quality information resources across disciplines, with traceable provenance, usable across EarthCube research scenarios: datasets, catalogs, vocabularies, information models, services, process models, repositories, etc. Make it open to the community Organize it to enable search and integration across domains and linking between information objects Plus links between resources, people/organizations, publications, models, workflows, software, activities, etc.
3
Approach Build on high-level resource inventory started at http://connections.earthcube.org Compile metadata for as many resources as we can (collect recommendations from geoscientists, harvest existing catalogs) Expose through simple search interface Use off the shelf technology: Geoportal, ISO metadata, CSW Make it accessible through EarthCube.org
5
READINESS ASSESSMENT 1 Catalog Metadata M1 Has a data listing M2 Uses minimal metadata standard, such as Dublin Core M3 Uses metadata standard, such as FGDC, or INSPIRE Catalog Search S1 Search Interface S2 Search API, not following a standard S3 Complies with Opensearch API S4 Complies with OGC CSW API Catalog Harvest H1 Has a harvest API H2 OAI API H3 OGC CSW API Vocabulary – Control and Access V1 Uses controlled terminology V2 Community Managed Terminology V3 SPARQL Vocabulary -- Representation T1 Listing of terminology, such as web pages T2 Uses ontology or SKOS Data Access API A1 Bulk download A2 Static URL A3 Web Service Data Query API Q1 Simple query subset Q2 Complex query Q3 Processing Subset Information Model Conceptual C0 Unspecified C1 Domain/Conceptual Model using UML C2 Domain/Conceptual Model using UML based on OGC or ISO standards Information Model as XML X1 XML Format. Schema may not be specified X2 Xml Schema Information Model as SQL S1 Provides an SQL Schema Also evaluated: processing services; visualization services; community consensus efforts; identifier persistence
6
High-level inventory and readiness assessment: viewer http://connections.earthcube.org
8
Staging Database Document processing components Harvest adapters Public access components Harvest adapters: components that connect to information sources and import descriptions of EarthCube resources into the staging database. Staging Database: document database that persists the originally harvested descriptions in their native state, as well as any additional information or updates resulting from subsequent processing/curation of the description Document processing components: components that pull documents from the staging database, perform various functions to upgrade content or transform presentation. The processed document may be pushed back to the staging database or out to the public access components Public access components: components that connect to document processors and implement external interfaces to present content for users Interfaces to the world Resource descriptions Ye Most Excellent EarthCube Inventory System
9
Then add features Links to organizations, researchers, other systems Validation Services Deep registration of datasets/databases (at feature level) Data search capabilities Quality/interop readiness assessment Annotation system
10
CINERGI Outline (without deep registration so far) Publication Staging and curation Harvesting Geoportal CSW, ISO 19115 ATOM, GeoRSS, etc. Linked data RDF, RDF store, eg Neo4j Extra metadata, provenance, links, annotations WAF w/XML ISO Staging DB: MDB MongoDB, CouchDB Geoportal, etc. ISODC other CSW, OAI-MPH, WAF, CKAN, other DISCO Validated triples 1. Metadata validation per record 2. Triggering parsers depending on metadata and validation results Spatial parser Person /org parser LOD parser Keyword parser Topic parser Time parser 3. 4. Finding ambiguities for manual curation Need a parser API so parsers can be added Duplicate detection, tagging, grouping Curation UI Results of parsing Provenance Duplicate flags Search UI Reporting to sources Pivot for search results Harvesting dashboard Record editor Community pivots Hot page Search in domain systems geoportal pivotDB
11
Challenges Scope Different levels of granularity Lack of formal information models Implicit domain semantics Multiple metadata registry platforms and standards Lots of data outside managed repositories Cross-domain governance vs domain systems Different expectations across domains (survey)
12
Initial inventory http://metadata. earthcube.org Resources from domain workshops and surveys + initial harvesting
13
Domain inventories: you are invited to participate! All sources of data mentioned at domain end-user workshops – are included Working with funded RCNs Step 1: Prepare an initial collection in a spreadsheet. Step 2: CINERGI will set up your community resource viewer and editing system, seeded with your collection Step 3: Community editing, updates and curation
14
Short questionnaire FunctionImportanceComments Making metadata from your facility available for search using standard metadata, via standard APIs 1 2 3 4 5 6 7 Unimportant Essential NA DK Tracking demand for and cross-domain usage of your resources 1 2 3 4 5 6 7 Unimportant Essential NA DK Identifying issues related to data and metadata quality and completeness 1 2 3 4 5 6 7 Unimportant Essential NA DK Tracking search hits that become searches for resources managed by your data facility 1 2 3 4 5 6 7 Unimportant Essential NA DK Connecting owners of relevant datasets to your facility for potential longer-term data management 1 2 3 4 5 6 7 Unimportant Essential NA DK Connecting data from your facility with people, publications, models, and projects 1 2 3 4 5 6 7 Unimportant Essential NA DK Identifying communities using data, tools, and models from your facility 1 2 3 4 5 6 7 Unimportant Essential NA DK Validating published metadata and service signatures from your facility 1 2 3 4 5 6 7 Unimportant Essential NA DK Finding and reporting to you resources that appear as duplicates across multiple registries 1 2 3 4 5 6 7 Unimportant Essential NA DK Potential added value by a cross-domain system Integration with cross-domain search Key characteristics for CINERGI See CINERGI Survey at http://workspace.earthcube.org/data-facilities
15
Development Team San Diego Supercomputer Center/UCSD Ilya Zaslavsky, David Valentine, Tom Whitenack Amarnath Gupta, Jeff Grethe (NIF project) Lamont /Columbia Univ./IEDA Kerstin Lehnert, Leslie Hsu Arizona Geological Survey Stephen Richard University of Chicago Tanu Malik Open Geospatial Consortium Luis Bermudez Community Partners Anthony Aufdenkampe: Critical Zone Observatories Shanan Peters: stratigraphy Bernhard Peucker- Ehrenbrink: Global River Observatories RCN projects that plan to organize community resources Test Enterprise Governance Building Blocks projects working on web services, brokering solutions Agencies International
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.