Presentation is loading. Please wait.

Presentation is loading. Please wait.

Robert Dattore and Steven Worley

Similar presentations

Presentation on theme: "Robert Dattore and Steven Worley"— Presentation transcript:

1 The Research Data Archive at NCAR: A Metadata System that Enables Discovery Across a Diverse Archive
Robert Dattore and Steven Worley National Center for Atmospheric Research Boulder, CO, USA 01/25/2011 AMS 2011

2 Outline Introduction RDA - Then RDA - Now Data Discovery AMS 2011
01/25/2011 AMS 2011

3 Introduction Purpose - support climate & weather research at NCAR; services are extended worldwide as resources permit Observations, derived products; focus on historical atmosphere/ocean data Metrics Established in 1960s 600+ datasets, 4M files, 600 TB 7000 users annually 01/25/2011 AMS 2011

4 Changing data landscape
Introduction Changing data landscape Then – small datasets, single country/experiment, specialized formats Now – global coverage, high spatial/temporal resolutions, standard formats Result and challenge: Lots of diversity How can we provide uniform discovery? 01/25/2011 AMS 2011

5 Then 01/25/2011 AMS 2011

6 Unscalable System! Then Bottom line
Increasing data diversity, evolving technology; difficult to develop good systematic discovery README files, directory names Primarily via personal communications Major limiting factor – insufficient metadata No metadata standard, dictionaries Collection not uniform across all datasets Rigidly-structured flat ASCII files Archiving separate from metadata collection Unscalable System! We needed to make a change 01/25/2011 AMS 2011

7 Now 01/25/2011 AMS 2011

8 Adopted GCMD3 controlled vocabularies
Now Developed local standard for discovery based on DIF1 & THREDDS2; applied across all datasets Adopted GCMD3 controlled vocabularies Local enhancements; e.g. data formats Harvest two types of file metadata File attribute – name, size, compression, … File content - variables, levels, date range, ... Storage using XML 1Directory Interchange Format, NASA/GCMD3 ; 2Thematic Realtime Environmental Distributed Data Services; 3Global Change Master Directory 01/25/2011 AMS 2011

9 Metadata Collection 01/25/2011 AMS 2011

10 Tools that automatically capture file metadata
Metadata Collection Tools that automatically capture file metadata Integrated with archiving activities Web-based GUI - guided entry of dataset discovery metadata Required fields, constrained entries 01/25/2011 AMS 2011

11 Relational Databases 01/25/2011 AMS 2011

12 All together, support accurate data discovery
Relational Databases Fast access Dataset discovery metadata Single database (~0.3M rows) File attribute metadata Single database (~45M rows) Maintains dataset/data file relationships File content metadata Four databases structured to handle diversity of data (~920M rows) Maintains detailed parameter relationships All together, support accurate data discovery 01/25/2011 AMS 2011

13 Data Discovery 01/25/2011 AMS 2011

14 Data Discovery Dataset discovery Google-like dataset search
“Look For Data” interface – user-defined dataset catalogs Auto-generated dataset pages – always up-to-date Collections – all reanalyses, upper air obs, surface obs 01/25/2011 AMS 2011

15 Data Discovery Data file discovery Other
“Create Your Own List” for data file lists Show specific files from terabyte-sized collections Other “Station Viewer” Google maps; see stations, metadata 01/25/2011 AMS 2011

16 Metadata Sharing OAI-PMH UCAR Community Data Portal (THREDDS)
Global Change Master Directory (DIF) also Dublin Core, native easy to add others as necessary 01/25/2011 AMS 2011

17 Thank You! Web:
Questions/comments? 01/25/2011 AMS 2011

Download ppt "Robert Dattore and Steven Worley"

Similar presentations

Ads by Google