Presentation is loading. Please wait.

Presentation is loading. Please wait.

Large-Scale Data Collection Metadata Management at the National Computational Infrastructure (NCI) Jingbo Wang 1, Irina Bastrakova.

Similar presentations


Presentation on theme: "Large-Scale Data Collection Metadata Management at the National Computational Infrastructure (NCI) Jingbo Wang 1, Irina Bastrakova."— Presentation transcript:

1 nci.org.au @NCInews Large-Scale Data Collection Metadata Management at the National Computational Infrastructure (NCI) Jingbo Wang 1, Irina Bastrakova 2, Daisy Duursma 3, Ben Evans 1, Kashif Gohar 1, Tim Mackey 2, Julia Martin 4, Matt Paget 5, Gerry Ryder 4, Guru Siddeswara 3, Lesley Wyborn 1 1 ANU, 2 Geoscience Australia, 3 TERN, 4 ANDS, 5 CSIRO

2 nci.org.au Overview 10PB+ Research Data VDI: Cloud scale user desktops on data Server-side analysis and visualization Data Services THREDDS Evans et. al. 2014 (in press @ ISESS) © National Computational Infrastructure 2014 Web-time analytics software

3 nci.org.au NCI Data Collections BOMGA CSIRO ANU Inter- national Other National CMIP5 3PB Astronomy (Optical) 200 TB Water Ocean 1.5 PB Atmosphere 2.4 PB Earth Observ. 2 PB Marine Videos 10 TB Geophysics 300 TB Weather 340 TB © National Computational Infrastructure 2014

4 nci.org.au We have established a petascale national data resource that is co-located with high-performance computing. NCI partners: Australian National University, Bureau of Meteorology, CSIRO and Geoscience Australia Support from the Australian Department of Education under Research Data Storage Infrastructure (RDSI) NCI manages 38+ data collections (10+ PB) in 7 categories: 1) earth system sciences, 2) climate and weather model data assets and products, 3) earth and marine observations and products, 4) geosciences, 5) terrestrial ecosystem, 6) water management and hydrology 7) others such as astronomy, social science and biosciences. NCI Data Collections cont.

5 nci.org.au disparate science data collections curated data collections Step1: record a data management plan including the conditions/licenses of source data, unify the metadata into a single metadata catalogue, record the (true)source of data, record product description/algorithm, tag with controlled vocabularies Step2: publish to the data services, record all URIs in data catalogue expose user-level metadata Data Metadata ready for data access step 1 step 2 © National Computational Infrastructure 2014 Ingest to availability

6 nci.org.au 1. Fill the Data Management Plan Data Management Plan (DMP) online form (attributes compliant with ISO 19115) © National Computational Infrastructure 2014 At NCI, collections are the operational form for data and metadata management DMP tool filters out heterogeneity of data from different sources in different formats 19115 compliant Collection level catalogue is automatically generated from DMPs reference related datasets, and record services for accessing the data.

7 nci.org.au 2: DMPs are mapped to the NCI Catalogue © National Computational Infrastructure 2014 http://geonetwork.nci.org.au

8 nci.org.au Top level GeoNetwork: Collection and Series Lens 3: Dataset specific GeoNetworks Dataset 1 Dataset 2 Dataset 3Dataset n Collection 1 Collection 2 Collection 3 … Lens 1: CSW Harvesting and Cross-walks (e.g. RIF-CS) Full harvest of the metadata Full Search GeoNetwork Full Search GeoNetwork (or domain) Dataset 1 Dataset 2 Dataset 3 … Lens 2: Domain Specific or User deep query Proposed multi-lens GeoNetwork architecture © National Computational Infrastructure 2014 Catalogue system infrastructure

9 nci.org.au 3. Data services and publishing process and in-situ analysis © National Computational Infrastructure 2014 VDI: Cloud scale user desktops on data Server-side analysis and visualization Data Services THREDDS

10 nci.org.au NCI data policy (publishing/citation) Reflect and interoperate with stakeholder policies and catalogues 4. Data citation –DOI minting methodology © National Computational Infrastructure 2014 Data collections/series/sets analysed: Data source Data ownership Who? How? Versioning - Dynamic data type Granularity What? DataCite schema

11 nci.org.au Characterisation Matrix for each collection – map the landscape © National Computational Infrastructure 2014 Data Citation Character Matrix

12 nci.org.au 5. Overall data publishing procedures © National Computational Infrastructure 2014

13 nci.org.au Tuesday Talk: Collaboratively Architecting a Scalable and Adaptable Petascale Infrastructure at NCI, Lesley Wyborn 5:05 PM - 5:18 PM, Moscone West 2020 Poster: Enabling Data Intensive Science through Virtual Laboratories and Science Gateways, David Lescinsky 08:00 AM - 12:20 PM Moscone South Poster Hall Friday Talk: Computational Environments and Analysis methods on the NCI HPC & HPD platform, Ben Evans 1:40 PM - 2:10 PM Moscone West 2020 Poster: The NCI HPC & HPD Platform for Analysis of Petascale Environmental Collections, Ben Evans 1:40 PM - 06:00 PM Moscone South Poster Hall © National Computational Infrastructure 2014


Download ppt "Large-Scale Data Collection Metadata Management at the National Computational Infrastructure (NCI) Jingbo Wang 1, Irina Bastrakova."

Similar presentations


Ads by Google