Presentation is loading. Please wait.

Presentation is loading. Please wait.

VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton Royal Meteorological Society SIG Meeting,

Similar presentations

Presentation on theme: "VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton Royal Meteorological Society SIG Meeting,"— Presentation transcript:

1 VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton Royal Meteorological Society SIG Meeting, BAS, 5 th October 2011: Transmission, presentation and archiving of meteorological data

2 VO Sandpit, November 2009 Overview What is data archival Why do it? How do we do it within CEDA?

3 VO Sandpit, November 2009 What do we call “data archival” Placing data into a repository which is: Backed up Robust (identify data corruptions) Catalogued Recognised repository

4 VO Sandpit, November 2009 Why archive data Making data public - Openness of the result and repeatability are essential for scientific rigor Place to share data with project participants Re-purposing data Additional services (often for free!) Maybe required for legal reasons Secure Get credit And because if you don’t….

5 VO Sandpit, November 2009 Why archive data

6 VO Sandpit, November 2009 >100,000,000 files holding ~ 1 Pb of data ~38,000,000 files downloaded since October 2010 19,000+ register users of which ~3600 are currently ‘active’ users 250+ datasets 26 staff Responsible for + other services and projects (e.g. UKCIP, CMIP5 partner) … i.e.. We are highly reliant on scripted systems and a well structured archive Scale of CEDA operations

7 VO Sandpit, November 2009 Arrivals 3 rd Party Data providers Data Suppliers Ingest Archive Backup External discovery service Catalogue metadata External Users Web service download view discovery

8 VO Sandpit, November 2009 Arrivals 3 rd Party Data providers Data Suppliers Ingest Archive Data Preparation

9 VO Sandpit, November 2009 Data Preparation Data Management Plans including delivery schedules Conditions of Use/Licensing Support suppliers in data preparation Capture supporting documentation (formats, calibration information, flight logs, etc.) File naming and archive structure Set up ingest routes

10 VO Sandpit, November 2009 Data Preparation - File structure Take the bad data challenge…. File “sw010203” What are these data? Guess surface winds, but on what day? What are the units? Any convention? How do we read the file? Is this spatial or temporal data?... 1440 pairs of data in a file 4.31 155.3 3.92 136.1 5.15 140.2 4.23 137.1 4.75 150.2 4.71 137.9 4.35 146.5 4.52 138.0 4.83 153.7 5.40 145.8 4.63 141.0 4.90 137.3 4.31 143.3 4.58 157.0 4.94 141.7 4.65 143.1 4.63 143.0 4.88 149.5 5.42 148.5 4.92 140.4 4.04 146.7 3.92 151.5 5.02 135.3 5.06 151.6 4.65 152.3 4.31 168.8 3.79 145.3 5.92 152.9 5.02 145.8 4.77 161.6 4.79 144.1 4.60 147.5 5.33 150.1 4.81 141.0 6.02 146.9 4.38 149.0 4.42 142.5 4.58 133.4 4.35 150.5 4.96 149.8 5.56 143.4 5.08 148.5 5.19 141.6 4.40 142.4 4.10 152.6 5.02 134.0 4.94 142.9 5.27 144.4 5.38 141.5 5.88 144.8 6.00 140.1 4.75 158.3 5.08 148.1 5.46 163.5 4.27 150.8 4.69 138.8 5.71 144.0 5.21 138.8 5.00 132.4 5.06 144.4

11 VO Sandpit, November 2009 Supported Formats Highly structured metadata Standard Names

12 VO Sandpit, November 2009 Arrivals 3 rd Party Data providers Data Suppliers Ingest Archive External discovery service Catalogue metadata External Users Web service discovery Data Discovery

13 VO Sandpit, November 2009 CEDA Catalogue

14 VO Sandpit, November 2009 NERC Data Discovery Service

15 VO Sandpit, November 2009 CEDA Document Repository

16 VO Sandpit, November 2009 Citations for Data Creators: DOIs Citation (and DOI) Data Citation and DOI… but only if in a recognised repository

17 VO Sandpit, November 2009 Arrivals 3 rd Party Data providers Data Suppliers Ingest Archive External discovery service Catalogue metadata External Users Web service download view discovery Data Services

18 VO Sandpit, November 2009 Visualisation Services

19 VO Sandpit, November 2009 Visualisation Services ISIC Video Wall

20 VO Sandpit, November 2009 Visualisation Services

21 VO Sandpit, November 2009 Processing Services CEDA WPS: Chain services together Download result Job either run straight away Or sent to run on backend service

22 VO Sandpit, November 2009 Processing Services Trajectory Service

23 VO Sandpit, November 2009 OPeNDAP Service With security layer Navigable and scriptable interface to archive CEDA has applied security shell using “Open ID” technology Give powerful sub-setting service for large datasets

24 VO Sandpit, November 2009 What’s on the horizon? Continue to develop visualisation and data processing services Increasing data volumes becoming too large to move around Hosting services – provide virtual environments for people to work on the data without downloading From Petascale to Exoscale But all this NEEDS well data that uses standards driven metadata and formats

25 VO Sandpit, November 2009 Take Home Messages Team Digial Preservation Video Plan for data management Tap into standards when preparing data Get data catalogued for data discovery Data in supported repositories leads to recognition for efforts preparing data A suite of additional services add value to existing data

Download ppt "VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton Royal Meteorological Society SIG Meeting,"

Similar presentations

Ads by Google