Download presentation
Presentation is loading. Please wait.
Published byMarcus Huntington Modified over 10 years ago
1
VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton graham.parton@stfc.ac.uk Royal Meteorological Society SIG Meeting, BAS, 5 th October 2011: Transmission, presentation and archiving of meteorological data
2
VO Sandpit, November 2009 Overview What is data archival Why do it? How do we do it within CEDA?
3
VO Sandpit, November 2009 What do we call “data archival” Placing data into a repository which is: Backed up Robust (identify data corruptions) Catalogued Recognised repository
4
VO Sandpit, November 2009 Why archive data Making data public - Openness of the result and repeatability are essential for scientific rigor Place to share data with project participants Re-purposing data Additional services (often for free!) Maybe required for legal reasons Secure Get credit And because if you don’t….
5
VO Sandpit, November 2009 Why archive data
6
VO Sandpit, November 2009 >100,000,000 files holding ~ 1 Pb of data ~38,000,000 files downloaded since October 2010 19,000+ register users of which ~3600 are currently ‘active’ users 250+ datasets 26 staff Responsible for + other services and projects (e.g. UKCIP, CMIP5 partner) … i.e.. We are highly reliant on scripted systems and a well structured archive Scale of CEDA operations
7
VO Sandpit, November 2009 Arrivals 3 rd Party Data providers Data Suppliers Ingest Archive Backup External discovery service Catalogue metadata External Users Web service download view discovery
8
VO Sandpit, November 2009 Arrivals 3 rd Party Data providers Data Suppliers Ingest Archive Data Preparation
9
VO Sandpit, November 2009 Data Preparation Data Management Plans including delivery schedules Conditions of Use/Licensing Support suppliers in data preparation Capture supporting documentation (formats, calibration information, flight logs, etc.) File naming and archive structure Set up ingest routes
10
VO Sandpit, November 2009 Data Preparation - File structure Take the bad data challenge…. File “sw010203” What are these data? Guess surface winds, but on what day? What are the units? Any convention? How do we read the file? Is this spatial or temporal data?... 1440 pairs of data in a file 4.31 155.3 3.92 136.1 5.15 140.2 4.23 137.1 4.75 150.2 4.71 137.9 4.35 146.5 4.52 138.0 4.83 153.7 5.40 145.8 4.63 141.0 4.90 137.3 4.31 143.3 4.58 157.0 4.94 141.7 4.65 143.1 4.63 143.0 4.88 149.5 5.42 148.5 4.92 140.4 4.04 146.7 3.92 151.5 5.02 135.3 5.06 151.6 4.65 152.3 4.31 168.8 3.79 145.3 5.92 152.9 5.02 145.8 4.77 161.6 4.79 144.1 4.60 147.5 5.33 150.1 4.81 141.0 6.02 146.9 4.38 149.0 4.42 142.5 4.58 133.4 4.35 150.5 4.96 149.8 5.56 143.4 5.08 148.5 5.19 141.6 4.40 142.4 4.10 152.6 5.02 134.0 4.94 142.9 5.27 144.4 5.38 141.5 5.88 144.8 6.00 140.1 4.75 158.3 5.08 148.1 5.46 163.5 4.27 150.8 4.69 138.8 5.71 144.0 5.21 138.8 5.00 132.4 5.06 144.4
11
VO Sandpit, November 2009 Supported Formats Highly structured metadata Standard Names
12
VO Sandpit, November 2009 Arrivals 3 rd Party Data providers Data Suppliers Ingest Archive External discovery service Catalogue metadata External Users Web service discovery Data Discovery
13
VO Sandpit, November 2009 CEDA Catalogue
14
VO Sandpit, November 2009 NERC Data Discovery Service data-search.nerc.ac.uk
15
VO Sandpit, November 2009 CEDA Document Repository cedadocs.badc.rl.ac.uk
16
VO Sandpit, November 2009 Citations for Data Creators: DOIs Citation (and DOI) Data Citation and DOI… but only if in a recognised repository
17
VO Sandpit, November 2009 Arrivals 3 rd Party Data providers Data Suppliers Ingest Archive External discovery service Catalogue metadata External Users Web service download view discovery Data Services
18
VO Sandpit, November 2009 Visualisation Services
19
VO Sandpit, November 2009 Visualisation Services ISIC Video Wall
20
VO Sandpit, November 2009 Visualisation Services
21
VO Sandpit, November 2009 Processing Services CEDA WPS: ceda-wps2.badc.rl.ac.uk/ui/home Chain services together Download result Job either run straight away Or sent to run on backend service
22
VO Sandpit, November 2009 Processing Services Trajectory Service
23
VO Sandpit, November 2009 OPeNDAP Service With security layer Navigable and scriptable interface to archive CEDA has applied security shell using “Open ID” technology Give powerful sub-setting service for large datasets
24
VO Sandpit, November 2009 What’s on the horizon? Continue to develop visualisation and data processing services Increasing data volumes becoming too large to move around Hosting services – provide virtual environments for people to work on the data without downloading From Petascale to Exoscale But all this NEEDS well data that uses standards driven metadata and formats
25
VO Sandpit, November 2009 Take Home Messages Team Digial Preservation Video Plan for data management Tap into standards when preparing data Get data catalogued for data discovery Data in supported repositories leads to recognition for efforts preparing data A suite of additional services add value to existing data
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.