Download presentation
Presentation is loading. Please wait.
Published byAlan Peters Modified over 6 years ago
1
Data Flows in ACTRIS: Considerations for Planning the Future
Markus Fiebig, and the EBAS team NILU - Norwegian Institute for Air Research
2
Requirements for a Research Infrastructure Workflow I
Operational infrastructure sets different demands than scientific work: Maintenance: Easy, Fast, Cost Efficient Minimise number of points of failure Ensure fast access to components to be maintained. Homogeneous data processing Ensure / prove that all data are comparable, i.e. processed in exactly the same way. Document what has been done “Where does this point in the IPCC come from?” Every data (pre-)product and processing tool must be identified, versioned, and archived. Consequence: You don’t set up the same service several times unless you have a very good reason. Conclusion may vary between NRT and manually QAed best quality data.
3
Requirements for a Research Infrastructure Workflow II
Data flows need to be well-defined, i.e. doesn’t contain ambiguities. All steps in a data processing / curation work flow need to be defined, and implementation quality assured. All steps in data processing need to be documented. The provenance (which processing steps has the data seen?) has to follow with the data. The roles in executing the workflow need to be defined (Who is doing what and when?).
4
Current ACTRIS Surface In-Situ NRT Workflow
Sub-network data centre: auto-creates hourly data files (level 0). initiates auto-upload to NRT server. FTP transfer to data centre Station: collects raw data in custom format transfer Data Centre: check for correct data format (level 0). check whether data stays within specified boun- daries (sanity check). FTP transfer to data centre Not well-defined Products and actions aren’t properly separated. Doesn’t use correct symbols. Station: auto-creates hourly data files (level 0). initiates auto-upload to NRT server. automatic feedback automatic feedback User access (restricted) via web-interface: ebas.nilu.no Processing to level 1.5 Processing to level 1 EBAS database User access via machine-to-machine web-service Hourly level 1 data file Hourly level 1.5 data file
5
Future ACTRIS Surface In-Situ NRT Workflow
Level 0 data Raw as provided by instrument Instrument specific Metadata attached Manual Processing: Manual assignment of flags Manual calibration correction. Automatic Processing: automatic assignment of flags No calibration correction Level 0a data Raw as provided by instrument Instrument specific Metadata attached Manual corr. / flagging applied Level 0b data Raw as provided by instrument Instrument specific Metadata attached Automatic flagging applied Reg NRT
6
Future ACTRIS Surface In-Situ NRT Workflow Cont’d
Reg NRT Identical algorithm Raw Processing: Calculation of targeted property. Remove instrument parameters and instrument failure periods Raw Processing: Calculation of targeted property. Remove instrument parameters and instrument failure periods Level 1a data Final targeted property Only valid data Original time res. Manual corr. / flagging applied Level 1b data Final targeted property Only valid data Original time res. Automatic flagging applied Reg NRT
7
Future ACTRIS Surface In-Situ NRT Workflow Cont’d
Reg NRT Identical algorithm Time averaging: Calculate hourly means. Disregard invalid data Add coverage flags. Copy env. cond. flags occuring in ave. period. Time averaging: Calculate hourly means. Disregard invalid data Add coverage flags. Copy env. cond. flags occuring in ave. period. Level 2 data Final targeted property Hourly averaged Coverage & env. cond. flags Manual corr. / flagging applied Level 1.5 data Final targeted property Hourly averaged Coverage & env. cond. flags Automatic flagging applied
8
How to Use Workflow to Meet Requirements
Use Persistent Identifiers (PIDs) to tag data pre-products and processing software (each version). Use DOIs to identify final data products. Have versioned archives for both, data (pre-)products and software for data processing. If same processing is done in different locations, guarantee that result is identical to ensure homogeneity. Include provenance information in metadata, i.e. metadata states data pre- products and software version used for processing. This applies to ALL data and software used ANYWHERE in the infrastructure. When setting up / updating workflows and their diagrams: Distinguish between data(sub)products, processing steps, and decisions. Use correct diagram symbols. Remove ambiguities from workflows. Discussion: Distribute roles in infrastructure as to achieve these aims most efficiently.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.