Data Flows in ACTRIS: Considerations for Planning the Future

Slides:



Advertisements
Similar presentations
Configuration management
Advertisements

Software change management
Configuration management
Usage of the memoQ web service API by LSP – a case study
1 Configuring Internet- related services (April 22, 2015) © Abdou Illia, Spring 2015.
Case Tools Trisha Cummings. Our Definition of CASE  CASE is the use of computer-based support in the software development process.  A CASE tool is a.
UI Standards & Tools Khushroo Shaikh.
Web Servers How do our requests for resources on the Internet get handled? Can they be located anywhere? Global?
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
AJAC Systems Hotel Reservation System
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Eric Westfall – Indiana University Jeremy Hanson – Iowa State University Building Applications with the KNS.
Content Strategy.
Implementing and Administrating Redundant PI-Advanced Computing Engine (ACE) Servers Craig Taylor PI Administrator.
Data management in the field Ari Haukijärvi 2nd EHES training seminar.
Configuration Management (CM)
Chapter 9 How Do Users Share Computer Files?. What is a File Server A (central) computer which stores files which can be accessed by network users.
Use of Hierarchical Keywords for Easy Data Management on HUBzero HUBbub Conference 2013 September 6 th, 2013 Gaurav Nanda, Jonathan Tan, Peter Auyeung,
Case study: Connecting data to the Data Provider Sergey Sukhonosov National Oceanographic Data Centre, Russia Expert training on the Ocean Data Portal.
Real-time Observation Monitoring and Analysis Network Haihe LIANG, etc. TECO-2008.
Where Should the GALION Data Reside? Centrally or Distributed? Introduction to the Discussion Fiebig, M.; Fahre Vik, A. Norwegian Institute for Air Research.
MSE Portfolio Presentation 1 Doug Smith November 13, 2008
LEADS/EMS DATA VALIDATION IPS MeteoStar December 11, 2006 WHAT IS VALIDATION? From The Dictionary: 1a. To Make Legally Valid 1b. To Grant Official.
PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.
(1) Test Driven Development Philip Johnson Collaborative Software Development Laboratory Information and Computer Sciences University of Hawaii Honolulu.
The Online World ONLINE DOCUMENTS. Online documents Online documents (such as text documents, spreadsheets, presentations, graphics and forms) are any.
1 Lesson 14 Sharing Documents Computer Literacy BASICS: A Comprehensive Guide to IC 3, 4 th Edition Morrison / Wells.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
De Rigueur - Adding Process to Your Business Analytics Environment Diane Hatcher, SAS Institute Inc, Cary, NC Falko Schulz, SAS Institute Australia., Brisbane,
Role of Metadata in dissemination of census data Regional Seminar on dissemination and spatial analysis of census data, Nairobi, September, 2010.
1 Requirements Management - II Lecture # Recap of Last Lecture We talked about requirements management and why is it necessary to manage requirements.
New Superpowers for FME Server Mark Stoakes Manager, Professional Services.
REMI Database Antall Fernandes. REMI ● A relational database to facilitate data - metadata organization of various research studies. ● Interface into.
1 The XMSF Profile Overlay to the FEDEP Dr. Katherine L. Morse, SAIC Mr. Robert Lutz, JHU APL
Advanced Higher Computing Science
Near-Real-Time Data Collection at WDCA: Why & What
Graphical Data Engineering
New Developments in ACTRIS Surface In-Situ Data Flow and Handling
How Do Users Share Computer Files?
Laboratory Information Management Systems (LIMS)
Corporate Presentation
Calibration meeting summary
The Development Process of Web Applications
Making Data Providers’ Contribution Count
Data Ingestion in ENES and collaboration with RDA
Fernando Aguilar, IFCA-CSIC
System Design and Modeling
Software Documentation
Maintaining software solutions
CS 501: Software Engineering Fall 1999
Customization Guidelines for BMC Remedy IT Service Management 7.5
Lesson 14 Sharing Documents
Lecture 09:Software Testing
Configuring Internet-related services
How To Report QA Measure Outcomes With ACTRIS Surface In Situ Data
Chapter 13 Quality Management
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
Software Validation in Accredited Laboratories
Customization Guidelines for BMC Remedy IT Service Management 7.5
Dynamics AX Upgrades Microsoft Dynamics AX 2009
Configuration management
PCW-09 Vision: Information Center Approval System
M. Kezunovic (P.I.) S. S. Luo D. Ristanovic Texas A&M University
Reportnet 3.0 Database Feasibility Study – Approach
Reporting QA Measures to EBAS – and a Word on Flagging
Introduction – workshop on EBAS and Data Quality
Lecture 23 CS 507.
Best Practices in Higher Education Student Data Warehousing Forum
Integrated Statistical Production System WITH GSBPM
Presentation transcript:

Data Flows in ACTRIS: Considerations for Planning the Future Markus Fiebig, and the EBAS team NILU - Norwegian Institute for Air Research

Requirements for a Research Infrastructure Workflow I Operational infrastructure sets different demands than scientific work: Maintenance: Easy, Fast, Cost Efficient Minimise number of points of failure Ensure fast access to components to be maintained. Homogeneous data processing Ensure / prove that all data are comparable, i.e. processed in exactly the same way. Document what has been done “Where does this point in the IPCC come from?” Every data (pre-)product and processing tool must be identified, versioned, and archived. Consequence: You don’t set up the same service several times unless you have a very good reason. Conclusion may vary between NRT and manually QAed best quality data.

Requirements for a Research Infrastructure Workflow II Data flows need to be well-defined, i.e. doesn’t contain ambiguities. All steps in a data processing / curation work flow need to be defined, and implementation quality assured. All steps in data processing need to be documented. The provenance (which processing steps has the data seen?) has to follow with the data. The roles in executing the workflow need to be defined (Who is doing what and when?).

Current ACTRIS Surface In-Situ NRT Workflow Sub-network data centre: auto-creates hourly data files (level 0). initiates auto-upload to NRT server. FTP transfer to data centre Station: collects raw data in custom format transfer Data Centre: check for correct data format (level 0). check whether data stays within specified boun- daries (sanity check). FTP transfer to data centre Not well-defined Products and actions aren’t properly separated. Doesn’t use correct symbols. Station: auto-creates hourly data files (level 0). initiates auto-upload to NRT server. automatic feedback automatic feedback User access (restricted) via web-interface: ebas.nilu.no Processing to level 1.5 Processing to level 1 EBAS database User access via machine-to-machine web-service Hourly level 1 data file Hourly level 1.5 data file

Future ACTRIS Surface In-Situ NRT Workflow Level 0 data Raw as provided by instrument Instrument specific Metadata attached Manual Processing: Manual assignment of flags Manual calibration correction. Automatic Processing: automatic assignment of flags No calibration correction Level 0a data Raw as provided by instrument Instrument specific Metadata attached Manual corr. / flagging applied Level 0b data Raw as provided by instrument Instrument specific Metadata attached Automatic flagging applied Reg NRT

Future ACTRIS Surface In-Situ NRT Workflow Cont’d Reg NRT Identical algorithm Raw Processing: Calculation of targeted property. Remove instrument parameters and instrument failure periods Raw Processing: Calculation of targeted property. Remove instrument parameters and instrument failure periods Level 1a data Final targeted property Only valid data Original time res. Manual corr. / flagging applied Level 1b data Final targeted property Only valid data Original time res. Automatic flagging applied Reg NRT

Future ACTRIS Surface In-Situ NRT Workflow Cont’d Reg NRT Identical algorithm Time averaging: Calculate hourly means. Disregard invalid data Add coverage flags. Copy env. cond. flags occuring in ave. period. Time averaging: Calculate hourly means. Disregard invalid data Add coverage flags. Copy env. cond. flags occuring in ave. period. Level 2 data Final targeted property Hourly averaged Coverage & env. cond. flags Manual corr. / flagging applied Level 1.5 data Final targeted property Hourly averaged Coverage & env. cond. flags Automatic flagging applied

How to Use Workflow to Meet Requirements Use Persistent Identifiers (PIDs) to tag data pre-products and processing software (each version). Use DOIs to identify final data products. Have versioned archives for both, data (pre-)products and software for data processing. If same processing is done in different locations, guarantee that result is identical to ensure homogeneity. Include provenance information in metadata, i.e. metadata states data pre- products and software version used for processing. This applies to ALL data and software used ANYWHERE in the infrastructure. When setting up / updating workflows and their diagrams: Distinguish between data(sub)products, processing steps, and decisions. Use correct diagram symbols. Remove ambiguities from workflows. Discussion: Distribute roles in infrastructure as to achieve these aims most efficiently.