Flexible tools for integrating observations and models Johan De Keyser Emmanuel Gamby Belgian Institute for Space Aeronomy.

Slides:



Advertisements
Similar presentations
Remote Visualisation System (RVS) By: Anil Chandra.
Advertisements

Configuration management
Dr Gordon Russell, Napier University Unit Data Dictionary 1 Data Dictionary Unit 5.3.
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
System Design and Analysis
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.
Data Processing A simple model and current UKDA practice Alasdair Crockett, Data Standards Manager, UKDA.
Interpret Application Specifications
Russell Taylor Lecturer in Computing & Business Studies.
SIMULATION. Simulation Definition of Simulation Simulation Methodology Proposing a New Experiment Considerations When Using Computer Models Types of Simulations.
Network File System (NFS) in AIX System COSC513 Operation Systems Instructor: Prof. Anvari Yuan Ma SID:
MSF Testing Introduction Functional Testing Performance Testing.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space Cracow Grid Workshop’10 Kraków, October 11-13,
Introduction to Systems Analysis and Design Trisha Cummings.
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
TESTING STRATEGY Requires a focus because there are many possible test areas and different types of testing available for each one of those areas. Because.
SCRAM Software Configuration, Release And Management Background SCRAM has been developed to enable large, geographically dispersed and autonomous groups.
Systems Analysis – Analyzing Requirements.  Analyzing requirement stage identifies user information needs and new systems requirements  IS dev team.
Time Table exchange QSAS / CL / CAA / AMDA CESR, 25/26 feb
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
DCS Overview MCS/DCS Technical Interchange Meeting August, 2000.
Objectives Understand what MATLAB is and why it is widely used in engineering and science Start the MATLAB program and solve simple problems in the command.
SPACE TELESCOPE SCIENCE INSTITUTE Operated for NASA by AURA COS Pipeline Language(s) We plan to develop CALCOS using Python and C Another programming language?
At A Glance VOLT is a freeware, platform independent tool set that coordinates cross-mission observation planning and scheduling among one or more space.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
Development of ORBIT Data Generation and Exploration Routines G. Shelburne K. Indireshkumar E. Feibush.
CHAPTER TEN AUTHORING.
Just as there are many human languages, there are many computer programming languages that can be used to develop software. Some are named after people,
Invitation to Computer Science 5 th Edition Chapter 6 An Introduction to System Software and Virtual Machine s.
Chapter 14 Part II: Architectural Adaptation BY: AARON MCKAY.
Planetary Science Archive PSA User Group Meeting #1 PSA UG #1  July 2 - 3, 2013  ESAC PSA Archiving Standards.
Problem Statement: Users can get too busy at work or at home to check the current weather condition for sever weather. Many of the free weather software.
Usability Issues Facing 21st Century Data Archives Joey Mukherjee and David Winningham
The european ITM Task Force data structure F. Imbeaux.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Introduction Advantages/ disadvantages Code examples Speed Summary Running on the AOD Analysis Platforms 1/11/2007 Andrew Mehta.
Term 2, 2011 Week 1. CONTENTS Problem-solving methodology Programming and scripting languages – Programming languages Programming languages – Scripting.
_______________________________________________________________CMAQ Libraries and Utilities ___________________________________________________Community.
United Nations Economic Commission for Europe Statistical Division The Importance of Databases in the Dissemination Process Steven Vale, UNECE.
Capabilities of Software. Object Linking & Embedding (OLE) OLE allows information to be shared between different programs For example, a spreadsheet created.
240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.
Distributed System Concepts and Architectures 2.3 Services Fall 2011 Student: Fan Bai
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. 1.
Sciamachy features and usage with respect to end-users The typical fate of retrieval people dealing with large datasets… C. Frankenberg, SRON team, IUP.
March 2004 At A Glance autoProducts is an automated flight dynamics product generation system. It provides a mission flight operations team with the capability.
McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All rights reserved.
06-1L ASTRO-E2 ASTRO-E2 User Group - 14 February, 2005 Astro-E2 Archive Lorella Angelini/HEASARC.
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
HDF EOS Workshop David Han Code
TSS Database Inventory. CIRA has… Received and imported the 2002 and 2018 modeling data Decided to initially store only IMPROVE site-specific data Decided.
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
G.Govi CERN/IT-DB 1 September 26, 2003 POOL Integration, Testing and Release Procedure Integration  Packages structure  External dependencies  Configuration.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Lecture VIII: Software Architecture
Lecture On Introduction (DBMS) By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Lecture #1: Introduction to Algorithms and Problem Solving Dr. Hmood Al-Dossari King Saud University Department of Computer Science 6 February 2012.
+ UVIS Data Visualization UVIS Team Meeting Braunschweig, Deutschland June 18, 2012.
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
Role of Metadata in dissemination of census data Regional Seminar on dissemination and spatial analysis of census data, Nairobi, September, 2010.
NA61/NA49 virtualisation:
Progress on NA61/NA49 software virtualisation Dag Toppe Larsen Wrocław
16th CAA Cross-calibration Workshop
Distributed web based systems
Created by Kamila zhakupova
Introduction to Systems Analysis and Design
CSSSPEC6 SOFTWARE DEVELOPMENT WITH QUALITY ASSURANCE
Palestinian Central Bureau of Statistics
Presentation transcript:

Flexible tools for integrating observations and models Johan De Keyser Emmanuel Gamby Belgian Institute for Space Aeronomy

November 2007ESWW Objective There exist several packages for processing and visualizing space-related data. Some are meant to be general (e.g. QSAS, MIM, NSSDC), and some are specific (e.g. Cluster Science Data System, Cluster Active Archive, Themis Data Analysis System). The goal is to infer some general conclusions about the needs of such software infrastructure, and to offer useful recommendations for data and modeling services in the space science and space weather arena.

November 2007ESWW Space weather clients and servers The commercial model: The scientist’s model: –user, service provider, and data provider coincide service provider data provider user service provider data provider

November 2007ESWW Business model User: looking at various spacecraft, ground-based or model data provided by colleagues or external data sources Service provider: offering his know-how in the form of models to colleagues or as model output to end users Data provider: offering processed data to colleagues and end users visualization and processing external repository instrument data model data local repository instrument data model data science input empirical models physical models interpretation

November 2007ESWW Sharing scientific know-how Scientists are end users as they derive knowledge, by bringing together different kinds of information. –What are the observations? Observational data –What is the interpretation? Model data –How do you bring it together? Algorithms This is all brought together by the data processing and visualization tool. Scientists turn into service providers –by making observational data available, –by offering their model data, or –by publishing their algorithms.

November 2007ESWW Examples Scientists develop algorithms to bring together data from different sources. The algorithm proposes a model, possibly parameterized, to compute model output. Example 1 : Gradients –Gradients are computed from measurements on the 4 Cluster spacecraft. –The model assumes locally constant gradients. –Model input parameters: e.g. estimate of the distance over which the gradients can safely be considered constant –Model output: the computed gradient vector with its error margins.

November 2007ESWW 20077

November 2007ESWW Example 2 : Modeling of cometary comae –Computing chemical composition in a cometary coma. –The model assumes thermodynamic equilibrium and computes how the composition evolves due to chemistry. –Model input: chemical reaction constants, neutral gas production rates, numerical parameters. –Model output: particle abundances throughout the coma

November 2007ESWW 20079

November 2007ESWW Sharing data Access type –Manual: Interactively look up and download data through a human-oriented graphic interface (e.g. web browser to CSDS, CAA, NSSDC) –Automatic: Automated machine-based data access procedure. Definition of “channels”: generic specification of where and how to find spacecraft data for a given time (TDAS, MIM) Physical access is always based on some protocol –NFS access: for a local repository –FTP access: NSSDC, Themis repository –Web access: Cluster Active Archive –Access restrictions require the use of login/password

November 2007ESWW Automated access downloads data to a local repository or cache. –Cache management based on reserved cache size and minimum guaranteed lifetime of files. File removal exploits the time of last usage. Automated access can lead to significant wait times –E.g. access to a 20 Mb data set over a 0.1 Mb/s connection takes several minutes; cache hits are therefore important. –A high cache hit rate can be achieved as scientists often work for a prolonged time with a limited set of events (if the cache is big enough to hold that set). –Caching is of not much help when scanning the whole archive, e.g. for statistical studies. –Access may be done as a background activity.

November 2007ESWW There is a plethora of available data formats –Archived data may be structured in ways that reflect their origin: time series of scalar or vector values, of particle distribution functions, or of wave spectra; multi- dimensional spatial fields; images … data might be grouped in a particular way, e.g. particle distribution moments are usually provided together on a common timescale –Archived data may be stored in a common file exchange format, such as ASCII, CDF, or HDF files. NSSDC offers data in these formats; ASCII only for low time resolution data. –Archived data might be compressed. NSSDC compresses ASCII data files.

November 2007ESWW Data fed into a visualization/processing tool need a specific format to load quickly. –MIM expresses time in Julian Days, enforces SI units. Therefore there is a need to convert various archive formats into the desired input format. –MIM uses a generic data format description to steer a data translator. This process maintains/provides metadata. –QSAS uses the QTRAN data format translator The formatted data volume is usually bigger than that of the archived data. It is the formatted data that are stored in the local cache, while the archive data from which they are derived have transient downloaded copies.

November 2007ESWW Recommendations Even if you offer a sophisticated web protocol with graphic data selection and preview possibilities, make the data accessible via FTP-server: is the easiest solution for automated access. Make data available in ASCII table form, or a compressed version of it, or in CDF or HDF. –Do not invent a new ad hoc format, such as CEF (Cluster Exchange Format) Data should always be accompanied with error estimates, both in terms of systematic and random errors. Offer adequate metadata. Provide documentation.

November 2007ESWW Sharing model data Sharing model output is similar to sharing observations (calibrated observations are the output from an instrument model anyway). It is essential to specify the systematic and random errors on the model output. –Example: Gradient computation from 4 non-coplanar data points, as often done with Cluster, cannot provide an estimate of the total error on the gradient: Specified error margins usually refer only to the effect of measurement errors – such limitations should be clearly stated when publishing model output.

November 2007ESWW Sharing model parameters may warrant even more attention since the meaning of the parameters might be less obvious. –Example: Modeling the chemistry in cometary comae is a complicated thing. Among the input parameters is a database containing a compilation of relevant reactions and temperature-dependent reaction rates, including uncertainty. Sharing model parameters is essential for comparing –model output obtained with different sets of model parameters; –model output obtained from different models, in order to be sure that the same input is used.

November 2007ESWW Recommendations Try to parameterize your models as much as possible. Do not hardcode model parameters. Offer the model parameter sets and the model results in a readable form; ASCII will often be preferred for the model parameters. Provide clear documentation about the model input parameters. Model output should be treated in the same way as observational data.

November 2007ESWW Sharing algorithms Sharing algorithms is still in its infancy. –There is no standard interface, depends on the software environment you want to incorporate it into; –issue of programming language and portability; –provide documentation. Preference for high-level languages –Matlab, IDL routines: offer features to assist defining and documenting the interface, automatically ensuring portability over a range of platforms –C++ library: also a portable format

November 2007ESWW Sharing algorithms can be avoided if the algorithm is run on demand as a web service. Advantages –No portability issues –Version control is easy –Secrecy to safeguard commercial interests Disadvantages –The data have to be imported and the results have to be exported over the web: slow –The server must be powerful enough to run the service for all clients –The algorithm is not open for critical review; no improvements/extensions from other parties.

November 2007ESWW Provide interactive on-line documentation for your algorithms, e.g. through a hypertext-based documentation system.

November 2007ESWW Recommendations: Algorithms Publish your algorithms, have them reviewed by as many people as possible. Describe algorithms in a high-level language, in terms of a number of simpler primitive operations, to enhance implementation on different platforms. Carefully compare different algorithms to establish correctness, efficiency, and error propagation properties. Provide detailed documentation as well as test examples.

November 2007ESWW Conclusions There is a need for general-purpose packages for processing and visualizing space-related data since data interpretation is a multi-instrument and multi- spacecraft activity, so mission-specific packages are too limited (though they can be useful for mission- specific archiving). Portability across a variety of platforms is desirable. Such packages should be well-documented, easily installed, and have an intuitive graphical user interface. Computational efficiency is a must since data volumes become increasingly larger.

November 2007ESWW Such a package should support –manual and automated data access; –conversion of various formats; –simultaneous processing of data from various sources, always including error estimates; –commanding from an interactive graphical user interface as well as running batch jobs, i.e. it must implement some scripting language; –documentation of observational data and model output data sets, including access to meta-data; –interactive definition, manipulation, and documentation of model input parameter sets; –implementation and documentation of new algorithms.