A Unified Data Model and Programming Interface for Working with Scientific Data Doug Lindholm Laboratory for Atmospheric and Space Physics University of.

Slides:



Advertisements
Similar presentations
Introduction to .NET Framework
Advertisements

Database System Concepts and Architecture
Recent Work in Progress
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 12Slide 1 Software Design l Objectives To explain how a software design may be represented.
LaTiS Doug Lindholm Laboratory for Atmospheric and Space Physics University of Colorado Boulder ESIP – July 8, 2014.
Lab Information Security Using Java (Review) Lab#0 Omaima Al-Matrafi.
A Common Data Model In the Middle Tier Enabling Data Access in Workflows … HDF/HDF-EOS Workshop XIV September 29, 2010 Doug Lindholm Laboratory for Atmospheric.
Streaming NetCDF John Caron July What does NetCDF do for you? Data Storage: machine-, OS-, compiler-independent Standard API (Application Programming.
® OGC Web Services Initiative, Phase 9 (OWS-9): Innovations Thread - OPeNDAP James Gallagher and Nathan Potter, OPeNDAP © 2012 Open Geospatial Consortium.
OBJECT ORIENTED PROGRAMMING M Taimoor Khan
Reading NetCDF Files in Matlab and analyzing the data.
The Future of NetCDF Russ Rew UCAR Unidata Program Center Acknowledgments: John Caron, Ed Hartnett, NASA’s Earth Science Technology Office, National Science.
Introduction To System Analysis and Design
George Blank University Lecturer. CS 602 Java and the Web Object Oriented Software Development Using Java Chapter 4.
7M701 1 Software Engineering Object-oriented Design Sommerville, Ian (2001) Software Engineering, 6 th edition: Chapter 12 )
Satzinger, Jackson, and Burd Object-Orieneted Analysis & Design
Programming Languages Structure
Software Engineering Module 1 -Components Teaching unit 3 – Advanced development Ernesto Damiani Free University of Bozen - Bolzano Lesson 2 – Components.
Object Oriented Databases - Overview
RIZWAN REHMAN, CCS, DU. Advantages of ORDBMSs  The main advantages of extending the relational data model come from reuse and sharing.  Reuse comes.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
CIT241 Prerequisite Knowledge ◦ Variables ◦ Operators ◦ C++ Syntax ◦ Program Structure ◦ Classes  Basic Structure of a class  Concept of Data Hiding.
Programming Languages and Paradigms Object-Oriented Programming.
OPeNDAP and the Data Access Protocol (DAP) Original version by Dave Fulker.
The Old World Meets the New: Utilizing Java Technology to Revitalize and Enhance NASA Scientific Legacy Code Michael D. Elder Furman University Hayden.
CIT UPES | Sept 2013 | Unified Modeling Language - UML.
Introduction to Programming Languages. Problem Solving in Programming.
Unidata’s TDS Workshop TDS Overview – Part II October 2012.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Unidata TDS Workshop TDS Overview – Part I XX-XX October 2014.
Fundamentals of Database Chapter 7 Database Technologies.
Unidata’s Common Data Model John Caron Unidata/UCAR Nov 2006.
Coverages and the DAP2 Data Model James Gallagher.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Introduction To System Analysis and Design
Ch 1. A Python Q&A Session Spring Why do people use Python? Software quality Developer productivity Program portability Support libraries Component.
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
Integrated Grid workflow for mesoscale weather modeling and visualization Zhizhin, M., A. Polyakov, D. Medvedev, A. Poyda, S. Berezin Space Research Institute.
Relation Type Theory Foundations & Applications. Overview Introduction Type Theory Object Type Theory – Critique of Object Type Theory – Patches for Object.
Introducing Allors Applications, Tools & Platform.
CS 501: Software Engineering Fall 1999 Lecture 12 System Architecture III Distributed Objects.
The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.
12 Oct 2003VO Tutorial, ADASS Strasbourg, Data Access Layer (DAL) Tutorial Doug Tody, National Radio Astronomy Observatory T HE US N ATIONAL V IRTUAL.
February 8, 2006copyright Thomas Pole , all rights reserved 1 Lecture 3: Reusable Software Packaging: Source Code and Text Chapter 2: Dealing.
Data Interoperability at the IRI: translating between data cultures Benno Blumenthal International Research Institute for Climate Prediction Columbia University.
Modern Programming Language. Web Container & Web Applications Web applications are server side applications The most essential requirement.
Chapter 18 Object Database Management Systems. Outline Motivation for object database management Object-oriented principles Architectures for object database.
A WEB-ENABLED APPROACH FOR GENERATING DATA PROCESSORS University of Nevada Reno Department of Computer Science & Engineering Jigar Patel Sergiu M. Dascalu.
Distributed Data Servers and Web Interface in the Climate Data Portal Willa H. Zhu Joint Institute for the Study of Ocean and Atmosphere University of.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Climate-SDM (1) Climate analysis use case –Described by: Marcia Branstetter Use case description –Data obtained from ESG –Using a sequence steps in analysis,
Sadegh Aliakbary Sharif University of Technology Fall 2010.
TSDS (HPDE DAP). Objectives (1) develop a standard API for time series-like data, (2) develop a software package, TSDS (Time Series Data Server), that.
Unidata Infrastructure for Data Services Russ Rew GO-ESSP Workshop, LLNL
JAVA TRAINING IN NOIDA. JAVA Java is a general-purpose computer programming language that is concurrent, class-based, object-oriented and specifically.
Your Interactive Guide to the Digital World Discovering Computers 2012 Chapter 13 Computer Programs and Programming Languages.
Sung-Dong Kim, Dept. of Computer Engineering, Hansung University Java - Introduction.
IS 350 Course Introduction. Slide 2 Objectives Identify the steps performed in the software development life cycle Describe selected tools used to design.
JAVA MULTIPLE CHOICE QUESTION.
WEB SERVICES.
Microsoft .NET 3. Language Innovations Pan Wuming 2017.
A Web-enabled Approach for generating data processors
Object-Orientated Programming
Tom Rink Tom Whittaker Paolo Antonelli Kevin Baggett.
Spark and Scala.
The Challenge of Cross - Language Interoperability
ExPLORE Complex Oceanographic Data
OPeNDAP’s Server4: Building a High Performance Data Server for the DAP
OPeNDAP/Hyrax Interfaces
Presentation transcript:

A Unified Data Model and Programming Interface for Working with Scientific Data Doug Lindholm Laboratory for Atmospheric and Space Physics University of Colorado Boulder UCAR SEA Conference – Feb 2012

Outline Higher Level Abstractions What is a Data Model LaTiS Data Model Higher Level Programming –Functional Programming –Scala Programming Language Data Model Implementation Examples Broader Impacts –Data Interoperability

Scientific Data Abstractions bits bytes00105e0 e6b0 343b 9c e7bc 0804 e7d , , 13.52, , 1.02e-14 int, long, float, double, scientific notation (Number) array

Functional Relationships Independent Variable (domain) Dependent Variables (range) Independent Variable Time Series: Instead of pressure[time] time → pressure Gridded Time Series: Instead of flux[time][lon][lat] time → ((lon, lat) → flux)

What do I mean by Data Model NOT a simulation or forecast (climate model) NOT a metadata model (ISO 19115) NOT a file format (NetCDF) NOT how the data are stored (RDBMS) NOT the representation in computer memory (data structure) Logical model What the data represent, conceptually How the data are USED

LaTiS Data Model Scalar time series Time -> Pressure Time series of grids T: Time Lon: Longitude Lat: Latitude P: Pressure dP: Uncertainty T -> ((Lon,Lat) -> (P, dP)) Three core Variable types Scalar (single Variable) Tuple (group of Variables) Function (domain mapped to range) Represents the functional relationship of the scientific data

Implementing the Data Model The LaTiS Data Model is an abstract representation Can be represented several ways –UML –VisAD grammar –Java Interface (no implementation) Need an implementation in code Scientific data Domain Specific Language (DSL) –Expose an API that fits the application domain Scala programming language –

Why Scala Evolution of Java –Use with existing Java code –Runs on the Java Virtual Machine (JVM) –Command line (REPL), script, or compiled –Statically typed (safer than dynamic languages) –Industrial strength (Twitter, LinkedIn, …) Object-Oriented –Encapsulation, polymorphism, … –Traits: interfaces with implementation, multiple inheritance, mix-ins Functional Programming –Immutable data structures –Functions with no side effects –Provable, parallelizable Syntactic sugar for Domain Specific Languages Operator “overloading”, natural math language for Variables Parallel collections

The Scala Data Model Implementation Three base Variable classes: Scalar, Tuple, Function Extend Scala collections, inherit many operations (e.g. Function extends SortedMap) Arbitrarily complex data structures by nesting Encapsulate metadata: units, provenance,… Mix-in math, resampling strategies “Overload” math operators, natural math processing Extend base Variables for specific application domain (e.g. TimeSeries extends Function), reuse basic math or mix-in domain specific operations without polluting the core API

Examples

Data Interoperability Philosophy: Leave data in their native form, expose via a common interface Reusable adapters (software modules) for common formats, extension points for custom formats XML dataset descriptors, map native data model to the LaTiS data model Catalog to map dataset names to the descriptors

Applications LaTiS server framework –Built around unified data model –XML dataset descriptors + data access adapters –Writer modules to support various output formats –Filter plug-ins to do server side processing –RESTful web service API, OPeNDAP interface, subsetting – LASP Interactive Solar Irradiance Data Center (LISIRD) –Uses LaTiS to read, subset, reformat data – Time Series Data Server (TSDS) –Common RESTful interface to NASA Heliophysics data –

Extra slides

Data Fusion Problem Solar irradiance at nm from multiple observations and proxies Disparate sources and formats Different units and time samples netCDF database

Processing the Data // Read the spectral time series data for each mission. // t -> (w -> (I, dI)) val sorce = reader.readData("sorce", time1, time2) val timed = reader.readData("timed", time1, time2) // Make a time series with the Lyman alpha (121.5 nm) measurements only. var sorce_lya = new TimeSeries() // t -> (I, dI) for ((time, spectrum) <- sorce) { sorce_lya = sorce_lya :+ (time, spectrum(121.5)) } // Exclude time samples with bad values. sorce_lya = sorce_lya.filter(! _.isMissing) // Do the same for timed_lya.... // Combine the two time series, with scale factors. // Let SORCE take precedence. composite_lya = timed_lya * sorce_lya * 1.04