LaTiS https://github.com/dlindhol/LaTiS Doug Lindholm Laboratory for Atmospheric and Space Physics University of Colorado Boulder ESIP – July 8, 2014.

Slides:



Advertisements
Similar presentations
1 NASA CEOP Status & Demo CEOS WGISS-25 Sanya, China February 27, 2008 Yonsook Enloe.
Advertisements

James Gallagher OPeNDAP 1/10/14
Database System Concepts and Architecture
Recent Work in Progress
A Unified Data Model and Programming Interface for Working with Scientific Data Doug Lindholm Laboratory for Atmospheric and Space Physics University of.
A Common Data Model In the Middle Tier Enabling Data Access in Workflows … HDF/HDF-EOS Workshop XIV September 29, 2010 Doug Lindholm Laboratory for Atmospheric.
Streaming NetCDF John Caron July What does NetCDF do for you? Data Storage: machine-, OS-, compiler-independent Standard API (Application Programming.
® OGC Web Services Initiative, Phase 9 (OWS-9): Innovations Thread - OPeNDAP James Gallagher and Nathan Potter, OPeNDAP © 2012 Open Geospatial Consortium.
7 +/- 2 Maybe Good Ideas John Caron June (1) NetCDF-Java (aka CDM) has lots of functionality, but only available in Java – NcML Aggregation – Access.
The Future of NetCDF Russ Rew UCAR Unidata Program Center Acknowledgments: John Caron, Ed Hartnett, NASA’s Earth Science Technology Office, National Science.
CS CS 5150 Software Engineering Lecture 13 System Architecture and Design 1.
RIZWAN REHMAN, CCS, DU. Advantages of ORDBMSs  The main advantages of extending the relational data model come from reuse and sharing.  Reuse comes.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
FHIRFarm – How to build a FHIR Server Farm (quickly)
Unidata TDS Workshop THREDDS Data Server Overview October 2014.
Introduction Downloading and sifting through large volumes of data stored in differing formats can be a time-consuming and sometimes frustrating process.
OPeNDAP and the Data Access Protocol (DAP) Original version by Dave Fulker.
GADS: A Web Service for accessing large environmental data sets Jon Blower, Keith Haines, Adit Santokhee Reading e-Science Centre University of Reading.
CS 160: Software Engineering October 8 Class Meeting
Unidata’s TDS Workshop TDS Overview – Part II October 2012.
OOI CyberInfrastructure: Technology Overview - Hyrax January 2009 Claudiu Farcas OOI CI Architecture & Design Team UCSD/Calit2.
© 2006 IBM Corporation IBM WebSphere Portlet Factory Architecture.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Unidata TDS Workshop TDS Overview – Part I XX-XX October 2014.
Web Services for Earth Science Data Edward Armstrong, Thomas Huang, Charles Thompson, Nga Quach, Richard Kim, Zhangfan Xing Winter ESIP 2014 Washington.
Coverages and the DAP2 Data Model James Gallagher.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Mid-Course Review: NetCDF in the Current Proposal Period Russ Rew
EARTH SCIENCE MARKUP LANGUAGE Why do you need it? How can it help you? INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
Accomplishments and Remaining Challenges: THREDDS Data Server and Common Data Model Ethan Davis Unidata Policy Committee Meeting May 2011.
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
Integrated Grid workflow for mesoscale weather modeling and visualization Zhizhin, M., A. Polyakov, D. Medvedev, A. Poyda, S. Berezin Space Research Institute.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
DAP4 James Gallagher & Ethan Davis OPeNDAP and Unidata.
Unidata TDS Workshop THREDDS Data Server Overview
Easily Serving and Accessing HDF-EOS2 Datasets Using DODS Technologies Richard Chinman, UCAR-IITA, DODS Project Manager
Recent developments with the THREDDS Data Server (TDS) and related Tools: covering TDS, NCML, WCS, forecast aggregation and not including stuff covered.
PaPCo, Das2, and Autoplot Jeremy Faden, University of Iowa.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
Unidata’s Common Data Model and the THREDDS Data Server John Caron Unidata/UCAR, Boulder CO Jan 6, 2006 ESIP Winter 2006.
Semantic Technologies and Application to Climate Data M. Benno Blumenthal IRI/Columbia University CDW /04-01.
Unidata’s TDS Workshop TDS Overview – Part I July 2011.
A radiologist analyzes an X-ray image, and writes his observations on papers  Image Tagging improves the quality, consistency.  Usefulness of the data.
A Data Access Framework for ESMF Model Outputs Roland Schweitzer Steve Hankin Jonathan Callahan Kevin O’Brien Ansley Manke.
OPeNDAP Hyrax Harnessing the power of the BES OPeNDAP Hyrax Back-End Server Patrick West
12 Oct 2003VO Tutorial, ADASS Strasbourg, Data Access Layer (DAL) Tutorial Doug Tody, National Radio Astronomy Observatory T HE US N ATIONAL V IRTUAL.
DSpace System Architecture 11 July 2002 DSpace System Architecture.
Data Interoperability at the IRI: translating between data cultures Benno Blumenthal International Research Institute for Climate Prediction Columbia University.
Weathertop Consulting, LLC Server-side OPeNDAP Analysis – Concrete steps toward a generalized framework via a reference implementation using F-TDS Roland.
LAS and THREDDS: Partners for Education Roland Schweitzer Steve Hankin Jonathan Callahan Joe Mclean Kevin O’Brien Ansley Manke Yonghua Wei.
What is Firefly (1) A web UI framework for web applications
OPeNDAP’s Server4: Building a High Performance Data Server for the DAP Using Existing Software James Gallagher*, Nathan Potter*, Patrick West**, Jose Garcia**
Distributed Data Servers and Web Interface in the Climate Data Portal Willa H. Zhu Joint Institute for the Study of Ocean and Atmosphere University of.
ESIP Air Quality Jan Air Quality Cluster Air Quality Cluster Technology Track Earth Science Information Partners Partners NASA NOAA EPA (?) USGS.
OGC Web Services with complex data Stephen Pascoe How OGC Web Services relate to GML Application Schema.
1 2.5 DISTRIBUTED DATA INTEGRATION WTF-CEOP (WGISS Test Facility for CEOP) May 2007 Yonsook Enloe (NASA/SGT) Chris Lynnes (NASA)
9/21/04 James Gallagher Server-Side: The Basics This part of the workshop contains an overview of the two servers which OPeNDAP has developed. One uses.
Update on Unidata Technologies for Data Access Russ Rew
TSDS (HPDE DAP). Objectives (1) develop a standard API for time series-like data, (2) develop a software package, TSDS (Time Series Data Server), that.
Unidata Infrastructure for Data Services Russ Rew GO-ESSP Workshop, LLNL
Data Browsing/Mining/Metadata
Tom Rink Tom Whittaker Paolo Antonelli Kevin Baggett.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
The Re3gistry software and the INSPIRE Registry
Access HDF5 Datasets via OPeNDAP’s Data Access Protocol (DAP)
Remote Data Access Update
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Future Development Plans
OPeNDAP’s Server4: Building a High Performance Data Server for the DAP
OPeNDAP/Hyrax Interfaces
Presentation transcript:

LaTiS Doug Lindholm Laboratory for Atmospheric and Space Physics University of Colorado Boulder ESIP – July 8, 2014

Motivation - Get Data Into Analysis Code/Tools Disparate Data Unified Interface

LaTiS Server Architecture Native Data Descriptors Adapters Filters Writers Client Applications LaTiS Data Model TSML ASCII Binary JDBC FITS Web Service Custom Subset Constrain (sst > 20) JSON Convert Units DAP2 Image code snippet Missing Values Derived Products Custom CSV Web Browser Excel Analysis Tools Program s Web Service

LaTiS Client Options Any OPeNDAP client. Available for most programming languages (python, IDL, Matlab,...). Analysis/visualization tools with built in OPeNDAP support. Web browser: Directly enter http URL query. wget, curl: command line tools for making an HTTP request. Custom Web Applications (Open Source coming soon) that make AJAX requests to LaTiS to get JSON output and make interactive plots. Custom programming APIs that wrap a LaTiS call.

Related Technology Comparisons OPeNDAP –Both implement DAP2 protocol (standard service API) –OPeNDAP servers tend to be file centric –LaTiS presents “virtual” dataset via aggregation –LaTiS aims to be easier to install, configure, and extend NetCDF Common Data Model (CDM) –Multidimensional array centric –Coupled to NetCDF file format –Climate and forecast model (simulation) emphasis THREDDS Data Server –Built around NetCDF CDM –Provides OPeNDAP and other service interfaces TSDS –First generation of LaTiS built on NetCDF CDM VisAD –Essentially the same logical data model as LaTiS with a clunkier implementation based on old Java capabilities –LaTiS is implemented around modern paradigms like Functional Programming

What do I mean by Data Model NOT a simulation or forecast (climate model) NOT a metadata model (ISO 19115) NOT a file format (NetCDF) NOT how the data are stored (RDBMS) NOT the representation in computer memory (data structure) Logical model What the data represent, conceptually How the data are used

Data Abstractions bits bytes00105e0 e6b0 343b 9c e7bc 0804 e7d , , 13.52, , 1.02e-14 int, long, float, double, scientific notation (Number) array

Scientific Data Abstractions Multi-dimensional Arrays Key Features: - Single data type - Access by index

Relational Data Relational Database Table = Relation Row = Tuple of Attributes e.g. (0, 3.5, B) Key Features: - Supports different data types - Well suited for access by value e.g. time>2, class=A But the relation is limited to a sequence of tuples: timefluxclas s 03.5B 14.6A 24.7A 34.1A 43.2B

LaTiS Unified Data Model Extends the Relational Model to add Functional relationships. Represents multi-dimensional domain of data grids. Access by value or index. Example: time series of gridded surface winds Time -> ((Lon, Lat) -> (U,V)) Independent Variable (domain) Dependent Variables (range) Independent Variable

LaTiS Data Model Only Three Variable Types: Scalar: single Variable Tuple: group of Variables Function: mapping from one Variable to another Extend to capture higher level, domain specific abstractions

Discipline Agnostic Data Access with LaTiS Philosophy: Leave data in their native form Expose via a common interface Software: Reusable adapters (software modules) to read common formats, extension points for custom formats XML dataset descriptors, map native data model to the LaTiS data model Open Source, community Web services: Standard service interfaces, currently OPeNDAP Server side processing and output format options

Implementing the Data Model The LaTiS Data Model is an abstract representation Can be represented several ways –UML –VisAD grammar –Java Interface (no implementation) Need an implementation in code Scientific data Domain Specific Language (DSL) –Expose an API that fits the application domain Scala programming language –

Why Scala Evolution of Java –Use with existing Java code –Runs on the Java Virtual Machine (JVM) –Command line (REPL), script, or compiled –Statically typed (safer than dynamic languages) –Industrial strength (Twitter, LinkedIn, …) Object-Oriented –Encapsulation, polymorphism, … –Traits: interfaces with implementation, multiple inheritance, mix-ins Functional Programming –Immutable data structures –Functions with no side effects –Provable, parallelizable Syntactic sugar for Domain Specific Languages Operator “overloading”, natural math language for Variables Parallel collections

Scala Implementation Dataset as a Scala collection Functional Programming Paradigms: –Function composition over object manipulation –Functions as first class citizens a LaTiS Function can be used like a programming function –Immutable data structures –No side-effects: parallelizable, provable –Lazy evaluation: scalable Math and resampling mixed in –e.g. dataset3 = (dataset1 + dataset2) / 2 Metadata encapsulated –enforce data consistency: unit conversions... –track provenance

LaTiS Server Implementation RESTful web service API (OPeNDAP +) Java Servlet, build and deploy war file XML dataset descriptor (TSML) for each dataset –Specify Adapter to use –Map native data source to LaTiS data model –Define transformations as Processing Instructions Catalog to map dataset names to TSML Plugins: implement the Adapter, Filter or Writer interfaces or extend existing ones Properties file to map filter and writer names to implementing classes

Example – Serving an ASCII File Sunspot data for October TSML Dataset descriptor <dataset name="Sunspot_Number" history="Read by LaTiS"> <adapter class="latis.reader.tsml.AsciiAdapter" url="file:/data/latis/ssn.txt" />

Example – Serving an ASCII File

Current Applications LASP Interactive Solar Irradiance Data Center (LISIRD) –Uses LaTiS to read, subset, reformat data, metadata – Time Series Data Server (TSDS) –Common RESTful interface to NASA Heliophysics data – Other LASP projects: MMS, MAVEN, database statistics, log files External users?

Capabilities – Data Reader Modules Operational: –ASCII (file, web service, system call), binary, NetCDF, Relational database, data “generators” –Time Series of scalars, vectors, and spectra –Arbitrarily long time series Prototyped: –HDF, CDF, FITS, GRIB, OPeNDAP (e.g. other LaTiS servers), NoSQL (MongoDB) –Nested 2D (gridded) data structures Planned: –Arbitrarily complex data structures

Capabilities – Data Writer Modules Operational: –OPeNDAP, ASCII (e.g. csv), binary, JSON, Image (PNG), IDL code, HTML dataset landing page Prototyped: –NetCDF, HDF, IDL save file, interactive plot Planned: –GeoTIFF, …

Capabilities – Data Filter Modules Operational: –Subset, aggregate, stride, thin, replace, integrate, bin average Prototyped: –FFT, min, max, unique, resampling, unit conversion Planned: –Coordinate system transformations –Make it easier to plug in custom computations –Track provenance

Capabilities – Service Interface Operational: –OPeNDAP –Java Servlet, simply deploy war file (Tomcat, Glassfish) Prototyped: –Authentication –Single executable (jetty) –THREDDS Data Server (TDS) integration Planned: –Open Geospatial Consortium (OGC) standards Web Map Server Web Coverage Server

Capabilities - Metadata Operational: –THREDDS catalog, static XML, browse Prototyped: –Semantic Web triple store (RDF, SPARQL) –Text search (Solr) –Modeling RDF triples (subject, predicate, object) –Track provenance, record Dataset modifications Planned: –Serve metadata in various schema (e.g. ISO 19115, SPASE) –Unique IDs, Digital Object Identifiers (DOI) for publishing

Other Capabilities Operational: –Time API with formatting –Time conversions with leap seconds Prototyped: –Caching, improve performance –Parallel processing, multi-core Planned: –Big Data, Hadoop, Map Reduce –Workflow integration

Source Code Management – Open Source Time Series Server (a.k.a. TSS1) –Core of Time Series Data Server (TSDS, tsds.net) –Built around Unidata Common Data Model –SourceForge: LaTiS (a.k.a. TSS2) –New LaTiS data model, scala implementation –GitHub: –LASP internal development branch –Plug-ins as separate projects (e.g. data collections, math, custom readers/writers,…), keep core small

My Background (i.e. bias) Astrophysicist by degree, software engineer by profession Data user and provider Scientific data applications developer: –astrophysics, atmospheric science, space science Holy Grail: common data model Favorite scientific data models: –VisAD ( –Unidata Common Data Model ( –OPeNDAP (

Motivation – Stove Pipes

Single Data Access Interface