Download presentation
Presentation is loading. Please wait.
Published byJefferson Furlong Modified over 9 years ago
1
LaTiS https://github.com/dlindhol/LaTiS Doug Lindholm Laboratory for Atmospheric and Space Physics University of Colorado Boulder ESIP – July 8, 2014
2
Motivation - Get Data Into Analysis Code/Tools Disparate Data Unified Interface
3
LaTiS Server Architecture Native Data Descriptors Adapters Filters Writers Client Applications LaTiS Data Model TSML ASCII Binary JDBC FITS Web Service Custom Subset Constrain (sst > 20) JSON Convert Units DAP2 Image code snippet Missing Values Derived Products Custom CSV Web Browser Excel Analysis Tools Program s Web Service
4
LaTiS Client Options Any OPeNDAP client. Available for most programming languages (python, IDL, Matlab,...). Analysis/visualization tools with built in OPeNDAP support. Web browser: Directly enter http URL query. wget, curl: command line tools for making an HTTP request. Custom Web Applications (Open Source coming soon) that make AJAX requests to LaTiS to get JSON output and make interactive plots. Custom programming APIs that wrap a LaTiS call.
5
Related Technology Comparisons OPeNDAP –Both implement DAP2 protocol (standard service API) –OPeNDAP servers tend to be file centric –LaTiS presents “virtual” dataset via aggregation –LaTiS aims to be easier to install, configure, and extend NetCDF Common Data Model (CDM) –Multidimensional array centric –Coupled to NetCDF file format –Climate and forecast model (simulation) emphasis THREDDS Data Server –Built around NetCDF CDM –Provides OPeNDAP and other service interfaces TSDS –First generation of LaTiS built on NetCDF CDM VisAD –Essentially the same logical data model as LaTiS with a clunkier implementation based on old Java capabilities –LaTiS is implemented around modern paradigms like Functional Programming
6
What do I mean by Data Model NOT a simulation or forecast (climate model) NOT a metadata model (ISO 19115) NOT a file format (NetCDF) NOT how the data are stored (RDBMS) NOT the representation in computer memory (data structure) Logical model What the data represent, conceptually How the data are used
7
Data Abstractions 10110101000001001111001100110011111110bits bytes00105e0 e6b0 343b 9c74 0804 e7bc 0804 e7d5 0804 1, -506376193, 13.52, 0.177483826523, 1.02e-14 int, long, float, double, scientific notation (Number) 1.23.62.41.7 - 3.2 array
8
Scientific Data Abstractions Multi-dimensional Arrays Key Features: - Single data type - Access by index
9
Relational Data Relational Database Table = Relation Row = Tuple of Attributes e.g. (0, 3.5, B) Key Features: - Supports different data types - Well suited for access by value e.g. time>2, class=A But the relation is limited to a sequence of tuples: timefluxclas s 03.5B 14.6A 24.7A 34.1A 43.2B
10
LaTiS Unified Data Model Extends the Relational Model to add Functional relationships. Represents multi-dimensional domain of data grids. Access by value or index. Example: time series of gridded surface winds Time -> ((Lon, Lat) -> (U,V)) Independent Variable (domain) Dependent Variables (range) Independent Variable
11
LaTiS Data Model Only Three Variable Types: Scalar: single Variable Tuple: group of Variables Function: mapping from one Variable to another Extend to capture higher level, domain specific abstractions
12
Discipline Agnostic Data Access with LaTiS Philosophy: Leave data in their native form Expose via a common interface Software: Reusable adapters (software modules) to read common formats, extension points for custom formats XML dataset descriptors, map native data model to the LaTiS data model Open Source, community Web services: Standard service interfaces, currently OPeNDAP Server side processing and output format options
13
Implementing the Data Model The LaTiS Data Model is an abstract representation Can be represented several ways –UML –VisAD grammar –Java Interface (no implementation) Need an implementation in code Scientific data Domain Specific Language (DSL) –Expose an API that fits the application domain Scala programming language –http://www.scala-lang.org/
14
Why Scala Evolution of Java –Use with existing Java code –Runs on the Java Virtual Machine (JVM) –Command line (REPL), script, or compiled –Statically typed (safer than dynamic languages) –Industrial strength (Twitter, LinkedIn, …) Object-Oriented –Encapsulation, polymorphism, … –Traits: interfaces with implementation, multiple inheritance, mix-ins Functional Programming –Immutable data structures –Functions with no side effects –Provable, parallelizable Syntactic sugar for Domain Specific Languages Operator “overloading”, natural math language for Variables Parallel collections
15
Scala Implementation Dataset as a Scala collection Functional Programming Paradigms: –Function composition over object manipulation –Functions as first class citizens a LaTiS Function can be used like a programming function –Immutable data structures –No side-effects: parallelizable, provable –Lazy evaluation: scalable Math and resampling mixed in –e.g. dataset3 = (dataset1 + dataset2) / 2 Metadata encapsulated –enforce data consistency: unit conversions... –track provenance
16
LaTiS Server Implementation RESTful web service API (OPeNDAP +) Java Servlet, build and deploy war file XML dataset descriptor (TSML) for each dataset –Specify Adapter to use –Map native data source to LaTiS data model –Define transformations as Processing Instructions Catalog to map dataset names to TSML Plugins: implement the Adapter, Filter or Writer interfaces or extend existing ones Properties file to map filter and writer names to implementing classes
17
Example – Serving an ASCII File Sunspot data for October 2003 2003 10 01 75 2003 10 02 72 2003 10 03 59 2003 10 04 60 2003 10 05 53 2003 10 06 51 2003 10 07 50 2003 10 08 56 2003 10 09 58 2003 10 10 50 2003 10 11 44 2003 10 12 22 2003 10 13 12 2003 10 14 4 2003 10 15 17 2003 10 16 24 2003 10 17 37 2003 10 18 43 2003 10 19 43 2003 10 20 64 2003 10 21 66 2003 10 22 72 2003 10 23 68 2003 10 24 81 2003 10 25 89 2003 10 26 102 2003 10 27 141 2003 10 28 161 2003 10 29 167 2003 10 30 171 2003 10 31 156 TSML Dataset descriptor <dataset name="Sunspot_Number" history="Read by LaTiS"> <adapter class="latis.reader.tsml.AsciiAdapter" url="file:/data/latis/ssn.txt" />
18
Example – Serving an ASCII File
19
Current Applications LASP Interactive Solar Irradiance Data Center (LISIRD) –Uses LaTiS to read, subset, reformat data, metadata –http://lasp.colorado.edu/lisird/ Time Series Data Server (TSDS) –Common RESTful interface to NASA Heliophysics data –http://tsds.net/ Other LASP projects: MMS, MAVEN, database statistics, log files External users?
20
Capabilities – Data Reader Modules Operational: –ASCII (file, web service, system call), binary, NetCDF, Relational database, data “generators” –Time Series of scalars, vectors, and spectra –Arbitrarily long time series Prototyped: –HDF, CDF, FITS, GRIB, OPeNDAP (e.g. other LaTiS servers), NoSQL (MongoDB) –Nested 2D (gridded) data structures Planned: –Arbitrarily complex data structures
21
Capabilities – Data Writer Modules Operational: –OPeNDAP, ASCII (e.g. csv), binary, JSON, Image (PNG), IDL code, HTML dataset landing page Prototyped: –NetCDF, HDF, IDL save file, interactive plot Planned: –GeoTIFF, …
22
Capabilities – Data Filter Modules Operational: –Subset, aggregate, stride, thin, replace, integrate, bin average Prototyped: –FFT, min, max, unique, resampling, unit conversion Planned: –Coordinate system transformations –Make it easier to plug in custom computations –Track provenance
23
Capabilities – Service Interface Operational: –OPeNDAP –Java Servlet, simply deploy war file (Tomcat, Glassfish) Prototyped: –Authentication –Single executable (jetty) –THREDDS Data Server (TDS) integration Planned: –Open Geospatial Consortium (OGC) standards Web Map Server Web Coverage Server
24
Capabilities - Metadata Operational: –THREDDS catalog, static XML, browse Prototyped: –Semantic Web triple store (RDF, SPARQL) –Text search (Solr) –Modeling RDF triples (subject, predicate, object) –Track provenance, record Dataset modifications Planned: –Serve metadata in various schema (e.g. ISO 19115, SPASE) –Unique IDs, Digital Object Identifiers (DOI) for publishing
25
Other Capabilities Operational: –Time API with formatting –Time conversions with leap seconds Prototyped: –Caching, improve performance –Parallel processing, multi-core Planned: –Big Data, Hadoop, Map Reduce –Workflow integration
26
Source Code Management – Open Source Time Series Server (a.k.a. TSS1) –Core of Time Series Data Server (TSDS, tsds.net) –Built around Unidata Common Data Model –SourceForge: https://sourceforge.net/projects/tsds/ LaTiS (a.k.a. TSS2) –New LaTiS data model, scala implementation –GitHub: https://github.com/dlindhol/LaTiS –LASP internal development branch –Plug-ins as separate projects (e.g. data collections, math, custom readers/writers,…), keep core small
27
My Background (i.e. bias) Astrophysicist by degree, software engineer by profession Data user and provider Scientific data applications developer: –astrophysics, atmospheric science, space science Holy Grail: common data model Favorite scientific data models: –VisAD (http://www.ssec.wisc.edu/~billh/visad.html) –Unidata Common Data Model (http://www.unidata.ucar.edu/software/netcdf-java/CDM/) –OPeNDAP (http://www.opendap.org/)
28
Motivation – Stove Pipes
29
Single Data Access Interface
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.