QDataSet Data Model What is a data model? –My definition… –“model” in the CompSci sense A bank’s software has model for customers Store what’s relevant.

Slides:



Advertisements
Similar presentations
JavaScript I. JavaScript is an object oriented programming language used to add interactivity to web pages. Different from Java, even though bears some.
Advertisements

Lists and the Collection Interface Chapter 4. Chapter Objectives To become familiar with the List interface To understand how to write an array-based.
A PLFS Plugin for HDF5 for Improved I/O Performance and Analysis Kshitij Mehta 1, John Bent 2, Aaron Torres 3, Gary Grider 3, Edgar Gabriel 1 1 University.
Streaming NetCDF John Caron July What does NetCDF do for you? Data Storage: machine-, OS-, compiler-independent Standard API (Application Programming.
©TheMcGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 10 *Arrays with more than one dimension *Java Collections API.
FORTRAN Short Course Week 4 Kate Thayer-Calder March 10, 2009.
Chapter 10 2D Arrays Collection Classes. Topics Arrays with more than one dimension Java Collections API ArrayList Map.
Engineering H192 - Computer Programming The Ohio State University Gateway Engineering Education Coalition Lect 21P. 1Winter Quarter MATLAB: Structures.
COSC 1306—COMPUTER SCIENCE AND PROGRAMMING DATA ABSTRACTION Jehan-François Pâris
Copyright © 2002, Systems and Computer Engineering, Carleton University Intro.ppt * Object-Oriented Software Development Unit 1 Course.
Game Programming © Wiley Publishing All Rights Reserved. The L Line The Express Line to Learning L Line L.
Abstract Types Defined as Classes of Variables Jeffrey Smith, Vincent Fumo, Richard Bruno.
ElVis Developments for Simulation and Analysis Programs Tarun Pondicherry Summer 2006 Science Ed High School Intern Eliot Feibush, Mentor 8/16/2006.
Zhonghua Qu and Ovidiu Daescu December 24, 2009 University of Texas at Dallas.
Introduction to Python
Lists in Python.
Victor Eijkhout and Erika Fuentes, ICL, University of Tennessee SuperComputing 2003 A Proposed Standard for Numerical Metadata.
Coverages and the DAP2 Data Model James Gallagher.
ViRBO features: das2 Jeremy Faden, Cottage Systems.
JavaScript, Fourth Edition
Computational Methods of Scientific Programming Lecturers Thomas A Herring, Room A, Chris Hill, Room ,
Chapter 6 Object-Oriented Java Script JavaScript, Third Edition.
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
What is an Array? An array is a collection of variables. Arrays have three important properties: –group of related items(for example, temperature for.
Lec 6 Data types. Variable: Its data object that is defined and named by the programmer explicitly in a program. Data Types: It’s a class of Dos together.
Autoplot Overview Autoplot developed originally for ViRBO Virtual Observatory, then adopted by VMO, and RBSP instrument and other teams.
Chapter 9 Object-Oriented Software Development F Software Development Process F Analyze Relationships Among Objects F Class Development F Class Design.
Binding UI Components to Data. Adding UI Components to the Page You can create components on a page by: Dragging a component from the Component Palette.
Selected Topics in Software Engineering - Distributed Software Development.
Introduction CS 3358 Data Structures. What is Computer Science? Computer Science is the study of algorithms, including their  Formal and mathematical.
Integrating netCDF and OPeNDAP (The DrNO Project) Dr. Dennis Heimbigner Unidata Go-ESSP Workshop Seattle, WA, Sept
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
Built-in Data Structures in Python An Introduction.
C++ Programming Basic Learning Prepared By The Smartpath Information systems
PaPCo, Das2, and Autoplot Jeremy Faden, University of Iowa.
Topic 1 Object Oriented Programming. 1-2 Objectives To review the concepts and terminology of object-oriented programming To discuss some features of.
May 2003National Coastal Data Development Center Brief Introduction Two components Data Exchange Infrastructure (DEI) Spatial Data Model (SDM) Together,
1 Working with Data Structures Kashef Mughal. 2 Chapter 5  Please review on your own  A few terms .NET Framework - programming model  CLR (Common.
9-Dec Dec-15  INTRODUCTION.  FEATURES OF OOP.  ORGANIZATION OF DATA & FUNCTION IN OOP.  OOP’S DESIGN.
Ordered Linked Lists using Abstract Data Types (ADT) in Java Presented by: Andrew Aken.
The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.
Introduction to Python Dr. José M. Reyes Álamo. 2 Three Rules of Programming Rule 1: Think before you program Rule 2: A program is a human-readable set.
The Math Class Methods Utilizing the Important Math Operations of Java!
Inside Autoplot: an Interface for Representing Scientific Data in Software IN11C-1063.
ANU COMP2110 Software Design in 2003 Lecture 10Slide 1 COMP2110 Software Design in 2004 Lecture 12 Documenting Detailed Design How to write down detailed.
Data Design and Implementation. Definitions Atomic or primitive type A data type whose elements are single, non-decomposable data items Composite type.
Chapter 6Java: an Introduction to Computer Science & Programming - Walter Savitch 1 Chapter 6 l Array Basics l Arrays in Classes and Methods l Programming.
 Packages:  Scrapy, Beautiful Soup  Scrapy  Website  
Arrays.
Document Releases Peer Reviews Constraints CDF-A How to…
Arrays Chapter 7. MIS Object Oriented Systems Arrays UTD, SOM 2 Objectives Nature and purpose of an array Using arrays in Java programs Methods.
ENEE150 – 0102 ANDREW GOFFIN More With Pointers. Importance of Pointers Dynamic Memory (relevant with malloc) Passing By Reference Pointer Arithmetic.
Math 252: Math Modeling Eli Goldwyn Introduction to MATLAB.
Midterm Review Tami Meredith. Primitive Data Types byte, short, int, long Values without a decimal point,..., -1, 0, 1, 2,... float, double Values with.
Overview of Previous Lesson(s) Over View VP is the methodology in which development allows the user to grab and use the desired tools like menus, buttons,
ARRAYS Multidimensional realities Image courtesy of
Lecture 9:FXML and Useful Java Collections Michael Hsu CSULA.
Extending a displacement A displacement defined by a pair where l is the length of the displacement and  the angle between its direction and the x-axix.
NetCDF Data Model Details Russ Rew, UCAR Unidata NetCDF 2009 Workshop
Lecture 8: Collections, Comparisons and Conversions. Svetla Boytcheva AUBG, Spring COS 240 Object-Oriented Languages.
Two-dimensional arrays
Object-Oriented Databases
J. B. Faden(1); R. S. Weigel(2); J. D. Vandegriff(3); R. H
Array Array is a variable which holds multiple values (elements) of similar data types. All the values are having their own index with an array. Index.
.NET and .NET Core 5.2 Type Operations Pan Wuming 2016.
LESSON 13 – INTRO TO ARRAYS
Teaching London Computing
Transforming Data (Python®)
Arrays .
Introduction to Data Structure
Presentation transcript:

QDataSet Data Model What is a data model? –My definition… –“model” in the CompSci sense A bank’s software has model for customers Store what’s relevant to their business –A representation of data that allows data access to the numbers and metadata –Bias towards visualization and analysis

QDataSet Motivation Dataset abstraction layer allows data from different sources to be plotted in Autoplot All data-handling systems have some sort of data model. All have limitations in what they can represent. Dataset abstraction provides nouns and verbs that develop a vocabulary.

Data Model Goals The model should be an interface, not a file format. Flexible to accurately represent many types of data Simple so as not to burden Range of abstraction –From a set of times data when was collected: Time –To high-dimension dataset that can be displayed and sliced, like Flux( scan mode, Time, energy, pitch )‏

Context for Development das2 has had two data models, QDataSet will become the third (and final, hopefully). Current is overly abstract. –Optimized for line plots and spectrograms. –All data must be tagged with physical units –Datasets must be Y(T) or Z(Mode,T,Y). –But cannot represent common things like Flux(Time,Energy,PitchAngle)‏ –Or GsmPos(T)‏ API is big, “TableDataSet” has 28 methods

Context For Development-TableDataSet Here are example methods to give context: Tds.getZUnits(), tds.getYUnits(), tds.getXUnits()‏ Tds.tableCount()‏ Tds.getYLength( itable )‏ Tds.getXLength()‏ Tds.getYTagDatum( itable, iy )‏ Tds.getDatum( ix, iy )‏ Tds.getDouble( ix, iy, units ) Tds.getProperty( DataSet.PROPERTY_X_TAG_WIDTH )‏

Context for Development—das2 & PaPCo Groping for the ideal model for two+ years PaPCo data model is based on CDF model –CDF file is a collection of datasets –datasets are 1,2,3,4-D arrays –datasets have attribute (name=value) pairs –dependencies between datasets

Context for Development--Autoplot Autoplot goal is to plot data from many sources, uses das2 “QDataSet” introduced when das2 data model limitations got in the way. –Supports untagged data (bunch of numbers)‏ –Combinations of data types (timetags are doubles, data are floats) make implementing one giant interface impossible. –(in OOP, has-a is always better than is-a)

QDataSet Java API inspired by CDF and NetCDF DataSet = Array + name=value properties Property names like DEPEND_0, UNITS –DEPEND_0 points to the dataset that tags dimension “rank” is number of indexes Abstraction comes from composition. Density( Time=1440 )‏ –Density is rank 1 dataset with 1440 values. –Time is rank 1 dataset with 1440 values –Density.property( QDataSet.DEPEND_0 ) -> Time(1440)‏

QDataSet—rank 3 Flux( Time=1440, Energy=55, Pitch=18 )‏ Flux.rank() -> 3 Energy.property( QDataSet.UNITS )->eV Flux.property( QDataSet.DEPEND_1 ) -> Energy(55)‏

Accessing Data Density.value(i) -> double Density.property( QDataSet.FILL ) -> -1e31 for ( int i=0; i<Density.length(); i++ ) { double d= Density.value(i)‏ } Iterator hides rank iter= DataSetIterator( Density )‏ while ( iter.hasNext() ) { double d= iter.value( Density )‏ }

Timetags Time.property( QDataSet.UNITS )->cdfEpoch Time.value( 0 ) -> e13 das2 “Datum” is double + Unit cdfEpoch.createDatum( Time.value(0) ) -> “ T01:23:45” Canonical time unit in das2 is Units.us2000, microseconds since midnight, Jan 1, 2000.

QDataSet implementations DDataSet is backed by double array, FDataSet is backed by floats. TagGenDataSet computes value() with each call. NetCDFDataSet adapts the NetCDF api to make it look like a QDataSet. DoubleBufferDataSet is backed by java.nio.DoubleBuffer, and has practically no limit to size since it is not bounded by physical memory

QDataSet interface rank() length(), length(dim0), length(dim0,dim1) value(dim0), value(dim0,dim1),… property( name ), property( name, dim0 ),… Note, there are extensions such as “WritableDataSet” with additional methods.

QDataSet layers Java API is thin syntax layer Abstraction comes from semmantics Thin syntax layer means easy to implement in different languages –Java –C++ –Xml

Rank-reducing Operators “Slice” reduces rank by extracting a dataset from array of datasets. –Remove context to see detail –Flux( Time, Energy, Pitch ) -> Flux( Energy, Pitch)‏ “collapse” reduces rank by averaging elements along a dimension –Remove details to see context –Flux( Time, Energy, Pitch ) -> Flux( Time, Energy )‏

Qube DataSets In general, QDataSets are arrays of arrays. Length method is qualified by index –ds.length() gives first dimension length –ds.length(0) might not equal ds.length(1). Slice operator only defined for 0 th index. Qube implies data is simple N-dimensional array and dimensions are independent. –Slice or collapse any dimension –Flux( Time=1440, Energy=32 ) implies Qube. Flux.property( QDataSet.QUBE ) -> True

Math Operators Add, subtract, multiply divide, pow, cos etc He_density( Time=1440 )‏ H_density( Time=1440 )‏ Total_density= Ops.add( He_density, H_density )‏ FFT, magnitude, etc angle= new TagGenDataSet( 0, 100*PI, )‏ fft= Ops.FFT( cos( angle ) )‏ pow= Ops.pow( magnitude(fft), 2 )‏

Other operators join appends one dataset to another –(add the dependencies too)‏ findex shows how two (tags) datasets interleave. Boxcar average for rank 1 datasets. etc…

IDL, Matlab inspired IDL’s findgen(20) -> 0,1,2,3,4,… Matlab’s linspace( 0., 1., 20 )-> 0.00, 0.05, 0.10, … IDL’s where( Density > 20. )‏ –but with zero length result! –no aliasing 2-D to 1-D! (result preserves dimensionality)‏

Limitations Rank1, 2, and 3 implemented in Java API. –Rank0 exists, but you can’t do anything with it –RankN exists, but you can only slice it. Many operators assume QUBEs Still groping for how to represent coordinate dimensions And bundles of correlated data Das2Stream current cannot represent rank 3 datasets.

Jython Support Jython is Python implemented in Java allows operator overloading QDataSet + jython = expressive language similar to IDL or matlab Autoplot script panel N1= getDataSet( “/home/jbf/density.dat?column=N1” )‏ N2= getDataSet( “/home/jbf/density.dat?column=N2” )‏ plot( N1 + N2 )‏

example Saturn Density contours 200 lines of jython code reads in 5 datasets produces 4 datasets Datasets are then displayed in Autoplot with contours feature added. ported from IDL script in about an hour

Thanks!