EXPRESS/Binary Report David Price ISO TC184 SC4 Toulouse June 2006.

Slides:



Advertisements
Similar presentations
Status on the Mapping of Metadata Standards
Advertisements

Introduction to the BinX Library eDIKT project team Ted Wen Robert Carroll
OASIS OData Technical Committee. AGENDA Introduction OASIS OData Technical Committee OData Overview Work of the Technical Committee Q&A.
Chapter 10: Designing Databases
1 Projection Indexes in HDF5 Rishi Rakesh Sinha The HDF Group.
The Assembly Language Level
Persistent identifiers – an Overview Juha Hakala The National Library of Finland
DEX Publication Project OASIS PLCS TC Telecon 29 April 2008 Trine Hansen.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
Pointer. Warning! Dangerous Curves C (and C++) have just about the most powerful, flexible and dangerous pointers in the world. –Most other languages.
IASSIST Conference 2006 – Ann Arbor, May Metadata as report and support A case for distinguishing expected from fielded metadata Reto Hadorn S I.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
Data Representation Kieran Mathieson. Outline Digital constraints Data types Integer Real Character Boolean Memory address.
CS 255: Database System Principles slides: Variable length data and record By:- Arunesh Joshi( 107) Id: Cs257_107_ch13_13.7.
® Eurostep.ESUKPC v0.1©Copyright Eurostep Limited An Introduction to ISO STEP Part 25 David Price.
HDF 1 NCSA HDF XML Activities Robert E. McGrath Mike Folk National Center for Supercomputing Applications.
1)Never start coding unless you understand the task! 2)Gather requirements first. This means identify the problem and ask questions about it. Now you kind.
XML, DITA and Content Repurposing By France Baril.
Metadata and identifiers for e- journals Copenhagen Juha Hakala Helsinki University Library
44220: Database Design & Implementation Logical Data Modelling Ian Perry Room: C48 Tel Ext.: 7287
Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Mike Folks, The HDF Group Ruth Duerr, NSIDC 1.
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
Dr. Azeddine Chikh IS446: Internet Software Development.
NetTech Solutions Working with Web Elements Lesson 6.
1. 2 Purpose of This Presentation ◆ To explain how spacecraft can be virtualized by using a standard modeling method; ◆ To introduce the basic concept.
1 Java Inheritance. 2 Inheritance On the surface, inheritance is a code re-use issue. –we can extend code that is already written in a manageable manner.
School of Computer Science PDE 2005expressik1 expressik: an open source EXPRESS parser and application development kit Andy Carpenter
DOMAIN MODEL— PART 2: ATTRIBUTES SYS466. Looking For Potential Classes “Know the business”. Ask Questions Identify business concepts; filter nouns (person,
December 1, 2005HDF & HDF-EOS Workshop IX P eter Cao, NCSA December 1, 2005 Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration.
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
1  Bob Hager Director of Publishing Standards Metadata Specification.
5 BASIC CONCEPTS OF ANY PROGRAMMING LANGUAGE Let’s get started …
Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct HDF and.
ECSE Software Engineering 1I HO 5 © HY 2012 Lecture 5 Formal Methods Isn’t this really getting old?
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra.
EXPRESS/HDF5 Mapping Specification Version 0.5 Walkthrough David Price October 2006.
ICS 321 Fall 2011 The Relational Model of Data (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 8/29/20111Lipyeow.
1 HDF5 Life cycle of data Boeing September 19, 2006.
CaDSR Software Users Meeting 3.1 Requirements Review 9/19/2005 caDSR Software Team Host: Denise Warzel NCICB, Assistant Director, caDSR.
The ISO EXPRESS and Binary Data Project January 2005.
Design Model Lecture p6 T120B pavasario sem.
EXPRESS/Binary Report David Price ISO SC4 Vico Equense March 2006.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 4 Relational Algebra.
Analyzing Systems Using Data Dictionaries Systems Analysis and Design, 8e Kendall & Kendall 8.
EXPRESS/UML aka Part 25 Edition 2 Bath STEP July 2004.
® A Proposed UML Profile For EXPRESS David Price Seattle ISO STEP Meeting October 2004.
STEP Tutorial: “ Fundamentals of STEP” David Briggs, Boeing January 16, 2001 ® PDES, Inc NASA STEP Workshop step.nasa.gov.
Lecture 10 Page 1 CS 111 Summer 2013 File Systems Control Structures A file is a named collection of information Primary roles of file system: – To store.
All Presentation Material Copyright Eurostep Group AB ® A Meta-model of EXPRESS in UML for MOF and UML to EXPRESS David Price April 2002.
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
The ISO EXPRESS and Binary Data Project Last Modified: June 2005 Following ISO SC4 Valencia Meetings.
CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++,
11 th NASA/ESA Workshop on Product Data Exchange 2009 Allison Barnard Feeney, NIST David Price, Eurostep.
The Instruction Set Architecture. Hardware – Software boundary Java Program C Program Ada Program Compiler Instruction Set Architecture Microcode Hardware.
11/8/2007HDF and HDF-EOS Workshop XI, Landover, MD1 Software to access HDF5 Datasets via OPeNDAP MuQun Yang, Hyo-Kyung Lee The HDF Group.
Synchronise work on DEXs and reference data between PLCS pilots and OASIS/PLCS Workshop #3 10 – 11 November 2004.
1. 2 Purpose of This Presentation ◆ To explain how spacecraft can be virtualized by using a standard modeling method; ◆ To introduce the basic concept.
CETIS Educational Content SIG, Learning and Teaching Scotland, September 2004 Content Specifications Update Wilbert Kraan Lorna M. Campbell CETIS.
DEX Publication Project OASIS PLCS TC Face to Face meeting 10 March 2008 Trine Hansen.
Part 25 E2 EXPRESS/UML Walkthrough Seattle STEP October 2004.
DOMAIN MODEL—PART 2: ATTRIBUTES BTS430 Systems Analysis and Design using UML.
NetCDF Data Model Details Russ Rew, UCAR Unidata NetCDF 2009 Workshop
REST API Design. Application API API = Application Programming Interface APIs expose functionality of an application or service that exists independently.
Chapter 27 Network Management Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2010 The HDF Group. All Rights Reserved1 Data Storage and I/O in HDF5.
File System Structure How do I organize a disk into a file system?
Relational Algebra Chapter 4, Part A
Creating Tables & Inserting Values Using SQL
APE EAD3 introduction - DARIAH - Brussels
Presentation transcript:

EXPRESS/Binary Report David Price ISO TC184 SC4 Toulouse June 2006

Agenda 1.Status since last ISO STEP in Italy (added) 2.Walkthrough of current EXPRESS/HDF5 mapping 3.Presentation of prototypes and testing results 4.Issue discussion for next draft of mapping 5.Next actions and plans for testing

March 2006 Italy STEP Meeting Report Items Workshop hosted by HDF Group –Workshop Dec 6-8, 2005 –Champaign, Illinois, USA STEP, ESA, commercial, EXPRESS/Binary and HDF 5 developer attendees Agenda was –Introduced HDF Group to EXPRESS language and STEP information models –HDF developers provided overview of HDF 5 Concepts and Structures –Walkthrough of EXPRESS/HDF Mapping Draft 0.2 –Presentation by domain experts : AP209 Analysis, STEP TAS, SINDA/G, Ship AP Analysis Needs –Issues/requirements around APIs, programming languages, etc.

Summary Reported at March 2006 Italy STEP Meeting Many core issues on V0.2 spec addressed at the Dec 2005 workshop at HDF Group US facilities –The basic approach was flawed, V0.2 did not use enough of the HDF capability V0.3 will be an improvement and should allow better control of efficiency by the application – Prototyping will follow V0.3

March 2006 Italy STEP Meeting Action Items David Price – Publish EXPRESS/HDF Mapping V0.3 due March 24 Mats Lindeblad – Create New Work Item for June SC4 meeting David Price - contact Hans-Peter about linking a one-day workshop with the NASA/ESA PDE at the end of April (a day before Monday?) Keith Hunten – plan session at Eng Analysis sessions at PDES, Inc. Offsite end of March David/Mats – plan for technical work at June SC4 meeting

Progress Since March V0.3 published Short requirements session at PDES, Inc Offsite where the EA team prioritized –Add SELECT –Add redefined attributes (does HDF support this?) –Add schema version attribute (may use URN) –What kind of metadata does NARA required? National archives project –Also, need a EXPRESS-to-C software to lower barrier to participating in prototyping

Progress Since March (2) One-day workshop held with pyEXPRESS prototype team lead by Alain Fagot and Hans-Peter –David Price Slides/Notes are available Post-workshop plan to produce V0.4 –EA requirements –better examples –Incorporate feedback/issues from pyEXPRESS Editor (i.e. David Price) could not provide sufficient time to the project to produce V0.4 or the EXPRESS-to-C software before June vacation V0.31 was published June 9 adding proposal for subset of SELECT types (one of the EA team priorities)

Current Mapping Walkthrough

Prototypes and Testing results pyEXPRESS testing (slides from PDE workshop) –Subset of EXPRESS (e.g. no complex instances) –Based on pyTables 1.3, HDF 1.6.5, Python 2.4 –Using same EXPRESS-based API for P21 and HDF access HDF is just another backend to the pyEXPRESS API This is a different approach from what is assumed by the EXPRESS/Binary team where direct HDF API access was assumed (is “programmer ease of use” a very high priority?) –Compression (using ZLIB) and chunking make file smaller and more efficient for read/write Even PC processors are powerful enough that decompression is faster than file access as HDF lets you only read into memory what you need at any given time –Benchmarks show good results (e.g % file size and 75% access times), but also identify areas in the mapping that need improvement (e.g. small HDF files are bigger than P21 and sometimes slower) –STEP TAS will be a NWI in SC4 starting soon

Issue discussion for next draft of mapping –David can edit source XML for V0.4 draft to include issue resolution we develop today –EA needs Check V0.31 SELECT support (DONE) Add redefined attributes (does HDF support this?) (DONE) Add schema version attribute (may use URN) –pyEXPRESS Cannes issues Object ID (i.e. pointers) handling code ID = Integer + string (string is pyTable name, generated from EXPRESS name) (DONE) Unset values for each datatype within the file (DONE)

Issue discussion for next draft of mapping (2) –Issues Complex/partial entity instances (ANDOR) (DONE) David Issue = (Multiple) Inheritance? Had something to do with select types. (DONE) Defined type of array “TYPE x = aggregate of whatever” (TODO) Complicated types for array values e.g. SELECT (REAL, INTEGER, ENTITY INSTANCE) (DONE) –We will use the same generic object identifier approach to handle these as to handle complicated SELECT types. Variable length string –HPdK thinks that these cannot be put in a HDF Compound Datatype. Georg found where it the UG seems to say this is allowed 7.1 Complex combinations of datatypes. Maybe it’s a limitation of pyTables? –The current mapping says use Varaible length datatypes but it’s not clear if that’s allowed in a Compound Datatype. –We may have to use the general purpose object id capability and have a dataset somewhere containing varying length strings (or find another solution). It does look like you may have to specify the maximum length of the varying length strings. –(DEFER TO WITH HDF)

Instance identifiers Every hdf5 link and hdf5 dataset has an hdf5 object id that is an unsigned 32/64 bit integer –Issue : Is there a problem with using 64 bit integer as part of entity instance ids on a 32 bit platform (i.e. does this place a limit on file size or interoperabilty?) H-P thinks the object ids are managed inside a hash table in HDF –Also thinks the object id is not exposed in the hdf API everywhere that we need it Proposal is to use a tuble of integers that can be used for both an entity instance id and a pointer into the aggregates –(hdf object id, row index)

Complicated Select types TYPE x = SELECT OF (REAL, INTEGER, LIST OF BOOLEAN, e2); Proposal is to have each base type in a separate HDF dataset in a separate group –Group for REAL, Group for INTEGER, Group for LIST OF BOOL, etc. It could be configurable –May have a single dataset for ALL integers in the file used in this way –May have a dataset for each attribute used in this way (similar to how the mapping for aggregate attribute values works now) –For cases where every entity instance that has TYPE x as its domain, you might use the simple type instead of the complicated mapping

Redeclared attributes attribues Redeclaration things we can address –specialize the attribute domain Write the encoding of the specialized value in the HDF compound type representing the subtype –type is subtype of original We only use the object identifier everywhere so this is no problem –rename of attribute Use new name in HDF compound data type for the subtype –Explicit to derived Do not put the attribute in the HDF5 compound data type and do not store a value

ANDOR SCHEMA test; ENTITY a; name : STRING; ENTITY b SUBTYPE OF a; age : INTEGER; x : REAL; ENTITY c SUBTYPE OF a; height : REAL; x: BOOLEAN; Results in test/a test/a/name test/b test/b/name test/b/age test/c test/c/height test/b__c test/b__c/name test/b__c/age test/b__c/height test/b__c/b__x test/b__c/c__x

Next actions and plans for testing pyEXPRESS testing based on pyTABLES, there is a C Tables API … Should our other testing be based on that? Can/should we set up another workshop with HDF Group to complete mapping? –DP Action to talk to Mike Folk to about doing something prior to the ISO in October (we remember him saying there was a workshop in DC) What do testers need to help get them started? –EXPRESS-to-C has been mentioned (if we use C Tables API that’s not useful) –Training? –Test data? –Schemas? Closing plenary slides for Friday NWI – Will be created and circulated via telecon before the next ISO STEP meeting.

Notes from Meeting Are there other sources of MetaData? –Are there other archiving (e.g. NARA) or LTDR standards (e.g. LOTAR)? –If you treat HDF as a “database” what is needed? –What about internal company meta-data? –What about Web-based standards (e.g. Dublin Core)? –Should we just include a generic meta-data “name-value pair” capability? –What about non-STEP data in the same file that the STEP data references (e.g. jpegs)? Where multiple mappings are still being tested, it is OK to include more than one in the specification. –The specification is currently a guide for prototype testers, not a draft standard. What are the highest priority requirements? “Performance”, but performance and efficiency of exactly what?

Notes from Meeting (2) We may need to add some HDF attributes to the Groups and Datasets when they are written to help readers (e.g. number of instances of an entity type that were written) –C Tables API uses this approach so we should look at that to see if we can learn anything for our use. We need to have more discussion about whether to allow or require writing inverse attribute values into the file, nothing is done there now. –For “read-only files” inverses could be a nice optimization. –Would we need to allow this to be configured? If so, how? –What about the “unnamed inverse” that EXPRESS says exists?

Action Items HPdK – Find out how to implement the object id using the HDF 5 API DP – Find thread on entity instance identifiers from a year ago, it might be useful for the new proposal AF – Write text to describe the multi-dataset approach to Aggregate Instances, to DP who will add to spec V0.4 DP – Read “fixme” from meeting and fix them. HPdK – Put example HDF5 files on the Web somewhere for others to view. Mapping document too. ML – Look at what Vivace stuff can be published publicly. ML – Look at What can be published to the Vivace Forum 2 (unfortunately, these are same dates as Hershey).