EXPRESS/HDF5 Mapping Specification Version 0.5 Walkthrough David Price October 2006.

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

Introduction to C Programming
Chapter 10: Designing Databases
Database Planning, Design, and Administration
A PLFS Plugin for HDF5 for Improved I/O Performance and Analysis Kshitij Mehta 1, John Bent 2, Aaron Torres 3, Gary Grider 3, Edgar Gabriel 1 1 University.
Database Systems: Design, Implementation, and Management Tenth Edition
Dr. Kalpakis CMSC 661, Principles of Database Systems Representing Data Elements [12]
By Philippe Kruchten Rational Software
Elementary Data Types Prof. Alamdeep Singh. Scalar Data Types Scalar data types represent a single object, i.e. only one value can be derived. In general,
ODMG Standard: Object Model1 OBJECT-ORIENTED DATABASE SYSTEMS ODMG Standard: Object Model Susan D. Urban and Suzanne W. Dietrich Department of Computer.
GI Systems and Science January 30, Points to Cover  Recap of what we covered so far  A concept of database Database Management System (DBMS) 
A First Attempt towards a Logical Model for the PBMS PANDA Meeting, Milano, 18 April 2002 National Technical University of Athens Patterns for Next-Generation.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 7 Data Modeling Using the Entity- Relationship (ER) Model.
Elementary Data Types Scalar Data Types Numerical Data Types Other
Structured Data Types and Encapsulation Mechanisms to create new data types: –Structured data Homogeneous: arrays, lists, sets, Non-homogeneous: records.
RIZWAN REHMAN, CCS, DU. Advantages of ORDBMSs  The main advantages of extending the relational data model come from reuse and sharing.  Reuse comes.
® Eurostep.ESUKPC v0.1©Copyright Eurostep Limited An Introduction to ISO STEP Part 25 David Price.
EER vs. UML Terminology EER Diagram Entity Type Entity Attribute
CSE314 Database Systems Data Modeling Using the Entity- Relationship (ER) Model Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
Proceso kintamybių modeliavimas Modelling process variabilities Donatas Čiukšys.
© Drexel University Software Engineering Research Group (SERG) 1 Based on the paper by Philippe Kruchten from Rational Software.
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Overview of the Database Development Process
EXPRESS/Binary Report David Price ISO TC184 SC4 Toulouse June 2006.
DBMS Lecture 9  Object Database Management Group –12 Rules for an OODBMS –Components of the ODMG standard  OODBMS Object Model Schema  OO Data Model.
Systems analysis and design, 6th edition Dennis, wixom, and roth
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
1. 2 Purpose of This Presentation ◆ To explain how spacecraft can be virtualized by using a standard modeling method; ◆ To introduce the basic concept.
SWE 316: Software Design and Architecture – Dr. Khalid Aljasser Objectives Lecture 11 : Frameworks SWE 316: Software Design and Architecture  To understand.
1 High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006.
HDF5 A new file format & software for high performance scientific data management.
CSCI 3140 Module 2 – Conceptual Database Design Theodore Chiasson Dalhousie University.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
2. Database System Concepts and Architecture
February 2-3, 2006SRB Workshop, San Diego P eter Cao, NCSA Mike Wan, SDSC Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration Object-level.
Natural and programming languages v0.2 – initial draft, Pikaro Tarmo v0.3 – updated, Pikaro Tarmo.
The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1.
N P O E S S I N T E G R A T E D P R O G R A M O F F I C E NPP/ NPOESS Product Data Format Richard E. Ullman NOAA/NESDIS/IPO NASA/GSFC/NPP Algorithm Division.
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
Methodology - Conceptual Database Design. 2 Design Methodology u Structured approach that uses procedures, techniques, tools, and documentation aids to.
1/26/2004TCSS545A Isabelle Bichindaritz1 Database Management Systems Design Methodology.
Data Structure & File Systems Hun Myoung Park, Ph.D., Public Management and Policy Analysis Program Graduate School of International Relations International.
October 15, 2008HDF and HDF-EOS Workshop XII1 What will be new in HDF5?
 Three-Schema Architecture Three-Schema Architecture  Internal Level Internal Level  Conceptual Level Conceptual Level  External Level External Level.
1 CS 430 Database Theory Winter 2005 Lecture 2: General Concepts.
1 HDF5 Life cycle of data Boeing September 19, 2006.
The ISO EXPRESS and Binary Data Project January 2005.
HDF5 UML Figures for Presenters Part I: Class Diagrams Part II: Relationship Diagrams Parts III & IV: The above, with text blocks.
EXPRESS/Binary Report David Price ISO SC4 Vico Equense March 2006.
ESDI Workshop on Conceptual Schema Languages and Tools
+ Structures and Unions. + Introduction We have seen that arrays can be used to represent a group of data items that belong to the same type, such as.
® A Proposed UML Profile For EXPRESS David Price Seattle ISO STEP Meeting October 2004.
STEP Tutorial: “ Fundamentals of STEP” David Briggs, Boeing January 16, 2001 ® PDES, Inc NASA STEP Workshop step.nasa.gov.
All Presentation Material Copyright Eurostep Group AB ® A Meta-model of EXPRESS in UML for MOF and UML to EXPRESS David Price April 2002.
1. 2 Purpose of This Presentation ◆ To explain how spacecraft can be virtualized by using a standard modeling method; ◆ To introduce the basic concept.
Towards Unifying Vector and Raster Data Models for Hybrid Spatial Regions Philip Dougherty.
Part 25 E2 EXPRESS/UML Walkthrough Seattle STEP October 2004.
The HDF Group Introduction to HDF5 Session Two Data Model Comparison HDF5 File Format 1 Copyright © 2010 The HDF Group. All Rights Reserved.
NetCDF Data Model Details Russ Rew, UCAR Unidata NetCDF 2009 Workshop
Copyright © 2010 The HDF Group. All Rights Reserved1 Data Storage and I/O in HDF5.
HDF/HDF-EOS Meeting Oct th 2008, Aurora CO Proposal for adding Named Dimensions to HDF5 Arrays Daniel Kahn Science Systems and Applications, Inc.
Data Modeling Using the Entity- Relationship (ER) Model
Introduction to HDF5 Session Five Reading & Writing Raw Data Values
Entity-Relationship Model
XML QUESTIONS AND ANSWERS
Data Modeling II XML Schema & JAXB Marc Dumontier May 4, 2004
Julia Powell Coast Survey Development Laboratory
Entity Relationship Diagrams
Introduction to Data Structure
Software Analysis.
Presentation transcript:

EXPRESS/HDF5 Mapping Specification Version 0.5 Walkthrough David Price October 2006

Agenda On EXPRESS-HDF5 spec EXPRESS-driven data in HDF5 EXPRESS schema concepts in HDF5 Small demo

On EXPRESS-HDF5 spec

Mapping Version 0.5 Caveats V0.5 is first “complete” mapping –Substantial simplification between V0.4 and V0.5 –Still likely there are still errors/omissions Design changes are still possible Two approaches are suggested in a few cases, expect to chose one after prototyping Version 0.6 will be the Committee Draft ballot –New Work Item/Committee Draft ballot to start as soon as documents are ready per Change Management Review Oct 24, 2006.

Not Covered in V0.5 May be added in later editions/versions –Inverse attributes –Derived attributes –Cross-file links Expected to remain out of scope –Constraints Rules, procedure and functions Domain rules Unique rules

Key Requirements Focus is on efficient data encoding and access … however … Use cases vary widely –Archiving –Random data access –Large-volume data exchange

Standardization Approach Not proposing standard EXPRESS-driven API now –API may come later and may be AP-specific Flexibility more important than uniformity –multiple schemas, multiple populations, multiple encodings, sharing data between populations –include anything users need in file (e.g. jpeg) Let HDF5 optimizations handle efficiency requirements as much as possible –Not “dumbing down” HDF5 to make pre- and post- processor development simple

Keep the following in mind We are encoding simple and array data values into a file We are not translating EXPRESS into another schema language –It’s not possible to pre-process an EXPRESS schema into an “empty HDF5 file” –We need to take data in software application memory and make it persistent on disk in an efficient manner

Notes about HDF5 HDF5 Files are self-descriptive –HDF5 Datatypes are stored along with the data HDF5 datatype in file, need not match datatype in application memory –Mapping between data in the HDF5 file and in application memory happens at runtime

Usage Architecture HDF5 Libraries Application Computer Memory HDF5 File on Disk Non-HDF5 File on Disk Guides encoding of application data into HDF5 File EXPRESS to HDF5 Application User

HDF5 concepts from feet

HDF5 Key Concepts File –a contiguous string of bytes in a computer store (memory, disk, etc.). The bytes represent zero or more objects of the model. Group –a collection of objects (including groups) Dataset –a multidimensional array of Data Elements, with Attributes and other metadata Datatype –a description of a specific class of data element, including its storage layout as a pattern of bits Dataspace –describes the logical spatial layout of the raw data associated with a Dataset or Attribute Attribute –a named data value associated with a group, dataset, or named datatype

HDF5 Key Concepts (2) Compound Datatype –a datatype that combines one or more datatypes, called members, into a more complex datatype Array –datatype for describing elements that are homogeneous multi- dimensional arrays of the same base datatype (base may be any HDF5 datatype) Link –represents an edge in a file graph structure and has a (name, value) pair associated with it –Group membership is implemented via the link object –Names of HDF5 objects are based on links to the object –Soft links are secondary paths providing additional identification of an HDF5 object

Informal Diagram Notation HDF5 Group HDF5 Datatype HDF5 Dataset HDF5 Attribute HDF5 Dataspace HDF5 Link Link Group Datatype Dataset Attribute Dataspace / Root Group Soft Link

Groups of : Groups and Datasets / /De/R5 Uniquely identified by De R5 A3 A2 A1 R1 R2 Kl SL Has two identifiers /De/R1 /Ab/SL Ab

Datatypes HDF5 supports numerous simple datatypes –Integer, Float, String, Bitfield, Opaque, Reference –Warning : Unicode support still in-process Of multiple types/architectures –Standard, Native, IEEE, etc. HDF5 supports complex datatypes –Array, Enum, Variable-length HDF5 support compound datatype –Structured set of any datatype Used defined datatypes are also supported

HDF5 Array HDF5 Array Datatype has three key characteristics –Rank = number of dimensions –Dimensions = index/bounds of the array –Datatype of the elements Example: TYPE x = ARRAY[1:3] OF ARRAY[1:5] OF INTEGER –Has Rank = 2 –Has Dimension [3, 5] –H5_NATIVE_INT might be datatype of elements

EXPRESS-driven data in HDF5

EXPRESS instance terms Population –A set of entity instances based on an EXPRESS schema Entity instance –An identifiable data instance based on one or more EXPRESS Entity Type(s) Simple value –An instance of a simple data type (e.g. integer) that is part of an Entity instance or a member of an aggregate value Aggregate value –An n-dimensional Array, List, Bag or Set that is part of an Entity instance (i.e. aggregate values have members) Entity instance reference –An attribute value or aggregate member that refers to a single Entity instance (typically via Entity instance identifier)

EXPRESS Populations An EXPRESS population is identified by an HDF5 Group –HDF5 Attribute signifies these HDF5 Groups iso_10303_26_data = Multiple populations based on different schemas can appear in the same HDF5 File Multiple populations based on the same schema but with different encodings can appear in the same HDF5 File HDF5 Links connect Datasets to Groups

Population as HDF5 Group iso_10303_26_data = ‘Composite and metallic structural analysis and related design’ iso_10303_26_data = ‘Configuration controlled 3D designs of mechanical parts and assemblies’ iso_10303_26_data = ‘Configuration controlled 3D designs of mechanical parts and assemblies’ AP203Pop1 AP209Pop1 AP203Pop2 A X /A/X/AP209Pop1 Uniquely identified by /

On Metadata Spec defines the following optional HDF5 Attributes (borrowed from ) –iso_10303_26_description –iso_10303_26_timestamp –iso_10303_26_author –iso_10303_26_organization –iso_10303_26_originating_system –iso_10303_26_preprocessor_version Better ideas or other requirements are welcome Nothing preventing the use of other metadata too (e.g. Dublin Core)

Output from HDF5 debugger HDF5 "exph5_groups.h5" { GROUP "/" { GROUP "Pop1" { ATTRIBUTE "iso_10303_26_data" { DATATYPE H5T_STRING { STRSIZE 9; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; } DATASPACE SCALAR DATA { (0): "geometry" }

EXPRESS Entity Instances For a given population, all instances of same Entity type appear in one HDF5 Dataset named – + “-objects” The assumption is that HDF5 will handle efficient access, subsetting, etc. That HDF5 Dataset contains –HDF5 Named Data Type Pointer referring to an HDF5 Compound Data Type derived from the Entity type –HDF5 (Simple) Dataspace for the data defining the rank, dimensions, current and max number of members –The HDF5 Array of values itself Except for the EXPRESS attribute values that are aggregate values, which is explained elsewhere

Entity Instance Extents in HDF5 iso_10303_26_data = ‘Configuration controlled 3D designs of mechanical parts and assemblies’ AP203Pop1 / product_definition_formation-objects product-objects Based on product Instances

EXPRESS Aggregate Instances EXPRESS Arrays map in two ways –Large array maps to its own HDF5 Dataset containing the HDF5 Array –Small array maps to HDF5 Array in the Entity Extent HDF5 Dataset So the HDF5 Array is defined in the HDF5 Compound Datatype representing the EXPRESS Entity Type EXPRESS Set, List and Bag map the same way –When bounds are [n:?] the HDF5 Array dimensions cannot be set until the population is known For a particular array, the HDF5 datatype of the array elements must be the same – but that datatype can be an HDF5 Compound Datatype as well as an HDF5 primitve datatype like integer

Aggregate Instance HDF5 Datasets iso_10303_26_data = ‘Configuration controlled 3D designs of mechanical parts and assemblies’ AP203Pop1 / Shape_representation-objects Based on product Instances Aggr-items-2 ENTITY shape_representation; name : STRING; items : LIST [0:?] OF rep_item; END_ENTITY; Aggr-items-1

EXPRESS schema concepts in HDF5

EXPRESS Schema EXPRESS schema-related information is stored in an HDF5 Group of its own –HDF5 Attribute signifies these HDF5 Groups iso_10303_26_schema = –This HDF5 Group may exist anywhere in the HDF5 file Multiple HDF5 Groups containing data may appear based on the same schema Schema versioning is not addressed Link names identify things in HDF5 so use slash(/), not dot (.) as separator –product/name, not product.name

Schema Info HDF5 Group iso_10303_26_schema = ‘geometry’ SCHEMA geometry; ENTITY line;... ENTITY point;... END_SCHEMA; Geometry encoding1 linepoint

Simple EXPRESS Types For basic TYPE, an HDF5 Named Datatype with an appropriate basis Datatype is defined –TYPE x = ENUMERATION OF Enum is a pre-defined type in HDF5 which maps to (name,integer) pairs Extensible enumerations are handled in the mapping –TYPE x = REAL; Example basis of HDF5_NATIVE_DOUBLE –TYPE y = x; One HDF5 Named Datatype is the basis of the other HDF5 Named Datatype These can be generated in the Schema HDF5 Group directly from the EXPRESS

EXPRESS Select of simple type For the following, three HDF5 Named Datatype with the same underlying HDF5 Datatype are defined –TYPE x = REAL –TYPE y = REAL –TYPE z = SELECT(x,y)

EXPRESS Select of entity type Proposed approach –Treat SELECT of only entity type as if there was an EXPRESS “entity instance identifier simple type” and do not map the EXPRESS select type itself –Instead, for EXPRESS attributes that have SELECTs of entity types as their values, simply set the allowed HDF5 value to be the representation of entity instance values – Region Reference

EXPRESS Entity Declarations The HDF5 Dataset for the EXPRESS Entity Type contains the following –an HDF5 Named Data Type –an HDF5 Compound Type within that Named Data Type

Handling ANDOR Generate a synthetic HDF5 Datatype to represent the combination of EXPRESS Entity Types when ANDOR is their relationship –Only need one for actual data in HDF5 File, no need to generate every possible combination allowed by the EXPRESS –Example: Entity types “b” and “c” subtypes of “a” can result in “b__c” HDF5 Compound Datatype An HDF Attribute "isComplex" signifies HDF5 Group represents the combination of Entity types

EXPRESS Attribute Declarations Within the HDF5 Compound Type for an EXPRESS Entity Type each EXPRESS attribute (including inherited attributes) is represented as follows –An HDF5 Field within the Compound Type for the name of the EXPRESS attribute –An HDF5 Datatype within that HDF5 Field for the HDF5 datatype of the EXPRESS attribute Some of the detailed encoding happens here (e.g. number of bits in representation)

Schema Info HDF5 Group / iso_10303_26_schema = ‘geometry’ SCHEMA geometry; ENTITY line;... ENTITY point;... END_SCHEMA; Geometry encoding1 (startpt=point_ref, endpt=point_ref) (x=float,y=float,z=float) linepoint

Comments on Usage EXPRESS Entity instance extents are treated as HDF5 Datasets Big EXPRESS Arrays are HDF5 Datasets –Small ones are embedded in Entity Instance –The identifier for the aggregate is an HDF5 Dataset Reference The “EXPRESS entity instance identifier” is an HDF5 Region Reference

Conclusions “Everything” in EXPRESS now covered Two key open issues –Entity instance identifier and references to them –Select when complicated combinations are possible (e.g. an Integer, or a Real or an Entity Instance)