Presentation is loading. Please wait.

Presentation is loading. Please wait.

EXPRESS/HDF5 Mapping Specification Version 0.5 Walkthrough David Price October 2006.

Similar presentations


Presentation on theme: "EXPRESS/HDF5 Mapping Specification Version 0.5 Walkthrough David Price October 2006."— Presentation transcript:

1 EXPRESS/HDF5 Mapping Specification Version 0.5 Walkthrough David Price October 2006

2 Agenda On EXPRESS-HDF5 spec EXPRESS-driven data in HDF5 EXPRESS schema concepts in HDF5 Small demo

3 On EXPRESS-HDF5 spec

4 Mapping Version 0.5 Caveats V0.5 is first “complete” mapping –Substantial simplification between V0.4 and V0.5 –Still likely there are still errors/omissions Design changes are still possible Two approaches are suggested in a few cases, expect to chose one after prototyping Version 0.6 will be the Committee Draft ballot –New Work Item/Committee Draft ballot to start as soon as documents are ready per Change Management Review Oct 24, 2006.

5 Not Covered in V0.5 May be added in later editions/versions –Inverse attributes –Derived attributes –Cross-file links Expected to remain out of scope –Constraints Rules, procedure and functions Domain rules Unique rules

6 Key Requirements Focus is on efficient data encoding and access … however … Use cases vary widely –Archiving –Random data access –Large-volume data exchange

7 Standardization Approach Not proposing standard EXPRESS-driven API now –API may come later and may be AP-specific Flexibility more important than uniformity –multiple schemas, multiple populations, multiple encodings, sharing data between populations –include anything users need in file (e.g. jpeg) Let HDF5 optimizations handle efficiency requirements as much as possible –Not “dumbing down” HDF5 to make pre- and post- processor development simple

8 Keep the following in mind We are encoding simple and array data values into a file We are not translating EXPRESS into another schema language –It’s not possible to pre-process an EXPRESS schema into an “empty HDF5 file” –We need to take data in software application memory and make it persistent on disk in an efficient manner

9 Notes about HDF5 HDF5 Files are self-descriptive –HDF5 Datatypes are stored along with the data HDF5 datatype in file, need not match datatype in application memory –Mapping between data in the HDF5 file and in application memory happens at runtime

10 Usage Architecture HDF5 Libraries Application Computer Memory HDF5 File on Disk Non-HDF5 File on Disk Guides encoding of application data into HDF5 File EXPRESS to HDF5 Application User

11 HDF5 concepts from 50000 feet

12 HDF5 Key Concepts File –a contiguous string of bytes in a computer store (memory, disk, etc.). The bytes represent zero or more objects of the model. Group –a collection of objects (including groups) Dataset –a multidimensional array of Data Elements, with Attributes and other metadata Datatype –a description of a specific class of data element, including its storage layout as a pattern of bits Dataspace –describes the logical spatial layout of the raw data associated with a Dataset or Attribute Attribute –a named data value associated with a group, dataset, or named datatype

13 HDF5 Key Concepts (2) Compound Datatype –a datatype that combines one or more datatypes, called members, into a more complex datatype Array –datatype for describing elements that are homogeneous multi- dimensional arrays of the same base datatype (base may be any HDF5 datatype) Link –represents an edge in a file graph structure and has a (name, value) pair associated with it –Group membership is implemented via the link object –Names of HDF5 objects are based on links to the object –Soft links are secondary paths providing additional identification of an HDF5 object

14 Informal Diagram Notation HDF5 Group HDF5 Datatype HDF5 Dataset HDF5 Attribute HDF5 Dataspace HDF5 Link Link Group Datatype Dataset Attribute Dataspace / Root Group Soft Link

15 Groups of : Groups and Datasets / /De/R5 Uniquely identified by De R5 A3 A2 A1 R1 R2 Kl SL Has two identifiers /De/R1 /Ab/SL Ab

16 Datatypes HDF5 supports numerous simple datatypes –Integer, Float, String, Bitfield, Opaque, Reference –Warning : Unicode support still in-process Of multiple types/architectures –Standard, Native, IEEE, etc. HDF5 supports complex datatypes –Array, Enum, Variable-length HDF5 support compound datatype –Structured set of any datatype Used defined datatypes are also supported

17 HDF5 Array HDF5 Array Datatype has three key characteristics –Rank = number of dimensions –Dimensions = index/bounds of the array –Datatype of the elements Example: TYPE x = ARRAY[1:3] OF ARRAY[1:5] OF INTEGER –Has Rank = 2 –Has Dimension [3, 5] –H5_NATIVE_INT might be datatype of elements

18 EXPRESS-driven data in HDF5

19 EXPRESS instance terms Population –A set of entity instances based on an EXPRESS schema Entity instance –An identifiable data instance based on one or more EXPRESS Entity Type(s) Simple value –An instance of a simple data type (e.g. integer) that is part of an Entity instance or a member of an aggregate value Aggregate value –An n-dimensional Array, List, Bag or Set that is part of an Entity instance (i.e. aggregate values have members) Entity instance reference –An attribute value or aggregate member that refers to a single Entity instance (typically via Entity instance identifier)

20 EXPRESS Populations An EXPRESS population is identified by an HDF5 Group –HDF5 Attribute signifies these HDF5 Groups iso_10303_26_data = Multiple populations based on different schemas can appear in the same HDF5 File Multiple populations based on the same schema but with different encodings can appear in the same HDF5 File HDF5 Links connect Datasets to Groups

21 Population as HDF5 Group iso_10303_26_data = ‘Composite and metallic structural analysis and related design’ iso_10303_26_data = ‘Configuration controlled 3D designs of mechanical parts and assemblies’ iso_10303_26_data = ‘Configuration controlled 3D designs of mechanical parts and assemblies’ AP203Pop1 AP209Pop1 AP203Pop2 A X /A/X/AP209Pop1 Uniquely identified by /

22 On Metadata Spec defines the following optional HDF5 Attributes (borrowed from 10303-21) –iso_10303_26_description –iso_10303_26_timestamp –iso_10303_26_author –iso_10303_26_organization –iso_10303_26_originating_system –iso_10303_26_preprocessor_version Better ideas or other requirements are welcome Nothing preventing the use of other metadata too (e.g. Dublin Core)

23 Output from HDF5 debugger HDF5 "exph5_groups.h5" { GROUP "/" { GROUP "Pop1" { ATTRIBUTE "iso_10303_26_data" { DATATYPE H5T_STRING { STRSIZE 9; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; } DATASPACE SCALAR DATA { (0): "geometry" }

24 EXPRESS Entity Instances For a given population, all instances of same Entity type appear in one HDF5 Dataset named – + “-objects” The assumption is that HDF5 will handle efficient access, subsetting, etc. That HDF5 Dataset contains –HDF5 Named Data Type Pointer referring to an HDF5 Compound Data Type derived from the Entity type –HDF5 (Simple) Dataspace for the data defining the rank, dimensions, current and max number of members –The HDF5 Array of values itself Except for the EXPRESS attribute values that are aggregate values, which is explained elsewhere

25 Entity Instance Extents in HDF5 iso_10303_26_data = ‘Configuration controlled 3D designs of mechanical parts and assemblies’ AP203Pop1 / product_definition_formation-objects product-objects Based on product Instances

26 EXPRESS Aggregate Instances EXPRESS Arrays map in two ways –Large array maps to its own HDF5 Dataset containing the HDF5 Array –Small array maps to HDF5 Array in the Entity Extent HDF5 Dataset So the HDF5 Array is defined in the HDF5 Compound Datatype representing the EXPRESS Entity Type EXPRESS Set, List and Bag map the same way –When bounds are [n:?] the HDF5 Array dimensions cannot be set until the population is known For a particular array, the HDF5 datatype of the array elements must be the same – but that datatype can be an HDF5 Compound Datatype as well as an HDF5 primitve datatype like integer

27 Aggregate Instance HDF5 Datasets iso_10303_26_data = ‘Configuration controlled 3D designs of mechanical parts and assemblies’ AP203Pop1 / Shape_representation-objects Based on product Instances Aggr-items-2 ENTITY shape_representation; name : STRING; items : LIST [0:?] OF rep_item; END_ENTITY; Aggr-items-1

28 EXPRESS schema concepts in HDF5

29 EXPRESS Schema EXPRESS schema-related information is stored in an HDF5 Group of its own –HDF5 Attribute signifies these HDF5 Groups iso_10303_26_schema = –This HDF5 Group may exist anywhere in the HDF5 file Multiple HDF5 Groups containing data may appear based on the same schema Schema versioning is not addressed Link names identify things in HDF5 so use slash(/), not dot (.) as separator –product/name, not product.name

30 Schema Info HDF5 Group iso_10303_26_schema = ‘geometry’ SCHEMA geometry; ENTITY line;... ENTITY point;... END_SCHEMA; Geometry encoding1 linepoint

31 Simple EXPRESS Types For basic TYPE, an HDF5 Named Datatype with an appropriate basis Datatype is defined –TYPE x = ENUMERATION OF Enum is a pre-defined type in HDF5 which maps to (name,integer) pairs Extensible enumerations are handled in the mapping –TYPE x = REAL; Example basis of HDF5_NATIVE_DOUBLE –TYPE y = x; One HDF5 Named Datatype is the basis of the other HDF5 Named Datatype These can be generated in the Schema HDF5 Group directly from the EXPRESS

32 EXPRESS Select of simple type For the following, three HDF5 Named Datatype with the same underlying HDF5 Datatype are defined –TYPE x = REAL –TYPE y = REAL –TYPE z = SELECT(x,y)

33 EXPRESS Select of entity type Proposed approach –Treat SELECT of only entity type as if there was an EXPRESS “entity instance identifier simple type” and do not map the EXPRESS select type itself –Instead, for EXPRESS attributes that have SELECTs of entity types as their values, simply set the allowed HDF5 value to be the representation of entity instance values – Region Reference

34 EXPRESS Entity Declarations The HDF5 Dataset for the EXPRESS Entity Type contains the following –an HDF5 Named Data Type –an HDF5 Compound Type within that Named Data Type

35 Handling ANDOR Generate a synthetic HDF5 Datatype to represent the combination of EXPRESS Entity Types when ANDOR is their relationship –Only need one for actual data in HDF5 File, no need to generate every possible combination allowed by the EXPRESS –Example: Entity types “b” and “c” subtypes of “a” can result in “b__c” HDF5 Compound Datatype An HDF Attribute "isComplex" signifies HDF5 Group represents the combination of Entity types

36 EXPRESS Attribute Declarations Within the HDF5 Compound Type for an EXPRESS Entity Type each EXPRESS attribute (including inherited attributes) is represented as follows –An HDF5 Field within the Compound Type for the name of the EXPRESS attribute –An HDF5 Datatype within that HDF5 Field for the HDF5 datatype of the EXPRESS attribute Some of the detailed encoding happens here (e.g. number of bits in representation)

37 Schema Info HDF5 Group / iso_10303_26_schema = ‘geometry’ SCHEMA geometry; ENTITY line;... ENTITY point;... END_SCHEMA; Geometry encoding1 (startpt=point_ref, endpt=point_ref) (x=float,y=float,z=float) linepoint

38 Comments on Usage EXPRESS Entity instance extents are treated as HDF5 Datasets Big EXPRESS Arrays are HDF5 Datasets –Small ones are embedded in Entity Instance –The identifier for the aggregate is an HDF5 Dataset Reference The “EXPRESS entity instance identifier” is an HDF5 Region Reference

39 Conclusions “Everything” in EXPRESS now covered Two key open issues –Entity instance identifier and references to them –Select when complicated combinations are possible (e.g. an Integer, or a Real or an Entity Instance)


Download ppt "EXPRESS/HDF5 Mapping Specification Version 0.5 Walkthrough David Price October 2006."

Similar presentations


Ads by Google