Data Type Registry Data set descriptions for automation Stephen M Richard, IEDA, EarthCube RDA Plenary 8, Denver, CO
Use cases Document the meaning of entities and attributes in data. Re-use of data type and attribute definitions Machine-assisted data integration: matching attribute content. Validation of data instances against a type definition. Tools that spin up a UI for a particular data type. Link software to data sources that it can use Support file introspection to assist with deep data registration
Use Case Scenario: Data integration For demonstration, we are interested in hydraulic conductivity Different field names, table layouts Workflow: Import to document store using source data structure Inspect the field names and values to propose mapping to known data types Curate to valid and refine mappings
Role of DTR in processing Pipeline
DTR model update: Some Input Sources RDA data Type registry prototype (WG output, March 2015) ISO19110 ISO11179 Clause 11 CSDGM Entity/Attribute OGC SWE Common Data Model NetCDF common data model Apache AVRO XML schema
The model.. Thing Property DataObject Attribute Conceptualization of the world Thing has Property 0..many meaning representation has DataObject Attribute Some properties on a thing might be essential, others might be optional. A property may have links back to the Things (ObjectClass in the UML) that can carry that property. A property may have a conceptual domain defined that scopes the intention of the values for that property. 0..many Representation of the world
Representation.. Attribute DataType DataObject Array Dictionary List has Attribute 0..many has DataType DataObject Array Dictionary List Primitive
Simple ‘Flat File’ DataObject = meanPorosity observation Well Name location depthTop_ft depthBottom_ft meanPorosity lithology formation source DCL & FA #1 SE1/4NW1/4NW1/4, sec. 13, T. 12 S., R. 17 E. 2,420 2,443 0.105 vuggy dolostone Jefferson City Dolomite. Gogel, 1981; reported by usgs2321 Table 2 Geis #1 SW1/4SW1/4SW1/4, sec. 32, T. 13 S., R. 2 W. 1,980 2,200 0.13 oolitic limestone Lansing and Kansas City Groups. 3,482 3,493 0.074 dolostone Roubidoux Formation. SE1/4NW1/4NW1/4, sec. 13, R. 17 E. 2,934 2,985 0.92 calcareous sandstone and granite Lamotte Sandstone and Precambrian rock. 2,616 2,804 0.181 cherty limestone and dolostone Warsaw, Keokuk, and Burlington Limestones. 2,944 3,046 0.125 porous dolostone Hunton Group. DataObject = meanPorosity observation Attributes: Well Name Location depthTop_ft depthBottom_ft meanPorosity lithology formation source Values all have datatype = primitive DataObject Attribute DataType Primitive
More complex DataObject <Description> <scope> <scopeCode>service</scopeCode> <name>Internal service</name> </scope> <dateInfo> <date>2011-11-11T00:00:00</date> <dateType>creation</dateType> </dateInfo> </Description> attribute attribute Data object attribute Data object attribute attribute Data object attribute
For Proposed Model Details see http://usgin.github.io/usginspecs/DataTypeModel- current.htm