AgGateway’s Progress on Data Type Registries Working to preserve meaning in production agriculture data exchange R. Andres Ferreyra (Ag Connections, LLC) Research Data Alliance 9th Plenary Barcelona, April 6, 2017
Interoperability Problems The agricultural industry’s field operations segment has developed a serious interoperability problem, as the ability of equipment to generate data has far outpaced users’ ability to derive useful information from it. There is a multitude of proprietary data formats in hardware and software solutions. No consensus on controlled vocabularies (or the need thereof) for crops, operations, products, etc. “Meaning drift” when exchanging O&M data.
AgGateway AgGateway is a consortium of ~240 companies dedicated to the implementation of data exchange standards in agriculture. It has chartered several projects to enable interoperability in field operations (following prior work on supply chain): SPADE, PAIL, ADAPT A major part of this work has been the development of a system of data type registries to support the unambiguous communication of meaning.
AgGateway’s Semantic Assets Representations “Universal” variables in field operations (e.g., crop yield in units of mass per area) ContextItems Geopolitical-context-dependent data Observation Codes Expressing aspects of an ISO 19156 Observation as orthogonal components. Translates into controlled vocabularies that can be combined to create new vocabularies.
The Geopolitical Context Problem Growers need to collect increasing amounts of field operations data. This usually includes lots of ctitically important, but frequently-changing geopolitical-context-dependent information (e.g., EPA numbers, Bundessortenamt, tax data, etc.) Capturing all of this data in the object model of farm management information system (FMIS) software is infeasible in the context of corporate IT realities (i.e., cannot be upgrading software often), unless it were somehow possible to decouple the infrequently- and frequently-changing aspects of the FMIS data model. In terms of requirements thus placed on a data model, an FMIS object model should simultaneously be: Simple/generic vs comprehensive/specific Static vs dynamic: Controlled vocabulary vs extensibility In terms of requirements thus placed on a data model, an FMIS object model should simultaneously be: - Generic, simple and compact enough to be easily understood and used, as well as accepted from an international perspective (which would suggest staying free of regionally-specific clutter), but still be able to support the capture & communication of necessary region-specific (i.e., geopolitical-context-dependent) data needed by growers and their partners as part of their business processes (simple/generic vs comprehensive/specific) - Able to express data with a controlled vocabulary (so everyone can understand what it means), but allowing that controlled vocabulary to be continually updated to match the nature of data requirements (static vs dynamic)
The ADAPT Solution: The ContextItem ADAPT reconciled the contradictions by defining an object class, the ContextItem, that can be attached to various other objects in ADAPT’s common object model. A ContextItem is a key/value structure where the “key” code references a ContextItemDefinition that defines what each ContextItem means. The “value” is composed of a string value along with data needed to interpret it (such as a unit of measure) or a nested list of other ContextItems (e.g. PLSS cadastral information.) The ContextItem definitions are sourced via an API. See it here: https://api.contextitem.org/swagger We’re putting in place an ISO 19135 – based governance process, and will allow anyone to request additions.
The ContextItem Object Code identifies what a given ContextItem contains: think of it as a machine-readable string that identifies what Value means: is it a PLSS Township number? An FSA Tract ID? An EPA Number? A PLSS Prime Meridian string? ValueUoM specifies, where appropriate, a unit of measure for Value. We draw from a controlled vocabulary of unit of measure codes (UN Rec 20). TimeScopes provides the ContextItem with a temporal context. NestedItems enables a hierarchical organization of nested ContextItems, suitable for multi-attribute data (e.g., US PLSS cadastral data)
The ContextItemDefinition Object Provides a rich definition of how a specific (as per Code) ContextItem’s value should be entered / displayed. ValueType specifies the data type of ContextItem.Value. Lexicalizations allow multi-language support. Properties encapsulate values along with (enumerated) ContextItems.
The ContextItemDefinition Object NestedIDefIds specifies a hierarchical ContextItem. Presentations specify, via a regular expression, how to enter & display the ContextItem.Value. ModelScopeIds specify what classes in the ADAPT & ISO object models a given ContextItem can be attached to. GeoPoliticalContextIds specify what geopolitical context (e.g., EU, Lithuania, Wisconsin) a given ContextItem is defined for.
More recent work: Observation Codes Several aspects of AgGateway’s field operations work involves observations & measurements. We found value in implementing ISO 19156. This work, centered on the PAIL, SPADE and ADAPT projects, emphasizes the explicit capture of the semantics of the various aspects of an observation. The work, performed by a group of industry and academic AgGateway participants spanning four continents, includes three major parts.
3 parts First, defining a componentized model of the properties of an observation, based on an extensible set of orthogonal vocabularies, which includes representing valid combinations of components. Second, deploying infrastructure, in the form of a RESTful API, to make the componentized variable definitions freely available to industry and the research community; this includes putting into place an ISO 19135-based process for stakeholders to request the addition of vocabularies or entries therein. Third, incorporating observations and measurements into AgGateway’s ADAPT common object model and format conversion plug-in architecture, thus enabling widespread interoperability.
ISO 19156 OM Model
What’s in an Observation? Attributes of the Observation Itself Parameters (e.g., the depth of a soil water measurement) Phenomenon time When did this happen? Result time When did we get the result? Valid time Is there a range of validity? Data quality ISO 19157 data quality metadata Things the Observation is connected to Feature of interest / sampling feature & sampling strategy Field / core, etc. Observed property e.g., air temperature Observation context Procedure Sensor, process used, etc. Result The value returned, its type, etc. Metadata Our intent is to represent observations as key-value pairs: The value corresponds to the ISO 19156 Result, and The Key represents as much of everything else as is practical.
Additional information Our encoding model Aggregation + Observable property + Sampling strategy + Additional information N >= 0 Time window + Method Target + Quantity / phenomenon + Method And N >= 0 window components Sample type Test type Ingredient Example 1: daily average greenhouse air temperature height 1.5m Example 2: soil hot-water extractable nitrogen mass-fraction We’re not finished yet, but there seems to be an emerging pattern of repeatable components
Comments This work opens up the possibility to leverage existing research-derived controlled vocabularies in industrial settings. We initially need to “keep it simple” to promote adoption. Anyone can request codes to be added. We want to provide straightforward interfaces linking these resources to ontologies.
DTR WG Influence DTR WG Recommendation Progress Every type in a data type registry must be identified with a resolvable persistent identifier Working on it! Types should reference related standards and recommendations in order to leverage existing efforts Yes Primitive types should be established and used, when possible, in the construction of more complex types A common API should be available across all type registries Type registries should be federated such that a single service can search across all known registries or some defined subset Need to learn more Type registries should include or enable referencing related services based on types The establishment of a data type registry for any community should be subject only to the needs and requirements of that community, i.e., there should be no higher level governance beyond the maintenance of whatever standards and processes are needed for effective federation across type registries
Thank you! For more information, contact: andres.ferreyra@agconnections.com