Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Integrative, Standards- Compliant Framework for TDWG Schemas and Services Phillip C. Dibner Ecosystem Associates TDWG Annual Meeting Saint Louis, Missouri.

Similar presentations


Presentation on theme: "An Integrative, Standards- Compliant Framework for TDWG Schemas and Services Phillip C. Dibner Ecosystem Associates TDWG Annual Meeting Saint Louis, Missouri."— Presentation transcript:

1 An Integrative, Standards- Compliant Framework for TDWG Schemas and Services Phillip C. Dibner Ecosystem Associates TDWG Annual Meeting Saint Louis, Missouri October 16, 2006

2 ISO 19103: Basic Definitions ISO 19103 defines a Conceptual Schema Language (CSL) for geographic information - as a profile of UML. A Conceptual Schema is a formal description of a conceptual model, i.e., a model that defines the concepts of a universe of discourse. An Application Schema is a Conceptual Schema for data required by one or more applications. Features are abstractions of real-world phenomena. Thus they figure prominently in applications that address real- world issues. Features have characteristics, or attributes, each of which has a name, a data type, and a value domain.

3 General Feature Model (GFM) - Strictly speaking, the GFM is defined by ISO 19109, which adds detail and context to the 19103 definition - Again: Features have properties with a name, a type, and a value domain. - This is consistent with the general definition of Objects; maps cleanly to data object representations in a variety of programming environments - Is consistent with other modeling languages than UML (GML, RDF) - Echoes the normalization imparted by ER DBMS models - Allows complex objects to be factored into simpler entities, all with the same underlying structure - Conversely, permits properties of undetermined type to be “stubbed out” with interim datatypes pending further analysis, while the rest of a model is completed - Relationships among elements are clear - Facilitates integration with other compliant components - Validated by substantial experience In sum, it supports consistent, normalized concept models, and facilitates integration and analysis.

4 E.g., Java object definition public class DarwinCoreData { public String GlobalUniqueIdentifier; public String DateLastModified; public String BasisOfRecord; public String InstitutionCode; public String CollectionCode; } Names Types Value domains are defined in context of the application: customary values known within the discipline. Attributes

5 ER models for DBMS Entity 2 Types Names Types Entity 1 Reference userInfo (e.g.)

6 Factoring, Design, Decomposition Independent Entities Add Later Exchange / Rearrange

7 Representation as XML Schema Similar benefits accrue if schemas follow the same pattern: root element is an object, its children are properties. Values of these properties may be literals or other objects, which in turn have properties... Provides for congruence between object models and their Schema representations, and for a natural mapping between representations. Really just OOA / OOD. (Note difference in terminology - ISO Feature attributes == Schema object properties.)

8 TDWG Schemas as Classes of Objects Darwin Core - description of a collected or observed real-world specimen. Therefore Darwin Core can be considered to specify a class of Features, and DwC instance documents describe actual Feature instances. ABCD - also describes entities that exist somewhere in the world. ABCD instance documents describe concrete Feature instances. TCS - might not describe a class of Features, but does in concept describe a class of objects. They do not in general follow an Object / property paradigm (DwC 1.4 did - it was flat. The new version is still under discussion and development.)

9 Is this a disaster? No All of these schemas provide concepts, terminology, and data structures - a vocabulary - that embodies the fundamentals of the domain. Moreover: 1. It’s possible to insert properties between a container object and a nested object: define a nestedObjectType and insert a property of that type between the parent and child. 2. It is likely possible to transform the data automatically, in real time if need be, if you want to serve them e.g. as GML Features (an encoding of ISO Feature), using XSLT transform or other technology. 3. The design has arguably been done, so the benefits to analysis, factoring, etc. might not be an issue.

10 In that case why is there an issue? There is some burden to supporting transforms, and we still need to define compliant schemas anyway, if we want to serve or otherwise use them as ISO Features. Likely to require custom software that either knows the structures explicitly or knows details of how to parse and interpret them, instead of standard tools that instantiate objects directly from instance documents. Maybe further analysis and refactoring will be needed after all. What to do? Keep the object/property model in mind for future work. Current versions can be viewed as “flattened views” of a more general conceptual model.

11 The Feature of Interest Values for attributes of a real-world object or phenomenon that is the subject of study (the Feature of Interest) may be: 1. Asserted: the attribute is simply assigned a value such as a sample number, experimenter, institution, guid, etc. 2. Observed / Measured: the value of the attribute is an estimate derived from some procedure. There is a well-defined conceptual model for such values, built upon a strong theoretical foundation: the O&M model.

12 The Observation Feature Type (O&M) From Cox, 2005. OGC document 05-087r3_Observations_and_Measurements FoI Observation Phenomenon

13 If our concepts are modeled as objects, they can be incorporated into this observation model, either naively or through more ambitious analysis. If the property we wish to “measure”is taxon, then: - the Feature of Interest may be a (collection of) specimen(s), effectively a DataSet (as defined in TCS) - the model for the Phenomenon we are “observing” is surely addressed by the vocabulary of TCS - (perhaps TaxonConcept or TaxonName) - the codespace for the values of the observed properties - the results - might be the set of scientific names of some designation, along with a reference to the author and publication - an AccordingTo (per TCS) If the observed property is collection or observation location, then: - the FoI is the specimen (or the field occurrence of the specimen) - the Phenomenon is geolocated geometry - the result is an instance of such a geometry

14 Remaining Issue We have more than one model for the same kind of information. Will it be possible to combine data from different services that respectively provide, e.g., Darwin Core and ABCD? Can we develop a single conceptual model with which these and other TDWG data models - and external models - are consistent? This is a generic problem.

15 A General Approach to Domain Modeling (After R. Atkinson and S. Cox, at the TDWG GIG Workshop in Edinburgh, June, 2006) 1. Examine the domain and break up into subdomains If using UML, this is accomplished by grouping related objects into various UML packages. The packages can be distributed for others to use. 2. Decide what doesn’t go in the domain of interest and belongs in someone else’s. In UML, put in a placeholder or stub package, to be replaced later. 3. Identify the common elements that everyone agrees on, and that all implementations will include. 4. These form the basis of a conceptual model.

16 Domain Modeling (Atkinson and Cox) 5. Proceed to identifying points in question or of disagreement. Clarify implications, explore consequences for the model. The notion is that here at least we can keep the model coherent. 6. Develop or bring into the discourse representational views that are of importance to near-term or legacy implementations. These represent the varied and sometimes incompatible viewpoints that different implementors have of the domain. Exercising these helps to clarify the conceptual model. Methodology and tools for mapping representational views to the conceptual model and to each other are still very much under development.

17 GIG Conceptual Modeling Exercise Taxonomic Data

18 Representational View Exercise Darwin Core

19 Some Conclusions Experimentation with domain modeling approach revealed some unanticipated aspects of our work. (In particular, the discovery of a new class - OrganismOccurrence - as the Feature of Interest whether for a field observation or a collection.) It’s a valuable approach and we should explore it further. It is clear that TDWG is addressing many of the same, generic issues as other domain organizations. Problems have been solved by ISO TC 211 - whence come the 191xx documents - so TDWG doesn’t have to. We should use them. This again should encourage us to think of our XML schema models as objects, and use the Object-property pattern. The real point of this address is simply that we should adopt the lessons of Object Oriented Design and Analysis - and continue to make use of the extensive body of work that’s been done by collaborations of experts in domains outside our own. Cost: a bit of pain, but well worth it.

20 What About Services? Services are the mechanism by which standards-compliant data are distributed across the internet. E.g., several services defined by the OGC for distributing ISO 19103-compliant feature data have been defined and are being increasingly broadly implemented. Current TDWG efforts are incorporating some of this work.

21 Acknowledgements James A. Brass, Biospheric Sciences Branch Chief


Download ppt "An Integrative, Standards- Compliant Framework for TDWG Schemas and Services Phillip C. Dibner Ecosystem Associates TDWG Annual Meeting Saint Louis, Missouri."

Similar presentations


Ads by Google