The GeoScience Mark-up Language, GeoSciML, is a language to allow the exchange of geological data. I had intended to announce the release of version 2 today. However, as a result of the Testbed 3 testing, the modelling team made additional changes last week in Uppsala. So what I’ll be presenting here is a look at the geological content of release candidate 3. Testing of this is planned to be completed by October, with the release of version 2.0 in December 2008. GeoSciML 2.0: Significant changes and additions to the CGI-IUGS geoscience data model Bruce Simons
Co-Authors Eric Boisvert - GSC Boyan Brodaric - GSC Dominique Janjou - BRGM Christian Bellier - BRGM Simon Cox - CSIRO Yuichiro Fusejima - GSJ Bruce R. Johnson - USGS John L. Laxton - BGS Oliver Raymond - GA Steve Richard - AzGS The development of the model is carried out under the IUGS Commission for the Management and Application of Geoscience Information – the CGI The co-authors listed here are the members of the version 2 Design Team. However, the work has also benefited from many others working on related areas.
Interoperability in the Geosciences “the ability of software and hardware on different machines from different vendors to share data” Efficiencies for government Efficiencies for industry Benefits for the wider geoscience community The aim is to establish a geoscience model that allows Interoperability. Interoperability is about being able to share data, without reformatting, and potentially without human involvement in the exchange. <Click> This results in Efficiencies for both government and industry, as well as benefits for the wider community.
Traditional paper map We are all used to, and comfortable with our traditional representation of data on Geological maps. It is a very efficient mechanism for transmitting information. Once you understand its structure and language
Traditional Paper Maps Advantages Presents lots of information Readily understood by experts (~0.2%) Targeted to specific end-users Disadvantages Stand-alone product Hard copy only Allows only limited analysis Doesn’t allow data exchange Single legend Requires further ‘explanation’ It presents lots of information that can be readily understood. Requires first year geology to understand a geological map, which limits it to about 0.2% of the population Specifically targeted to the end user requirements, <Click> It does have Disadvantages: a stand alone product that makes its end-use less flexible only in hard copy or a scanned equivalent It’s not easy to carry out any machine based analysis It doesn’t readily allow data exchange It only provides a Single Legend. The map features descriptions, such as the Geologic units, faults etc, applies only to that map. Same features on adjacent maps, or at different scales, may differ without the user being aware of these differences. It requires Further explanation. Even though the maps may be quite succinct at delivering complex information, there is usually an associated ‘Explanatory Notes’ required. I believe 339 pages for a 1:100 000 map is the biggest GSV has so far produced. And the authors assure me that there is not a word wasted in it.
Digital Maps During the 1980s, surveys started moving towards digital data capture and mapping and there have been significant advances in this area, to the point where hard copy maps have almost, but not quite, been replaced. <Click> This digital map, or GIS, approach captures most of the map information, and humans can, mostly, make sense of it.
Digital Maps Advantages Disadvantages Captures most map information Human readable Some data exchange capacity Allows queries and analysis Disadvantages Targeted end-user Single legend ‘Flat’ data structure Vendor specific format No relationships, cross- sections, face notes Advantages: allows data exchange, allows some querying and analysis functionality, particularly if in GIS format. <Click> Disadvantages: It has the same targeted end use and single legend capacity as the geological map format. It only handles simple, or relatively flat, data structures This data is usually in vendor specific formats The main limitation, and one of the reasons that hard copy maps are still with us, apart from their field-friendly format, is that it is extremely difficult to deliver all the other information, such as rock relationship diagrams, structured legends, cross-sections, and face notes, that maps traditionally contain. So are there alternatives?
Structured Digital Data Well believe it or not there are, and they look something like this. Although unfamiliar to most of us, there are standards developed, or being developed, that allow us to convert the data we see represented on a geological map into this format, known as XML, short for Extensible Mark-up Language. Although its considered human readable, clearly geologist should not be reading this. As such, although this is the language we use to deliver data, we won’t be seeing anymore of it during this talk. Instead, What we will be seeing is a graphical representation of it, using UML – the Unified Modelling Language.
Structured Digital Data Advantages Handles all the information Is well-structured Allows establishing data exchange standards Caters for all end-users Suitable for computer analysis Machine readable Disadvantages Difficult for humans to read Requires agreed standards Advantages: we can structure it, which makes it possible to establish standards for this structure. There is no need to tailor it to suit a particular end-user. But the real benefit is that it is machine readable, and client applications can be written to allow analysis of this data. This means that the same data source could potentially be used by GIS applications to make maps, statistical packages to carry out analysis and stereonet packages to plot foliations. And these applications will work on any data provided in this format. <Click> It has some Disadvantages: In its raw state it isn’t quite as friendly as our geological maps, But The real disadvantage is that it requires work in the geoscience community to define and agree on a standard structure for geological data. The agreed standard for geology data is the GeoScience Mark-up Language, referred to as GeoSciML.
GeoSciML Benefits Data to GeoSciML Schema mapping Canada WMS WFS Canada GSV GA BGS USGS GSC GeoSciML Format GSC mapping WMS WFS GeoSciML USA USGS mapping GeoSciML GML Client UK BGS mapping Australia So why are we so interested in establishing this standard? <Click> Each organisation can map there data to that standard without redeveloping the backend databases, Any software client that is OGC and GeoSciML compliant, can access that data, no matter where the source is. The user of that client doesn’t need to map their client to each organisations different data sources structure, which is what is currently required. This is what interoperability is all about. GA mapping GSV mapping Datasources OGC Services
Interoperability Requirements Systems (Data Services) Syntax (Data Language) Schematic (Data Structure) Semantic (Data Content) interoperability Current ‘World’ Organisation specific Few standards Access, Excel, Proprietary GIS Files, DVD, CD GeoSciML, O&M Controlled Vocabularies GML, XML WFS, WMS, WCS GeoSciML ‘World’ To achieve interoperability requires agreed standards on many levels. The CGI Interoperability Working Group has been involved at all levels on the right hand side, <Click> I’m only talking about the Schematic, or Data Structure aspects.
Schematic Agreement RockMaterial consolidationDegree compositionCategory geneticDescription lithology Schematic Agreement Victoria South Australia lithology So what is Schematic Agreement. If we look at Two delivery structures from adjacent Australian states <Click> We see the Lithology value is stored in two places in Victoria and only one place in SA. For an agreed schema we need to specify whether this is one attribute or two, where it occurs in the data structure, and what format the values it can take are. In GeoSciML we have specified that a Rock class, represented by the green box, has at least 4 attributes, one of which is the lithology value.
Schematic Agreement GeologicUnit Lithology Cardinality RockMaterial + bodyMorphology: [0..*] compositionCategory: [0..1] exposureColor: [0..*] outcropCharacter: [0..*] rank: [0..1] CompositionPart + lithology: ControlledConcept [1..*] material: RockMaterial [0..1] proportion: role: +composition 0..* Cardinality Lithology RockMaterial consolidationDegree: CGI_Term compositionCategory: CGI_Term [0..1] geneticDescription: CGI_Term [0..1] Lithology: ControlledConcept [1..*] So obviously Rocks have lithology <Click> But GeologicUnits may also have lithology So we can describe the lithology of a rock, or the lithology of a GeologicUnit But we could also describe the lithology of the GeologicUnit by describing the Rocks it contains So we can describe the lithology of the Rocks that make up the GeologicUnit, or we can simply describe the lithology of the unit, without describing the rocks Not only do we need to agree on what properties are appropriate, but also need to agree on the cardinality of the various properties. For instance a GeologicUnit may have 0 or many composition descriptions, and each of these may have 1 or many Lithology terms. This representation is UML (Unified Modelling Language), which is a graphical way of representing the XML we saw previously Advantage that we (now) have software to convert from this to XML.
What is GeoSciML? machine readable GeoScience Markup Language a Geological Data Model based on real world concepts that represents the complexity of geology tells users what geological information goes where developed by the international scientific community builds on established standards such as GML uses the ISO ‘feature’ model So recapping: GeoSciML is the machine readable GeoScience Mark-up Language From the geologists perspective the important points are that Its based on real world concepts, such as Geologic units, faults, contacts and the like. This is based on work carried out at a number of jurisdictions, but the most influential has been the NADM work from 1999 to 2004 It handles the complexity required to describe these concepts 3. it specifies what information goes where and what the associations are between the various pieces of information 4. It has been developed by the international geoscience community, and it makes use of the GML standard of the Open Geospatial Consortium and the ISO standards.
MappedFeature – geologic map elements The map sheet Map polygons and lines, described by GML geometries Geologic description (map legend) An important point is that GeoSciML obtains its framework from other non-geoscience domains. This framework shows up in: 1. the use of standard UML stereotypes; 2. Reference to standard external components E.g. Geometry GM_Object (from ISO 19107), metadata MD_Metadata (from ISO 19115), SamplingFeatures (from OGC O&M) <Click> Here the mapsheet description comes from Observation & Measurements <Click> The Map geometries from GML <Click> and the map legend or geological description from GeoSciML
Vocabularies Features Sampling Features Units Structures ‘Rocks’ AnyDefinition GeologicFeatureRelation GeologicEvent GM_Object SurveyProcedure SamplingFeatureRelation VocabRelation AnyDictionary GeologicFeature MappedFeature AnyFeature SamplingFeature Observation ControlledConcept GeologicVocabulary SpatiallyExtensiveSamplingFeature Specimen SamplingPoint DiscreteCoverageObservation AnyEntity StratigraphicLexicon Units Structures SamplingCurve CV_DiscreteCoverage WeatheringDescription BoreholeDetails CompositionPart GeologicUnit GeologicUnitPart GeologicStructure Borehole BoreholeCollar PhysicalDescription MetamorphicDescription ShearDisplacementStructure Contact Lineation NonDirectionalStructure Foliation DisplacementValue Fault DuctileShearStructure FaultSystem FoldSystem Fold Layering ‘Rocks’ MaterialRelation SeparationValue SlipComponents NetSlipValue Obviously any model covering a domain as complex as geology, is also going to be complex and difficult to describe. But perhaps we can get a quick overview of some of the scope of the model in the time remaining. <Click> Broadly the model covers Geologic Features. These may be <Click> GeologicUnits, <Click> GeologicStructures or <Click> from Observation & Measurements Sampling Features <Click> EarthMaterials such as Rocks and Minerals, may be used to describe these features and <Click> Vocabularies and Values used to contain the geological terms used. Clearly a large proportion of geological concepts are covered, the model has extensive breadth. I will now look at a some of these concepts in more detail to demonstrate the ‘depth’ of the model ConstituentPart EarthMaterial Values CGI_Value CGI_GeometricDescriptionValue ParticleGeometryDescription CompoundMaterial Mineral OrganicMaterial CGI_Range CGI_PrimitiveValue CGI_NumericRange CGI_PlanarOrientation CGI_LinearOrientation InorganicFluid CGI_TermRange CGI_TermValue CGI_NumericValue CGI_Vector CGI_Term CGI_Numeric
GeologicFeature Relation GeologicEvent eventAge eventEnvironment [0..*] eventProcess [1..*] geologicHistory 0..1 preferredAge 0..* GeologicFeature Relation GeologicRelation relationship sourceRole [0..1] targetRole [0..1] sourceLink 0..* target 1 targetLink source GM_Object boundary buffer(Distance) centroid closure convexHull coordinateDimension dimension distance envelope isCycle isSimple maximalComplex mbRegion representativePoint transform shape MD_Metadata metadata 0..1 GeologicUnit GeologicStructure GeologicFeature observationMethod [1..*] purpose MappedFeature observationMethod [1..*] positionalAccuracy specification 1 occurrence 0..* So what do we mean by GeologicFeatures? <Click> GeologicFeatures may be Units or Structures These Units or Structures may be mapped, and we use all the GML defined properties, which cover polygons, lines, points etc to describe the spatial properties of the GeologicFeatures Features may be related to other features, that is Units intrude other units, faults cut units etc Geologic Features may have a geologic history that is a series of events that created the feature These events are described by their age, environment and process properties link to the GML Metadata classes. Note the classes are colour coded: green was defined in GeoSciML 1, Yellow is new to GeopSciML 2, blue is from Observation and Measurements and fawn from GML SpatiallyExtensive SamplingFeature samplingFrame GeologicFeature
Geologic Unit CompositionPart GeologicUnitType ControlledConcept Allostratigraphic Alteration ArtificialGround Biostratigraphic Chronostratigraphic Deformation Excavation Geomorphologic GeophysicalUnit Lithodemic Lithogenetic Lithologic Lithostratigraphic LithotectonicUnit MagnetostratigraphicUnit MassMovement Pedoderm Pedostratigraphic PolarityChronostratigraphicUnit CompositionPart lithology material proportion role composition ControlledConcept identifier name classifier GeologicUnit geologicUnitType bodyMorphology GeologicUnitPart proportion role part contained Unit PhysicalDescription density magneticSusceptibility permeability porosity physicalProperty exposureColor outcropCharacter rank compositionCategory unitThickness weathering Character WeatheringDescription weatheringDegree weatheringProduct weatheringProcess environment BeddingDescription beddingPattern beddingStyle beddingThickness +bedding MetamorphicDescription metamorphicFacies metamorphicGrade peakPressureValue peakTemperatureValue protolithLithology metamorphic Character All types of Geologic Units are catered for These GeologicUnits may be described by various properties such as: <Click> Weathering character using the Weathering Description, <Click> Metamorphic character using the MetamorphicDescription <Click> Bedding can be described <Click> as can the Physical Properties <Click> We’ve already seen that GeologicUnits have composition descriptions. They can also be considered as parts of other GeologicUnits, such as members, formations and groups. The Name of the Unit could be a ControlledConcept, that is it is defined in some Stratigraphic Lexicon Geologic Unit
ParticleGeometry Description ConstituentPart proportion role material part target MaterialRelation relationship sourceRole targetRole source EarthMaterial color purpose particleGeometry ParticleGeometry Description size sorting particleType shape aspectRatio particleGeometry Organic Material InorganicFluid Mineral mineralName RockMaterial compositionCategory geneticCategory consolidationDegree lithology metamorphic Character MetamorphicDescription metamorphicFacies metamorphicGrade peakPressureValue peakTemperatureValue protolithLithology FabricDescription fabricType fabric PhysicalDescription density magneticSusceptibility permeability porosity physicalProperty EarthMaterials may be RockMaterials covering consolidated and Unconsolidated Material, such as sand and gravel. <Click> Or they may be Minerals, organic material or Inorganic fluid. These last two are empty ‘placeholders’, ie we haven’t filled out their attributes. These are areas where we would like to see other specialist propose the properties required to describe these classes. EarthMaterials, such as Minerals or RockMaterials, can be used to make up other RockMaterials, which allows describing the constituent parts of any Rock, such as its clasts, matrix, phenocrysts and the like. We can specify the relationships between these various constituent parts Like GeologicUnits there are a number of classes we can use to describe the various properties of the Rocks
NonDirectionalStructure Lineation definingElement intensity lineationType mineralElement orientation Contact contactCharacter contactType orientation NonDirectionalStructure structureType BoundaryRelationship constraints {source must be GeologicUnit} {target must be GeologicUnit} boundary Occurrence boundedUnitLink DeformationUnit definedUnit defining Structure GeologicStructure Displacement Event incremental ShearDisplacement Structure planeOrientation Fold profileType axialSurfaceOrientation hingeLineOrientation geneticModel amplitude hingeLineCurvature hingeShape interLimbAngle limbShape span symmetry higherOrder FoldPart foldSystem Member Foliation continuity definingElement foliationType intensity mineralElement orientation spacing Layering Rock consolidationDegree lithology layer Composition FoldSystem periodic wavelength FaultSystem faultSystem Member DisplacementValue hangingWallDirection movementSense movementType total Fault segment DuctileShear Structure segment NetSlip Value Separation Slip Components slipComponent Geologic Structures are way to complex to discuss here, but suffice it to say that it covers: <Click> brittle and ductile shears, <Click> folds, <Click> foliations, <Click> lineations, <Click> contacts and <Click> nondirectional structures, such as mudcracks and miarolitic cavities
SpatiallyExtensive SamplingFeature Specimen SamplingPoint Relation role relatedSamplingFeature 0..* source target SurveyProcedure surveyDetails 0..1 AnyFeature Intention sampled Feature 1..* SamplingFeature SpatiallyExtensive SamplingFeature Specimen SamplingPoint position Outcrop CV_DiscreteCoverage Observation DiscreteCoverage relatedObservation 0..* result SamplingCurve length [0..1] shape Borehole BoreholeCollar location collarLocation borehole 0..* The model also extends Observation and Measurements to include boreholes. BoreholeDetails dateOfDrilling driller drillingMethod inclinationType nominalDiameter operator startPoint indexData 0..1
Governance Tony Cragg, Subcommittee, 1991 IWG Establishing standards inevitably involves Committees, sub-committees, task groups, working groups and the like Establishing the Geoscience Mark-up Language, GeoSciML, is no exception. It was Created by the Interoperability Working Group, established in late 2003 Under the CGI which is A Commission of the IUGS. This is the governance structure for GeoSciML The GeoSciML standard also makes use of the standards established by ISO and OGC. So GeoSciML comes with a very credible pedigree and governance framework. Tony Cragg, Subcommittee, 1991
Interoperability Requirements Summary availability of appropriate technologies - OGC, ISO, W3C common data structure common data content commitment to these standards - GGIC, INSPIRE, 1G-Europe, NSF-GIN - CGI-IUGS For all this to work we need: The availability of appropriate technologies. For this we are dependent on the International standards bodies <Click> We need common data structure – that’s GeoSciML But we also need agreement on standard data content using Controlled Vocabularies and the like. This is the responsibility of the CGI Finally, and most importantly, we need a commitment to these standards. And that is where Regional and National standard setting projects and organisations must play a crucial role
GeoSciML Documentation http://www.geosciml.org Web Services Workshop 9:00 – 14:00 Sunday 10 August Room D1 GeoSciML Documentation http://www.geosciml.org Remind you that the use of GeoSciML in Web services will be demonstrated at tomorrow’s workshop