CRMsci: the Scientific Observation Model Martin Doerr, Chryssoula Bekiari, Athina Kritsotaki, Gerald Hiebel, Maria Theodoridou Center for Cultural Informatics, Institute of Computer Science Foundation for Research and Technology - Hellas CIDOC 2014 Dresden, September 9th, 2014
Current situation EU infrastructure projects aim to publish linked open data about scientific observations in geology, biology, archaelogical excavations, digital productions and medicine Existing standards for scientific observation INSPIRE –earth science oriented promoted by EU OBOE – life science oriented, support semantic annotation SEEK – ecology oriented - framework Darwin Core – a general use metadata scheme for biodiversity Focus on : semantic annotation process of data sets
Epistemological Considerations Theories are formalized sets of concepts that organize observations and predict and explain phenomena and demand a solid empirical base of evidence Raw data provided by the data sets per se are of little use Scientific observation forms the basis for understanding the phenomena being studied and it is a process by which we advance our understanding of the world. It common to all sciences the workflow of forming of a hypothesis to perform and explain observations that are made, the gathering of data, and the drawing of conclusions that confirm or deny the original hypothesis. The difference between the types of sciences is in what is considered data, and how data is gathered and processed The cultural discourse includes information from all sorts of sciences and product of sciences, i.e. digital productions, biological samples, specimen of physical objects (materials, fluids etc.). Scientific data and metadata can be considered as historical records.
Common Workflow Form of a hypothesis to perform an observation (select parameters, properties, signals and the way of converting these to data) Perform the observations. (They are only concerned with objects or events that are observable, either directly or indirectly ) Explain the observations made and the gathering of data Draw conclusions based upon this data, (make a scientific hypothesis - tentative explanations about the observations made) Deduce the implications (test them through further observation, compare the results) Confirm, deny, re-evaluate the original hypothesis Formulate valid theories (allow others to repeat the observations)
Limitations Problems with the existing standards: They model observation isolated from other actions that are preceding or following an observation event, They leave out information that would allow for later assessing, the quality and precision of the results or for re-evaluating existing measured data due to new evidence which would not require redoing the measurement itself, if suitable raw data were provided. Even though they are using the above standards to publish data in repositories, they typically lack the required information to facilitate effective long-term preservation and interpretation of data.
The CRMsci – overview(1) has been developed bottom up from specific metadata examples such as water sampling in aquifer systems, earthquake shock recordings, landslides, excavation processes, species occurrence and detection of new species, tissue sampling in cancer research, 3D digitization, takes into account relevant standards, such as INSPIRE, OBOE, Darwin Core, national archaeological standards for excavation, Digital Provenance models and others. describes, together with the CIDOC CRM, a discipline neutral level of genericity, which can be used as a general ontology of human activity, things and events happening in spacetime uses the same encoding-neutral formalism of knowledge representation as the CIDOC CRM, and can be implemented in RDFS, OWL, on RDBMS and in other forms of encoding reuses, wherever appropriate, parts of CIDOC CRM, we consider as part of this model all constructs used from ISO21127, together with their definitions following the version 5.1.2 maintained by CIDOC.
The CRMsci – overview (2) Metadata about: The human observer The object of observation (a “thing”, “something”, a process or a state?), The observation hypothesis (choice of parameters), The identity of the object, if any, The environment, time and location The condition of the thing, The instrumentation and method used The identity, authenticity and transmission of the produced records The inference making
Events and Activities E5 Event E7 Activity E13 Attribute Assignment S18 Alteration E63 Beginning of Existence S5 Inference Making S4 Observation S17 Physical Genesis S8 Categorical Hypothesis Building S6 Data Evaluation E11 Modification S7 Simulation-Prediction S1 Matter Removal E12 Production E16/S21 Measurement E80 Part Removal S2 Sample Taking S40 Encounter Event S3 Measurement by Sampling 8
S10 Material Substantial Observable Entity E1 CRM Entity …comprises items(E77) or phenomena (E2) that can be observed such as physical things, their behavior, states and interactions or events, either directly by human sensory impression, or enhanced with tools and measurement devices,. Inspired by OBOE S15 Observable Entity E2 Temporal Entity E77 Persistent Item E70 Thing S16 State E5 Event S10 Material Substantial E3 Condition State E18 Physical Thing E53 Place S14 Fluid Body S11 Amount of Matter E55 Type S20 / E26 Physical Feature S12 Amount of Fluid S13 Sample S9 Property Type E25 Man-Made Feature E27 Site S22 Segment of Matter
S10 Material Substantial Matter Removing and Sampling S19 Observable Entity E7 Activity E77 Persistent Item E2 Temporal Entity E55 Type S1 Matter Removal O20 sampled from type of part E70 Thing O1 diminished E3 Condition State E57 Material P45 consists of P46 is composed of P44 has condition S2 Sample Taking O3 sampled from S10 Material Substantial O2 removed O4 sampled at E53 Place O5 removed S11 Amount of Matter S14 Fluid Body E18 Physical Thing O7 contains or confines P156 occupies S13 Sample O15 occupied
E13 Attribute Assignment S10 Material Substantial Monitoring observation activities E7 Activity P2 has type E13 Attribute Assignment E55 Type S5 Inference Making S4 Observation O10 observed O11 observedProperty S9 Property Type S15 Observable Entity O16 described S6 Data Evaluation P39 measured S19 Encounter Event E5 Event E16 Measurement O14 assigned dimension E70 Thing O17 has dimension P40 observed dimension E54 Dimension S10 Material Substantial O32 has found object E18 Physical Thing
S19 Encounter Event E18 Physical Thing Sphaero-levantina-003 O32 has found object Inspired by Darwin Core E21 Person Sarah Faulwetter E53 Place Israel O7 contains or confines (is contained or confined) P14 carried out by S19 Encounter Event urn:catalog:IOL:POLY:Sphaerosyllis-levantina-ALA-IL-7-Oct.2009 E53 Place Haifa Bay Ecosystem Station 1 O21 has found at(witnessed) E55 Type P2 has type P4 has timespan Ecosystem Type sandy - muddy sediments E52 Timespan 7 October 2009 P125 used object of type P127 has broader term Equipment Type WA265/SS214 Equipment Type Van Veen Grab
S5 Inference Making E1 CRM Entity comprises the action of making propositions and statements about particular states of affairs in reality or in possible realities or categorical descriptions of reality by using inferences from other statements based on hypotheses and any form of formal or informal logic. P16 used specific object (was used for) E70 Thing P17 was motivated by (motivated) P15 was influenced by (influenced ) E7 Activity P33 used specific technique (was used by) E29 Design or Procedure E13 Attribute Assignment 010 Assigned dimension (dimension was assigned by) E54 Dimension S5 Inference Making S6 Data Evaluation 011 described ( was described by) S19 Observable Entity S8 Categorical Hypothesis Building concluding propositions on a respective reality from observational data by making evaluations based on mathematical inference rules and calculations using established hypotheses assumptions developed by “induction” from finite numbers of observation of particular thing. Based on inference rules and theory S7 Simulation-Prediction executing algorithms or software for simulating the reality or not by using mathematical models 13
Applications Informed by the IAM model (argumentation) EU FP7 - PSP InGeoClouds European Space Agency: satellite data EU FP7-INFRASTRUCTURES-2012-1 ARIADNE Supermodel for CRMarchaeo EU - FP7 - CP & CSA iMarine Informs and complements MarineTLO Extended MarineTLO used in LifeWatch Greece, being promoted to LifeWatch
Conclusions Our aim is : to open the discussions in CIDOC about subjects concerning the conceptual modelling about products of human activities. to suggest to CIDOC to approve that modelling scientific activities is a valid scope for CIDOC and could be a working item for the CRM-SIG WG Needed: Still to be done: Specializations into analytical methods and reference data sets Links: http://www.ics.forth.gr/isl/CRMext/CRMsci.rdfs
Thank you !!!