An agile process for the creation of conceptual models from content descriptions Hans-Werner Sehring Centre for Sustainable Content Logistics TuTech Innovation GmbH / Hamburg University of Technology Joint work with: Sebastian Boßung Henner Carl Joachim W. Schmidt
30 September 2007An agile modelling process - Hans-Werner Sehring, Outline 1.Conceptual Content Management 2.Asset expressions and schemata 3.The Asset Schema Inference Process 4.Straight-forward schema inference 5.Cluster-based schema inference 6.Process evaluation 7.Summary and outlook
30 September 2007An agile modelling process - Hans-Werner Sehring, Conceptual Content Management Conceptual Content Management (CCM) –an approach to domain modelling –inspired by epistemology: entity description by classes and instances, called Assets –Assets are dual entity descriptions consisting of content visualising it and a conceptual model describing it –model-based system generation Features: –modelling is carried out by domain experts –domain models are open to changes –existing work is preserved, even if changes are applied –communication between domain experts with individual models is maintained
30 September 2007An agile modelling process - Hans-Werner Sehring, CCM dynamics CCM systems (CCMSs) are dynamically generated from domain models: –immediately realizing model changes –preserving existing Assets –maintaining communication Key contributions to this end: –modelling language –model compiler –architecture for evolvable systems model Historiography from Time import Timestamp from Topology import Place class Professor { content image concept characteristic n :String relationship publs :Work* } Intermediate model (parse tree) ………… a:AssetClassb:AssetClass m:Model superClass Political_Iconography (PI) ArtistsRegents m client1 client ( Regents ) m client client ( PI ) m med2 mediation ( Regents, Artists ) DB ( Regents ) m client2 client ( Artists ) DB ( Artists ) m med1 mediation ( PI, ( Regents, Artists )) m distrib1 distribution ( PI, Regents ) m distrib2 distribution ( PI, Artists ) DB ( PI )
30 September 2007An agile modelling process - Hans-Werner Sehring, Model-driven development All SW development starts with a conceptual model –especially model-driven development approaches call for models with a sufficient degree of formality –CCM is similar to model-driven development in the respect that software creation is highly automated –in CCM, software generation is even dynamic A CCM model is required as a starting point for CCMSs –usually, some modelling expert (analyst) is consulted –due to dynamics requirement, such a modelling expert cannot be employed in CCM –domain experts are not modelling experts; usually have problems with, e.g., sufficient formality –but: experts can “tell their story” by providing examples
30 September 2007An agile modelling process - Hans-Werner Sehring, Asset expressions and schemata In many domains research starts by regarding instances (samples), not concepts
30 September 2007An agile modelling process - Hans-Werner Sehring, Asset model from the example Manually defined classes for the example: model Historiography from Time import Timestamp from Topology import Place class Professor { content image concept characteristic name :String relationship publications :Work* } class Work { content scan concept characteristic title :String relationship concerns :Professor* relationship issued :Issuing relationship reviewers :Professor*} class Issuing { concept relationship issued :Place relationship issuedBy :Professor relationship issuedWhen :Timestamp } Models consisting of classes Classes with content handles and attributes (and constraints) characteristics relationships Models consisting of classes Classes with content handles and attributes (and constraints) characteristics relationships
30 September 2007An agile modelling process - Hans-Werner Sehring, Asset model from the example (cont’d) Example of personalisation: a domain expert introduces the distinction of documents: model MyHistoriography from Historiography import Work, Professor class Work { concept relationship reviewer unused } class Dissertation refines Work { concept relationship reviewer :Professor* } Import and redefinition of classes for schema evolution (user communities) personalisation (single users) … Import and redefinition of classes for schema evolution (user communities) personalisation (single users) …
30 September 2007An agile modelling process - Hans-Werner Sehring, Asset Schema Inference Process (ASIP) Bootstrapping: CCM itself requires an initial model as a starting point for the open dynamic modelling process Required: sytematic support for domain experts in finding suitable models Start with Asset Expressions: –content abstractions and applications: assigned names and bound values –semantic types (concepts): no inner structure Concepts and classes are not distinguished in CCM models, intensional and extensional definitions Free-form entity descriptions are used as samples; later they become instances of classes reviewer: Professor : Professor
30 September 2007An agile modelling process - Hans-Werner Sehring, Agile CCMS development Agility: –based on the possibility to generate CCMSs dynamically –domain experts review their models based on experiences with an operational CCMS –if changes to the model are required, another iteration of the process is started –entity descriptions created within the CCMS can be used as samples for the next iteration of the process Create Asset expressions Construct schema Generate CCMS
30 September 2007An agile modelling process - Hans-Werner Sehring, ASIP phases The ASIP has four phases Sample acquisition Schema inference Feedback questions Prototype generation System generation unhappy with schema: -modify samples (- modify schema) answer questions Phase 1 Phase 2 Phase 3 Phase 4
30 September 2007An agile modelling process - Hans-Werner Sehring, Two schema inference experiments Experiments with alternatives for phases 2 and 3: –(traditional) schema inference plus user feedback straight-forward approach starting from singletons –clustering, supervised by domain experts statistical approach, semi-supervised learning Phase 3 (generation of questions to gather feedback) is determined by the alternative chosen Result of phases 1-3 is a CCM model: –prototype generation and system generation (phase 4) are carried out by the CCM model compiler –the domain expert can modify the inferred schema (openness and dynamics)
30 September 2007An agile modelling process - Hans-Werner Sehring, Straight-forward schema inference Schema construction by traditional schema inference 1. derive naive classes directly from the set of samples 2. apply simplifications 3. if changes where applied to the schema, repeat step 2 Step 1: for each sample create an Asset class with –a content handle whose type is determined by the encoding format of the sample’s content –attributes for all abstractions over the content characteristics for certain known types relationships for other types no further constraints
30 September 2007An agile modelling process - Hans-Werner Sehring, Schema simplification Step 2: simplifications, repeatedly applied in the specified order –identical class: unify classes with attributes and content handles with identical names and types –inheritance: subtype relationship of classes whose sets of attributes are in a subset relationship –type match: if two classes have attributes and content handles of identical types, prompt expert for unification –inheritance orphan: ask domain expert about removal of classes with only few instances Note: –often classes considered equal if the attributes’ types match –here the name is considered, or else feedback is collected
30 September 2007An agile modelling process - Hans-Werner Sehring, Cluster-based schema inference Schema construction by clustering: –cluster samples, create classes from clusters –experiment based on k-means algorithm Clustering steps: –classification: assign classes to clusters based on distance measure d : d(s,c) = α d sem (s,c) + (1-α) d struct (s,c),α [0..1] –optimisation: recompute the cluster centres –inheritance hierarchy creation: like in the simple approach –feedback: visualise the clusters, allow to partition clusters => semi-supervised learning Less user interaction than in the traditional approach
30 September 2007An agile modelling process - Hans-Werner Sehring, Structural distance measure d struct is based on the length of the shortest edit script (similar to string matching) Costs like: edit operationcost magnitude add attributelow remove attributehigh change attribute namelow broaden attribute typemedium narrow attribute typevery low increase cardinality of attribute valuemedium decrease cardinality of attribute valuevery low
30 September 2007An agile modelling process - Hans-Werner Sehring, Semantic distance measure d sem is determined by the shortest paths in the class hierarchy 1/2 h(T1) if T 1 is direct supertype of T C d sem (T 1,T m ) + d sem (T m,T C ) if T 1 is direct supertype of T m d sem (s,c) = and T m is supertype of T C d sem (T S,T 1 ) + d sem (T S,T C ) if T S is the most specific common supertype of T 1 and T C
30 September 2007An agile modelling process - Hans-Werner Sehring, Process evaluation Schema quality: –generally difficult to judge –for domain modelling: not a schema that describes sample best, but model that best represents the application domain Criteria [Cherfi, Akoka, Comyn-Wattiau]: –specification: graphical legibility simplicity expressiveness syntactical correctness semantic correctness –usage: completeness, understandability –implementation: implementability, maintainability
30 September 2007An agile modelling process - Hans-Werner Sehring, Process evaluation (cont’d) Selected parameters: –simplicity: in general depends on the given sample set domain expert’s answers in feedback phase –syntactical correctness: granted by model generation –semantic correctness: can be negatively impacted by structurally coinciding classes with different meanings –understandability: generated class names can be an obstacle but: generated system lowers impact of schema –implementability: by generation –maintainability: through dynamics
30 September 2007An agile modelling process - Hans-Werner Sehring, Summary and outlook Summary: –Conceptual Content Management allows domain experts to provide and individually change domain models –domain experts are usually no modelling experts, and they prefer to start with samples describing observations –a process helps domain experts defining initial models to start the open dynamic CCM activity –as one novel approach a cluster-based schema inference process has been investigated Outlook: future work will include … –the inclusion of the cluster-based approach into the open modelling for extensional concept definitions –the employment of reasoning techniques (induction, abduction) to guide the schema construction process