Presentation is loading. Please wait.

Presentation is loading. Please wait.

Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006.

Similar presentations


Presentation on theme: "Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006."— Presentation transcript:

1 Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006

2 Chris Wallace, SMRG Seminar, Feb 2006 2 Exploring the design space “design as a conversation with the materials in the situation” (Schon) Native XML database (NXD) –Storing, querying and updating XML documents without mapping into relations –Schema-free –Trees are to NXD what tables are to RDBMS –Tables are trees Information Systems –Focus on semi-structured data (mixture of simple data items, text and complex nested structures) –Searching, derived data, visualisation –Process support –Large problem space variously supported by spreadsheets, word documents, ad-hoc databases, increasingly web-integrated data.

3 Chris Wallace, SMRG Seminar, Feb 2006 3 eXist Native XML Database Open source Java European team of developers led by Wolfgang Meier Documents (files) are organised in collections (folders) in a file store –XML Documents stored in an efficient, B+ tree structure with indexes –Non-XML resources (XQuery, CSS, JPEG..), etc can be stored as binary Deployable in different ways –Embedded in a Java application –Part of a Cocoon pipeline –As web application in Apache/Tomcat –With embedded Jetty HTTPserver (as on stocks) Multiple Interfaces –REST – to Java servlet –SOAP –XML:RPC

4 Chris Wallace, SMRG Seminar, Feb 2006 4 NXD case studies FOLD –modules, programmes, scheme operations, staff, organisational structures, events Family photos and history –Integration of meta-data on family photos with family history (births, deaths and marriages)meta-databirths, deaths and marriages ISD3 Assignment –a web-based calculator –e.g. a currency convertercurrency converter

5 Chris Wallace, SMRG Seminar, Feb 2006 5 Research Work Development of the FOLD (Faculty OnLine Data) - a pilot project for UWE Teaching students and staff in XML languages (XML Schema, XSLT, XQuery) and NDX database design Links with other eXist projects SPA2006 Workshop on NDX XML Prague (eXist)

6 Chris Wallace, SMRG Seminar, Feb 2006 6 Research Areas Design practice for NDX –‘Pattern language’ to help map from conceptual model to multiple XML schemes –Identifier design –Structuring documents by responsibility and versions NDX in organisational use –Social effects of distributed responsibility –Visualisation of complex relationships –Handling integrity problems – accept inconsistency as a way of life –Management of veracity

7 Chris Wallace, SMRG Seminar, Feb 2006 7 The FOLD Faculty OnLine Data Technologies –eXist –(Java) – not yet –XQuery –XSLT –CSS –PHP – to be eliminated

8 Chris Wallace, SMRG Seminar, Feb 2006 8 The FOLD (2) Scope –Module and Programme specifications –Modular Schema operations (runs) –Staff –Organisational structure –Events Functionality –Highly linked –(Integrating UWE sources) –(Personalized Interface)

9 Chris Wallace, SMRG Seminar, Feb 2006 9 The FOLD

10 Chris Wallace, SMRG Seminar, Feb 2006 10 Fold Design Issues Conceptual Modelling Conceptual – Logical – Physical mapping Identifiers Relationships and links Versioning Editing Views Responsibilities Processes

11 Chris Wallace, SMRG Seminar, Feb 2006 11 Mapping from Conceptual model to the Logical and physical layers What criteria to use in breaking up the whole model into –Logical Entity – a logical compound structure –Physical Documents – a physical aggregation of entity instances Collections – a physical aggregation of documents Examples –Module Specification [moduleCode] Module Spec is an Entity Each Module Spec is a Document –Module Run [moduleCode/year/runNo] Module Run is an Entity Set of Module Runs for a Field is a Document Issues –Where to develop Schemas? –No logical data in the physical – purely for convenience

12 Chris Wallace, SMRG Seminar, Feb 2006 12 Conceptual Modelling Conventional normalised data model Generality issue e.g. Module run –Roles as Attributes Stewart Green –Roles as Entities Module Leader Stewart Green –Entities enable meta data, but defeat use of tables for data entry Need views Attributes v elements –a Conceptual/logical mapping issue – … – UFIEKG-20-3..

13 Chris Wallace, SMRG Seminar, Feb 2006 13 Conceptual Modelling Tools UML class model closest to suitable conceptual model –Allows multi-valued attributes –Distinguished relationship kinds Composition Bi-directional associations Uni-directional associations (for multiplicity resolution) –QSEE/Rose No identifiers (primary keys) ?? No indication of mapping to attributes or elements No mapping into Entites No mapping into Documents and Collections

14 Chris Wallace, SMRG Seminar, Feb 2006 14 Identifiers Principle adopted – use naturally occurring identifiers wherever possible –Persons : “Ian Beeson” –Rooms : “3P14” Plus –Reduces gap between RW domain and system –Names in minutes of meetings, on spreadsheets are readable –) Minus –Duplicates Duplicates not tolerable in the RW either, resolved through RW negotiation within a RW namespace e.g. the Faculty Mergers generate duplicates –Aliases –Not all entities have unique identifiers Programmes – ISIS Primary Award and UCAS are candidates but don’t work ? –All names need namespace – “Ian Beeson” at CEMS at UWE –Need to replace multiple naming conventions with a single naming scheme (e.g. initials) –URN’s and semantic web

15 Chris Wallace, SMRG Seminar, Feb 2006 15 Alias handling –Problem handling aliases in staff data Currently a person can have multiple names – first is the prime Better is a separate alias table –Lookup the base table –If not find, try the alias table

16 Chris Wallace, SMRG Seminar, Feb 2006 16 Relationships and Links Relationships need to be implemented –One – Many RDBMS – primary key on the One side becomes foreign key on the Many side NXD – choose which side on the basis of complexity and responsibility –Sequence (modules in a stage) –Complex (pre-requisite expression) –Many-Many RDBMS – intersection table NXD– as for one-many or either side as appropriate – Groups and subgroups Issues –Referential integrity RDBMS – ‘eager’ – data not allowed in unless links OK, links maintained through updates –integrity failures transient, repair outside database NXD – ‘lazy’ –store the data and provide on-demand or on-trigger validation –Integrity failures can be persisted (XLinkit) and repair is inside database

17 Chris Wallace, SMRG Seminar, Feb 2006 17 Versioning Based on Yearly cycle –Base Year set in user’s session –Default set in system config Two different approaches –Module Run, Coursework Elements.. Explicit version identifier –ModuleCode/Year/RunNo –Selection is explicit [Year= $year] –Module Specification, Programme Structure Implicit version defined by sequence of versions

18 Chris Wallace, SMRG Seminar, Feb 2006 18 Implicit Versioning 2002 2005 2007 Versions Year=2006 Latest version =2005 Latest version =2002 Year=2004

19 Chris Wallace, SMRG Seminar, Feb 2006 19 Implicit Versioning let $specPath := "/db/versionTest", $currentYear := "2005", $moduleCode := request:request-parameter("moduleCode",""), $year := request:request-parameter("year",$currentYear), (: get the set of possible versions for this module :) $modspecs := collection($specPath)/moduleSpecification [ModuleCode=$moduleCode] [Version <= $year], (: select the version with the highest version number :) $modspec := $modspecs[Version = max($modspecs/Version)] return $modspec

20 Chris Wallace, SMRG Seminar, Feb 2006 20 Editing Table structured Document editing –Allows maintenance using familiar Spreadsheet tools (Excel 2003) –Schema is induced by Excel –Accommodations Multi-valued fields as concatenated values –XPath Join and tokenise functions –Embedded separator problem (a name with ‘,’ as a legitimate character) –Defeats indexing Optional elements increase table width Formatting choices not maintained (e.g.Freeze-Window) Structured Document editing –Allows maintenance with Word without a schema With difficulty –not schema awareness –Use InfoPath to create desktop form based on schema Need to redo if schema changes In-situ Updates –With Xquery-generated forms and update –With XForms

21 Chris Wallace, SMRG Seminar, Feb 2006 21 Views Views arise from the need for de-normalisation –Coursework Element As a simple element –Key : moduleCode/Year/runNo/elementNo –Data: due date As a derived complex element –SuggestedHours (computed from Hours table) –Late date (computed from UWE calendar) –Weighings (extracted from relevant specification) –Module Leader (extracted from Module Run) Views as transient or materialize View definition View Maintenance

22 Chris Wallace, SMRG Seminar, Feb 2006 22

23 Chris Wallace, SMRG Seminar, Feb 2006 23 declare function fold:courseworkElement($moduleCode, $year, $runNo, $elementNo) { let $mod := fold:moduleSpecification($moduleCode,$year), $run := fold:moduleRun($moduleCode,$year,$runNo), $elementRun := fold:elementRun($moduleCode,$year,$runNo,'B', $elementNo), $elementSpec := $mod/Assessment/FirstAttempt/Components/ComponentB/Element[position() = $elementNo], $dueDate := $elementRun/DueDate, $returnDate := fold:workingDays($dueDate,20), $componentWeight := $mod/Assessment/Weighting/ComponentWeightB, $weightInComponent := data($elementSpec/Weight), $weightInModule := round($weightInComponent * $componentWeight div 100), $load := fold:load($mod/Level), $hrs := round(data($mod/UWERating) div data($load/Credits) * $weightInModule div 100 * data($load/Hours)) return {$moduleCode} {$mod/Title} {$runNo} {$run/ModuleLeader} {$run/InternalModerator} {$run/ExternalExaminer} CW {$elementNo} {$elementSpec/Description} {$hrs} {$weightInComponent} {$weightInModule} {data($dueDate)} {data($returnDate)} };

24 Chris Wallace, SMRG Seminar, Feb 2006 24 Process support Short term – Process support –Form generation –Linkage to process documentation Medium term – Process monitoring –Online capture of significant dates Coursework hand-in date Date exam sent to moderator Date coursework returned to students –Derived information Workload prediction based on coursework schedule and student numbers Display of latest coursework returned and SMS message to students Long term- Process management –Workflow –Process enactment software

25 Chris Wallace, SMRG Seminar, Feb 2006 25 Short-term Session based logins to personalise the interface and specify parameters (currentYear) Form generation as passive documents –Update through the form an obvious extension Extend operational data with date-based status –Date-returned-to students If set (work has been returned) –Date used to generate page of coursework recently returned –Date used to monitor conformance to target return date(!) Link Forms to textual/graphical process description –Coursework from setting to field board –How to specialise a generic description? By level By module By field

26 Chris Wallace, SMRG Seminar, Feb 2006 26 Responsibilities Responsibility allocation –Admin / architect decision –Physical level design for responsibility All Module Runs in a Field in one document Modules and Programme Structures in Field Collections (within Year) –Group access rights For IS Field - ISAdmin –Anne Moggridge –Peter Rawlings –Lilly Cooke –Tracey Davis Need for check-in check-out of documents –WebDav (Web Folders)

27 Chris Wallace, SMRG Seminar, Feb 2006 27 Conclusion Slide from prototype to production Pluses and Minuses of user enthusiasm Go for ‘low-hanging fruit’ Pay attention to the learning process –XQuery, XSLT are non-trivial languages because deeply unlike Java/PHP Reflection forced by presentations and workshops


Download ppt "Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006."

Similar presentations


Ads by Google