Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006.

Slides:



Advertisements
Similar presentations
XML: Extensible Markup Language
Advertisements

XML DOCUMENTS AND DATABASES
MS-Access XP Lesson 1. Introduction to MS-Access Database Management System Software (DBMS) Store data in databases Database is a collection of table.
Management Information Systems, Sixth Edition
3/5/2009Computer systems1 Analyzing System Using Data Dictionaries Computer System: 1. Data Dictionary 2. Data Dictionary Categories 3. Creating Data Dictionary.
Dr Gordon Russell, Napier University Unit Data Dictionary 1 Data Dictionary Unit 5.3.
The CEMS Faculty Information System Project 23 June 2006.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
XSLT XML DBs, and Schemas Week 18 DSA. The Whisky Case study XSLT can be applied in the client. –Add a xml processing instruction to the xml to bind to.
Native XML Database for Information Systems Chris Wallace ISD3 March 2006.
Native XML Database for Information Systems Chris Wallace IS School Research Seminar Feb 2006.
DSA week 161 XML Data and Schemas DSA Week 16. DSA week 162 News Bloglines wall of images –
Introduction to Structured Query Language (SQL)
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 7 Data Modeling Using the Entity- Relationship (ER) Model.
Introduction to XQuery and eXist DSA. XSLT Tutorial Problems Bad language : – No ! – for XML schema –Some XSLT engines tolerant of extra tags (permissive),
Data Management Design
Native XML Databases for Information Systems Chris Wallace XQuery workshop April 2006.
Chapter 11 Data Management Layer Design
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Introduction to Structured Query Language (SQL)
1 Chapter 2 Reviewing Tables and Queries. 2 Chapter Objectives Identify the steps required to develop an Access application Specify the characteristics.
Information systems and databases Database information systems Read the textbook: Chapter 2: Information systems and databases FOR MORE INFO...
BIS310: Week 7 BIS310: Structured Analysis and Design Data Modeling and Database Design.
Chapter 14 & 15 Conceptual & Logical Database Design Methodology
Page 1 ISMT E-120 Introduction to Microsoft Access & Relational Databases The Influence of Software and Hardware Technologies on Business Productivity.
4/20/2017.
Databases & Data Warehouses Chapter 3 Database Processing.
Page 1 ISMT E-120 Desktop Applications for Managers Introduction to Microsoft Access.
PHASE 3: SYSTEMS DESIGN Chapter 7 Data Design.
CSE314 Database Systems Data Modeling Using the Entity- Relationship (ER) Model Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall 9.1.
Databases C HAPTER Chapter 10: Databases2 Databases and Structured Fields  A database is a collection of information –Typically stored as computer.
Lecture 7 Interaction. Topics Implementing data flows An internet solution Transactions in MySQL 4-tier systems – business rule/presentation separation.
XML Overview. Chapter 8 © 2011 Pearson Education 2 Extensible Markup Language (XML) A text-based markup language (like HTML) A text-based markup language.
Information Systems Design 3 UFIE8V Lecture topics Admin –StaffingStaffing –UWE OnlineUWE Online –AssessmentAssessment Themes Case studies.
ITEC224 Database Programming
Introduction to Databases A line manager asks, “If data unorganized is like matter unorganized and God created the heavens and earth in six days, how come.
Chapter 9 Designing Databases Modern Systems Analysis and Design Sixth Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich.
Summary Data Modeling SDLC What is Data Modeling Application Audience and Services Entities Attributes Relationships Entity Relationship Diagrams Conceptual,Logical.
1 Chapter 15 Methodology Conceptual Databases Design Transparencies Last Updated: April 2011 By M. Arief
Concepts and Terminology Introduction to Database.
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 6 XSLT (Based on Møller and Schwartzbach,
 A database is a collection of data that is organized so that its contents can easily be accessed, managed, and updated. What is Database?
MIS 301 Information Systems in Organizations Dave Salisbury ( )
 XML is designed to describe data and to focus on what data is. HTML is designed to display data and to focus on how data looks.  XML is created to structure,
1 The Relational Database Model. 2 Learning Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical.
Object Oriented Analysis and Design 1 Chapter 7 Database Design  UML Specification for Data Modeling  The Relational Data Model and Object Model  Persistence.
Object Persistence (Data Base) Design Chapter 13.
Object Persistence Design Chapter 13. Key Definitions Object persistence involves the selection of a storage format and optimization for performance.
Slide 1 Object Persistence Design Chapter 13 Alan Dennis, Barbara Wixom, and David Tegarden John Wiley & Sons, Inc. Slides by Fred Niederman Edited by.
Information Systems & Databases 2.2) Organisation methods.
10/10/2012ISC239 Isabelle Bichindaritz1 Physical Database Design.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Copyright 2006 Prentice-Hall, Inc. Essentials of Systems Analysis and Design Third Edition Joseph S. Valacich Joey F. George Jeffrey A. Hoffer Chapter.
PowerPoint Presentation for Dennis, Wixom, & Tegarden Systems Analysis and Design with UML, 3rd Edition Copyright © 2009 John Wiley & Sons, Inc. All rights.
Chapter 10 Designing Databases. Objectives:  Define key database design terms.  Explain the role of database design in the IS development process. 
Session 1 Module 1: Introduction to Data Integrity
 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.
The Relational Model Lecture #2 Monday 21 st October 2001.
XML 1. Chapter 8 © 2013 Pearson Education, Inc. Publishing as Prentice Hall SAMPLE XML SCHEMA (XSD) 2 Schema is a record definition, analogous to the.
Data Modeling Using the Entity- Relationship (ER) Model
Business rules.
XML: Extensible Markup Language
Entity-Relationship Model
Chapter 9 Designing Databases
Chapter 1 Database Systems
Data Model.
Chapter 1 Database Systems
Presentation transcript:

Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006

Chris Wallace, SMRG Seminar, Feb Exploring the design space “design as a conversation with the materials in the situation” (Schon) Native XML database (NXD) –Storing, querying and updating XML documents without mapping into relations –Schema-free –Trees are to NXD what tables are to RDBMS –Tables are trees Information Systems –Focus on semi-structured data (mixture of simple data items, text and complex nested structures) –Searching, derived data, visualisation –Process support –Large problem space variously supported by spreadsheets, word documents, ad-hoc databases, increasingly web-integrated data.

Chris Wallace, SMRG Seminar, Feb eXist Native XML Database Open source Java European team of developers led by Wolfgang Meier Documents (files) are organised in collections (folders) in a file store –XML Documents stored in an efficient, B+ tree structure with indexes –Non-XML resources (XQuery, CSS, JPEG..), etc can be stored as binary Deployable in different ways –Embedded in a Java application –Part of a Cocoon pipeline –As web application in Apache/Tomcat –With embedded Jetty HTTPserver (as on stocks) Multiple Interfaces –REST – to Java servlet –SOAP –XML:RPC

Chris Wallace, SMRG Seminar, Feb NXD case studies FOLD –modules, programmes, scheme operations, staff, organisational structures, events Family photos and history –Integration of meta-data on family photos with family history (births, deaths and marriages)meta-databirths, deaths and marriages ISD3 Assignment –a web-based calculator –e.g. a currency convertercurrency converter

Chris Wallace, SMRG Seminar, Feb Research Work Development of the FOLD (Faculty OnLine Data) - a pilot project for UWE Teaching students and staff in XML languages (XML Schema, XSLT, XQuery) and NDX database design Links with other eXist projects SPA2006 Workshop on NDX XML Prague (eXist)

Chris Wallace, SMRG Seminar, Feb Research Areas Design practice for NDX –‘Pattern language’ to help map from conceptual model to multiple XML schemes –Identifier design –Structuring documents by responsibility and versions NDX in organisational use –Social effects of distributed responsibility –Visualisation of complex relationships –Handling integrity problems – accept inconsistency as a way of life –Management of veracity

Chris Wallace, SMRG Seminar, Feb The FOLD Faculty OnLine Data Technologies –eXist –(Java) – not yet –XQuery –XSLT –CSS –PHP – to be eliminated

Chris Wallace, SMRG Seminar, Feb The FOLD (2) Scope –Module and Programme specifications –Modular Schema operations (runs) –Staff –Organisational structure –Events Functionality –Highly linked –(Integrating UWE sources) –(Personalized Interface)

Chris Wallace, SMRG Seminar, Feb The FOLD

Chris Wallace, SMRG Seminar, Feb Fold Design Issues Conceptual Modelling Conceptual – Logical – Physical mapping Identifiers Relationships and links Versioning Editing Views Responsibilities Processes

Chris Wallace, SMRG Seminar, Feb Mapping from Conceptual model to the Logical and physical layers What criteria to use in breaking up the whole model into –Logical Entity – a logical compound structure –Physical Documents – a physical aggregation of entity instances Collections – a physical aggregation of documents Examples –Module Specification [moduleCode] Module Spec is an Entity Each Module Spec is a Document –Module Run [moduleCode/year/runNo] Module Run is an Entity Set of Module Runs for a Field is a Document Issues –Where to develop Schemas? –No logical data in the physical – purely for convenience

Chris Wallace, SMRG Seminar, Feb Conceptual Modelling Conventional normalised data model Generality issue e.g. Module run –Roles as Attributes Stewart Green –Roles as Entities Module Leader Stewart Green –Entities enable meta data, but defeat use of tables for data entry Need views Attributes v elements –a Conceptual/logical mapping issue – … – UFIEKG

Chris Wallace, SMRG Seminar, Feb Conceptual Modelling Tools UML class model closest to suitable conceptual model –Allows multi-valued attributes –Distinguished relationship kinds Composition Bi-directional associations Uni-directional associations (for multiplicity resolution) –QSEE/Rose No identifiers (primary keys) ?? No indication of mapping to attributes or elements No mapping into Entites No mapping into Documents and Collections

Chris Wallace, SMRG Seminar, Feb Identifiers Principle adopted – use naturally occurring identifiers wherever possible –Persons : “Ian Beeson” –Rooms : “3P14” Plus –Reduces gap between RW domain and system –Names in minutes of meetings, on spreadsheets are readable –) Minus –Duplicates Duplicates not tolerable in the RW either, resolved through RW negotiation within a RW namespace e.g. the Faculty Mergers generate duplicates –Aliases –Not all entities have unique identifiers Programmes – ISIS Primary Award and UCAS are candidates but don’t work ? –All names need namespace – “Ian Beeson” at CEMS at UWE –Need to replace multiple naming conventions with a single naming scheme (e.g. initials) –URN’s and semantic web

Chris Wallace, SMRG Seminar, Feb Alias handling –Problem handling aliases in staff data Currently a person can have multiple names – first is the prime Better is a separate alias table –Lookup the base table –If not find, try the alias table

Chris Wallace, SMRG Seminar, Feb Relationships and Links Relationships need to be implemented –One – Many RDBMS – primary key on the One side becomes foreign key on the Many side NXD – choose which side on the basis of complexity and responsibility –Sequence (modules in a stage) –Complex (pre-requisite expression) –Many-Many RDBMS – intersection table NXD– as for one-many or either side as appropriate – Groups and subgroups Issues –Referential integrity RDBMS – ‘eager’ – data not allowed in unless links OK, links maintained through updates –integrity failures transient, repair outside database NXD – ‘lazy’ –store the data and provide on-demand or on-trigger validation –Integrity failures can be persisted (XLinkit) and repair is inside database

Chris Wallace, SMRG Seminar, Feb Versioning Based on Yearly cycle –Base Year set in user’s session –Default set in system config Two different approaches –Module Run, Coursework Elements.. Explicit version identifier –ModuleCode/Year/RunNo –Selection is explicit [Year= $year] –Module Specification, Programme Structure Implicit version defined by sequence of versions

Chris Wallace, SMRG Seminar, Feb Implicit Versioning Versions Year=2006 Latest version =2005 Latest version =2002 Year=2004

Chris Wallace, SMRG Seminar, Feb Implicit Versioning let $specPath := "/db/versionTest", $currentYear := "2005", $moduleCode := request:request-parameter("moduleCode",""), $year := request:request-parameter("year",$currentYear), (: get the set of possible versions for this module :) $modspecs := collection($specPath)/moduleSpecification [ModuleCode=$moduleCode] [Version <= $year], (: select the version with the highest version number :) $modspec := $modspecs[Version = max($modspecs/Version)] return $modspec

Chris Wallace, SMRG Seminar, Feb Editing Table structured Document editing –Allows maintenance using familiar Spreadsheet tools (Excel 2003) –Schema is induced by Excel –Accommodations Multi-valued fields as concatenated values –XPath Join and tokenise functions –Embedded separator problem (a name with ‘,’ as a legitimate character) –Defeats indexing Optional elements increase table width Formatting choices not maintained (e.g.Freeze-Window) Structured Document editing –Allows maintenance with Word without a schema With difficulty –not schema awareness –Use InfoPath to create desktop form based on schema Need to redo if schema changes In-situ Updates –With Xquery-generated forms and update –With XForms

Chris Wallace, SMRG Seminar, Feb Views Views arise from the need for de-normalisation –Coursework Element As a simple element –Key : moduleCode/Year/runNo/elementNo –Data: due date As a derived complex element –SuggestedHours (computed from Hours table) –Late date (computed from UWE calendar) –Weighings (extracted from relevant specification) –Module Leader (extracted from Module Run) Views as transient or materialize View definition View Maintenance

Chris Wallace, SMRG Seminar, Feb

Chris Wallace, SMRG Seminar, Feb declare function fold:courseworkElement($moduleCode, $year, $runNo, $elementNo) { let $mod := fold:moduleSpecification($moduleCode,$year), $run := fold:moduleRun($moduleCode,$year,$runNo), $elementRun := fold:elementRun($moduleCode,$year,$runNo,'B', $elementNo), $elementSpec := $mod/Assessment/FirstAttempt/Components/ComponentB/Element[position() = $elementNo], $dueDate := $elementRun/DueDate, $returnDate := fold:workingDays($dueDate,20), $componentWeight := $mod/Assessment/Weighting/ComponentWeightB, $weightInComponent := data($elementSpec/Weight), $weightInModule := round($weightInComponent * $componentWeight div 100), $load := fold:load($mod/Level), $hrs := round(data($mod/UWERating) div data($load/Credits) * $weightInModule div 100 * data($load/Hours)) return {$moduleCode} {$mod/Title} {$runNo} {$run/ModuleLeader} {$run/InternalModerator} {$run/ExternalExaminer} CW {$elementNo} {$elementSpec/Description} {$hrs} {$weightInComponent} {$weightInModule} {data($dueDate)} {data($returnDate)} };

Chris Wallace, SMRG Seminar, Feb Process support Short term – Process support –Form generation –Linkage to process documentation Medium term – Process monitoring –Online capture of significant dates Coursework hand-in date Date exam sent to moderator Date coursework returned to students –Derived information Workload prediction based on coursework schedule and student numbers Display of latest coursework returned and SMS message to students Long term- Process management –Workflow –Process enactment software

Chris Wallace, SMRG Seminar, Feb Short-term Session based logins to personalise the interface and specify parameters (currentYear) Form generation as passive documents –Update through the form an obvious extension Extend operational data with date-based status –Date-returned-to students If set (work has been returned) –Date used to generate page of coursework recently returned –Date used to monitor conformance to target return date(!) Link Forms to textual/graphical process description –Coursework from setting to field board –How to specialise a generic description? By level By module By field

Chris Wallace, SMRG Seminar, Feb Responsibilities Responsibility allocation –Admin / architect decision –Physical level design for responsibility All Module Runs in a Field in one document Modules and Programme Structures in Field Collections (within Year) –Group access rights For IS Field - ISAdmin –Anne Moggridge –Peter Rawlings –Lilly Cooke –Tracey Davis Need for check-in check-out of documents –WebDav (Web Folders)

Chris Wallace, SMRG Seminar, Feb Conclusion Slide from prototype to production Pluses and Minuses of user enthusiasm Go for ‘low-hanging fruit’ Pay attention to the learning process –XQuery, XSLT are non-trivial languages because deeply unlike Java/PHP Reflection forced by presentations and workshops