Native XML Databases for Information Systems Chris Wallace XQuery workshop April 2006.

Slides:



Advertisements
Similar presentations
Database Management Using Microsoft Access Xinhua Chen, Ph.D. Chinese Association of Professionals in Science and Technology March 23, 2003.
Advertisements

Chapter 10: Designing Databases
XML: Extensible Markup Language
Management Information Systems, Sixth Edition
Dr Gordon Russell, Napier University Unit Data Dictionary 1 Data Dictionary Unit 5.3.
The CEMS Faculty Information System Project 23 June 2006.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 13-1 COS 346 Day 25.
Native XML Database for Information Systems Chris Wallace ISD3 March 2006.
Native XML Database for Information Systems Chris Wallace IS School Research Seminar Feb 2006.
Introduction to XQuery and eXist DSA. XSLT Tutorial Problems Bad language : – No ! – for XML schema –Some XSLT engines tolerant of extra tags (permissive),
XML Prague/eXist BOFChris Wallace, UWE, Bristol 1 eXist in the Faculty Chris Wallace Senior Lecturer School of Information Systems UWE, Bristol XML Prague.
Introduction to XQuery and eXist Week 17 DSA. DSA - XQuery2 XPath. Hierarchical file systems have been navigable with path expression since Unix –/abc/cde/../../efg.
XML Workshop XSLT. XML Tagged data Hello A really interesting course, well taught Interchange of data RSS, BPEL4WS, RossettaNet … Structure document representation.
Data Modelling. EAR model This modelling language allows a very small vocabulary: Just as English has nouns, verbs, adjectives, pronouns.., EAR models.
Chapter 11 Data Management Layer Design
Modern Systems Analysis and Design Third Edition
1 Chapter 2 Reviewing Tables and Queries. 2 Chapter Objectives Identify the steps required to develop an Access application Specify the characteristics.
Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006.
Chapter 1: The Database Environment
4/20/2017.
Databases & Data Warehouses Chapter 3 Database Processing.
Page 1 ISMT E-120 Desktop Applications for Managers Introduction to Microsoft Access.
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.
Apache Chemistry face-to-face meeting April 2010.
Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall 9.1.
XForms: A case study Rajiv Shivane & Pavitar Singh.
Copyright © 2012 Accenture All Rights Reserved.Copyright © 2012 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are.
Information Systems Design 3 UFIE8V Lecture topics Admin –StaffingStaffing –UWE OnlineUWE Online –AssessmentAssessment Themes Case studies.
Introduction to Databases A line manager asks, “If data unorganized is like matter unorganized and God created the heavens and earth in six days, how come.
Chapter 9 Designing Databases Modern Systems Analysis and Design Sixth Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich.
Summary Data Modeling SDLC What is Data Modeling Application Audience and Services Entities Attributes Relationships Entity Relationship Diagrams Conceptual,Logical.
Midterm Exam Chapters 1,2,3,5, 6,7 (closed book) March 11, 2014.
MIS 301 Information Systems in Organizations Dave Salisbury ( )
Intro. to XML & XML DB Bun Yue Professor, CS/CIS UHCL.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering.
1 The Relational Database Model. 2 Learning Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical.
Object Oriented Analysis and Design 1 Chapter 7 Database Design  UML Specification for Data Modeling  The Relational Data Model and Object Model  Persistence.
Object Persistence Design Chapter 13. Key Definitions Object persistence involves the selection of a storage format and optimization for performance.
Slide 1 Object Persistence Design Chapter 13 Alan Dennis, Barbara Wixom, and David Tegarden John Wiley & Sons, Inc. Slides by Fred Niederman Edited by.
9/7/2012ISC329 Isabelle Bichindaritz1 The Relational Database Model.
7 1 Chapter 7 Introduction to Structured Query Language (SQL) Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Chapter 12: Designing Databases
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Database Application Design and Data Integrity AIMS 3710 R. Nakatsu.
PowerPoint Presentation for Dennis, Wixom, & Tegarden Systems Analysis and Design with UML, 3rd Edition Copyright © 2009 John Wiley & Sons, Inc. All rights.
Keyword Searching Weighted Federated Search with Key Word in Context Date: 10/2/2008 Dan McCreary President Dan McCreary & Associates
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Mike Jackson EPCC OGSA-DAI Architecture + Extensibility OGSA-DAI Tutorial GGF17, Tokyo.
XML and Database.
Database Management Systems (DBMS)
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
Session 1 Module 1: Introduction to Data Integrity
Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall Chapter 9 Designing Databases 9.1.
Object storage and object interoperability
CH 10 DB Application Design n 10.1 Functions n 10.2 Case Application n 10.3 Creating, Reading, Updating, Deleting View n 10.4 Form Design n 10.5 Report.
Apache Cocoon – XML Publishing Framework 데이터베이스 연구실 박사 1 학기 이 세영.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Management Information Systems by Prof. Park Kyung-Hye Chapter 7 (8th Week) Databases and Data Warehouses 07.
XML: Extensible Markup Language
Modern Systems Analysis and Design Third Edition
XML in Web Technologies
Microsoft Access 2003 Illustrated Complete
Database Processing with XML
MANAGING DATA RESOURCES
Data Model.
Analysis models and design models
Database management concepts
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Presentation transcript:

Native XML Databases for Information Systems Chris Wallace XQuery workshop April 2006

Chris Wallace, UWE, Bristol 2 Exploring the design space Native XML database (NXD) –Storing, querying and updating XML documents without mapping into relations –Schema-free –Trees are to NXD what tables are to RDBMS –Tables are trees Information Systems –Focus on semi-structured data (mixture of simple data items, text and complex nested structures) –Searching, derived data, visualisation –Process support –Large problem space variously supported by spreadsheets, word documents, ad-hoc databases, increasingly web-integrated data “design as a conversation with the materials in the situation” (Schon)

Chris Wallace, UWE, Bristol 3 Solution: eXist Native XML Database eXist –Open source Java –European team of developers led by Wolfgang Meier –Under development for several years, mature except for documentation Supports –XQuery –XUpdate –XSLT –Free-text searching –XQuery Extensions to allow complete applications to be developed Documents (files) are organised in collections (folders) in a file store –XML Documents stored in an efficient, B+ tree structure with indexes –Non-XML resources (XQuery, CSS, JPEG..), etc can be stored as binary Deployable in different ways –Embedded in a Java application –Part of a Cocoon pipeline –As web application in Apache/Tomcat –With embedded Jetty HTTPServer Multiple Interfaces –REST – to Java servlet –SOAP –XML-RPC

Chris Wallace, UWE, Bristol 4 Sample Implementations Family photos and history –Integration of meta-data on family photos with family history (births, deaths and marriages) and Google Earth FOLD –modules, programmes, scheme operations, staff, organisational structures, events Other demos on the eXist demo siteOther demos

Chris Wallace, UWE, Bristol 5 FOLD – Faculty OnLine Data Operations at student level (2000 in CEMS) supported by central systems (student records, finance) FOLD Scope – teaching and assessment management and organisational knowledge –Modules [450] and their specification –Programmes (Courses) [100] and their structures –Operations – Runs, Coursework, exams –Staff (300+) –Organisational structure (100) –Events Information currently distributed over word documents, spreadsheets, access databases, SQL database, flat text files, LDAP Aims –To support distributed data ownership –To provide a web of data within and between systems –To support organisational processes –To improve data veracity

Chris Wallace, UWE, Bristol 6 FOLD Entity Types Entity TypeIdentifier No of instancesDocument Map No of documents / year Document Type Module SpecificationModuleCode/Version450one each450 (40) complex structure Module RunModuleCode/Year/Runno460one per field/year6table Module assessments ModuleCode/Year/Runno/Eleme ntNo800one per field/year6Table ExaminationModuleCode/Year/Exam420one per year1 Student numbersDate/ModuleCode450 * 4one per date5table Award typesPrimaryAward8one only1simple structure ProgrammesProgrammeCode/Year100one per year1table Programme Structure ProgrammeCode/Pathway/Versi on110one each110 (20) complex structure Organisational structureGroupName100 several per major group60simple structure EventsEventGroup/EventID300all events in a group50simple structure StaffName400per responsibility5table with reps TrainingName/Course200one only1table Training CoursesCourse40one only1table ucasKey wordsUCASCode/Keyword4000one only1table UWE calendarDate365one only1table SuggestedHoursLevel5one only1simple structure Entity Type metadataDatasetName20one only1table System ConfigurationFaculty1one only1table

Chris Wallace, UWE, Bristol 7 FOLD current stats Code –XQuery –XSLT –XSD (one schema) –CSS –PHP - 10 ( vcal) Pages –about 25 user –Only 1 admin as yet Information System development –CW (4 months) –Placement Student (8 months) –Phase allocation: Project (20%) Code (20%) Data – gathering, conversion, cleaning (60%)

Chris Wallace, UWE, Bristol 8 The FOLD

Chris Wallace, UWE, Bristol 9 Areas for attention Conceptual Modelling –Identifiers –Relationships and links –Versioning Logical Modelling (in XML) –Element/attribute –Views –Validation Physical layer (in NXD) – Structuring documents and collections –Mapping to editors –Responsibilities Programming –Functional allocation between tiers –Views and constructed elements –Integrity –XQuery programming User interface –Editing –Long transactions Development Process –Case Tool requirements Scope of application of NXD

Chris Wallace, UWE, Bristol 10 Conceptual Modelling Conventional normalised data model –EAR ++ Entity (not XML entities like &) Attribute (multi-valued) Relationships –Association –Composition –Object Orientation? methods are mainly getters (of derived values) Inheritance only useful in the schema domain Instance inheritance more useful in IS –Expressivity Problems Identifiers Order of parts Verbosity ? Conceptual Scope –Edit trails, versioning, activity tracking Generality problem –Roles as Attributes Stewart Green –Roles as Entities Module Leader Stewart Green

Chris Wallace, UWE, Bristol 11 Identifiers Principle adopted – use naturally occurring identifiers wherever possible –Persons : “Chris Wallace” –Rooms : “3P14” Yes –Reduces gap between Real World domain and system –Names in minutes of meetings, on spreadsheets are readable No –Duplicates Duplicates not tolerable in the RW either, resolved through RW negotiation within a RW namespace e.g. the Faculty Mergers generate duplicates –Aliases –Not all entities have unique domain identifiers Gives rise to confusion in the problem domain and should be resolved there Po –All names need namespace – “Chris Wallace” at CEMS at UWE –Need to replace multiple naming conventions with a single naming scheme (e.g. initials) –URN’s and semantic web

Chris Wallace, UWE, Bristol 12 Conceptual to Logical Attributes v elements Relationships Integrity Views

Chris Wallace, UWE, Bristol 13 Attributes v elements E.g. – … – UFIEKG What criteria to use? –Attributes as ‘meta’ is vague –FOLD uses only elements

Chris Wallace, UWE, Bristol 14 Relationships Implementing Relationships –One – Many RDBMS – primary key on the One side becomes foreign key on the Many side NXD – choose which side on the basis of complexity and responsibility –Sequence (modules in a stage) –Complex (pre-requisite expression) –Many-Many RDBMS – intersection table NXD– as for one-many or either side as appropriate – e.g. Groups and subgroups

Chris Wallace, UWE, Bristol 15 Integrity Structural integrity –Schema validation too weak and too restructive –NXD stores any well-formed XML Referential Integrity –RDBMS – ‘eager’ data not allowed in unless valid, updates maintain integrity integrity failures transient, repair outside database –NXD – ‘lazy’ store the data and provide on-demand or on-trigger validation Integrity failures can be persisted (XLinkit) and repair is inside database Identifier Uniqueness –XML ids only checked within a document –NXD stores all XML nodes with internal identifiers For Information Systems, veracity of the model is what’s important

Chris Wallace, UWE, Bristol 16 Logical to Physical layers What criteria to use in allocation of logical units to the physical layer: –Documents – a physical aggregation of entity instances –Collections – a physical aggregation of documents Examples –Module Specification [moduleCode] Module Spec is an Entity Each Module Spec is a Document –Module Run [moduleCode/year/runNo] Module Run is an Entity Set of Module Runs for a Field is a Document Issues –Schemas needed per entity, not per document –Principle: No concepts modelled in the physical layer –Use Physical layer for responsibility, access rights ?

Chris Wallace, UWE, Bristol 17 Programming issues Tier design Views and constructed elements XQuery programming

Chris Wallace, UWE, Bristol 18 Tier design Allocation of functionality to tiers –Initially nearly all XQuery generating HTML –As work matured, code moved into function libraries and XSLT –XQuery for request input, sessions, selection of nodes, computation of views for –XSLT to generate interface for –CSS to style

Chris Wallace, UWE, Bristol 19 Views Views arise from the need for de-normalisation for presentation –Coursework Element As a simple element –Key : moduleCode/Year/runNo/elementNo –Data: due date As an extended de-normalised element –SuggestedHours (computed from Hours table) –Late date (computed from UWE calendar) –Weighings (extracted from relevant specification) –Module Leader (extracted from Module Run) Views as intermediate structures –From low level functions –For output to XSL –Constructed elements in XQuery use copy (losing reference so cant update through a constructed element) View caching for efficiency –Triggers can invoke cache renewal

Chris Wallace, UWE, Bristol 20 declare function fold:courseworkElement($moduleCode, $year, $runNo, $elementNo) { let $mod := fold:moduleSpecification($moduleCode,$year), $run := fold:moduleRun($moduleCode,$year,$runNo), $elementRun := fold:elementRun($moduleCode,$year,$runNo,'B', $elementNo), $elementSpec := $mod/Assessment/FirstAttempt/Components/ComponentB/Element[position() = $elementNo], $dueDate := $elementRun/DueDate, $returnDate := fold:workingDays($dueDate,20), $componentWeight := $mod/Assessment/Weighting/ComponentWeightB, $weightInComponent := data($elementSpec/Weight), $weightInModule := round($weightInComponent * $componentWeight div 100), $load := fold:load($mod/Level), $hrs := round(data($mod/UWERating) div data($load/Credits) * $weightInModule div 100 * data($load/Hours)) return {$moduleCode} {$mod/Title} {$runNo} {$run/ModuleLeader} {$run/InternalModerator} {$run/ExternalExaminer} CW {$elementNo} {$elementSpec/Description} {$hrs} {$weightInComponent} {$weightInModule} {data($dueDate)} {data($returnDate)} };

Chris Wallace, UWE, Bristol 21 Integrity Unlike RDBMS, integrity checks not inherent in Database –Structural ( schema validation) –Referential integrity –Business rules Policies –Restrictive - allow in only data which has satisfied integrity constraints Unitary view of data – model must be consistent at all times –Permissive – allow in un-validated data with on-demand validation reconciliation Pluralist view – model will probably never be consistent but have to work with this On-demand validation –Structure via eXist validation –Referential (via explicit coding) –Extensive Business rules

Chris Wallace, UWE, Bristol 22 XQuery programming Functional style yields good clean code But its not OO! Need to rethink some algorithms Strict data typing needs explicit conversion Schema not missed XPath 2.0 in XQuery, Xpath 1.0 in XSLT (xalan) causes confusion Fast and responsive

Chris Wallace, UWE, Bristol 23 User Interface Table structured Document editing –Allows maintenance using familiar Spreadsheet tools (Excel Add-in) –Schema is induced by Excel –Accommodations Multi-valued fields as concatenated values –XPath Join and tokenise functions –Embedded separator problem (a name with ‘,’ as a legitimate character) –Defeats conventional indexing but eXist supports full text indexing Optional elements increase table width Formatting choices not maintained (e.g. column widths, freeze-window location) –WebDav to provide Web Folder access (still not functioning) Structured Document editing –Allows maintenance with Word without a schema With difficulty –not schema awareness –Use InfoPath to create desktop form based on schema Need to redo if schema changes –Document editors (Arbotext, XMetal..) - expensive In-situ updates –With Xquery-generated forms and update –With XForms using Orbeon (open-source XForms server)

Chris Wallace, UWE, Bristol 24 Development Tools eXist Java Client provides basic tools –Syntax-aware editor –Query execution –User and database management XML spy Any text editor Model-driven development –Conceptual Model -> logical Model -> physical Model – Rose, QSEE ?

Chris Wallace, UWE, Bristol 25 Development Process Co-development of Information system structure (code and schemas) and content (documents) Support schema migration and refactoring (using XQuery/XSLT) Slide from prototype to production Pluses and Minuses of user enthusiasm Go for ‘low-hanging fruit’ Pay attention to the learning process –XQuery, XSLT are non-trivial languages because deeply unlike Java/PHP Project management via steering group, discussion boards but needs forceful lead developer Reflection forced by presentations and workshops Is Agile IS development different to Agile Software development?

Chris Wallace, UWE, Bristol 26 Characteristics of good fit ? FOLD –Low update rate / medium access rate –High document complexity –Document-centric ownership –Navigational interface –Integration with central systems – (via XML interfaces?)