Download presentation
Presentation is loading. Please wait.
Published bySydney White Modified over 9 years ago
1
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more
2
What is XML? A framework for declarative languages A syntax and two major constructs: elements & attributes Elements: Have begin and end tags Can be embedded Can be put in lists (homogeneous or heterogeneous) Attributes: Are assigned to elements Are strings Are put in quotes
3
What is XML for? Initially, as a cornerstone of the semantic web Automatic searching of the web (versus interactive) Self-describing data Has been adapted to a wide variety of application domains As a means for specifying the structure of data As a catch-all for nontraditional data
4
XML documents An instance of XML is a language An instance of an XML language is a document Documents are hierarchical & list-oriented XML documents can be parsed in a single, linear pass There is do notion of a fixed schema Does not leverage meta data for set-oriented queries Order matters in a set of documents Order matters in a series of elements in a document
5
Is it a generalized HTML? Sort of, but perhaps more of a meta alternative to HTML The real point is to allow HTML pages to be located and searched automatically This is done by allowing language developers to create their own names for documents, elements, & attributes
6
What else is part of the XML philosophy? Namespaces Associated with URLs Can be referenced in a nested fashion in an XML document Widely distributed sharing of data, XML languages, and namespaces
7
What’s missing, from the database uer’s and a programmer’s perspective? No innate notion of a query language No Objects Very limited data structuring capabilities Yet another impedance mismatch problem No way to store XML documents in a relational database, at least not natively No way to make a database out of a set of documents
8
So, in response to the database community’s desires… A hierarchical query language – Xpath A specification format for schemas – DTDs But uses a different syntax Does not accommodate namespaces
9
So, in response to the database community’s desires, phase 2… XML schema More atomic or “basic” types Like DTD’s, but with an XML syntax Supports namespaces Adds primary keys and foreign keys Adds more constructs for structuring data Simple types: primitive types, list and union, & restriction Attributes can be of simple types Complex types: compositors all (unordered) and sequence (ordered), and choice Extension and restriction Integrity constraints
10
Query language 1: XPath Follows hierarchy of XML documents Uses syntax borrowed from Unix file system \ for root . for current node @ for value of an attribute [1], [2], etc., for siblings // for self or descendent of .//x for all descendants to find an element of a specific type x Augmented with URLs to create Xpointer Relational database systems generally have an XML data type now
11
Distributed Databases & Distributed TXS – homogenous and heterogeneous See page 689: multiple DBs vs. a distributed DB Homogeneous distributed DBs Single unified schema Designed top down Distribution by row, column, table, by table selection Issues of distribution Redundancy: availability vs. keeping copies up to date Hidden joins with column distribution Hidden unions with table selection distribution
12
Executing distributed transactions Each node has a master and a client module Masters are all identical and contain distributed data info Clients are like single site databases with a prepare to commit 3 basic strategies for query fragment execution Bring data to procedure Send procedure to data Meet in a 3 rd place Estimating costs Data shipping Result shipping Wait times on nodes Integrity constraint enforcement
13
Heterogeneous distributed databases Forms of heterogeneity Model Schema Database product Namespace Table structure (implications for object identities) Keys and Foreign keys Units SQL dialect Semantic issues relating to varying interpretations of data
14
Integrating heterogeneous databases After the fact Stability is never achieved Mappings are complex Data may have conflicts, redundancy, and gaps Closed world vs. open world
15
Engineering for nonstop change Mediators around databases Gateways connecting old apps and new databases Gateways connecting new apps and old databases A stability of instability
16
OLAP Standard model N dimension tables 1 fact table (PK is union of keys of dimension tables) Hypercube visualization Multidimensional table result visualizations Star and constellation schemas Terminology Drilling down – stepping down nested attributes Rolling up – moving up nested attributes Pivot – group by
17
Specialized operators Cube operator and 4 equivalent queries Viewing results See page 722 Equivalent – see 723
18
Populating the warehouse Transformation Integration cleaning
19
Data mining Effectively an open world application Association, classification, clustering – page 730 Association – confidence and support – page 731
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.