Presentation is loading. Please wait.

Presentation is loading. Please wait.

XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.

Similar presentations


Presentation on theme: "XML, distributed databases, and OLAP/warehousing The semantic web and a lot more."— Presentation transcript:

1 XML, distributed databases, and OLAP/warehousing The semantic web and a lot more

2 What is XML?  A framework for declarative languages  A syntax and two major constructs: elements & attributes  Elements:  Have begin and end tags  Can be embedded  Can be put in lists (homogeneous or heterogeneous)  Attributes:  Are assigned to elements  Are strings  Are put in quotes

3 What is XML for?  Initially, as a cornerstone of the semantic web  Automatic searching of the web (versus interactive)  Self-describing data  Has been adapted to a wide variety of application domains  As a means for specifying the structure of data  As a catch-all for nontraditional data

4 XML documents  An instance of XML is a language  An instance of an XML language is a document  Documents are hierarchical & list-oriented  XML documents can be parsed in a single, linear pass  There is do notion of a fixed schema  Does not leverage meta data for set-oriented queries  Order matters in a set of documents  Order matters in a series of elements in a document

5 Is it a generalized HTML?  Sort of, but perhaps more of a meta alternative to HTML  The real point is to allow HTML pages to be located and searched automatically  This is done by allowing language developers to create their own names for documents, elements, & attributes

6 What else is part of the XML philosophy?  Namespaces  Associated with URLs  Can be referenced in a nested fashion in an XML document  Widely distributed sharing of data, XML languages, and namespaces

7 What’s missing, from the database uer’s and a programmer’s perspective?  No innate notion of a query language  No Objects  Very limited data structuring capabilities  Yet another impedance mismatch problem  No way to store XML documents in a relational database, at least not natively  No way to make a database out of a set of documents

8 So, in response to the database community’s desires…  A hierarchical query language – Xpath  A specification format for schemas – DTDs  But uses a different syntax  Does not accommodate namespaces

9 So, in response to the database community’s desires, phase 2…  XML schema  More atomic or “basic” types  Like DTD’s, but with an XML syntax  Supports namespaces  Adds primary keys and foreign keys  Adds more constructs for structuring data  Simple types: primitive types, list and union, & restriction  Attributes can be of simple types  Complex types: compositors  all (unordered) and sequence (ordered), and choice  Extension and restriction  Integrity constraints

10 Query language 1: XPath  Follows hierarchy of XML documents  Uses syntax borrowed from Unix file system  \ for root . for current node  @ for value of an attribute  [1], [2], etc., for siblings  // for self or descendent of .//x for all descendants to find an element of a specific type x  Augmented with URLs to create Xpointer  Relational database systems generally have an XML data type now

11 Distributed Databases & Distributed TXS – homogenous and heterogeneous  See page 689: multiple DBs vs. a distributed DB  Homogeneous distributed DBs  Single unified schema  Designed top down  Distribution by row, column, table, by table selection  Issues of distribution  Redundancy: availability vs. keeping copies up to date  Hidden joins with column distribution  Hidden unions with table selection distribution

12 Executing distributed transactions  Each node has a master and a client module  Masters are all identical and contain distributed data info  Clients are like single site databases with a prepare to commit  3 basic strategies for query fragment execution  Bring data to procedure  Send procedure to data  Meet in a 3 rd place  Estimating costs  Data shipping  Result shipping  Wait times on nodes  Integrity constraint enforcement

13 Heterogeneous distributed databases  Forms of heterogeneity  Model  Schema  Database product  Namespace  Table structure (implications for object identities)  Keys and Foreign keys  Units  SQL dialect  Semantic issues relating to varying interpretations of data

14 Integrating heterogeneous databases  After the fact  Stability is never achieved  Mappings are complex  Data may have conflicts, redundancy, and gaps  Closed world vs. open world

15 Engineering for nonstop change  Mediators around databases  Gateways connecting old apps and new databases  Gateways connecting new apps and old databases  A stability of instability

16 OLAP  Standard model  N dimension tables  1 fact table (PK is union of keys of dimension tables)  Hypercube visualization  Multidimensional table result visualizations  Star and constellation schemas  Terminology  Drilling down – stepping down nested attributes  Rolling up – moving up nested attributes  Pivot – group by

17 Specialized operators  Cube operator and 4 equivalent queries  Viewing results  See page 722  Equivalent – see 723

18 Populating the warehouse  Transformation  Integration  cleaning

19 Data mining  Effectively an open world application  Association, classification, clustering – page 730  Association – confidence and support – page 731


Download ppt "XML, distributed databases, and OLAP/warehousing The semantic web and a lot more."

Similar presentations


Ads by Google