XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

Chapter 10: Designing Databases
XML: Extensible Markup Language
C6 Databases.
ISOM Distributed Databases Arijit Sengupta. ISOM Learning Objectives Understand the concept and necessity of distributed databases Understand the types.
ISSUES THE CLOUD AND DATABASES. WHAT KIND OF DATA MANAGEMENT IS A GOOD FIT WITH THE CLOUD? Analytical data management: data attributes Far more reads.
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 5 Understanding Entity Relationship Diagrams.
1 COS 425: Database and Information Management Systems XML and information exchange.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
Summary. Chapter 9 – Triggers Integrity constraints Enforcing IC with different techniques –Keys –Foreign keys –Attribute-based constraints –Schema-based.
Sunday, June 28, 2015 Abdelali ZAHI : FALL 2003 : XML Schemas XML Schemas Presented By : Abdelali ZAHI Instructor : Dr H.Haddouti.
Cornell CS 502 More XML XML schema, XPATH, XSLT CS 502 – Carl Lagoze – Cornell University.
...Looking back Why use a DBMS? How to design a database? How to query a database? How does a DBMS work?
SQL Server 2000 and XML Erik Veerman Consultant Intellinet Business Intelligence.
4/20/2017.
XP New Perspectives on XML Tutorial 4 1 XML Schema Tutorial – Carey ISBN Working with Namespaces and Schemas.
XML – what is it? eXtensible Markup Language Standard for publishing and interchange on the web and over the wire simpler version of SGML adapted to internet.
XML, CFMX CFML & SQL XML Kevin Penny, MMCP
IST722 Data Warehousing Business Intelligence Development with SQL Server Analysis Services and Excel 2013 Michael A. Fudge, Jr.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
Another PillowTalk Presentation  2004 Dynamic Systems, Inc. Introduction to XML for SOA Lee H. Burstein,
XML Overview. Chapter 8 © 2011 Pearson Education 2 Extensible Markup Language (XML) A text-based markup language (like HTML) A text-based markup language.
Database and Data Warehouse Module B: Designing and Building a Relational Database Chapter 3.
XML과 Database 홍기형 성신여자대학교 성신여자대학교 홍기형.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 2: Intro to Relational.
NOSQL DATABASES Please remember to read the NOSQL Distilled book and the Seven Databases book.
1 CS 430 Database Theory Winter 2005 Lecture 17: Objects, XML, and DBMSs.
Object Persistence (Data Base) Design Chapter 13.
1 Chapter 1 Introduction. 2 Introduction n Definition A database management system (DBMS) is a general-purpose software system that facilitates the process.
1 Design Issues in XML Databases Ref: Designing XML Databases by Mark Graves.
Fall 2013, Databases, Exam 2 Questions for the second exam. Your answers are due by Dec. 18 at 4PM. (This is the final exam slot.) And please type your.
Accessing Data Using XML CHAPTER NINE Matakuliah: T0063 – Pemrograman Visual Tahun: 2009.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
1 Tutorial 14 Validating Documents with Schemas Exploring the XML Schema Vocabulary.
Fall 2013, Databases, Exam 2 Questions for the second exam…
XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
Relational DBs Basics. Formally understood Set theoretic Originally defined with an algebra, with Selection, Projection, Join, and Union/Difference/Intersection.
Database Management Supplement 1. 2 I. The Hierarchy of Data Database File (Entity, Table) Record (info for a specific entity, Row) Field (Attribute,
Chapter 10 Database Management. Data and Information How are data and information related? p Fig Next processing data stored on disk Step.
Session 1 Module 1: Introduction to Data Integrity
Database Connectivity with ASP.NET. 2 Introduction Web pages commonly used to: –Gather information stored on a Web server database Most server-side scripting.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
CHAPTER NINE Accessing Data Using XML. McGraw Hill/Irwin ©2002 by The McGraw-Hill Companies, Inc. All rights reserved Introduction The eXtensible.
Connecting to External Data. Financial data can be obtained from a number of different data sources.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
Database Design, Application Development, and Administration, 6 th Edition Copyright © 2015 by Michael V. Mannino. All rights reserved. Chapter 5 Understanding.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
CENG 351 File Structures and Data Management1 Relational Model Chapter 3.
XML Databases Presented By: Pardeep MT15042 Anurag Goel MT15006.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
Prof. HeleMai Haav: CSC 230 Spring *03 Overview: Databases.
Plan for Final Lecture What you may expect to be asked in the Exam?
XML: Extensible Markup Language
Data warehouse and OLAP
XML in Web Technologies
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehouse and OLAP
Enhance BI Applications and Simplify Development
The Relational Model Textbook /7/2018.
MANAGING DATA RESOURCES
Database management concepts
CSE591: Data Mining by H. Liu
More XML XML schema, XPATH, XSLT
Data Warehouse and OLAP
Presentation transcript:

XML, distributed databases, and OLAP/warehousing The semantic web and a lot more

What is XML?  A framework for declarative languages  A syntax and two major constructs: elements & attributes  Elements:  Have begin and end tags  Can be embedded  Can be put in lists (homogeneous or heterogeneous)  Attributes:  Are assigned to elements  Are strings  Are put in quotes

What is XML for?  Initially, as a cornerstone of the semantic web  Automatic searching of the web (versus interactive)  Self-describing data  Has been adapted to a wide variety of application domains  As a means for specifying the structure of data  As a catch-all for nontraditional data

XML documents  An instance of XML is a language  An instance of an XML language is a document  Documents are hierarchical & list-oriented  XML documents can be parsed in a single, linear pass  There is do notion of a fixed schema  Does not leverage meta data for set-oriented queries  Order matters in a set of documents  Order matters in a series of elements in a document

Is it a generalized HTML?  Sort of, but perhaps more of a meta alternative to HTML  The real point is to allow HTML pages to be located and searched automatically  This is done by allowing language developers to create their own names for documents, elements, & attributes

What else is part of the XML philosophy?  Namespaces  Associated with URLs  Can be referenced in a nested fashion in an XML document  Widely distributed sharing of data, XML languages, and namespaces

What’s missing, from the database uer’s and a programmer’s perspective?  No innate notion of a query language  No Objects  Very limited data structuring capabilities  Yet another impedance mismatch problem  No way to store XML documents in a relational database, at least not natively  No way to make a database out of a set of documents

So, in response to the database community’s desires…  A hierarchical query language – Xpath  A specification format for schemas – DTDs  But uses a different syntax  Does not accommodate namespaces

So, in response to the database community’s desires, phase 2…  XML schema  More atomic or “basic” types  Like DTD’s, but with an XML syntax  Supports namespaces  Adds primary keys and foreign keys  Adds more constructs for structuring data  Simple types: primitive types, list and union, & restriction  Attributes can be of simple types  Complex types: compositors  all (unordered) and sequence (ordered), and choice  Extension and restriction  Integrity constraints

Query language 1: XPath  Follows hierarchy of XML documents  Uses syntax borrowed from Unix file system  \ for root . for current node for value of an attribute  [1], [2], etc., for siblings  // for self or descendent of .//x for all descendants to find an element of a specific type x  Augmented with URLs to create Xpointer  Relational database systems generally have an XML data type now

Distributed Databases & Distributed TXS – homogenous and heterogeneous  See page 689: multiple DBs vs. a distributed DB  Homogeneous distributed DBs  Single unified schema  Designed top down  Distribution by row, column, table, by table selection  Issues of distribution  Redundancy: availability vs. keeping copies up to date  Hidden joins with column distribution  Hidden unions with table selection distribution

Executing distributed transactions  Each node has a master and a client module  Masters are all identical and contain distributed data info  Clients are like single site databases with a prepare to commit  3 basic strategies for query fragment execution  Bring data to procedure  Send procedure to data  Meet in a 3 rd place  Estimating costs  Data shipping  Result shipping  Wait times on nodes  Integrity constraint enforcement

Heterogeneous distributed databases  Forms of heterogeneity  Model  Schema  Database product  Namespace  Table structure (implications for object identities)  Keys and Foreign keys  Units  SQL dialect  Semantic issues relating to varying interpretations of data

Integrating heterogeneous databases  After the fact  Stability is never achieved  Mappings are complex  Data may have conflicts, redundancy, and gaps  Closed world vs. open world

Engineering for nonstop change  Mediators around databases  Gateways connecting old apps and new databases  Gateways connecting new apps and old databases  A stability of instability

OLAP  Standard model  N dimension tables  1 fact table (PK is union of keys of dimension tables)  Hypercube visualization  Multidimensional table result visualizations  Star and constellation schemas  Terminology  Drilling down – stepping down nested attributes  Rolling up – moving up nested attributes  Pivot – group by

Specialized operators  Cube operator and 4 equivalent queries  Viewing results  See page 722  Equivalent – see 723

Populating the warehouse  Transformation  Integration  cleaning

Data mining  Effectively an open world application  Association, classification, clustering – page 730  Association – confidence and support – page 731