M.P. Johnson, DBMS, Stern/NYU, Spring 20051 C20.0046: Database Management Systems Lecture #25 M.P. Johnson Stern School of Business, NYU Spring, 2005.

Slides:



Advertisements
Similar presentations
Inside an XSLT Processor Michael Kay, ICL 19 May 2000.
Advertisements

BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
Data Analysis. Overview Traditional database systems are tuned to many, small, simple queries. Some applications use fewer, more time-consuming, analytic.
Technical BI Project Lifecycle
Chapter 18: Data Analysis and Mining Kat Powell. Chapter 18: Data Analysis and Mining ➔ Decision Support Systems ➔ Data Analysis and OLAP ➔ Data Warehousing.
Data Warehousing M R BRAHMAM.
Decision Support and Data Warehouse. Decision supports Systems Components Data management function –Data warehouse Model management function –Analytical.
OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries.
Chapter 3 Database Management
1 Lecture 9: XQuery. 2 XQuery Motivation XPath expressivity insufficient –no join queries (as in SQL) –no changes to the XML structure possible –no quantifiers.
3-1 Chapter 3 Data and Knowledge Management
1 Introduction to Database Systems CSE 444 Lecture 11 Xpath/XQuery April 23, 2008.
1 Lecture 11: Xpath/XQuery Friday, October 20, 2006.
M.P. Johnson, DBMS, Stern/NYU, Sp20041 C : Database Management Systems Lecture #24 Matthew P. Johnson Stern School of Business, NYU Spring, 2004.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
DATA WAREHOUSE (Muscat, Oman).
1 Lecture 16: Querying XML Data: XPath, XQuery Friday, February 11, 2005.
CS346: Advanced Databases
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization.
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.
Sayed Ahmed Logical Design of a Data Warehouse.  Free Training and Educational Services  Training and Education in Bangla: Training and Education in.
1 XQuery Slides From Dr. Suciu. 2 FLWR (“Flower”) Expressions FOR... LET... WHERE... RETURN... FOR... LET... WHERE... RETURN...
CS 157B: Database Management Systems II May 8 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak
1 INTRODUCTION TO DATABASE MANAGEMENT SYSTEM L E C T U R E
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
311: Management Information Systems Database Systems Chapter 3.
OnLine Analytical Processing (OLAP)
1 Data Warehouses BUAD/American University Data Warehouses.
Data Warehousing.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Winter 2006Winter 2002 Keller, Ullman, CushingJudy Cushing 19–1 Warehousing The most common form of information integration: copy sources into a single.
UNIT-II Principles of dimensional modeling
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
1 On-Line Analytic Processing Warehousing Data Cubes.
CMPE 226 Database Systems October 21 Class Meeting Department of Computer Engineering San Jose State University Fall 2015 Instructor: Ron Mak
1 XQuery Slides From Dr. Suciu. 2 XQuery Based on Quilt, which is based on XML-QL Uses XPath to express more complex queries.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
XQuery 1. In this lecture Summary of XQuery FLWOR expressions – For, Let, Where, Order by, Return FOR and LET expressions Collections and sorting 2.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
1 Lecture 12: XML, XPath, XQuery Friday, October 24, 2003.
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
Chapter 04 Semantic Web Application Architecture 23 November 2015 A Team 오혜성, 조형헌, 권윤, 신동준, 이인용.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Data Warehouses and OLAP 1.  Review Questions ◦ Question 1: OLAP ◦ Question 2: Data Warehouses ◦ Question 3: Various Terms and Definitions ◦ Question.
Pindaro Demertzoglou Data Resource Management – MGMT 4170 Lally School of Management Rensselaer Polytechnic Institute.
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
Introduction to Business Analytics
CMPE 226 Database Systems April 12 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
I Copyright © 2004, Oracle. All rights reserved. Introduction.
XML: Extensible Markup Language
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
On-Line Analytic Processing
Data warehouse and OLAP
Lecture 11: Xpath/XQuery
Databases and Data Warehouses Chapter 3
CMPE 226 Database Systems April 11 Class Meeting
MANAGING DATA RESOURCES
Introduction of Week 9 Return assignment 5-2
Xquery Slides From Dr. Suciu.
Presentation transcript:

M.P. Johnson, DBMS, Stern/NYU, Spring C : Database Management Systems Lecture #25 M.P. Johnson Stern School of Business, NYU Spring, 2005

M.P. Johnson, DBMS, Stern/NYU, Spring Agenda Querying XML Data Warehousing Next week:  Data Mining  Websearch  Etc.

M.P. Johnson, DBMS, Stern/NYU, Spring Goals after today: 1. Be aware of some of the important XML standards 2. Know how to write some DW queries in Oracle

M.P. Johnson, DBMS, Stern/NYU, Spring XML: Semi-structured data Not too random  Data organized into entities  Similar/related grouped to form other entities Not too structured  Some attributes may be missing  Size of attributes may vary Support of lists/sets Juuust Right  Data is self-describing

M.P. Johnson, DBMS, Stern/NYU, Spring Lost in Translation 2003 Hamlet 1999 Bill Murray Lost in Translation 2003 Hamlet 1999 Bill Murray

M.P. Johnson, DBMS, Stern/NYU, Spring New topic: Querying XML XPath  Simple protocol for accessing node  Will use in XQuery and conversion from relations XQuery  SQL : relations :: XQuery : XML XSLT  sophisticated transformations  Sometimes for presentation

M.P. Johnson, DBMS, Stern/NYU, Spring XQuery Queries are FLWR expressions  Based on Quilt and XML-QL FOR $b IN document("bib.xml")//book WHERE $b/publisher = "Morgan Kaufmann" AND $b/year = "1998" RETURN $b/title FOR/LET... WHERE... RETURN... FOR/LET... WHERE... RETURN...

M.P. Johnson, DBMS, Stern/NYU, Spring XQuery Find all book titles published after 1995: FOR $x IN document("bib.xml")/bib/book WHERE $x/year > 1995 RETURN { $x/title } FOR $x IN document("bib.xml")/bib/book WHERE $x/year > 1995 RETURN { $x/title } Result: abc def ghi

M.P. Johnson, DBMS, Stern/NYU, Spring SQL v. XQuery Product(pid, name, maker) Company(cid, name, city) Find all products made in NYC SELECT x.name FROM Product x, Company y WHERE x.maker=y.cid and y.city="NYC" SELECT x.name FROM Product x, Company y WHERE x.maker=y.cid and y.city="NYC" FOR $r in document("db.xml")/db, $x in $r/Product/row, $y in $r/Company/row WHERE $x/maker/text()=$y/cid/text() and $y/city/text() = "NYC" RETURN { $x/name } FOR $r in document("db.xml")/db, $x in $r/Product/row, $y in $r/Company/row WHERE $x/maker/text()=$y/cid/text() and $y/city/text() = "NYC" RETURN { $x/name } SQL XQuery

M.P. Johnson, DBMS, Stern/NYU, Spring SQL v. XQuery For each company with revenues < 1M count the products over $100 SELECT y.name, count(*) FROM Product x, Company y WHERE x.price > 100 and x.maker=y.cid and y.revenue < GROUP BY y.cid, y.name SELECT y.name, count(*) FROM Product x, Company y WHERE x.price > 100 and x.maker=y.cid and y.revenue < GROUP BY y.cid, y.name FOR $r in document("db.xml")/db, $y in $r/Company/row[revenue/text()< ] RETURN { $y/name/text() } { count( $r/Product/row[maker/text()=$y/cid/text()][price/text()>100]) } FOR $r in document("db.xml")/db, $y in $r/Company/row[revenue/text()< ] RETURN { $y/name/text() } { count( $r/Product/row[maker/text()=$y/cid/text()][price/text()>100]) }

M.P. Johnson, DBMS, Stern/NYU, Spring XSLT: XSL Transformations Converts XML docs to other XML docs  Or to HTML, PDF, etc. E.g.: Have data in XML, want to display to all users  Users view web with IE, Firefox, Treo…  Have XSLT convert to HTML that looks good on each  XSLT processor takes XML doc and XSL template for view

M.P. Johnson, DBMS, Stern/NYU, Spring XSLT v. XQuery FLWR expressions:  Often much simpler than XSLT XSLT v. XQuery:  FOR $b IN document("bib.xml")//book WHERE $b/publisher = "Morgan Kaufmann" AND $b/year = "1998“ RETURN $b/title FOR $b IN document("bib.xml")//book WHERE $b/publisher = "Morgan Kaufmann" AND $b/year = "1998“ RETURN $b/title <xsl:if test="publisher='Morgan Kaufmann' and year='1998'"> <xsl:if test="publisher='Morgan Kaufmann' and year='1998'">

M.P. Johnson, DBMS, Stern/NYU, Spring Displaying XML with XSL/XSLT XSL: style sheet language for XML  XSL : XML :: CSS : HTML Menu in XML:  XSL file for displaying it:  XSL applied to the XML:  More info on Java with XSLT and XPath: 

M.P. Johnson, DBMS, Stern/NYU, Spring From XML to relations (Oracle) To move single values from XML to tables, can simply use extractvalue in UPDATE statements: SQL> UPDATE purchase_order SET order_nbr = 7101, customer_po_nbr = extractvalue(purchase_order_doc, '/purchase_order/po_number'), customer_inception_date = to_date(extractvalue(purchase_order_doc, '/purchase_order/po_date'), 'yyyy-mm-dd'); SQL> UPDATE purchase_order SET order_nbr = 7101, customer_po_nbr = extractvalue(purchase_order_doc, '/purchase_order/po_number'), customer_inception_date = to_date(extractvalue(purchase_order_doc, '/purchase_order/po_date'), 'yyyy-mm-dd');

M.P. Johnson, DBMS, Stern/NYU, Spring From relations to XML (Oracle) Saw how to put XML in a table Conversely, can convert ordinary rel data to XML  XMLElement() generates an XML node Now can call XMLElement ftn to wrap vals in tags: And can build it up recursively: SELECT XMLElement("supplier_id", s.supplier_id) || XMLElement("name", s.name) xml_fragment FROM supplier s; SELECT XMLElement("supplier_id", s.supplier_id) || XMLElement("name", s.name) xml_fragment FROM supplier s; SELECT XMLElement("supplier", XMLElement("supplier_id", s.supplier_id), XMLElement("name", s.name)) FROM supplier s; SELECT XMLElement("supplier", XMLElement("supplier_id", s.supplier_id), XMLElement("name", s.name)) FROM supplier s;

M.P. Johnson, DBMS, Stern/NYU, Spring Why XML matters Hugely popular  To past few years what Java was to mid-90s  Buzzword-compliant XML databases won’t likely replace RDBMSs (remember OODBMSs?), but: Allows for comm. between DBMSs disparate architectures, tools, languages, etc.  Basis for Web Services DBMS vendors are adding XML support  MS, Oracle, et al.

M.P. Johnson, DBMS, Stern/NYU, Spring For more info APIs: SAX, JAXP Editors: XML Spy, MS XML Notepad: Parsers: Saxon, Xalan, MS XML Parser Lectures drew on resources from: Nine-week course on XML:  W3C XML Tutorial: 

M.P. Johnson, DBMS, Stern/NYU, Spring Recent XML news/etc. Group at Sun planning “binary XML”  XML is “simple and sloppy”  RDF: Resource Definition Framework  Metadata for the web  “Semantic web”  Content, authors, relations to other content  Web + XML = the “global mind” 

M.P. Johnson, DBMS, Stern/NYU, Spring New topic: Data Warehousing Physical warehouse: stores different kinds of items  combined from different sources in supply chain  access items as a combined package  “Synergy” DW is the sys containing the data from many DBs OLAP is the system for easily querying the DW  Online analytical processing  front-end to DW & stats

M.P. Johnson, DBMS, Stern/NYU, Spring Integrating Data Ad hoc combination of DBs from different sources can be problematic Data may be spread across many systems  geographically  by division  different systems from before mergers…

M.P. Johnson, DBMS, Stern/NYU, Spring Conversion/scrubbing/merging Lots of issues…  different types of data Varchar(255) v. char(30)  Different values for data ‘GREEN’/’GR/’2  Semantic differences Cars v. Automobiles  Missing values Handle with nulls or XML

M.P. Johnson, DBMS, Stern/NYU, Spring Federated DBs Situ: n different DBs must work together One idea: write programs for each to talk to each other one  How many programs required?  Like ambassadors for each country

M.P. Johnson, DBMS, Stern/NYU, Spring Federated DBs Better idea: introduce another DB  write programs for it to talk to each other DB Now how many programs?  English in business, French in diplomacy  Warehousing  Refreshed nightly

M.P. Johnson, DBMS, Stern/NYU, Spring OLTP v. OLAP DWs usually not updated in real-time  data is usually not live  but care about higher-level, longer-term patterns  For “knowledge workers”/decision-makers Live data is in system used by OLTP  online transaction processing  E.g., airline reservations  OLTP data loaded into DW periodically, say nightly

M.P. Johnson, DBMS, Stern/NYU, Spring Utilizing Data Situ: each time manager has hunch   requests custom reports   direct programmers to write/modify SQL app to produce these results  on higher or lower levels, for different specifics Problem: too difficult/expensive/slow  too great a time lag

M.P. Johnson, DBMS, Stern/NYU, Spring EISs Could just write queries at command-prompt But decision makes aren’t (all) SQL programmers Soln: create an executive information system  provides friendly front-end to common, important queries  basically a simple DB front-end  your project part 5 GROUP BY queries are particularly applicable…

M.P. Johnson, DBMS, Stern/NYU, Spring EISs v. OLAP Okay for fixed set of queries But what if queries are open-ended? Q: What’s driving sales in the Northeast?  What’s the source cause?  Result from one query influences next query tried OLAP systems are interactive:  run query  analyze results  think of new query  repeat

M.P. Johnson, DBMS, Stern/NYU, Spring Star Schemas Popular schema for DW data One central DB surrounded by specific DBs Center: fact table Extremities: data tables Fields in fact table are foreign keys to data tables Normalization  Snowflake Schema  May not be worthwhile…

M.P. Johnson, DBMS, Stern/NYU, Spring Dates and star schemas OLAP behaves as though you had a Days table, with every possible row  Dates(day, week, month, year, DID)  (5, 27, 7, 2000) Can join on Days like any other table

M.P. Johnson, DBMS, Stern/NYU, Spring Dates and star schemas E.g.: products x salesperson x region x date  Products sold by salespeople in regions on dates Regular dim tables:  Product(PID, name, color)  Emp(name, SSN, sal)  Region(name, RID) Fact table:  Sales(PID, DID, SSN, RID)  Interpret as a cube (cross product of all dimensions) Can have both data and stats

M.P. Johnson, DBMS, Stern/NYU, Spring Drill-down & roll-up Imagine: notice some region’s sales way up Why? Good salesperson? Some popular product there? Maybe need to search by month, or month and product, abstract back up to just product… “slicing & dicing”

M.P. Johnson, DBMS, Stern/NYU, Spring OLAP and data warehousing Could write GROUP BY queries for each OLAP systems provide simpler, non-SQL interface for this sort of thing Vendors: MicroStrategy, SAP, etc. Otoh: DW-style operators have been added to SQL and some DBMSs…

M.P. Johnson, DBMS, Stern/NYU, Spring DW extensions in SQL: ROLLUP (Oracle) Suppose have orders table (from two years), with region and date info: Can select total sales: Examples derived/from Mastering Oracle SQL, 2e (O’Reilly) Get data here: SELECT sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id; SELECT sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id; SQL> column month format a10 SQL> describe all_orders; SQL> column month format a10 SQL> describe all_orders;

M.P. Johnson, DBMS, Stern/NYU, Spring Can write GROUP BY queries for year or region or both: SELECT r.name region, o.year, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY (r.name, o.year); SELECT r.name region, o.year, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY (r.name, o.year); DW extensions in SQL: ROLLUP (Oracle)

M.P. Johnson, DBMS, Stern/NYU, Spring ROLLUP operator  Extension of GROUP BY  Does GROUP BY on several levels, simultaneously  Order matters Get sales totals for each region/year pair each region, and the grand total: SELECT r.name region, o.year, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY ROLLUP (r.name, o.year); SELECT r.name region, o.year, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY ROLLUP (r.name, o.year); DW extensions in SQL: ROLLUP (Oracle)

M.P. Johnson, DBMS, Stern/NYU, Spring Change the order of the group fields to get a different sequence of groups To get totals for each year/region pair, each year, and the grand total, and just reverse group-by order: SELECT o.year, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY ROLLUP (o.year, r.name); SELECT o.year, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY ROLLUP (o.year, r.name); DW extensions in SQL: ROLLUP (Oracle)

M.P. Johnson, DBMS, Stern/NYU, Spring Adding more dimensions, like month, is easy (apart from formatting): NB: summing happens on each level SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY ROLLUP (o.year, o.month, r.name); SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY ROLLUP (o.year, o.month, r.name); DW extensions in SQL: ROLLUP (Oracle)

M.P. Johnson, DBMS, Stern/NYU, Spring If desired, can combine fields for the sake of grouping: DW extensions in SQL: ROLLUP (Oracle) SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY ROLLUP ((o.year, o.month), r.name); SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY ROLLUP ((o.year, o.month), r.name);

M.P. Johnson, DBMS, Stern/NYU, Spring DW extensions in SQL: CUBE (Oracle) Another GROUP BY extension: CUBE  Subtotals all possible combins of group-by fields (powerset)  Syntax: “ROLLUP”  “CUBE”  Order of fields doesn’t matter (apart from ordering) To get subtotals for each region/month pair, each region, each month, and the grand total: SELECT to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY CUBE (o.month, r.name); SELECT to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY CUBE (o.month, r.name);

M.P. Johnson, DBMS, Stern/NYU, Spring DW extensions in SQL: CUBE (Oracle) Again, can easily add more dimensions: SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY CUBE (o.year, o.month, r.name); SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY CUBE (o.year, o.month, r.name);

M.P. Johnson, DBMS, Stern/NYU, Spring DW SQL exts: GROUPING SETS (Oracle) That’s a lot of rows Instead of a cube of all combinations, maybe we just want the totals for each individual field: SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY GROUPING SETS (o.year, o.month, r.name); SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY GROUPING SETS (o.year, o.month, r.name);

M.P. Johnson, DBMS, Stern/NYU, Spring Next time Overview of data mining Some other odds & ends…