Download presentation
Presentation is loading. Please wait.
1
M.P. Johnson, DBMS, Stern/NYU, Spring 20051 C20.0046: Database Management Systems Lecture #25 M.P. Johnson Stern School of Business, NYU Spring, 2005
2
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 2 Agenda Querying XML Data Warehousing Next week: Data Mining Websearch Etc.
3
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 3 Goals after today: 1. Be aware of some of the important XML standards 2. Know how to write some DW queries in Oracle
4
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 4 XML: Semi-structured data Not too random Data organized into entities Similar/related grouped to form other entities Not too structured Some attributes may be missing Size of attributes may vary Support of lists/sets Juuust Right Data is self-describing
5
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 5 Lost in Translation 2003 Hamlet 1999 Bill Murray Lost in Translation 2003 Hamlet 1999 Bill Murray
6
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 6 New topic: Querying XML XPath Simple protocol for accessing node Will use in XQuery and conversion from relations XQuery SQL : relations :: XQuery : XML XSLT sophisticated transformations Sometimes for presentation
7
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 7 XQuery Queries are FLWR expressions Based on Quilt and XML-QL FOR $b IN document("bib.xml")//book WHERE $b/publisher = "Morgan Kaufmann" AND $b/year = "1998" RETURN $b/title FOR/LET... WHERE... RETURN... FOR/LET... WHERE... RETURN...
8
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 8 XQuery Find all book titles published after 1995: FOR $x IN document("bib.xml")/bib/book WHERE $x/year > 1995 RETURN { $x/title } FOR $x IN document("bib.xml")/bib/book WHERE $x/year > 1995 RETURN { $x/title } Result: abc def ghi
9
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 9 SQL v. XQuery Product(pid, name, maker) Company(cid, name, city) Find all products made in NYC SELECT x.name FROM Product x, Company y WHERE x.maker=y.cid and y.city="NYC" SELECT x.name FROM Product x, Company y WHERE x.maker=y.cid and y.city="NYC" FOR $r in document("db.xml")/db, $x in $r/Product/row, $y in $r/Company/row WHERE $x/maker/text()=$y/cid/text() and $y/city/text() = "NYC" RETURN { $x/name } FOR $r in document("db.xml")/db, $x in $r/Product/row, $y in $r/Company/row WHERE $x/maker/text()=$y/cid/text() and $y/city/text() = "NYC" RETURN { $x/name } SQL XQuery
10
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 10 SQL v. XQuery For each company with revenues < 1M count the products over $100 SELECT y.name, count(*) FROM Product x, Company y WHERE x.price > 100 and x.maker=y.cid and y.revenue < 1000000 GROUP BY y.cid, y.name SELECT y.name, count(*) FROM Product x, Company y WHERE x.price > 100 and x.maker=y.cid and y.revenue < 1000000 GROUP BY y.cid, y.name FOR $r in document("db.xml")/db, $y in $r/Company/row[revenue/text()<1000000] RETURN { $y/name/text() } { count( $r/Product/row[maker/text()=$y/cid/text()][price/text()>100]) } FOR $r in document("db.xml")/db, $y in $r/Company/row[revenue/text()<1000000] RETURN { $y/name/text() } { count( $r/Product/row[maker/text()=$y/cid/text()][price/text()>100]) }
11
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 11 XSLT: XSL Transformations Converts XML docs to other XML docs Or to HTML, PDF, etc. E.g.: Have data in XML, want to display to all users Users view web with IE, Firefox, Treo… Have XSLT convert to HTML that looks good on each XSLT processor takes XML doc and XSL template for view
12
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 12 XSLT v. XQuery FLWR expressions: Often much simpler than XSLT XSLT v. XQuery: http://www.xmlportfolio.com/xquery.html http://www.xmlportfolio.com/xquery.html FOR $b IN document("bib.xml")//book WHERE $b/publisher = "Morgan Kaufmann" AND $b/year = "1998“ RETURN $b/title FOR $b IN document("bib.xml")//book WHERE $b/publisher = "Morgan Kaufmann" AND $b/year = "1998“ RETURN $b/title <xsl:if test="publisher='Morgan Kaufmann' and year='1998'"> <xsl:if test="publisher='Morgan Kaufmann' and year='1998'">
13
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 13 Displaying XML with XSL/XSLT XSL: style sheet language for XML XSL : XML :: CSS : HTML Menu in XML: http://www.w3schools.com/xml/simple.xml http://www.w3schools.com/xml/simple.xml XSL file for displaying it: http://www.w3schools.com/xml/simple.xsl http://www.w3schools.com/xml/simple.xsl XSL applied to the XML: http://www.w3schools.com/xml/simplexsl.xml http://www.w3schools.com/xml/simplexsl.xml More info on Java with XSLT and XPath: http://java.sun.com/webservices/docs/ea2/tutorial/doc/JAXPXSLT2.html http://java.sun.com/webservices/docs/ea2/tutorial/doc/JAXPXSLT2.html
14
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 14 From XML to relations (Oracle) To move single values from XML to tables, can simply use extractvalue in UPDATE statements: SQL> UPDATE purchase_order SET order_nbr = 7101, customer_po_nbr = extractvalue(purchase_order_doc, '/purchase_order/po_number'), customer_inception_date = to_date(extractvalue(purchase_order_doc, '/purchase_order/po_date'), 'yyyy-mm-dd'); SQL> UPDATE purchase_order SET order_nbr = 7101, customer_po_nbr = extractvalue(purchase_order_doc, '/purchase_order/po_number'), customer_inception_date = to_date(extractvalue(purchase_order_doc, '/purchase_order/po_date'), 'yyyy-mm-dd');
15
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 15 From relations to XML (Oracle) Saw how to put XML in a table Conversely, can convert ordinary rel data to XML XMLElement() generates an XML node Now can call XMLElement ftn to wrap vals in tags: And can build it up recursively: SELECT XMLElement("supplier_id", s.supplier_id) || XMLElement("name", s.name) xml_fragment FROM supplier s; SELECT XMLElement("supplier_id", s.supplier_id) || XMLElement("name", s.name) xml_fragment FROM supplier s; SELECT XMLElement("supplier", XMLElement("supplier_id", s.supplier_id), XMLElement("name", s.name)) FROM supplier s; SELECT XMLElement("supplier", XMLElement("supplier_id", s.supplier_id), XMLElement("name", s.name)) FROM supplier s;
16
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 16 Why XML matters Hugely popular To past few years what Java was to mid-90s Buzzword-compliant XML databases won’t likely replace RDBMSs (remember OODBMSs?), but: Allows for comm. between DBMSs disparate architectures, tools, languages, etc. Basis for Web Services DBMS vendors are adding XML support MS, Oracle, et al.
17
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 17 For more info APIs: SAX, JAXP Editors: XML Spy, MS XML Notepad: http://www.webattack.com/get/xmlnotepad.shtml http://www.webattack.com/get/xmlnotepad.shtml Parsers: Saxon, Xalan, MS XML Parser Lectures drew on resources from: Nine-week course on XML: http://www.cs.rpi.edu/~puninj/XMLJ/classes.html http://www.cs.rpi.edu/~puninj/XMLJ/classes.html W3C XML Tutorial: http://www.w3schools.com/xml/default.asp http://www.w3schools.com/xml/default.asp http://www.cs.cornell.edu/courses/cs433/2001fa/Slides/Xml,%20XPath,%20&%20Xslt.ppt
18
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 18 Recent XML news/etc. Group at Sun planning “binary XML” http://developers.slashdot.org/article.pl?sid=05/01/14/1650206&tid=156 http://developers.slashdot.org/article.pl?sid=05/01/14/1650206&tid=156 XML is “simple and sloppy” http://www.adambosworth.net/archives/000031.html http://www.adambosworth.net/archives/000031.html RDF: Resource Definition Framework Metadata for the web “Semantic web” Content, authors, relations to other content http://www.w3.org/DesignIssues/RDFnot.html http://www.w3.org/DesignIssues/RDFnot.html Web + XML = the “global mind” http://novaspivack.typepad.com/nova_spivacks_weblog/2004/06/minding_the_pla.html http://novaspivack.typepad.com/nova_spivacks_weblog/2004/06/minding_the_pla.html
19
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 19 New topic: Data Warehousing Physical warehouse: stores different kinds of items combined from different sources in supply chain access items as a combined package “Synergy” DW is the sys containing the data from many DBs OLAP is the system for easily querying the DW Online analytical processing front-end to DW & stats
20
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 20 Integrating Data Ad hoc combination of DBs from different sources can be problematic Data may be spread across many systems geographically by division different systems from before mergers…
21
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 21 Conversion/scrubbing/merging Lots of issues… different types of data Varchar(255) v. char(30) Different values for data ‘GREEN’/’GR/’2 Semantic differences Cars v. Automobiles Missing values Handle with nulls or XML
22
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 22 Federated DBs Situ: n different DBs must work together One idea: write programs for each to talk to each other one How many programs required? Like ambassadors for each country
23
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 23 Federated DBs Better idea: introduce another DB write programs for it to talk to each other DB Now how many programs? English in business, French in diplomacy Warehousing Refreshed nightly
24
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 24 OLTP v. OLAP DWs usually not updated in real-time data is usually not live but care about higher-level, longer-term patterns For “knowledge workers”/decision-makers Live data is in system used by OLTP online transaction processing E.g., airline reservations OLTP data loaded into DW periodically, say nightly
25
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 25 Utilizing Data Situ: each time manager has hunch requests custom reports direct programmers to write/modify SQL app to produce these results on higher or lower levels, for different specifics Problem: too difficult/expensive/slow too great a time lag
26
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 26 EISs Could just write queries at command-prompt But decision makes aren’t (all) SQL programmers Soln: create an executive information system provides friendly front-end to common, important queries basically a simple DB front-end your project part 5 GROUP BY queries are particularly applicable…
27
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 27 EISs v. OLAP Okay for fixed set of queries But what if queries are open-ended? Q: What’s driving sales in the Northeast? What’s the source cause? Result from one query influences next query tried OLAP systems are interactive: run query analyze results think of new query repeat
28
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 28 Star Schemas Popular schema for DW data One central DB surrounded by specific DBs Center: fact table Extremities: data tables Fields in fact table are foreign keys to data tables Normalization Snowflake Schema May not be worthwhile…
29
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 29 Dates and star schemas OLAP behaves as though you had a Days table, with every possible row Dates(day, week, month, year, DID) (5, 27, 7, 2000) Can join on Days like any other table
30
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 30 Dates and star schemas E.g.: products x salesperson x region x date Products sold by salespeople in regions on dates Regular dim tables: Product(PID, name, color) Emp(name, SSN, sal) Region(name, RID) Fact table: Sales(PID, DID, SSN, RID) Interpret as a cube (cross product of all dimensions) Can have both data and stats
31
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 31 Drill-down & roll-up Imagine: notice some region’s sales way up Why? Good salesperson? Some popular product there? Maybe need to search by month, or month and product, abstract back up to just product… “slicing & dicing”
32
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 32 OLAP and data warehousing Could write GROUP BY queries for each OLAP systems provide simpler, non-SQL interface for this sort of thing Vendors: MicroStrategy, SAP, etc. Otoh: DW-style operators have been added to SQL and some DBMSs…
33
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 33 DW extensions in SQL: ROLLUP (Oracle) Suppose have orders table (from two years), with region and date info: Can select total sales: Examples derived/from Mastering Oracle SQL, 2e (O’Reilly) Get data here: http://examples.oreilly.com/mastorasql2/mosql2_data.sql http://examples.oreilly.com/mastorasql2/mosql2_data.sql SELECT sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id; SELECT sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id; SQL> column month format a10 SQL> @mosql2_data SQL> describe all_orders; SQL> column month format a10 SQL> @mosql2_data SQL> describe all_orders;
34
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 34 Can write GROUP BY queries for year or region or both: SELECT r.name region, o.year, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY (r.name, o.year); SELECT r.name region, o.year, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY (r.name, o.year); DW extensions in SQL: ROLLUP (Oracle)
35
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 35 ROLLUP operator Extension of GROUP BY Does GROUP BY on several levels, simultaneously Order matters Get sales totals for each region/year pair each region, and the grand total: SELECT r.name region, o.year, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY ROLLUP (r.name, o.year); SELECT r.name region, o.year, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY ROLLUP (r.name, o.year); DW extensions in SQL: ROLLUP (Oracle)
36
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 36 Change the order of the group fields to get a different sequence of groups To get totals for each year/region pair, each year, and the grand total, and just reverse group-by order: SELECT o.year, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY ROLLUP (o.year, r.name); SELECT o.year, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY ROLLUP (o.year, r.name); DW extensions in SQL: ROLLUP (Oracle)
37
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 37 Adding more dimensions, like month, is easy (apart from formatting): NB: summing happens on each level SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY ROLLUP (o.year, o.month, r.name); SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY ROLLUP (o.year, o.month, r.name); DW extensions in SQL: ROLLUP (Oracle)
38
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 38 If desired, can combine fields for the sake of grouping: DW extensions in SQL: ROLLUP (Oracle) SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY ROLLUP ((o.year, o.month), r.name); SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY ROLLUP ((o.year, o.month), r.name);
39
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 39 DW extensions in SQL: CUBE (Oracle) Another GROUP BY extension: CUBE Subtotals all possible combins of group-by fields (powerset) Syntax: “ROLLUP” “CUBE” Order of fields doesn’t matter (apart from ordering) To get subtotals for each region/month pair, each region, each month, and the grand total: SELECT to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY CUBE (o.month, r.name); SELECT to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY CUBE (o.month, r.name);
40
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 40 DW extensions in SQL: CUBE (Oracle) Again, can easily add more dimensions: SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY CUBE (o.year, o.month, r.name); SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY CUBE (o.year, o.month, r.name);
41
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 41 DW SQL exts: GROUPING SETS (Oracle) That’s a lot of rows Instead of a cube of all combinations, maybe we just want the totals for each individual field: SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY GROUPING SETS (o.year, o.month, r.name); SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY GROUPING SETS (o.year, o.month, r.name);
42
M.P. Johnson, DBMS, Stern/NYU, Spring 2005 42 Next time Overview of data mining Some other odds & ends…
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.