Presentation is loading. Please wait.

Presentation is loading. Please wait.

M.P. Johnson, DBMS, Stern/NYU, Spring 20051 C20.0046: Database Management Systems Lecture #25 M.P. Johnson Stern School of Business, NYU Spring, 2005.

Similar presentations


Presentation on theme: "M.P. Johnson, DBMS, Stern/NYU, Spring 20051 C20.0046: Database Management Systems Lecture #25 M.P. Johnson Stern School of Business, NYU Spring, 2005."— Presentation transcript:

1 M.P. Johnson, DBMS, Stern/NYU, Spring 20051 C20.0046: Database Management Systems Lecture #25 M.P. Johnson Stern School of Business, NYU Spring, 2005

2 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 2 Agenda Querying XML Data Warehousing Next week:  Data Mining  Websearch  Etc.

3 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 3 Goals after today: 1. Be aware of some of the important XML standards 2. Know how to write some DW queries in Oracle

4 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 4 XML: Semi-structured data Not too random  Data organized into entities  Similar/related grouped to form other entities Not too structured  Some attributes may be missing  Size of attributes may vary Support of lists/sets Juuust Right  Data is self-describing

5 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 5 Lost in Translation 2003 Hamlet 1999 Bill Murray Lost in Translation 2003 Hamlet 1999 Bill Murray

6 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 6 New topic: Querying XML XPath  Simple protocol for accessing node  Will use in XQuery and conversion from relations XQuery  SQL : relations :: XQuery : XML XSLT  sophisticated transformations  Sometimes for presentation

7 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 7 XQuery Queries are FLWR expressions  Based on Quilt and XML-QL FOR $b IN document("bib.xml")//book WHERE $b/publisher = "Morgan Kaufmann" AND $b/year = "1998" RETURN $b/title FOR/LET... WHERE... RETURN... FOR/LET... WHERE... RETURN...

8 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 8 XQuery Find all book titles published after 1995: FOR $x IN document("bib.xml")/bib/book WHERE $x/year > 1995 RETURN { $x/title } FOR $x IN document("bib.xml")/bib/book WHERE $x/year > 1995 RETURN { $x/title } Result: abc def ghi

9 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 9 SQL v. XQuery Product(pid, name, maker) Company(cid, name, city) Find all products made in NYC SELECT x.name FROM Product x, Company y WHERE x.maker=y.cid and y.city="NYC" SELECT x.name FROM Product x, Company y WHERE x.maker=y.cid and y.city="NYC" FOR $r in document("db.xml")/db, $x in $r/Product/row, $y in $r/Company/row WHERE $x/maker/text()=$y/cid/text() and $y/city/text() = "NYC" RETURN { $x/name } FOR $r in document("db.xml")/db, $x in $r/Product/row, $y in $r/Company/row WHERE $x/maker/text()=$y/cid/text() and $y/city/text() = "NYC" RETURN { $x/name } SQL XQuery

10 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 10 SQL v. XQuery For each company with revenues < 1M count the products over $100 SELECT y.name, count(*) FROM Product x, Company y WHERE x.price > 100 and x.maker=y.cid and y.revenue < 1000000 GROUP BY y.cid, y.name SELECT y.name, count(*) FROM Product x, Company y WHERE x.price > 100 and x.maker=y.cid and y.revenue < 1000000 GROUP BY y.cid, y.name FOR $r in document("db.xml")/db, $y in $r/Company/row[revenue/text()<1000000] RETURN { $y/name/text() } { count( $r/Product/row[maker/text()=$y/cid/text()][price/text()>100]) } FOR $r in document("db.xml")/db, $y in $r/Company/row[revenue/text()<1000000] RETURN { $y/name/text() } { count( $r/Product/row[maker/text()=$y/cid/text()][price/text()>100]) }

11 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 11 XSLT: XSL Transformations Converts XML docs to other XML docs  Or to HTML, PDF, etc. E.g.: Have data in XML, want to display to all users  Users view web with IE, Firefox, Treo…  Have XSLT convert to HTML that looks good on each  XSLT processor takes XML doc and XSL template for view

12 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 12 XSLT v. XQuery FLWR expressions:  Often much simpler than XSLT XSLT v. XQuery:  http://www.xmlportfolio.com/xquery.html http://www.xmlportfolio.com/xquery.html FOR $b IN document("bib.xml")//book WHERE $b/publisher = "Morgan Kaufmann" AND $b/year = "1998“ RETURN $b/title FOR $b IN document("bib.xml")//book WHERE $b/publisher = "Morgan Kaufmann" AND $b/year = "1998“ RETURN $b/title <xsl:if test="publisher='Morgan Kaufmann' and year='1998'"> <xsl:if test="publisher='Morgan Kaufmann' and year='1998'">

13 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 13 Displaying XML with XSL/XSLT XSL: style sheet language for XML  XSL : XML :: CSS : HTML Menu in XML:  http://www.w3schools.com/xml/simple.xml http://www.w3schools.com/xml/simple.xml XSL file for displaying it:  http://www.w3schools.com/xml/simple.xsl http://www.w3schools.com/xml/simple.xsl XSL applied to the XML:  http://www.w3schools.com/xml/simplexsl.xml http://www.w3schools.com/xml/simplexsl.xml More info on Java with XSLT and XPath:  http://java.sun.com/webservices/docs/ea2/tutorial/doc/JAXPXSLT2.html http://java.sun.com/webservices/docs/ea2/tutorial/doc/JAXPXSLT2.html

14 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 14 From XML to relations (Oracle) To move single values from XML to tables, can simply use extractvalue in UPDATE statements: SQL> UPDATE purchase_order SET order_nbr = 7101, customer_po_nbr = extractvalue(purchase_order_doc, '/purchase_order/po_number'), customer_inception_date = to_date(extractvalue(purchase_order_doc, '/purchase_order/po_date'), 'yyyy-mm-dd'); SQL> UPDATE purchase_order SET order_nbr = 7101, customer_po_nbr = extractvalue(purchase_order_doc, '/purchase_order/po_number'), customer_inception_date = to_date(extractvalue(purchase_order_doc, '/purchase_order/po_date'), 'yyyy-mm-dd');

15 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 15 From relations to XML (Oracle) Saw how to put XML in a table Conversely, can convert ordinary rel data to XML  XMLElement() generates an XML node Now can call XMLElement ftn to wrap vals in tags: And can build it up recursively: SELECT XMLElement("supplier_id", s.supplier_id) || XMLElement("name", s.name) xml_fragment FROM supplier s; SELECT XMLElement("supplier_id", s.supplier_id) || XMLElement("name", s.name) xml_fragment FROM supplier s; SELECT XMLElement("supplier", XMLElement("supplier_id", s.supplier_id), XMLElement("name", s.name)) FROM supplier s; SELECT XMLElement("supplier", XMLElement("supplier_id", s.supplier_id), XMLElement("name", s.name)) FROM supplier s;

16 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 16 Why XML matters Hugely popular  To past few years what Java was to mid-90s  Buzzword-compliant XML databases won’t likely replace RDBMSs (remember OODBMSs?), but: Allows for comm. between DBMSs disparate architectures, tools, languages, etc.  Basis for Web Services DBMS vendors are adding XML support  MS, Oracle, et al.

17 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 17 For more info APIs: SAX, JAXP Editors: XML Spy, MS XML Notepad: http://www.webattack.com/get/xmlnotepad.shtml http://www.webattack.com/get/xmlnotepad.shtml Parsers: Saxon, Xalan, MS XML Parser Lectures drew on resources from: Nine-week course on XML:  http://www.cs.rpi.edu/~puninj/XMLJ/classes.html http://www.cs.rpi.edu/~puninj/XMLJ/classes.html W3C XML Tutorial:  http://www.w3schools.com/xml/default.asp http://www.w3schools.com/xml/default.asp http://www.cs.cornell.edu/courses/cs433/2001fa/Slides/Xml,%20XPath,%20&%20Xslt.ppt

18 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 18 Recent XML news/etc. Group at Sun planning “binary XML”  http://developers.slashdot.org/article.pl?sid=05/01/14/1650206&tid=156 http://developers.slashdot.org/article.pl?sid=05/01/14/1650206&tid=156 XML is “simple and sloppy”  http://www.adambosworth.net/archives/000031.html http://www.adambosworth.net/archives/000031.html RDF: Resource Definition Framework  Metadata for the web  “Semantic web”  Content, authors, relations to other content  http://www.w3.org/DesignIssues/RDFnot.html http://www.w3.org/DesignIssues/RDFnot.html Web + XML = the “global mind”  http://novaspivack.typepad.com/nova_spivacks_weblog/2004/06/minding_the_pla.html http://novaspivack.typepad.com/nova_spivacks_weblog/2004/06/minding_the_pla.html

19 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 19 New topic: Data Warehousing Physical warehouse: stores different kinds of items  combined from different sources in supply chain  access items as a combined package  “Synergy” DW is the sys containing the data from many DBs OLAP is the system for easily querying the DW  Online analytical processing  front-end to DW & stats

20 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 20 Integrating Data Ad hoc combination of DBs from different sources can be problematic Data may be spread across many systems  geographically  by division  different systems from before mergers…

21 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 21 Conversion/scrubbing/merging Lots of issues…  different types of data Varchar(255) v. char(30)  Different values for data ‘GREEN’/’GR/’2  Semantic differences Cars v. Automobiles  Missing values Handle with nulls or XML

22 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 22 Federated DBs Situ: n different DBs must work together One idea: write programs for each to talk to each other one  How many programs required?  Like ambassadors for each country

23 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 23 Federated DBs Better idea: introduce another DB  write programs for it to talk to each other DB Now how many programs?  English in business, French in diplomacy  Warehousing  Refreshed nightly

24 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 24 OLTP v. OLAP DWs usually not updated in real-time  data is usually not live  but care about higher-level, longer-term patterns  For “knowledge workers”/decision-makers Live data is in system used by OLTP  online transaction processing  E.g., airline reservations  OLTP data loaded into DW periodically, say nightly

25 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 25 Utilizing Data Situ: each time manager has hunch   requests custom reports   direct programmers to write/modify SQL app to produce these results  on higher or lower levels, for different specifics Problem: too difficult/expensive/slow  too great a time lag

26 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 26 EISs Could just write queries at command-prompt But decision makes aren’t (all) SQL programmers Soln: create an executive information system  provides friendly front-end to common, important queries  basically a simple DB front-end  your project part 5 GROUP BY queries are particularly applicable…

27 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 27 EISs v. OLAP Okay for fixed set of queries But what if queries are open-ended? Q: What’s driving sales in the Northeast?  What’s the source cause?  Result from one query influences next query tried OLAP systems are interactive:  run query  analyze results  think of new query  repeat

28 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 28 Star Schemas Popular schema for DW data One central DB surrounded by specific DBs Center: fact table Extremities: data tables Fields in fact table are foreign keys to data tables Normalization  Snowflake Schema  May not be worthwhile…

29 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 29 Dates and star schemas OLAP behaves as though you had a Days table, with every possible row  Dates(day, week, month, year, DID)  (5, 27, 7, 2000) Can join on Days like any other table

30 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 30 Dates and star schemas E.g.: products x salesperson x region x date  Products sold by salespeople in regions on dates Regular dim tables:  Product(PID, name, color)  Emp(name, SSN, sal)  Region(name, RID) Fact table:  Sales(PID, DID, SSN, RID)  Interpret as a cube (cross product of all dimensions) Can have both data and stats

31 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 31 Drill-down & roll-up Imagine: notice some region’s sales way up Why? Good salesperson? Some popular product there? Maybe need to search by month, or month and product, abstract back up to just product… “slicing & dicing”

32 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 32 OLAP and data warehousing Could write GROUP BY queries for each OLAP systems provide simpler, non-SQL interface for this sort of thing Vendors: MicroStrategy, SAP, etc. Otoh: DW-style operators have been added to SQL and some DBMSs…

33 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 33 DW extensions in SQL: ROLLUP (Oracle) Suppose have orders table (from two years), with region and date info: Can select total sales: Examples derived/from Mastering Oracle SQL, 2e (O’Reilly) Get data here: http://examples.oreilly.com/mastorasql2/mosql2_data.sql http://examples.oreilly.com/mastorasql2/mosql2_data.sql SELECT sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id; SELECT sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id; SQL> column month format a10 SQL> @mosql2_data SQL> describe all_orders; SQL> column month format a10 SQL> @mosql2_data SQL> describe all_orders;

34 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 34 Can write GROUP BY queries for year or region or both: SELECT r.name region, o.year, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY (r.name, o.year); SELECT r.name region, o.year, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY (r.name, o.year); DW extensions in SQL: ROLLUP (Oracle)

35 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 35 ROLLUP operator  Extension of GROUP BY  Does GROUP BY on several levels, simultaneously  Order matters Get sales totals for each region/year pair each region, and the grand total: SELECT r.name region, o.year, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY ROLLUP (r.name, o.year); SELECT r.name region, o.year, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY ROLLUP (r.name, o.year); DW extensions in SQL: ROLLUP (Oracle)

36 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 36 Change the order of the group fields to get a different sequence of groups To get totals for each year/region pair, each year, and the grand total, and just reverse group-by order: SELECT o.year, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY ROLLUP (o.year, r.name); SELECT o.year, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY ROLLUP (o.year, r.name); DW extensions in SQL: ROLLUP (Oracle)

37 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 37 Adding more dimensions, like month, is easy (apart from formatting): NB: summing happens on each level SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY ROLLUP (o.year, o.month, r.name); SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY ROLLUP (o.year, o.month, r.name); DW extensions in SQL: ROLLUP (Oracle)

38 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 38 If desired, can combine fields for the sake of grouping: DW extensions in SQL: ROLLUP (Oracle) SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY ROLLUP ((o.year, o.month), r.name); SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY ROLLUP ((o.year, o.month), r.name);

39 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 39 DW extensions in SQL: CUBE (Oracle) Another GROUP BY extension: CUBE  Subtotals all possible combins of group-by fields (powerset)  Syntax: “ROLLUP”  “CUBE”  Order of fields doesn’t matter (apart from ordering) To get subtotals for each region/month pair, each region, each month, and the grand total: SELECT to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY CUBE (o.month, r.name); SELECT to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY CUBE (o.month, r.name);

40 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 40 DW extensions in SQL: CUBE (Oracle) Again, can easily add more dimensions: SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY CUBE (o.year, o.month, r.name); SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY CUBE (o.year, o.month, r.name);

41 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 41 DW SQL exts: GROUPING SETS (Oracle) That’s a lot of rows Instead of a cube of all combinations, maybe we just want the totals for each individual field: SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY GROUPING SETS (o.year, o.month, r.name); SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales) FROM all_orders o join region r ON r.region_id = o.region_id GROUP BY GROUPING SETS (o.year, o.month, r.name);

42 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 42 Next time Overview of data mining Some other odds & ends…


Download ppt "M.P. Johnson, DBMS, Stern/NYU, Spring 20051 C20.0046: Database Management Systems Lecture #25 M.P. Johnson Stern School of Business, NYU Spring, 2005."

Similar presentations


Ads by Google