Presentation is loading. Please wait.

Presentation is loading. Please wait.

M.P. Johnson, DBMS, Stern/NYU, Sp20041 C20.0046: Database Management Systems Lecture #24 Matthew P. Johnson Stern School of Business, NYU Spring, 2004.

Similar presentations


Presentation on theme: "M.P. Johnson, DBMS, Stern/NYU, Sp20041 C20.0046: Database Management Systems Lecture #24 Matthew P. Johnson Stern School of Business, NYU Spring, 2004."— Presentation transcript:

1 M.P. Johnson, DBMS, Stern/NYU, Sp20041 C20.0046: Database Management Systems Lecture #24 Matthew P. Johnson Stern School of Business, NYU Spring, 2004

2 M.P. Johnson, DBMS, Stern/NYU, Sp2004 2 Agenda Previously: XML Next:  Finish XML & related technologies  Hardware  Indices Hw3 up soon 1-minute responses Grading

3 M.P. Johnson, DBMS, Stern/NYU, Sp2004 3 XML Applications/dialects Copy from: http://pages.stern.nyu.edu/~mjohnson/dbms/eg/xml.txt http://pages.stern.nyu.edu/~mjohnson/dbms/eg/xml.txt MathML: Mathematical Markup Language  http://wwwasdoc.web.cern.ch/wwwasdoc/WWW/publications/ictp 99/ictp99N8059.html http://wwwasdoc.web.cern.ch/wwwasdoc/WWW/publications/ictp 99/ictp99N8059.html ChemML: Chemical Markup Language X4ML: XML for Merrill Lynch XHMTL: HTML retrofitted as an XML application  Validation: http://pages.stern.nyu.edu/~mjohnson/dbms/ http://pages.stern.nyu.edu/~mjohnson/dbms/

4 M.P. Johnson, DBMS, Stern/NYU, Sp2004 4 XML Applications/dialects VoiceXML:  http://newmedia.purchase.edu/~Jeanine/interfaces/rps.xml http://newmedia.purchase.edu/~Jeanine/interfaces/rps.xml  AT&T Directory Assistance  http://phone.yahoo.com/ http://phone.yahoo.com/ Image from http://www.voicexml.org/tutorials/intro2.html

5 M.P. Johnson, DBMS, Stern/NYU, Sp2004 5 More XML Apps FIXML  XML equiv. of FIX: Financial Information eXchange swiftML  XML equiv. of SWIFT: Society for Worldwide Interbank Financial Telecommunications message format Apache’s Ant  Scripting language for Java build management  http://ant.apache.org/manual/using.html http://ant.apache.org/manual/using.html Many more:  http://www-106.ibm.com/developerworks/xml/library/x-stand4/ http://www-106.ibm.com/developerworks/xml/library/x-stand4/

6 M.P. Johnson, DBMS, Stern/NYU, Sp2004 6 More XML Applications/Protocols RSS: Rich Site Summary/Really Simple Syndication  http://slate.msn.com/rss/ http://slate.msn.com/rss/  http://slashdot.org/index.rss http://slashdot.org/index.rss  Screenshot http://paulboutin.weblogger.com/pictures/viewer$673  More info: http://slate.msn.com/id/2096660/http://slate.msn.com/id/2096660/ my channel story 1 … // other items my channel story 1 … // other items

7 M.P. Johnson, DBMS, Stern/NYU, Sp2004 7 More XML Applications/Protocols SOAP: Simple Object Access Protocol  XML-based messaging format  Used by Google API: http://www.google.com/apis/http://www.google.com/apis/  Amazon API: http://amazon.com/gp/aws/landing.htmlhttp://amazon.com/gp/aws/landing.html  Amazon light: http://kokogiak.com/amazon/http://kokogiak.com/amazon/  Other examples: http://www.wired.com/wired/archive/12.03/google.html?pg= 10&topic=&topic_set= http://www.wired.com/wired/archive/12.03/google.html?pg= 10&topic=&topic_set SOAP envelope with header and body  Request sales tax for total <SOAP:Envelope xmlns:SOAP="urn:schemas-xmlsoap-org:soap.v1"> 100 <SOAP:Envelope xmlns:SOAP="urn:schemas-xmlsoap-org:soap.v1"> 100

8 M.P. Johnson, DBMS, Stern/NYU, Sp2004 8 More XML Applications/Protocols %(key)s 0 10 true false %(key)s 0 10 true false

9 M.P. Johnson, DBMS, Stern/NYU, Sp2004 9 RDF RDF: Resource Definition Framework  Describe info on web  Metadata for the web  Content, authors, relations to other content  “Semantic web” See http://www.w3.org/DesignIssues/RDFnot.html http://www.w3.org/DesignIssues/RDFnot.html

10 M.P. Johnson, DBMS, Stern/NYU, Sp2004 10 New topic: Querying XML XPath  Simple protocol for accessing node  Won’t discuss XQuery: SQL of XML XSLT: sophisticated transformations

11 M.P. Johnson, DBMS, Stern/NYU, Sp2004 11 XQuery XQuery: FLWR expressions  Based on Quilt and XML-QL FOR/LET... WHERE... RETURN... FOR/LET... WHERE... RETURN... FOR $b IN document("bib.xml")//book WHERE $b/publisher = "Morgan Kaufmann" AND $b/year = "1998" RETURN $b/title

12 M.P. Johnson, DBMS, Stern/NYU, Sp2004 12 XQuery Find all book titles published after 1995: FOR $x IN document("bib.xml")/bib/book WHERE $x/year > 1995 RETURN { $x/title } FOR $x IN document("bib.xml")/bib/book WHERE $x/year > 1995 RETURN { $x/title } Result: abc def ghi

13 M.P. Johnson, DBMS, Stern/NYU, Sp2004 13 SQL and XQuery Side-by-side Product(pid, name, maker) Company(cid, name, city) Find all products made in Seattle SELECT x.name FROM Product x, Company y WHERE x.maker=y.cid and y.city=“Seattle” FOR $r in document(“db.xml”)/db, $x in $r/Product/row, $y in $r/Company/row WHERE $x/maker/text()=$y/cid/text() and $y/city/text() = “Seattle” RETURN { $x/name } SQL XQuery

14 M.P. Johnson, DBMS, Stern/NYU, Sp2004 14 SQL and XQuery Side-by-side For each company with revenues < 1M count the products over $100 SELECT y.name, count(*) FROM Product x, Company y WHERE x.price > 100 and x.maker=y.cid and y.revenue < 1000000 GROUP BY y.cid, y.name FOR $r in document(“db.xml”)/db, $y in $r/Company/row[revenue/text() { $y/name/text() } { count($r/Product/row[maker/text()=$y/cid/text()][price/text()>100]) }

15 M.P. Johnson, DBMS, Stern/NYU, Sp2004 15 XSLT: XST: Transformations Converts XML docs to other XML docs  Or to HTML, PDF, etc. E.g.: Have data in XML, want to display to all users  Users view web with IE, Netscape, Palm…  Have XSLT convert to HTML that looks good on each  XSLT processor takes XML doc and XSL template for view

16 M.P. Johnson, DBMS, Stern/NYU, Sp2004 16 Querying XML with XQuery FLWR expressions:  Often much simpler than XSLT XSLT v. XQuery:  http://www.xmlportfolio.com/xquery.html http://www.xmlportfolio.com/xquery.html FOR $b IN document("bib.xml")//book WHERE $b/publisher = "Morgan Kaufmann" AND $b/year = "1998" RETURN $b/title

17 M.P. Johnson, DBMS, Stern/NYU, Sp2004 17 Displaying XML with XSL/XSLT XSL: style sheet language for XML  As CSS is for HTML Menu in XML:  http://www.w3schools.com/xml/simple.xml http://www.w3schools.com/xml/simple.xml XSL file for displaying it:  http://www.w3schools.com/xml/simple.xsl http://www.w3schools.com/xml/simple.xsl XSL applied to the XML:  http://www.w3schools.com/xml/simplexsl.xml http://www.w3schools.com/xml/simplexsl.xml More info on Java with XSLT and Xpath:  http://java.sun.com/webservices/docs/ea2/tutorial/doc/JAXPXSLT2.html http://java.sun.com/webservices/docs/ea2/tutorial/doc/JAXPXSLT2.html

18 M.P. Johnson, DBMS, Stern/NYU, Sp2004 18 Why XML matters Hugely popular  To millennium what Java was to mid-90s  Buzzword compliant XML databases won’t likely replace RDBMSs (remember OODBMSs?), but: Allows for comm. between DBMSs disparate architectures, tools, languages, etc.  Basis for Web Services DBMS vendors are adding XML support  MS, Oracle, et al.

19 M.P. Johnson, DBMS, Stern/NYU, Sp2004 19 For more info APIs: SAX, JAXP Editors: XML Spy, MS XML Notepad: http://www.webattack.com/get/xmlnotepad.shtml http://www.webattack.com/get/xmlnotepad.shtml Parsers: Saxon, Xalan, MS XML Parser Lecture drew on resources from: Nine-week course on XML:  http://www.cs.rpi.edu/~puninj/XMLJ/classes.html http://www.cs.rpi.edu/~puninj/XMLJ/classes.html W3C XML Tutorial:  http://www.w3schools.com/xml/default.asp http://www.w3schools.com/xml/default.asp http://www.cs.cornell.edu/courses/cs433/2001fa/Slides/Xml,% 20XPath,%20&%20Xslt.ppt http://www.cs.cornell.edu/courses/cs433/2001fa/Slides/Xml,% 20XPath,%20&%20Xslt.ppt

20 M.P. Johnson, DBMS, Stern/NYU, Sp2004 20 Next topic: Hardware Types of memory Disks Mergesort/TPMMS

21 M.P. Johnson, DBMS, Stern/NYU, Sp2004 21 What should a DBMS do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide durability of the data. How will we do all this?

22 M.P. Johnson, DBMS, Stern/NYU, Sp2004 22 Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution plan Record, index requests Page commands Read/write pages Transaction manager: Concurrency control Logging/recovery Transaction commands Let’s get physical

23 M.P. Johnson, DBMS, Stern/NYU, Sp2004 23 Types of memory Main Memory Disk Tape Volatile limited address spaces expensive average access time: 10-100 ns 5-10 MB/S transmission rates 100s GB storage average time to access a block: 10-15 msecs. Need to consider seek, rotation, transfer times. Keep records “close” to each other. 1.5 MB/S transfer rate 280 GB typical capacity Only sequential access Not for operational data Cache: access time 10 nano’s

24 M.P. Johnson, DBMS, Stern/NYU, Sp2004 24 Main Memory Fastest, most expensive Today: O(1 GB) are common on PCs Some databases could fit in memory  New industry trend: Main Memory Database But many cannot RAM is volatile and small  Still need to store on disk

25 M.P. Johnson, DBMS, Stern/NYU, Sp2004 25 Secondary Storage Disks Slower, cheaper than main memory Persistent! Used with a main memory buffer

26 M.P. Johnson, DBMS, Stern/NYU, Sp2004 26 $200 worth of disk space

27 M.P. Johnson, DBMS, Stern/NYU, Sp2004 27 Data must be in RAM for DBMS to operate on it! Table of pairs is maintained. LRU is not always good. DB MAIN MEMORY DISK disk page free frame Page Requests from Higher Levels BUFFER POOL choice of frame dictated by replacement policy Buffer Management in a DBMS

28 M.P. Johnson, DBMS, Stern/NYU, Sp2004 28 Buffer Manager Why not just use the OS? DBMS may be able to anticipate access patterns Hence, may also be able to perform prefetching DBMS needs the ability to force pages to disk.

29 M.P. Johnson, DBMS, Stern/NYU, Sp2004 29 Tertiary Storage CDs, DVDs, jukeboxes  ROM Tapes, tape silos  sequential access Bi but very slow  long term archiving only

30 M.P. Johnson, DBMS, Stern/NYU, Sp2004 30 The Mechanics of Disk Mechanical characteristics:  Rotation speed (5400RPM)  Number of platters (1-30)  Number of tracks (<=10000)  Number of bytes/track(105) Platters Spindle Disk head Arm movement Arm assembly Tracks Sector Cylinder

31 M.P. Johnson, DBMS, Stern/NYU, Sp2004 31 Disk Access Characteristics Disk latency = time between when command is issued and when data is in memory Disk latency = seek time + rotational latency  Seek time = time for the head to reach cylinder 10ms – 40ms  Rotational latency = time for the sector to rotate Rotation time = 10ms Average latency = 10ms/2 Transfer time = typically 40MB/s Disks read/write one block at a time (typically 4kB)

32 M.P. Johnson, DBMS, Stern/NYU, Sp2004 32 A little CS… In main memory: CPU time  Big O notation ! In databases time is dominated by I/O cost  Big O too, but for I/O’s  Often big O becomes a constant  The I/O Model of Computation Consequence: need to redesign certain algorithms

33 M.P. Johnson, DBMS, Stern/NYU, Sp2004 33 Mergesort Alg E.g. Complexity

34 M.P. Johnson, DBMS, Stern/NYU, Sp2004 34 Sorting Problem: sort 1 GB of data with 1MB of RAM. Where we need this:  Data requested in sorted order (ORDER BY)  Needed for grouping operations  First step in sort-merge join algorithm  Duplicate removal  Bulk loading of B+-tree indexes.

35 M.P. Johnson, DBMS, Stern/NYU, Sp2004 35 Two-Way Merge-sort Requires 3 Buffers in RAM Pass 1: Read a page, sort it, write it. Pass 2, 3, …, etc.: merge two runs, write them Main memory buffers INPUT 1 INPUT 2 OUTPUT Disk Runs of length L Runs of length 2L

36 M.P. Johnson, DBMS, Stern/NYU, Sp2004 36 Two-Way External Merge Sort Assume block size is B = 4Kb Step 1  runs of length L = 4Kb Step 2  runs of length L = 8Kb Step 3  runs of length L = 16Kb = 2 3-1 * 4Kb … Step 9  runs of length L = 1MB … Step 19  runs of length L = 1GB (why?) Need 19 iterations over the disk data to sort 1GB

37 M.P. Johnson, DBMS, Stern/NYU, Sp2004 37 Can we do better?

38 M.P. Johnson, DBMS, Stern/NYU, Sp2004 38 Large Two-Way External Merge Sort We've got a meg! Divide RAM into thirds Read, write in blocks of 333kb How much improvement?

39 M.P. Johnson, DBMS, Stern/NYU, Sp2004 39 Can we do better?

40 M.P. Johnson, DBMS, Stern/NYU, Sp2004 40 Cost Model for Our Analysis B: Block size ( = 4KB) M: Size of main memory ( = 1MB) N: Number of records in the file R: Size of one record

41 M.P. Johnson, DBMS, Stern/NYU, Sp2004 41 External Merge-Sort Phase one: load M bytes in memory, sort  Result: SIZE/M lists of length M bytes (1MB) M bytes of main memory Disk... M/R records

42 M.P. Johnson, DBMS, Stern/NYU, Sp2004 42 Phase Two Merge M/B – 1 lists into a new list  M/B-1 = 1MB / 4kb -1 = 250 Result: lists of size M *(M/B – 1) bytes  249 * 1MB ~= 250 MB M bytes of main memory Disk... Input M/B Input 1 Input 2.. Output

43 M.P. Johnson, DBMS, Stern/NYU, Sp2004 43 Phase Three Merge M/B – 1 lists into a new list Result: lists of size M*(M/B – 1) 2 bytes  249 * 250 MB ~= 62,500 MB = 625 GB M bytes of main memory Disk... Input M/B Input 1 Input 2.. Output

44 M.P. Johnson, DBMS, Stern/NYU, Sp2004 44 Cost of External Merge Sort Number of passes: How much data can we sort with 1MB RAM?  1 pass  1MB  2 passes  250MB (M/B = 250)  3 passes  625GB Time:  assume read/write block ~ 10 ms =.01 s  eac pass: read, write all data  eac pass: 2*625GB/4kb*.01s = 2*1562500s = 2*26041m = 2*434 = 2*18 days = 36 days

45 M.P. Johnson, DBMS, Stern/NYU, Sp2004 45 Cost of External Merge Sort Number of passes: How much data can we sort with 10MB RAM (M/B = 2500)?  1 pass  10MB  2 passes  10MB * 2500 = 25,000MB = 25GB  3 passes  2500 * 25GB = 62,500GB

46 M.P. Johnson, DBMS, Stern/NYU, Sp2004 46 Cost of External Merge Sort Number of passes: How much data can we sort with 100MB RAM (M/B = 25,000)?  1 pass  100MB  2 passes  100MB * 25,000 = 2,500,000MB = 2,500GB = 2.5TB  3 passes  25,000 * 2.5TB = 62,500TB = 62.5PB

47 M.P. Johnson, DBMS, Stern/NYU, Sp2004 47 Next time Next: Indices For next time: reading from chapter 13 posted today Hw3 up soon Now: one-minute responses


Download ppt "M.P. Johnson, DBMS, Stern/NYU, Sp20041 C20.0046: Database Management Systems Lecture #24 Matthew P. Johnson Stern School of Business, NYU Spring, 2004."

Similar presentations


Ads by Google