Download presentation
Presentation is loading. Please wait.
1
M.P. Johnson, DBMS, Stern/NYU, Sp20041 C20.0046: Database Management Systems Lecture #24 Matthew P. Johnson Stern School of Business, NYU Spring, 2004
2
M.P. Johnson, DBMS, Stern/NYU, Sp2004 2 Agenda Previously: XML Next: Finish XML & related technologies Hardware Indices Hw3 up soon 1-minute responses Grading
3
M.P. Johnson, DBMS, Stern/NYU, Sp2004 3 XML Applications/dialects Copy from: http://pages.stern.nyu.edu/~mjohnson/dbms/eg/xml.txt http://pages.stern.nyu.edu/~mjohnson/dbms/eg/xml.txt MathML: Mathematical Markup Language http://wwwasdoc.web.cern.ch/wwwasdoc/WWW/publications/ictp 99/ictp99N8059.html http://wwwasdoc.web.cern.ch/wwwasdoc/WWW/publications/ictp 99/ictp99N8059.html ChemML: Chemical Markup Language X4ML: XML for Merrill Lynch XHMTL: HTML retrofitted as an XML application Validation: http://pages.stern.nyu.edu/~mjohnson/dbms/ http://pages.stern.nyu.edu/~mjohnson/dbms/
4
M.P. Johnson, DBMS, Stern/NYU, Sp2004 4 XML Applications/dialects VoiceXML: http://newmedia.purchase.edu/~Jeanine/interfaces/rps.xml http://newmedia.purchase.edu/~Jeanine/interfaces/rps.xml AT&T Directory Assistance http://phone.yahoo.com/ http://phone.yahoo.com/ Image from http://www.voicexml.org/tutorials/intro2.html
5
M.P. Johnson, DBMS, Stern/NYU, Sp2004 5 More XML Apps FIXML XML equiv. of FIX: Financial Information eXchange swiftML XML equiv. of SWIFT: Society for Worldwide Interbank Financial Telecommunications message format Apache’s Ant Scripting language for Java build management http://ant.apache.org/manual/using.html http://ant.apache.org/manual/using.html Many more: http://www-106.ibm.com/developerworks/xml/library/x-stand4/ http://www-106.ibm.com/developerworks/xml/library/x-stand4/
6
M.P. Johnson, DBMS, Stern/NYU, Sp2004 6 More XML Applications/Protocols RSS: Rich Site Summary/Really Simple Syndication http://slate.msn.com/rss/ http://slate.msn.com/rss/ http://slashdot.org/index.rss http://slashdot.org/index.rss Screenshot http://paulboutin.weblogger.com/pictures/viewer$673 More info: http://slate.msn.com/id/2096660/http://slate.msn.com/id/2096660/ my channel story 1 … // other items my channel story 1 … // other items
7
M.P. Johnson, DBMS, Stern/NYU, Sp2004 7 More XML Applications/Protocols SOAP: Simple Object Access Protocol XML-based messaging format Used by Google API: http://www.google.com/apis/http://www.google.com/apis/ Amazon API: http://amazon.com/gp/aws/landing.htmlhttp://amazon.com/gp/aws/landing.html Amazon light: http://kokogiak.com/amazon/http://kokogiak.com/amazon/ Other examples: http://www.wired.com/wired/archive/12.03/google.html?pg= 10&topic=&topic_set= http://www.wired.com/wired/archive/12.03/google.html?pg= 10&topic=&topic_set SOAP envelope with header and body Request sales tax for total <SOAP:Envelope xmlns:SOAP="urn:schemas-xmlsoap-org:soap.v1"> 100 <SOAP:Envelope xmlns:SOAP="urn:schemas-xmlsoap-org:soap.v1"> 100
8
M.P. Johnson, DBMS, Stern/NYU, Sp2004 8 More XML Applications/Protocols %(key)s 0 10 true false %(key)s 0 10 true false
9
M.P. Johnson, DBMS, Stern/NYU, Sp2004 9 RDF RDF: Resource Definition Framework Describe info on web Metadata for the web Content, authors, relations to other content “Semantic web” See http://www.w3.org/DesignIssues/RDFnot.html http://www.w3.org/DesignIssues/RDFnot.html
10
M.P. Johnson, DBMS, Stern/NYU, Sp2004 10 New topic: Querying XML XPath Simple protocol for accessing node Won’t discuss XQuery: SQL of XML XSLT: sophisticated transformations
11
M.P. Johnson, DBMS, Stern/NYU, Sp2004 11 XQuery XQuery: FLWR expressions Based on Quilt and XML-QL FOR/LET... WHERE... RETURN... FOR/LET... WHERE... RETURN... FOR $b IN document("bib.xml")//book WHERE $b/publisher = "Morgan Kaufmann" AND $b/year = "1998" RETURN $b/title
12
M.P. Johnson, DBMS, Stern/NYU, Sp2004 12 XQuery Find all book titles published after 1995: FOR $x IN document("bib.xml")/bib/book WHERE $x/year > 1995 RETURN { $x/title } FOR $x IN document("bib.xml")/bib/book WHERE $x/year > 1995 RETURN { $x/title } Result: abc def ghi
13
M.P. Johnson, DBMS, Stern/NYU, Sp2004 13 SQL and XQuery Side-by-side Product(pid, name, maker) Company(cid, name, city) Find all products made in Seattle SELECT x.name FROM Product x, Company y WHERE x.maker=y.cid and y.city=“Seattle” FOR $r in document(“db.xml”)/db, $x in $r/Product/row, $y in $r/Company/row WHERE $x/maker/text()=$y/cid/text() and $y/city/text() = “Seattle” RETURN { $x/name } SQL XQuery
14
M.P. Johnson, DBMS, Stern/NYU, Sp2004 14 SQL and XQuery Side-by-side For each company with revenues < 1M count the products over $100 SELECT y.name, count(*) FROM Product x, Company y WHERE x.price > 100 and x.maker=y.cid and y.revenue < 1000000 GROUP BY y.cid, y.name FOR $r in document(“db.xml”)/db, $y in $r/Company/row[revenue/text() { $y/name/text() } { count($r/Product/row[maker/text()=$y/cid/text()][price/text()>100]) }
15
M.P. Johnson, DBMS, Stern/NYU, Sp2004 15 XSLT: XST: Transformations Converts XML docs to other XML docs Or to HTML, PDF, etc. E.g.: Have data in XML, want to display to all users Users view web with IE, Netscape, Palm… Have XSLT convert to HTML that looks good on each XSLT processor takes XML doc and XSL template for view
16
M.P. Johnson, DBMS, Stern/NYU, Sp2004 16 Querying XML with XQuery FLWR expressions: Often much simpler than XSLT XSLT v. XQuery: http://www.xmlportfolio.com/xquery.html http://www.xmlportfolio.com/xquery.html FOR $b IN document("bib.xml")//book WHERE $b/publisher = "Morgan Kaufmann" AND $b/year = "1998" RETURN $b/title
17
M.P. Johnson, DBMS, Stern/NYU, Sp2004 17 Displaying XML with XSL/XSLT XSL: style sheet language for XML As CSS is for HTML Menu in XML: http://www.w3schools.com/xml/simple.xml http://www.w3schools.com/xml/simple.xml XSL file for displaying it: http://www.w3schools.com/xml/simple.xsl http://www.w3schools.com/xml/simple.xsl XSL applied to the XML: http://www.w3schools.com/xml/simplexsl.xml http://www.w3schools.com/xml/simplexsl.xml More info on Java with XSLT and Xpath: http://java.sun.com/webservices/docs/ea2/tutorial/doc/JAXPXSLT2.html http://java.sun.com/webservices/docs/ea2/tutorial/doc/JAXPXSLT2.html
18
M.P. Johnson, DBMS, Stern/NYU, Sp2004 18 Why XML matters Hugely popular To millennium what Java was to mid-90s Buzzword compliant XML databases won’t likely replace RDBMSs (remember OODBMSs?), but: Allows for comm. between DBMSs disparate architectures, tools, languages, etc. Basis for Web Services DBMS vendors are adding XML support MS, Oracle, et al.
19
M.P. Johnson, DBMS, Stern/NYU, Sp2004 19 For more info APIs: SAX, JAXP Editors: XML Spy, MS XML Notepad: http://www.webattack.com/get/xmlnotepad.shtml http://www.webattack.com/get/xmlnotepad.shtml Parsers: Saxon, Xalan, MS XML Parser Lecture drew on resources from: Nine-week course on XML: http://www.cs.rpi.edu/~puninj/XMLJ/classes.html http://www.cs.rpi.edu/~puninj/XMLJ/classes.html W3C XML Tutorial: http://www.w3schools.com/xml/default.asp http://www.w3schools.com/xml/default.asp http://www.cs.cornell.edu/courses/cs433/2001fa/Slides/Xml,% 20XPath,%20&%20Xslt.ppt http://www.cs.cornell.edu/courses/cs433/2001fa/Slides/Xml,% 20XPath,%20&%20Xslt.ppt
20
M.P. Johnson, DBMS, Stern/NYU, Sp2004 20 Next topic: Hardware Types of memory Disks Mergesort/TPMMS
21
M.P. Johnson, DBMS, Stern/NYU, Sp2004 21 What should a DBMS do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide durability of the data. How will we do all this?
22
M.P. Johnson, DBMS, Stern/NYU, Sp2004 22 Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution plan Record, index requests Page commands Read/write pages Transaction manager: Concurrency control Logging/recovery Transaction commands Let’s get physical
23
M.P. Johnson, DBMS, Stern/NYU, Sp2004 23 Types of memory Main Memory Disk Tape Volatile limited address spaces expensive average access time: 10-100 ns 5-10 MB/S transmission rates 100s GB storage average time to access a block: 10-15 msecs. Need to consider seek, rotation, transfer times. Keep records “close” to each other. 1.5 MB/S transfer rate 280 GB typical capacity Only sequential access Not for operational data Cache: access time 10 nano’s
24
M.P. Johnson, DBMS, Stern/NYU, Sp2004 24 Main Memory Fastest, most expensive Today: O(1 GB) are common on PCs Some databases could fit in memory New industry trend: Main Memory Database But many cannot RAM is volatile and small Still need to store on disk
25
M.P. Johnson, DBMS, Stern/NYU, Sp2004 25 Secondary Storage Disks Slower, cheaper than main memory Persistent! Used with a main memory buffer
26
M.P. Johnson, DBMS, Stern/NYU, Sp2004 26 $200 worth of disk space
27
M.P. Johnson, DBMS, Stern/NYU, Sp2004 27 Data must be in RAM for DBMS to operate on it! Table of pairs is maintained. LRU is not always good. DB MAIN MEMORY DISK disk page free frame Page Requests from Higher Levels BUFFER POOL choice of frame dictated by replacement policy Buffer Management in a DBMS
28
M.P. Johnson, DBMS, Stern/NYU, Sp2004 28 Buffer Manager Why not just use the OS? DBMS may be able to anticipate access patterns Hence, may also be able to perform prefetching DBMS needs the ability to force pages to disk.
29
M.P. Johnson, DBMS, Stern/NYU, Sp2004 29 Tertiary Storage CDs, DVDs, jukeboxes ROM Tapes, tape silos sequential access Bi but very slow long term archiving only
30
M.P. Johnson, DBMS, Stern/NYU, Sp2004 30 The Mechanics of Disk Mechanical characteristics: Rotation speed (5400RPM) Number of platters (1-30) Number of tracks (<=10000) Number of bytes/track(105) Platters Spindle Disk head Arm movement Arm assembly Tracks Sector Cylinder
31
M.P. Johnson, DBMS, Stern/NYU, Sp2004 31 Disk Access Characteristics Disk latency = time between when command is issued and when data is in memory Disk latency = seek time + rotational latency Seek time = time for the head to reach cylinder 10ms – 40ms Rotational latency = time for the sector to rotate Rotation time = 10ms Average latency = 10ms/2 Transfer time = typically 40MB/s Disks read/write one block at a time (typically 4kB)
32
M.P. Johnson, DBMS, Stern/NYU, Sp2004 32 A little CS… In main memory: CPU time Big O notation ! In databases time is dominated by I/O cost Big O too, but for I/O’s Often big O becomes a constant The I/O Model of Computation Consequence: need to redesign certain algorithms
33
M.P. Johnson, DBMS, Stern/NYU, Sp2004 33 Mergesort Alg E.g. Complexity
34
M.P. Johnson, DBMS, Stern/NYU, Sp2004 34 Sorting Problem: sort 1 GB of data with 1MB of RAM. Where we need this: Data requested in sorted order (ORDER BY) Needed for grouping operations First step in sort-merge join algorithm Duplicate removal Bulk loading of B+-tree indexes.
35
M.P. Johnson, DBMS, Stern/NYU, Sp2004 35 Two-Way Merge-sort Requires 3 Buffers in RAM Pass 1: Read a page, sort it, write it. Pass 2, 3, …, etc.: merge two runs, write them Main memory buffers INPUT 1 INPUT 2 OUTPUT Disk Runs of length L Runs of length 2L
36
M.P. Johnson, DBMS, Stern/NYU, Sp2004 36 Two-Way External Merge Sort Assume block size is B = 4Kb Step 1 runs of length L = 4Kb Step 2 runs of length L = 8Kb Step 3 runs of length L = 16Kb = 2 3-1 * 4Kb … Step 9 runs of length L = 1MB … Step 19 runs of length L = 1GB (why?) Need 19 iterations over the disk data to sort 1GB
37
M.P. Johnson, DBMS, Stern/NYU, Sp2004 37 Can we do better?
38
M.P. Johnson, DBMS, Stern/NYU, Sp2004 38 Large Two-Way External Merge Sort We've got a meg! Divide RAM into thirds Read, write in blocks of 333kb How much improvement?
39
M.P. Johnson, DBMS, Stern/NYU, Sp2004 39 Can we do better?
40
M.P. Johnson, DBMS, Stern/NYU, Sp2004 40 Cost Model for Our Analysis B: Block size ( = 4KB) M: Size of main memory ( = 1MB) N: Number of records in the file R: Size of one record
41
M.P. Johnson, DBMS, Stern/NYU, Sp2004 41 External Merge-Sort Phase one: load M bytes in memory, sort Result: SIZE/M lists of length M bytes (1MB) M bytes of main memory Disk... M/R records
42
M.P. Johnson, DBMS, Stern/NYU, Sp2004 42 Phase Two Merge M/B – 1 lists into a new list M/B-1 = 1MB / 4kb -1 = 250 Result: lists of size M *(M/B – 1) bytes 249 * 1MB ~= 250 MB M bytes of main memory Disk... Input M/B Input 1 Input 2.. Output
43
M.P. Johnson, DBMS, Stern/NYU, Sp2004 43 Phase Three Merge M/B – 1 lists into a new list Result: lists of size M*(M/B – 1) 2 bytes 249 * 250 MB ~= 62,500 MB = 625 GB M bytes of main memory Disk... Input M/B Input 1 Input 2.. Output
44
M.P. Johnson, DBMS, Stern/NYU, Sp2004 44 Cost of External Merge Sort Number of passes: How much data can we sort with 1MB RAM? 1 pass 1MB 2 passes 250MB (M/B = 250) 3 passes 625GB Time: assume read/write block ~ 10 ms =.01 s eac pass: read, write all data eac pass: 2*625GB/4kb*.01s = 2*1562500s = 2*26041m = 2*434 = 2*18 days = 36 days
45
M.P. Johnson, DBMS, Stern/NYU, Sp2004 45 Cost of External Merge Sort Number of passes: How much data can we sort with 10MB RAM (M/B = 2500)? 1 pass 10MB 2 passes 10MB * 2500 = 25,000MB = 25GB 3 passes 2500 * 25GB = 62,500GB
46
M.P. Johnson, DBMS, Stern/NYU, Sp2004 46 Cost of External Merge Sort Number of passes: How much data can we sort with 100MB RAM (M/B = 25,000)? 1 pass 100MB 2 passes 100MB * 25,000 = 2,500,000MB = 2,500GB = 2.5TB 3 passes 25,000 * 2.5TB = 62,500TB = 62.5PB
47
M.P. Johnson, DBMS, Stern/NYU, Sp2004 47 Next time Next: Indices For next time: reading from chapter 13 posted today Hw3 up soon Now: one-minute responses
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.