End of XQuery DBMS Internals

Slides:



Advertisements
Similar presentations
Storing Data: Disk Organization and I/O
Advertisements

Storing Data: Disks and Files
Storing Data: Disks and Files: Chapter 9
1 Overview of Storage and Indexing Chapter 8 (part 1)
Querying XML (cont.). Comments on XPath? What’s good about it? What can’t it do that you want it to do? How does it compare, say, to SQL?
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Query Languages - XQuery Slides partially from Dan Suciu.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
1 Overview of Storage and Indexing Chapter 8 1. Basics about file management 2. Introduction to indexing 3. First glimpse at indices and workloads.
Xpath to XQuery February 23rd, Other Stuff HW 3 is out. Instructions for Phase 3 are out. Today: finish Xpath, start and finish Xquery. From Wednesday:
Querying XML February 12 th, Querying XML Data XPath = simple navigation through the tree XQuery = the SQL of XML XSLT = recursive traversal –will.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 9.
Lecture 11: DMBS Internals
Physical Storage Susan B. Davidson University of Pennsylvania CIS330 – Database Management Systems November 20, 2007.
Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”
End of XML February 19 th, FLWR (“Flower”) Expressions FOR... LET... WHERE... RETURN... FOR... LET... WHERE... RETURN...
Lecture #19 May 13 th, Agenda Trip Report End of constraints: triggers. Systems aspects of SQL – read chapter 8 in the book. Going under the lid!
1 Chapter 17 Disk Storage, Basic File Structures, and Hashing Chapter 18 Index Structures for Files.
ICS 321 Fall 2011 Overview of Storage & Indexing (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 11/9/20111Lipyeow.
Sorting.
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary.
1 Overview of Storage and Indexing Chapter 8 (part 1)
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
Chapter Ten. Storage Categories Storage medium is required to store information/data Primary memory can be accessed by the CPU directly Fast, expensive.
1 External Sorting. 2 Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing gpa order.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
CPSC-608 Database Systems Fall 2015 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
XML May 6th, Instructor AnHai Doan Brief bio –high school in Vietnam & undergrad in Hungary –M.S. at Wisconsin –Ph.D. at Washington under Alon &
Datalog Another formalism for expressing queries: - cleaner - closer to a “logic” notation - more convenient for analysis - equivalent in power to relational.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
Lecture 17: XPath and XQuery Wednesday, Nov. 7, 2001.
1 CSE544: Lecture 7 XQuery, Relational Algebra Monday, 4/22/02.
1 Lecture 15: Data Storage, Recovery Monday, February 13, 2006.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
The very Essentials of Disk and Buffer Management.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Disks and Files.
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
CS522 Advanced database Systems
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Management Systems (CS 564)
CPSC-608 Database Systems
Oracle SQL*Loader
Querying XML and Semistructured Data
Disks and Files DBMS stores information on (“hard”) disks.
Lecture 11: DMBS Internals
File Organizations Chapter 8 “How index-learning turns no student pale
Lecture 9: Data Storage and IO Models
Disk Storage, Basic File Structures, and Buffer Management
Database Management Systems (CS 564)
CPSC-310 Database Systems
Introduction to Database Systems
Midterm Review – Part I ( Disk, Buffer and Index )
Lecture 15: Midterm Review Data Storage
CS222/CS122C: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
Secondary Storage Management Brian Bershad
Basics Storing Data on Disks and Files
CS222p: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
General External Merge Sort
Secondary Storage Management Hank Levy
Files and access methods
Lecture 18: DMBS Overview and Data Storage
Lecture 12: XQuery in SQL Server
Lecture 15: Data Storage Tuesday, February 20, 2001.
Lecture 20: Representing Data Elements
Presentation transcript:

End of XQuery DBMS Internals February 25th, 2004

FLWR (“Flower”) Expressions FOR ... LET... WHERE... RETURN...

FOR v.s. LET FOR Binds node variables  iteration LET Binds collection variables  one value

FOR v.s. LET FOR $x IN document("bib.xml")/bib/book Returns: <result> <book>...</book></result> ... FOR $x IN document("bib.xml")/bib/book RETURN <result> { $x } </result> LET $x IN document("bib.xml")/bib/book RETURN <result> { $x } </result> Returns: <result> <book>...</book> <book>...</book> ... </result>

Collections in XQuery Ordered and unordered collections /bib/book/author = an ordered collection Distinct(/bib/book/author) = an unordered collection LET $b = /bib/book  $b is a collection $b/author  a collection (several authors...) Returns: <result> <author>...</author> <author>...</author> ... </result> RETURN <result> { $b/author } </result>

Collections in XQuery What about collections in expressions ? $b/price  list of n prices $b/price * 0.7  list of n numbers $b/price * $b/quantity  list of n x m numbers ?? $b/price * ($b/quant1 + $b/quant2)  $b/price * $b/quant1 + $b/price * $b/quant2 !!

Sorting in XQuery <publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) RETURN <publisher> <name> { $p/text() } </name> , FOR $b IN document("bib.xml")//book[publisher = $p] RETURN <book> { $b/title , $b/price } </book> SORTBY(price DESCENDING) </publisher> SORTBY(name) </publisher_list>

If-Then-Else FOR $h IN //holding RETURN <holding> { $h/title, IF $h/@type = "Journal" THEN $h/editor ELSE $h/author } </holding> SORTBY (title)

Existential Quantifiers FOR $b IN //book WHERE SOME $p IN $b//para SATISFIES contains($p, "sailing") AND contains($p, "windsurfing") RETURN { $b/title }

Universal Quantifiers FOR $b IN //book WHERE EVERY $p IN $b//para SATISFIES contains($p, "sailing") RETURN { $b/title }

Other Stuff in XQuery BEFORE and AFTER FILTER Recursive functions for dealing with order in the input FILTER deletes some edges in the result tree Recursive functions Currently: arbitrary recursion Perhaps more restrictions in the future ?

Database Internals How does it all work?

What Should a DBMS Do? How will we do all this?? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide durability of the data. How will we do all this??

Query compiler/optimizer Generic Architecture Query update User/ Application Query compiler/optimizer Query execution plan Transaction commands Record, index requests Execution engine Index/record mgr. Transaction manager: Concurrency control Logging/recovery Page commands Buffer manager Read/write pages Storage manager storage

The Memory Hierarchy Main Memory Disk Tape 5-10 MB/S transmission rates Gigs of storage average time to access a block: 10-15 msecs. Need to consider seek, rotation, transfer times. Keep records “close” to each other. 1.5 MB/S transfer rate 280 GB typical capacity Only sequential access Not for operational data Volatile limited address spaces expensive average access time: 10-100 nanoseconds Cache: access time 10 nano’s

Main Memory Fastest, most expensive Today: 512MB are common on PCs Many databases could fit in memory New industry trend: Main Memory Database E.g TimesTen Main issue is volatility

Secondary Storage Disks Slower, cheaper than main memory Persistent !!! Used with a main memory buffer

Buffer Management in a DBMS Page Requests from Higher Levels BUFFER POOL disk page free frame MAIN MEMORY DISK DB choice of frame dictated by replacement policy Data must be in RAM for DBMS to operate on it! Table of <frame#, pageid> pairs is maintained. LRU is not always good. 4

Buffer Manager Manages buffer pool: the pool provides space for a limited number of pages from disk. Needs to decide on page replacement policy. Enables the higher levels of the DBMS to assume that the needed data is in main memory. Why not use the Operating System for the task?? - DBMS may be able to anticipate access patterns - Hence, may also be able to perform prefetching - DBMS needs the ability to force pages to disk.

Tertiary Storage Tapes or optical disks Extremely slow: used for long term archiving only

The Mechanics of Disk Mechanical characteristics: Cylinder Mechanical characteristics: Rotation speed (5400RPM) Number of platters (1-30) Number of tracks (<=10000) Number of bytes/track(105) Spindle Tracks Disk head Sector Arm movement Platters Arm assembly

Disk Access Characteristics Disk latency = time between when command is issued and when data is in memory Disk latency = seek time + rotational latency Seek time = time for the head to reach cylinder 10ms – 40ms Rotational latency = time for the sector to rotate Rotation time = 10ms Average latency = 10ms/2 Transfer time = typically 10MB/s Disks read/write one block at a time (typically 4kB)

The I/O Model of Computation In main memory algorithms we care about CPU time In databases time is dominated by I/O cost Assumption: cost is given only by I/O Consequence: need to redesign certain algorithms Will illustrate here with sorting

Sorting Illustrates the difference in algorithm design when your data is not in main memory: Problem: sort 1Gb of data with 1Mb of RAM. Arises in many places in database systems: Data requested in sorted order (ORDER BY) Needed for grouping operations First step in sort-merge join algorithm Duplicate removal Bulk loading of B+-tree indexes. 4

2-Way Merge-sort: Requires 3 Buffers Pass 1: Read a page, sort it, write it. only one buffer page is used Pass 2, 3, …, etc.: three buffer pages used. INPUT 1 OUTPUT INPUT 2 Main memory buffers Disk Disk 5

Two-Way External Merge Sort 3,4 6,2 9,4 8,7 5,6 3,1 2 Input file Each pass we read + write each page in file. N pages in the file => the number of passes So total cost is: Improvement: start with larger runs Sort 1GB with 1MB memory in 10 passes PASS 0 3,4 2,6 4,9 7,8 5,6 1,3 2 1-page runs PASS 1 2,3 4,7 1,3 2-page runs 4,6 8,9 5,6 2 PASS 2 2,3 4,4 1,2 4-page runs 6,7 3,5 8,9 6 PASS 3 1,2 2,3 3,4 8-page runs 4,5 6,6 7,8 9 6

Can We Do Better ? We have more main memory Should use it to improve performance

Cost Model for Our Analysis B: Block size M: Size of main memory N: Number of records in the file R: Size of one record 3

External Merge-Sort Phase one: load M bytes in memory, sort Result: runs of length M/R records M/R records . . . . . . Disk Disk M bytes of main memory

Phase Two . . . . . . Merge M/B – 1 runs into a new run Result: runs have now M/R (M/B – 1) records Input 1 . . . Input 2 . . . Output . . . . Input M/B Disk Disk M bytes of main memory 7

Phase Three . . . . . . Merge M/B – 1 runs into a new run Result: runs have now M/R (M/B – 1)2 records Input 1 . . . Input 2 . . . Output . . . . Input M/B Disk Disk M bytes of main memory 7

Cost of External Merge Sort Number of passes: Think differently Given B = 4KB, M = 64MB, R = 0.1KB Pass 1: runs of length M/R = 640000 Have now sorted runs of 640000 records Pass 2: runs increase by a factor of M/B – 1 = 16000 Have now sorted runs of 10,240,000,000 = 1010 records Pass 3: runs increase by a factor of M/B – 1 = 16000 Have now sorted runs of 1014 records Nobody has so much data ! Can sort everything in 2 or 3 passes ! 8

Number of Passes of External Sort B: number of frames in the buffer pool; N: number of pages in relation. 9

Next on Agenda File organization (brief) Indexing Query execution Query optimization

Storage and Indexing How do we store efficiently large amounts of data? The appropriate storage depends on what kind of accesses we expect to have to the data. We consider: primary storage of the data additional indexes (very very important).

Cost Model for Our Analysis As a good approximation, we ignore CPU costs: B: The number of data pages R: Number of records per page D: (Average) time to read or write disk page Measuring number of page I/O’s ignores gains of pre-fetching blocks of pages; thus, even I/O cost is only approximated. Average-case analysis; based on several simplistic assumptions. 3

File Organizations and Assumptions Heap Files: Equality selection on key; exactly one match. Insert always at end of file. Sorted Files: Files compacted after deletions. Selections on sort field(s). Hashed Files: No overflow buckets, 80% page occupancy. Single record insert and delete. 4

Cost of Operations 5