and Di s t r i b u t e d Databases on the Web Nathaniel Ayewah CSE 8330 Presentation SMU
Introduction Why XML? In Databases? Structured Content Data Model
Overview XML Technologies XML and Databases Will XML supplant Relational DBs? Querying XML Different Approaches
XML Origins SGML HTML XML
XML Example (a)HTML Southern Methodist Universiy PO Box 0133, Dallas, TX SMU is a private university of more than 11,000 students near the vibrant heart of Dallas. (b)XML Southern Methodist University PO Box 0133 Dallas TX SMU is a private university of more than 11,000 students near the vibrant heart of Dallas. meta language
On Data and Documents Data-centric. Regular Structure Computer Science 23 Excellent Electrical Engineering 4353 O.K.. Document-centric Irregular Structure Rock the SEAS Vote! Vote for your favorite faculty member. Once a year, you get to choose your favorite faculty member. That time has come again. March 25 to April 1 Stop by the CSE office because you care vs
XML Technologies XSLT XQuery XML-QL XPath XQL XSL-FO XPointer XML Schema DTD SAX DOM XML Encryption XML Signature SOAP UDDI WSDL
XML Technologies Validation and Structure Query Languages** Parsing and Processing Transformation and Presentation
Validation and Structure DTD
Query Languages Document Community Database Community W3C [XPath/XQL] [XML-QL] [XQuery]
“What is the size of the Computer Science Faculty?” XQL Document(“departments.xml”)//department = “CSE”]/facultysize Output: 23
“What is the size of the Computer Science Faculty?” XML-QL WHERE $n $f $r IN “departments.xml”, $c = “CSE” CONSTRUCT $f Output: 23
“What is the size of the Computer Science Faculty?” XQuery for $b in doc(“departments.xml”)//department let $d := $b/facultysize where = “CSE” return {$d} Output: 23
{ for $i in fn:doc("catalog.xml")//item, $p in fn:doc("parts.xml")//part[partno = $i/partno], $s in fn:doc("suppliers.xml")//supplier[suppno = $i/suppno] order by $p/description, $s/suppname return { $p/description, $s/suppname, $i/price } } XQuery: Joins Source:
XML and Databases
Why distribute? Data Integration Data Distribution
Classification Data viewVirtual viewQuery view (d)(d)(v)(v)(q)(q) Classification: T d,v,q d, v, q {R, X, H, N} R = Relational Data Model X = XML Data Model H = Hybrid Data Model N = View does not exist
Products Native XML XML Enabled Middleware or XML Server Wrappers Standalone XML XQuery Engine Content Management System T X,X,X T R,X,X T R,R,X ? T R,R,R T R,X,X T R,X,N T X,R,R T X,N,X T X,X,X
XML-Enabled Relation: DepartmentsXML IDNameSize CSE Computer Science 234 EE Electrical Engineering 334 CSEComputer Science234 EEElectrical Engineering334 Default View
Native XML Database Native Relational XML Document Collection Physical Storage Tuple(s) Relation XML::DB Initiative
Berkeley DB XML XML Data Model over Physical Berkeley DB storage system Native XML Storage Supports transactions, recovery, indexing, replication, multiple users and concurrency, query processing, standards C++/Java APIs
Berkeley DB XML Source: Berkeley DB XML Documentation
Current/Future Research Issues Physical Storage Query Optimization Distributed Processing/Optimization Static vs Dynamic Processing First, Last, Partial Results Updates
Conclusion and Future Will XML replace existing DBs? Document-centric applications XML Sources Data warehousing (Xyleme) Data-centric applications Business Transactions