Internet Databases Chapter 22

Slides:



Advertisements
Similar presentations
Overview Environment for Internet database connectivity
Advertisements

XML: Extensible Markup Language
XML/EDI Overview West Chester Electronic Commerce Resource Center (ECRC)
3 November 2008CIS 340 # 1 Topics To define XML as a technology To place XML in the context of system architectures.
Information Retrieval in Practice
Summary. Chapter 9 – Triggers Integrity constraints Enforcing IC with different techniques –Keys –Foreign keys –Attribute-based constraints –Schema-based.
Guide To UNIX Using Linux Third Edition
Overview of Search Engines
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 XML Taken from Chapter 7.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
XML: Overview MIS 181.9: Service Oriented Architecture 2 nd Semester,
XML Overview. Chapter 8 © 2011 Pearson Education 2 Extensible Markup Language (XML) A text-based markup language (like HTML) A text-based markup language.
XML과 Database 홍기형 성신여자대학교 성신여자대학교 홍기형.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
1 MSCS 237 Overview of web technologies (A specific type of distributed systems)
Copyright © 2002 ProsoftTraining. All rights reserved. JavaServer Pages.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
An Introduction to XML Sandeep Bhattaram
1 IST 210 Organization of Data Database and the Web.
1 LM 6 Database Applications Dr. Lei Li. Learning Objectives Explain three components of a client-server system Describe differences between a 2-tiered.
Information Retrieval in Practice
CF 1334 Sistem Basis Data (3 SKS)
Diskusi-08 Jelaskan dan berikan contoh penggunaan theta join, equijoin, natural join, outer join, dan semijoin The slides for this text are organized into.
Search Engine Architecture
Diskusi-5 Sebutkan perangkat (tools) yang berpotensi mendukung kebutuhan tugas-tugas manajerial (management work) Jelaskan enam karakteristik informasi.
MODELS OF DATABASE AND DATABASE DESIGN
Tree-Structured Indexes
Storage and Indexes Chapter 8 & 9
Query-by-Example (QBE)
Diskusi-1 Bacalah materi chapter-01, lalu buatlah ringkasan yang berisi tentang : Definisi EUIS Siapa end user Dampak euis pada lingkungan kerja Perencanaan.
Latihan Answer the following questions using the relational schema from the Exercises at the end of Chapter 3: Create the Hotel table using the integrity.
Diskusi-16 Buatlah ringkasan tentang pertimbangan dalam desain yang ergonomis pada tiga perangkat utama komputer yaitu monitor, keyboard dan mouse (lihat.
Database Application Development
Permutations and Combinations
Hash-Based Indexes Chapter 11
Latihan Create a separate table with the same structure as the Booking table to hold archive records. Using the INSERT statement, copy the records from.
Tugas-05 a. Sebutkan primary key masing-masing tabel
Database Processing with XML
Introduction to Query Optimization
Relational Algebra Chapter 4, Part A
Modeling Your Data Chapter 2 cs542
File Organizations Chapter 8 “How index-learning turns no student pale
COSC 6340 Projects & Homeworks Spring 2002
Database Application Development
Internet Applications
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
B+-Trees and Static Hashing
File Organizations and Indexing
File Organizations and Indexing
Tree-Structured Indexes
Team Project, Part II NOMO Auto, Part II IST 210 Section 4
Hash-Based Indexes Chapter 10
Selected Topics: External Sorting, Join Algorithms, …
Relational Algebra Chapter 4, Sections 4.1 – 4.2
B+Trees The slides for this text are organized into chapters. This lecture covers Chapter 9. Chapter 1: Introduction to Database Systems Chapter 2: The.
Data Model.
Database Management Systems CSE594
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Introduction to World Wide Web
Evaluation of Relational Operations: Other Techniques
Distributed Databases
CSE591: Data Mining by H. Liu
Introduction to Database Systems
Chapter 11 Instructor: Xin Zhang
Tree-Structured Indexes
Allyson Falkner Spokane County ISD
File Organizations and Indexing
COSC 3480 Projects & Homeworks Fall 2003
Database Application Development
Relational Calculus Chapter 4, Part B
Presentation transcript:

Internet Databases Chapter 22 The slides for this text are organized into chapters. This lecture covers the material from Chapter 22. Chapter 1: Introduction to Database Systems Chapter 2: The Entity-Relationship Model Chapter 3: The Relational Model Chapter 4 (Part A): Relational Algebra Chapter 4 (Part B): Relational Calculus Chapter 5: SQL: Queries, Programming, Triggers Chapter 6: Query-by-Example (QBE) Chapter 7: Storing Data: Disks and Files Chapter 8: File Organizations and Indexing Chapter 9: Tree-Structured Indexing Chapter 10: Hash-Based Indexing Chapter 11: External Sorting Chapter 12 (Part A): Evaluation of Relational Operators Chapter 12 (Part B): Evaluation of Relational Operators: Other Techniques Chapter 13: Introduction to Query Optimization Chapter 14: A Typical Relational Optimizer Chapter 15: Schema Refinement and Normal Forms Chapter 16 (Part A): Physical Database Design Chapter 16 (Part B): Database Tuning Chapter 17: Security Chapter 18: Transaction Management Overview Chapter 19: Concurrency Control Chapter 20: Crash Recovery Chapter 21 (Part A): Parallel Databases Chapter 21 (Part B): Distributed Databases Chapter 22: Internet Databases Chapter 23: Decision Support Chapter 24: Data Mining Chapter 25: Object-Database Systems Chapter 26: Spatial Data Management Chapter 27: Deductive Databases Chapter 28: Additional Topics 1

HTML Simple markup language Text is annotated with language commands called tags, usually consisting of a start tag and an end tag

HTML Example: Book Listing <HTML><BODY> Fiction: <UL><LI>Author: Milan Kundera</LI? <LI>Title: Identity</LI> <LI>Published: 1998</LI> </UL> Science: <UL><LI>Author: Richard Feynman</LI> <LI>Title: The Character of Physical Law</LI> <LI>Hardcover</LI> </UL></BODY></HTML>

Web Pages with Database Contents Web pages contain the results of database queries. How do we generate such pages? Web server creates a new process for a program interacts with the database. Web server communicates with this program via CGI (Common gateway interface) Program generates result page with content from the database Other protocols: ISAPI (Microsoft Internet Server API), NSAPI (Netscape Server API)

Application Servers In CGI, each page request results in the creation of a new process: very inefficient Application server: Piece of software between the web server and the applications Functionality: Hold a set of pre-forked threads or processes for performance Database connection pooling (reuse a set of existing connections) Integration of heterogeneous data sources Transaction management involving several data sources Session management

Other Server-Side Processing Java Servlets: Java programs that run on the server and interact with the server through a well-defined API. JavaBeans: Reusable software components written in Java. Java Server Pages and Active Server Pages: Code inside a web page that is interpreted by the web server

Beyond HTML: XML Extensible Markup Language (XML): “Extensible HTML” Confluence of SGML and HTML: The power of SGML with the simplicity of HTML Allows definition of new markup languages, called document type declarations (DTDs)

XML: Language Constructs Elements Main structural building blocks of XML Start and end tag Must be properly nested Element can have attributes that provide additional information about the element Entities: like macros, represent common text. Comments Document type declarations (DTDs)

Booklist Example in XML <?XML version=“1.0” standalone=“yes”?> <!DOCTYPE BOOKLIST SYSTEM “booklist.dtd”> <BOOKLIST> <BOOK genre=“Fiction”> <AUTHOR> <FIRST>Milan</FIRST><LAST>Kundera</LAST> </AUTHOR> <TITLE>Identity</TITLE> <PUBLISHED>1998</PUBLISHED> <BOOK genre=“Science” format=“Hardcover”> <FIRST>Richard</FIRST><LAST>Feynman</LAST> <TITLE>The Character of Physical Law</TITLE> </BOOK></BOOKLIST>

XML: DTDs A DTD is a set of rules that defines the elements, attributes, and entities that are allowed in the document. An XML document is well-formed if it does not have an associated DTD but it is properly nested. An XML document is valid if it has a DTD and the document follows the rules in the DTD.

An Example DTD <!DOCTYPE BOOKLIST [ <!ELEMENT BOOKLIST (BOOK)*> <!ELEMENT BOOK (AUTHOR, TITLE, PUBLISHED?)> <!ELEMENT AUTHOR (FIRST, LAST)> <!ELEMENT FIRST (#PCDATA)> <!ELEMENT LAST (#PCDATA)> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT PUBLISHED (#PCDATA)> <!ATTLIST BOOK genre (Science|Fiction) #REQUIRED> <!ATTLIST BOOK format (Paperback|Hardcover) “Paperback”> ]>

Domain-Specific DTDs Development of standardized DTDs for specialized domains enables data exchange between heterogeneous sources Example: Mathematical Markup Language (MathML) Encodes mathematical material on the web In HTML: <IMG SRC=“xysq.gif” ALT=“(x+y)^2”> In MathML: <apply> <power/> <apply> <plus/> <ci>x</ci> <ci>y</ci> </apply> <cn>2</cn> </apply>

XML-QL: Querying XML Data Goal: High-level, declarative language that allows manipulation of XML documents No standard yet Example query in XML-QL: WHERE <BOOK> <NAME><LAST>$1</LAST></NAME> </BOOK> in “www.booklist.com/books.xml CONSTRUCT <RESULT> $1 </RESULT>

XML-QL (Contd.) A more complicated example: WHERE <BOOK> $b <BOOK> IN “www.booklist.com/books.xml”, <AUTHOR> $n </AUTHOR> <PUBLISHED> $p </PUBLISHED> in $e CONSTRUCT <RESULT> <PUBLISHED> $p </PUBLISHED> WHERE <LAST> $l </LAST> IN $n CONSTRUCT <LAST> $l </LAST> </RESULT>

Semi-structured Data Data with partial structure All data models for semi-structured data use some type of labeled graph We introduce the object exchange model (OEM): Object is triple (label, type, value) Complex objects are decomposed hierarchically into smaller objects

Example: Booklist Data in OEM AUTHOR TITLE PUBLISHED AUTHOR FORMAT TITLE The character of phy- sical law Hard- cover Identity 1998 Milan Kundera Richard Feynman

Indexing for Text Search Text database: Collection of text documents Important class of queries: Keyword searches Boolean queries: Query terms connected with AND, OR and NOT. Result is list of documents that satisfy the boolean expression. Ranked queries: Result is list of documents ranked by their “relevance”. IR: Precision (percentage of retrieved documents that are relevant) and recall (percentage of relevant objects that are retrieved)

Inverted Files Mobile agent 2 Agent James 1 Document RID For each possible query term, store an ordered list (the inverted list) of document identifiers that contain the term. Query evaluation: Intersection or Union of inverted lists. Example: Agent AND James <2> Mobile <1> James <1,2> Agent Inverted List Word

Signature Files Index structure (the signature file) with one data entry for each document Hash function hashes words to bit-vector. Data entry for a document (the signature of the document) is the OR of all hashed words. Signature S1 matches signature S2 if S2&S1=S2

Signature Files: Query Evaluation Boolean query consisting of conjunction of words: Generate query signature Sq Scan signatures of all documents. If signature S matches Sq, then retrieve document and check for false positives. Boolean query consisting of disjunction of k words: Generate k query signatures S1, …, Sk Scan signature file to find documents whose signature matches any of S1, …, Sk Check for false positives

Signature Files: Example 0001 Mobile 1100 James 1010 Agent Hash Word Mobile agent Agent James Document 1011 2 1110 1 Signature RID

Summary Publishing databases on the web requires server-side processing such as CGI-scripts, Servlets, ASP, or JSP XML is an emerging document description standard that allows the definition of new DTDs. Query languages for XML documents such as XQL are emerging. Text databases have gained importance with the proliferation of text data on the web. Boolean queries can be efficiently evaluated using an inverted index or a signature file. Evaluation of ranked queries is a more difficult problem.