Download presentation
Presentation is loading. Please wait.
Published byHoratio Wilcox Modified over 8 years ago
1
Enterprise Database Systems XML eXtended Markup Language Dr. Georgia Garani garani@teilar.gr garani@teilar.gr Dr. Theodoros Mitakos teo_ms@yahoo.com Technological Educational Institution of Larissa in collaboration with Staffordshire University Larissa 2006
2
Agenda Structured, semistructured, unstructured data XML Data Model XML Documents, DTD, XML SCHEMA XML and Databases
3
Internet Architectures (Two tier, three tier) Introduction Client Presentation logic Business logic Server Data processing Monolithic Presentation Logic Business Logic Data Processing Thin client Presentation logic Application Server Business logic Data Server Data processing
4
Hyperlink Documents - Web languages - Tag languages HTML (Hypertext markup Language) Formatting and structuring web documents Formatting and structuring web documentsXML Structuring and exchanging data over the Web (structure and meaning). Structuring and exchanging data over the Web (structure and meaning). Formating aspects are defined separately by XSL (Extended Stylesheet Language) Formating aspects are defined separately by XSL (Extended Stylesheet Language)
5
Structured data Data that have a strict format e.g. data that are stored in a relational database table (the same format for all records in a table) We design the schema and DBMS checks to ensure that all data follows the structures and constraints specified in the schema.
6
Semistructured data In some applications data is collected before it is known how it will be stored and managed. This data may have a structure but not all the information collected will have identical strucuture. E.g. Some attributes may be shared among the various entities but other attributes may exist only in few entities. Moreover additional attribues can be introduced in some of the newer data items in any time and there is no predefined schema. This type of data is known as semistructured data.
7
Difference between structured and semistructured data In semistructured data, the schema information is mixed in with the data values, since each data object can have different attributes that are not known in advance. This type of data is called self described data.
8
Semistructured data as a directed graph rojects project name number worker workerr ssn name hours ssn hours name location Product x1 bellaire 123john 30.5567 mary 25
9
The schema information in the semistructured model is intermixed with the objects and their data values in the same data structure. In the semistructured model there is no requirement for a predefined schema to which the data objects must conform
10
Unstructured data In this category of data there is a very limited indication of the type of the data. E.g. a text document that contains information embedded within it. E.g. a text document that contains information embedded within it.
11
<td width=197 valign=top style='width:147.6pt;border:solid windowtext 1.0pt; <td width=197 valign=top style='width:147.6pt;border:solid windowtext 1.0pt; border-top:none;mso-border-top-alt:solid windowtext.5pt;mso-border-alt:solid windowtext.5pt; border-top:none;mso-border-top-alt:solid windowtext.5pt;mso-border-alt:solid windowtext.5pt; padding:0cm 5.4pt 0cm 5.4pt'> padding:0cm 5.4pt 0cm 5.4pt'> 3 3 <td width=197 valign=top style='width:147.6pt;border-top:none;border-left: <td width=197 valign=top style='width:147.6pt;border-top:none;border-left: none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; mso-border-top-alt:solid windowtext.5pt;mso-border-left-alt:solid windowtext.5pt; mso-border-top-alt:solid windowtext.5pt;mso-border-left-alt:solid windowtext.5pt; mso-border-alt:solid windowtext.5pt;padding:0cm 5.4pt 0cm 5.4pt'> mso-border-alt:solid windowtext.5pt;padding:0cm 5.4pt 0cm 5.4pt'> Kate Kate </table> SEMESTER A SEMESTER A </div></body></html>
12
HTML Web pages with html are considered unstructured data. Text that appears between angled brackets, is an HTML tag A tag with a backslash indicates an end tag which represents ending of the effect of a matching start tag. The tags mark up the document in order to instruct an HTML processor how to display the text between a start tag and a matching end tag HTML has a very big number of tags but HTML documents are very difficult to interpret automatically by computer programs because they do not include schema information about the type of data in the documents.
13
Example tags … … <body>…</body> Attributes describe addiotional properties of the tag. <tr>
14
Example <projects><project> Product x Product x 1 1 bellaire bellaire 5 5 123 123 john john 30.5 30.5 567 567 mary mary 25 25 <project>...</project>...</projects>
15
XML tree data model Elements and attributes As in HTML elements are identified in a document by their start tag and end tag. The tag names are enclosed between angled brackets, and end tags are identified by a backslash,. Complex elements are constructed from other elements hierarchically whereas simple elements contain data values.A major difference between XML and HTML is that XML tag names are defined to describe the meaning of the data elements in the document rather than to display how the test is to be displayed. An XML document can be represented as a tree structure. XML attributes are used to describe properties and characteristics of the elements within which they appear.
16
Types of XML Documents Data-centric XML documents: These documents have man small data items that follow a specific structure and hence may be extracted from a structured database. They are formatted as XML documents in order to exchange them or display them over the web. Document centric XML documents: These are documents with large amounts of text, such as news aticles or books. There are few or no structured data elements in these documents Hybrid XML documents: These may have parts that contain structured data and other parts that are predominantly textual or unstructured.
17
An XML DTD <!DOCTYPE projects [ ]>
18
DTD If an XML document conforms to a predefined XML schema or DTD then the document can be considered as structured data XML documents that do not conform to any schema are considered as semistructured data. These are called schemaless XML documents.
19
Well formed XML documents It must be syntactically correct. It must follow the syntactic guidelines of the tree model. There must be a single root element and every element must include a matching pair of start and end tags within the start and end tags of the parent element. There must be a single root element and every element must include a matching pair of start and end tags within the start and end tags of the parent element. A standard set of API functions called DOM (Document Object Model) allows programs to manipulate the resulting tree representation corresponding to a well-formed XML document. The whole document must be parsed beforehand when using DOM. Another API called SAX allows processing of XML documents on the fly by notifying the processing program whenever a start or end tag is encountered. A standard set of API functions called DOM (Document Object Model) allows programs to manipulate the resulting tree representation corresponding to a well-formed XML document. The whole document must be parsed beforehand when using DOM. Another API called SAX allows processing of XML documents on the fly by notifying the processing program whenever a start or end tag is encountered.
20
Notation A * following the element name means that the element can be repeated zero or more times in the document. A + following the element name means that the element can be repeated one or more times in the document. A ? Following the element name means that the element can be repeated zero or one times An element appearing without any of the preceding three symbols must appear exactly once in the document. The type of the element is specified via parentheses following the element. If the parentheses include names of other elements these latter elements are the children of the element in the tree structure. If the parentheses include the keyword #PCDATA or one of the other data types available in XML DTD, the element is a leaf node. A bar symbol (e1| e2) specifies that ither e1 or e2 can appear in the document.
21
DTD limitations The data types in DTD are not very general. DTD has its own syntax and thus requires specialized processors. All DTD elements are always forced to follow the specified ordering of the document so unordered elements are not permitted.
22
XML Schema The XML schema language is a standard for specifying the structure of XML documents. It uses the same syntax rules as regular XML documents, so that the same processors can be used both. XML instance document or XML document XML instance document or XML document XML schema document for a document that specifies an XML document. XML schema document for a document that specifies an XML document.
23
definitions Schema descriptions and XML namespaces: It is necessary to identify the specific set of XML schema language elements being used by specifying a file stored at a web locaton. E.g. “http//www.w3.org/2001/XMLSchema”.This definition is called an XML namespace. Annotations, documentation and language used:the tags xsd:documentation and xsd:annotation are used for providing comments and other descriptions in the XML document. xml:lang element specifies the language being used.
24
Storing XML documents Using a DBMS to store the documents as text: A relational or object DBMS can be used to store whole XML documents as text fields within the DBMS records or objects. This approach can be used if the DBMS has a special module for document processing, and would work for storing schemaless and document-centric XML documents Using a DBMS to store the document contents as data elements; This approach would work for storing a collection of documents that follow a specific XML DTD or XML schema.. Because all the documents have the same structure one can design a relational or object database to store the leaf-level data elements within the XML documents. Designing a specialized system for storing native XML data: A new type of database system based on a tree model could be designed and implemented. Creating or publishing customized XML documents from preexisting relational databases: Because there are enormous amounts of data already stored in relational databases, parts of this data may need to be formatted as documents for exchanging or displaying over the web. Use a a separate middleware software layer to handle the conversions needed between the XML documents and the relational database.
25
Extracting XML documents from databases 1.Create the appropriate XML hierarchy and the coresponding XML schema document 2.Create the correct query in SQL to extract the desired informatio for the XML document 3.Once the query is executed its result must be structured from the flat relational foro to the XML tree structure. 4.The query can be customized to select either a single object or a multiple objects into the document.
26
XML QUERYING - XPATH An Xpath expression returns a collection of element nodes that satisfy certain patterns specified in the expression. The names in the XPath expression are node names in the XML document tree that are either tag (element) names or attribute names, possibly with additional quantifier conditions to further restrict the nodes that satisfy the pattern. Two main separators are used when specifying a path: single slash (/) and double slash (//). A single slash before a tag specifies that the tag must appear as a direct child of the previous (parent) tag. A single slash before a tag specifies that the tag must appear as a direct child of the previous (parent) tag. A double slash (//) specifies that the tag can appear as a descendant of the previous tag at any level. A double slash (//) specifies that the tag can appear as a descendant of the previous tag at any level.
27
examples /company/company/department //employee[employeeSalary gt 1000]/employeeName /company/ employee[employeeSalary gt 1000]/employeeName /company/project/projectworker [hours ge 20.0]
28
XML QUERYING - XQuery XQuery permits the specification of more general queries on one or more XML documents. The typical form of a query in XQuery is known as FLWR expression, which stands for the four main clauses of XQuery and has the following form: FOR <variable bindings o individual nodes (elements) LET LET WHERE WHERE RETURN RETURN
29
Examples FOR $x IN Doc(www.company.com/info.xml) www.company.com/info.xml //employee[employeeSalary gt 1000]/employeeName RETURN $x/firstName, $x/lastName RETURN $x/firstName, $x/lastName FOR $x IN Doc(www.company.com/info.xml)/company/employee www.company.com/info.xml WHERE $ /employeeSalary gt 1000 RETURN $x /employeeName /firstName, $x /employeeName /lastName /lastName
30
Example - DTD <!ATTLIST product product_id CDATA #REQUIRED Product_desc CDATA #REQUIRED> <!ATTLIST item gender CDATA #REQUIRED> gender CDATA #REQUIRED>
31
EXAMPLE XSL - XML XSL:<rule> </rule>XML: <catalog> SO1111 SO1111 2.99 2.99 20 20 </item></product></catalog>
32
Executing queries http://iiserver/virtualroot{?sql=string|?template=XMLtemplate}[{¶m=valu ehttp://iiserver/virtualroot{?sql=string|?template=XMLtemplate}[{¶m=valu e}...] http://iiserver/virtualroot{?sql=string|?template=XMLtemplate}[{¶m=valu e http://ntb11901/sample?sql=SELECT+ ’;SELECT+emp_no+mp_lna me+FROM+employee+FOR+XML+RAW;SELECT+’</ROOThttp://ntb11901/sample?sql=SELECT+ ’;SELECT+emp_no+mp_lna me+FROM+employee+FOR+XML+RAW;SELECT+’ ’ http://ntb11901/sample?sql=SELECT+ ’;SELECT+emp_no+mp_lna me+FROM+employee+FOR+XML+RAW;SELECT+’</ROOT<ROOT> </ROOT>
33
http://localhost/sample/queries/simpleselect.xml SELECT emp_no,emp_lname SELECT emp_no,emp_lname FROM employee WHERE emp_no = 28559 FOR XML AUTO </sql:query></ROOT> </ROOT>
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.