Download presentation
Presentation is loading. Please wait.
Published byJason Stephens Modified over 9 years ago
1
Introduction to XML Data Management Issues
2
Types of data Structured Structured Semi-structured Semi-structured
3
Structured Data data is organized in entities ( or tables) data is organized in entities ( or tables) similar entities are grouped together (tables) similar entities are grouped together (tables) entities in the same group have the same descriptions (attributes) entities in the same group have the same descriptions (attributes)
4
Current Database World –Structure Relational Database Management System (DBMS): Relational Database Management System (DBMS): everything is a table everything is a table –Software: MS Access, Oracle….
5
Example of a table (patients)
6
Example of a group of tables
7
Example of relationships
8
World of Web Data –Easy document exchange –Unstructured (or poorly structured) data Everything is a document Everything is a document –No standard for query languages
9
World of Web Data Example Example –An organization A publishes financial data on its web pages (HTML), generated from DBMS. –A second organization B wants some financial analysis; can access only web data. RDBMS AB HTML
10
Semi-structured Data data can be of any type data can be of any type not necessarily following any format not necessarily following any format does not follow any rules does not follow any rules is not predictable is not predictable examples include examples include –text –video –sound –images
11
Characteristics of Semi-Structured Data structure is irregular, missing or has additional attributes structure is irregular, missing or has additional attributes parts of data lack structure, e.g., images parts of data lack structure, e.g., images some may yield little structure, e.g., plain text some may yield little structure, e.g., plain text
12
Semi-structured Data ( Cont’d) Definition: Data that is inherently self- describing and does not conform to an explicit and fixed schema is known as Semi-structured Data Definition: Data that is inherently self- describing and does not conform to an explicit and fixed schema is known as Semi-structured Data information is contained within data itself information is contained within data itself
13
Example of Semi-Structured Data name: Peter Wood name: Peter Wood email: ptw@dcs.bbk.ac.uk, p.wood@bbk.ac.uk email: ptw@dcs.bbk.ac.uk, p.wood@bbk.ac.uk------------------------------------------------------------------ name: name: first name: Markfirst name: Mark last name: Levenelast name: Levene email: mark@dcs.bbk.ac.uk email: mark@dcs.bbk.ac.uk------------------------------------------------------------------ name: Alex Smith name: Alex Smith affiliation: StFX affiliation: StFX
14
IMDb – A Motivating Example The Internet Movie Database is a classical example of a collection of semi-structured data The Internet Movie Database is a classical example of a collection of semi-structured data Although the information pertaining to different movies may be essentially similar, their structure may be different! Although the information pertaining to different movies may be essentially similar, their structure may be different! Let us consider an example movie database Let us consider an example movie database
15
An Example Movie Database
16
Irregularity In Structure Example: Some movie may annotate information about the actors, choreographer, director and producer, while another movie may annotate additional information about the lyricist and the music director Example: Some movie may annotate information about the actors, choreographer, director and producer, while another movie may annotate additional information about the lyricist and the music director
17
Irregularity In Structure The same kind of data may be typified differently The same kind of data may be typified differently For example: An actor’s name may be represented as a string or as a tuple (first_name, last_name) For example: An actor’s name may be represented as a string or as a tuple (first_name, last_name) Since data gets added to this database dynamically, the structure of the database as a whole, also keeps changing dynamically Since data gets added to this database dynamically, the structure of the database as a whole, also keeps changing dynamically
18
Traditional Data Management Universe of Discourse Model of the UoD Database Query
19
Post-Internet Data Management Universe of Discourse Retrieval? Data Query
20
XML – An Embodiment of Semistructured Data XML can be used to represent semistructured data XML can be used to represent semistructured data
21
What is XML? XML stands for EXtensible Markup Language XML stands for EXtensible Markup Language XML is a markup language much like HTML XML is a markup language much like HTML XML was designed to describe data XML was designed to describe data XML tags are not predefined. You must define your own tags XML tags are not predefined. You must define your own tags
22
The main difference between XML and HTML XML and HTML were designed with different goals: XML was designed to describe data and to focus on what data is. XML was designed to describe data and to focus on what data is. HTML was designed to display data and to focus on how data looks. HTML was designed to display data and to focus on how data looks. It is important to understand that XML is not a replacement for HTML. It is important to understand that XML is not a replacement for HTML.
23
XML does not DO anything Maybe it is a little hard to understand, but XML does not DO anything. XML is created to structure, store and to send information. Maybe it is a little hard to understand, but XML does not DO anything. XML is created to structure, store and to send information. The note has a header and a message body. It also has sender and receiver information. But still, this XML document does not DO anything. It is just pure information wrapped in XML tags. Someone must write a piece of software to send, receive or display it. The note has a header and a message body. It also has sender and receiver information. But still, this XML document does not DO anything. It is just pure information wrapped in XML tags. Someone must write a piece of software to send, receive or display it. John Mary Reminder Don't forget me this weekend!
24
XML is free and extensible XML tags are not predefined. You must "invent" your own tags. XML tags are not predefined. You must "invent" your own tags. The tags used to mark up HTML documents and the structure of HTML documents are predefined. (like,, etc.). The tags used to mark up HTML documents and the structure of HTML documents are predefined. (like,, etc.). XML allows authors to define their own tags and their own document structure. XML allows authors to define their own tags and their own document structure. The tags in the example above (like and ) are not defined in any XML standard. These tags are "invented" by the author of the XML document. The tags in the example above (like and ) are not defined in any XML standard. These tags are "invented" by the author of the XML document.
25
XML is used to Exchange Data With XML, data can be exchanged between incompatible systems. With XML, data can be exchanged between incompatible systems. In the real world, computer systems and databases contain data in incompatible formats. One of the most time-consuming challenges for developers has been to exchange data between such systems over the Internet. In the real world, computer systems and databases contain data in incompatible formats. One of the most time-consuming challenges for developers has been to exchange data between such systems over the Internet. Since XML data is stored in plain text format, XML provides a software- and hardware-independent way of sharing data. Since XML data is stored in plain text format, XML provides a software- and hardware-independent way of sharing data.
26
XML can be used to Create new Languages XML is the mother of WAP( Wireless Application Protocol) and WML (The Wireless Markup Language). XML is the mother of WAP( Wireless Application Protocol) and WML (The Wireless Markup Language). WML used to markup Internet applications for handheld devices like mobile phones. WML used to markup Internet applications for handheld devices like mobile phones.
27
XML Syntax The syntax rules of XML are very simple and very strict. The rules are very easy to learn, and very easy to use. The syntax rules of XML are very simple and very strict. The rules are very easy to learn, and very easy to use. Because of this, creating software that can read and manipulate XML is very easy to do. Because of this, creating software that can read and manipulate XML is very easy to do.
28
All XML elements must have a closing tag With XML, it is illegal to omit the closing tag. With XML, it is illegal to omit the closing tag. In HTML some elements do not have to have a closing tag. The following code is legal in HTML: In HTML some elements do not have to have a closing tag. The following code is legal in HTML: This is a paragraph This is a paragraph In XML all elements must have a closing tag, like this: In XML all elements must have a closing tag, like this: This is a paragraph This is a paragraph
29
XML tags are case sensitive Unlike HTML, XML tags are case sensitive. Unlike HTML, XML tags are case sensitive. With XML, the tag is different from the tag. With XML, the tag is different from the tag. Opening and closing tags must therefore be written with the same case: Opening and closing tags must therefore be written with the same case: This is incorrect This is correct This is incorrect This is correct
30
All XML elements must be properly nested Improper nesting of tags makes no sense to XML. Improper nesting of tags makes no sense to XML. In HTML some elements can be improperly nested within each other like this: In HTML some elements can be improperly nested within each other like this: This text is bold and italic This text is bold and italic In XML all elements must be properly nested within each other like this: In XML all elements must be properly nested within each other like this: This text is bold and italic This text is bold and italic
31
All XML documents must have a root element (tag) All XML documents must contain a single tag pair to define a root element. All XML documents must contain a single tag pair to define a root element. All other elements must be within this root element. All other elements must be within this root element. All elements can have sub elements (child elements). Sub elements must be correctly nested within their parent element: All elements can have sub elements (child elements). Sub elements must be correctly nested within their parent element:<root><child>.......... </root>
32
With XML, white space is preserved With XML, white space is preserved With XML, white space is preserved With XML, the white space in your document is not truncated. With XML, the white space in your document is not truncated. This is unlike HTML. With HTML, a sentence like this: This is unlike HTML. With HTML, a sentence like this: Hello my name is John, will be displayed like this: Hello my name is John, because HTML strips off the white space.
33
Element Naming XML elements must follow these naming rules: Names can contain letters, numbers, and other characters Names can contain letters, numbers, and other characters Names must not start with a number or punctuation character Names must not start with a number or punctuation character Names must not start with the letters xml (or XML or Xml..) Names must not start with the letters xml (or XML or Xml..) Names cannot contain spaces Names cannot contain spaces
34
Element Naming Any name can be used, no words are reserved, but the idea is to make names descriptive Any name can be used, no words are reserved, but the idea is to make names descriptive XML documents often have a corresponding database, in which fields exist corresponding to elements in the XML document. A good practice is to use the naming rules of your database for the elements in the XML documents. XML documents often have a corresponding database, in which fields exist corresponding to elements in the XML document. A good practice is to use the naming rules of your database for the elements in the XML documents.
35
Comments in XML The syntax for writing comments in XML is similar to that of HTML. The syntax for writing comments in XML is similar to that of HTML.
36
Errors in XML will stop the XML program The World Wide Web Consortium (W3C) XML specification states that a program should not continue to process an XML document if it finds a validation error. The reason is that XML software should be easy to write, and that all XML documents should be compatible. The World Wide Web Consortium (W3C) XML specification states that a program should not continue to process an XML document if it finds a validation error. The reason is that XML software should be easy to write, and that all XML documents should be compatible. With HTML it was possible to create documents with lots of errors (like when you forget an end tag). One of the main reasons that HTML browsers are so big and incompatible, is that they have their own ways to figure out what a document should look like when they encounter an HTML error. With HTML it was possible to create documents with lots of errors (like when you forget an end tag). One of the main reasons that HTML browsers are so big and incompatible, is that they have their own ways to figure out what a document should look like when they encounter an HTML error. With XML this should not be possible. With XML this should not be possible.
37
XML and Browsers Netscape 6 or higher supports XML Netscape 6 or higher supports XML Internet Explorer 5.0 or higher supports XML Internet Explorer 5.0 or higher supports XML FireFox supports XML FireFox supports XML
38
Viewing XML Files If you open an XML document in IE, it will display the document with color coded root and child elements. A plus (+) or minus sign (-) to the left of the elements can be clicked to expand or collapse the element structure. If you open an XML document in IE, it will display the document with color coded root and child elements. A plus (+) or minus sign (-) to the left of the elements can be clicked to expand or collapse the element structure. If you want to view the raw XML source, you must select "View Source" from the browser menu. If you want to view the raw XML source, you must select "View Source" from the browser menu. If an erroneous XML file is opened, the browser will report the error. If an erroneous XML file is opened, the browser will report the error.
39
Other Examples Viewing some XML documents will help you get the XML feeling. Viewing some XML documents will help you get the XML feeling. An XML CD catalog This is some CD collection, stored as XML data An XML CD catalog This is some CD collection, stored as XML data An XML CD catalog An XML CD catalog An XML plant catalog This is a plant catalog from a plant shop, stored as XML data. An XML plant catalog This is a plant catalog from a plant shop, stored as XML data. An XML plant catalog An XML plant catalog A Simple Food Menu This is a breakfast food menu from a restaurant, stored as XML data. A Simple Food Menu This is a breakfast food menu from a restaurant, stored as XML data. A Simple Food Menu A Simple Food Menu
40
Why does XML display like this? XML documents do not carry information about how to display the data. XML documents do not carry information about how to display the data. Since XML tags are "invented" by the author of the XML document, browsers do not know if a tag like describes an HTML “table” or a dining table. Since XML tags are "invented" by the author of the XML document, browsers do not know if a tag like describes an HTML “table” or a dining table. Without any information about how to display the data, most browsers will just display the XML document as it is. Without any information about how to display the data, most browsers will just display the XML document as it is.
41
XML Attributes XML elements can have attributes in the start tag, just like HTML. XML elements can have attributes in the start tag, just like HTML. Attributes are used to provide additional information about elements. Attributes are used to provide additional information about elements. In HTML (and in XML) attributes provide additional information about elements: In HTML (and in XML) attributes provide additional information about elements:
42
XML Attributes Attribute values must always be enclosed in quotes Attribute values must always be enclosed in quotes
43
XML Attributes Cont. John John Mary Mary ---------------------------------------------------------------------- <to>John</to><from>Mary</from> The error in the first document is that the date attribute in the note element is not quoted. The error in the first document is that the date attribute in the note element is not quoted. The first line in the document - the XML declaration The first line in the document - the XML declaration
44
Use of Elements vs. Attributes Data can be stored in child elements or in attributes. Data can be stored in child elements or in attributes. Take a look at these examples: <firstname>Anna</firstname><lastname>Smith</lastname></person>--------------------------------------------------<person> female female <firstname>Anna</firstname><lastname>Smith</lastname></person> In the first example sex is an attribute. In the last, sex is a child element. Both examples provide the same information.
45
The XML Rules (Summary) 1. Single, unique root element 2. Matching open/close tags 3. Consistent capitalisation 4. Correctly nested elements (no overlapping elements) 5. Attribute values enclosed in quotes 3Months.com Web Development Wakefield st Wellington New Zealand
46
Authoring XML Documents A basic XML document is an XML element that can, but might not, include nested XML elements. A basic XML document is an XML element that can, but might not, include nested XML elements. Example – A bookstore: Example – A bookstore: Second Chance Second Chance Matthew Dunn Matthew Dunn
47
Converting Relational Database to XML Example: Export the following data into XML and group books by store Relational Database: Relational Database: Store (sid, name, phone) Book (bid, title, authors) StoreBook (sid, bid, price, stock) StoreBook StoreBook phone authors bid titlesid name pricestock
48
Converting Relational Database to XML (Cont’d) XML version of the Bookstore database : XML version of the Bookstore database : 123 123 … … … … … … …
49
Examples example of database example of databaseexample of databaseexample of database Example of database converted to XML Example of database converted to XML Example of database converted to XML Example of database converted to XML
50
XML representation of a sample Movie Database <Movies><Movie> The Notebook The Notebook Ryan Gosling Ryan Gosling Rachel McAdams Rachel McAdams Nick Cassavetes Nick Cassavetes </Movie></Movie> FRIENDS FRIENDS Seinfeld Seinfeld
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.