Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jacob (Jack) Gryn - Presented November 28, 20021 Semi-Structured Data and XML.

Similar presentations


Presentation on theme: "Jacob (Jack) Gryn - Presented November 28, 20021 Semi-Structured Data and XML."— Presentation transcript:

1 Jacob (Jack) Gryn - Presented November 28, 20021 Semi-Structured Data and XML

2 Jacob (Jack) Gryn - Presented November 28, 20022 Agenda  Semi-Structured Data  XML

3 Jacob (Jack) Gryn - Presented November 28, 20023 Semi-Structured Data: an Introduction  What is structured data  What is non-structured data  What is semi-structured data  How is semi-structured data represented?  What can we do with semi-structured data?

4 Jacob (Jack) Gryn - Presented November 28, 20024 What is Structured Data?  Strongly typed variables/attributes (ie. int, float, string[20])  Every attribute in a relation is defined for all records  Data is represented in some organized fashion

5 Jacob (Jack) Gryn - Presented November 28, 20025 An Example of Structured Data Name: char(10)Birthday: DATESalary: INT Bob1949-08-1352000 Bill1967-04-1245000 A relational database can be considered structured data

6 Jacob (Jack) Gryn - Presented November 28, 20026 What is Non-Structured Data?  Data that has no type definitions  Data is not organized according to any pattern  No concept of variables or attributes

7 Jacob (Jack) Gryn - Presented November 28, 20027 An Example of Non-Structured Data As you can see, such data would be almost impossible to have a computer automatically parse. “ Bob was born sometime in August of 1949. He has a reasonable salary of 52000. Someone else was born on the 12 th of a different month, his name is Bill. By the way, Bob was born on the 13 th of August. ”

8 Jacob (Jack) Gryn - Presented November 28, 20028 Then what is Semi-Structured Data? Anything in between structured and non-structured data!

9 Jacob (Jack) Gryn - Presented November 28, 20029 Then what is Semi-Structured Data?  Everything in between structured and non-structured data  Variables are loosely typed  x=1 is valid, so is x= “ hello ”  A record does not need to have all attributes defined  ie. In a database of cars, if we don ’ t know the engine type, we can choose not to define the field for tha particular record. Whereas in a structured database, the attribute would be defined, but set to NULL.  An attribute of a record could be another record  It does not necessarily have to differentiate between an identifier and a value

10 Jacob (Jack) Gryn - Presented November 28, 200210 So how is semi-structured data represented? Semi-Structured data can be represented as a tree

11 Jacob (Jack) Gryn - Presented November 28, 200211 So how is semi-structured data represented? Semi-Structured data can be represented in the form of indented text: Bob Birthday 1949 August 13 Salary $52,000 Bill Birthday 1967 April

12 Jacob (Jack) Gryn - Presented November 28, 200212 So how is semi-structured data represented? Semi-Structured data can be represented as a markup language: (ie. HTML, XML, LISP, AceDB, Tsimmis) Bob 5513 Sales 45000 Ed 6766 312 Executive Confidential

13 Jacob (Jack) Gryn - Presented November 28, 200213 Overview  Semi-Structured data is not necessarily created with the intention of being processed.  ie. Web pages are not necessarily intended to be queried by a language like SQL; the web designer, not taking this into consideration may not make it easy for the data to be processed by a machine.

14 Jacob (Jack) Gryn - Presented November 28, 200214 What can we do with Semi- Structured Data?  Since there is some structure, it can be scanned and parsed  Once the data is parsed, we can query it using specialized query languages such as UnQL, GEXT and Lorel  We can “ clean it up ” to be placed into a structured relational database

15 Jacob (Jack) Gryn - Presented November 28, 200215 XML: an Introduction to XML  What is XML?  What does it offer to creators of DB ’ s?  How can XML be used as a DB?  Representations of XML  Other features of XML  Disadvantages to XML

16 Jacob (Jack) Gryn - Presented November 28, 200216 Summary / Key Points of Semi-Structured data  In between structured and non- structured data  Loosely typed attributes  Not all attributes need to be defined for every record  Can be parsed and queried

17 Jacob (Jack) Gryn - Presented November 28, 200217 What is XML?  XML stands for eXtensible Markup Language  Based on tags similar to HTML  Actually, XHTML is a form of XML  Used to define markup languages

18 Jacob (Jack) Gryn - Presented November 28, 200218 What does XML offer to database designers?  Readable by humans using Unicode or ASCII text  Easy for computers to parse  Can easily be used as ‘ back-end ’ for web sites

19 Jacob (Jack) Gryn - Presented November 28, 200219 Notice that this is semi-structured data, since not all the fields are filled in and because they are loosely typed. How can XML be used as a database? Consider the following data: NameExtensionOfficeDepartmentSalary Bob5513Sales45000 Ed6766312ExecutiveConfidential It can be written in XML as follows: Bob 5513 Sales 45000 Ed 6766 312 Executive Confidential

20 Jacob (Jack) Gryn - Presented November 28, 200220 In XML, there are few restrictions to how data can be laid out  The tag names can represent either attribute names or data itself  Tag names can be defined to anything the creator wishes

21 Jacob (Jack) Gryn - Presented November 28, 200221 But, there are still a few restrictions  Every tag that is opened, must be closed.  Bob  Close tag is not needed for empty data   If one tag is opened inside the field of another tag, it must be closed before the outer tag is closed.  Bob  Bob>  Tags are case sensitive

22 Jacob (Jack) Gryn - Presented November 28, 200222 How can XML be represented?  As a tree structure  As text/markup tags

23 Jacob (Jack) Gryn - Presented November 28, 200223 How can XML be represented? As a tree structure: Take our previous example:  Leaf nodes generally, but do not necessarily store the data  Recent web browsers will show this structure

24 Jacob (Jack) Gryn - Presented November 28, 200224 How can XML be represented? As a text/markup language: Take our previous example: Bob 5513 Sales 45000 Ed 6766 312 Executive Confidential

25 Jacob (Jack) Gryn - Presented November 28, 200225 Other features of XML  It is easy to parse  It can be queried like a database  It can be used with XSL Templates to easily generate web pages from data  It can be used with DTS (Document Type Definition) to run as a fully structured database

26 Jacob (Jack) Gryn - Presented November 28, 200226 Disadvantages to XML  Difficult create indexes on  Difficult to optimize queries  Requires additional disk space  Text format  Redundant data in tags  No single standard of how data should be stored in XML

27 Jacob (Jack) Gryn - Presented November 28, 200227 Summary / Key points of XML  Data stored using text-based markup language  Can also be represented in tree format  Can store structured and semi- structured data  Easy to parse and query, but inefficient

28 Jacob (Jack) Gryn - Presented November 28, 200228 Where to Get More Information  Search the web, you ’ ll find something!


Download ppt "Jacob (Jack) Gryn - Presented November 28, 20021 Semi-Structured Data and XML."

Similar presentations


Ads by Google