Download presentation
Presentation is loading. Please wait.
Published byGerald Montgomery Modified over 9 years ago
1
Jacob (Jack) Gryn - Presented November 28, 20021 Semi-Structured Data and XML
2
Jacob (Jack) Gryn - Presented November 28, 20022 Agenda Semi-Structured Data XML
3
Jacob (Jack) Gryn - Presented November 28, 20023 Semi-Structured Data: an Introduction What is structured data What is non-structured data What is semi-structured data How is semi-structured data represented? What can we do with semi-structured data?
4
Jacob (Jack) Gryn - Presented November 28, 20024 What is Structured Data? Strongly typed variables/attributes (ie. int, float, string[20]) Every attribute in a relation is defined for all records Data is represented in some organized fashion
5
Jacob (Jack) Gryn - Presented November 28, 20025 An Example of Structured Data Name: char(10)Birthday: DATESalary: INT Bob1949-08-1352000 Bill1967-04-1245000 A relational database can be considered structured data
6
Jacob (Jack) Gryn - Presented November 28, 20026 What is Non-Structured Data? Data that has no type definitions Data is not organized according to any pattern No concept of variables or attributes
7
Jacob (Jack) Gryn - Presented November 28, 20027 An Example of Non-Structured Data As you can see, such data would be almost impossible to have a computer automatically parse. “ Bob was born sometime in August of 1949. He has a reasonable salary of 52000. Someone else was born on the 12 th of a different month, his name is Bill. By the way, Bob was born on the 13 th of August. ”
8
Jacob (Jack) Gryn - Presented November 28, 20028 Then what is Semi-Structured Data? Anything in between structured and non-structured data!
9
Jacob (Jack) Gryn - Presented November 28, 20029 Then what is Semi-Structured Data? Everything in between structured and non-structured data Variables are loosely typed x=1 is valid, so is x= “ hello ” A record does not need to have all attributes defined ie. In a database of cars, if we don ’ t know the engine type, we can choose not to define the field for tha particular record. Whereas in a structured database, the attribute would be defined, but set to NULL. An attribute of a record could be another record It does not necessarily have to differentiate between an identifier and a value
10
Jacob (Jack) Gryn - Presented November 28, 200210 So how is semi-structured data represented? Semi-Structured data can be represented as a tree
11
Jacob (Jack) Gryn - Presented November 28, 200211 So how is semi-structured data represented? Semi-Structured data can be represented in the form of indented text: Bob Birthday 1949 August 13 Salary $52,000 Bill Birthday 1967 April
12
Jacob (Jack) Gryn - Presented November 28, 200212 So how is semi-structured data represented? Semi-Structured data can be represented as a markup language: (ie. HTML, XML, LISP, AceDB, Tsimmis) Bob 5513 Sales 45000 Ed 6766 312 Executive Confidential
13
Jacob (Jack) Gryn - Presented November 28, 200213 Overview Semi-Structured data is not necessarily created with the intention of being processed. ie. Web pages are not necessarily intended to be queried by a language like SQL; the web designer, not taking this into consideration may not make it easy for the data to be processed by a machine.
14
Jacob (Jack) Gryn - Presented November 28, 200214 What can we do with Semi- Structured Data? Since there is some structure, it can be scanned and parsed Once the data is parsed, we can query it using specialized query languages such as UnQL, GEXT and Lorel We can “ clean it up ” to be placed into a structured relational database
15
Jacob (Jack) Gryn - Presented November 28, 200215 XML: an Introduction to XML What is XML? What does it offer to creators of DB ’ s? How can XML be used as a DB? Representations of XML Other features of XML Disadvantages to XML
16
Jacob (Jack) Gryn - Presented November 28, 200216 Summary / Key Points of Semi-Structured data In between structured and non- structured data Loosely typed attributes Not all attributes need to be defined for every record Can be parsed and queried
17
Jacob (Jack) Gryn - Presented November 28, 200217 What is XML? XML stands for eXtensible Markup Language Based on tags similar to HTML Actually, XHTML is a form of XML Used to define markup languages
18
Jacob (Jack) Gryn - Presented November 28, 200218 What does XML offer to database designers? Readable by humans using Unicode or ASCII text Easy for computers to parse Can easily be used as ‘ back-end ’ for web sites
19
Jacob (Jack) Gryn - Presented November 28, 200219 Notice that this is semi-structured data, since not all the fields are filled in and because they are loosely typed. How can XML be used as a database? Consider the following data: NameExtensionOfficeDepartmentSalary Bob5513Sales45000 Ed6766312ExecutiveConfidential It can be written in XML as follows: Bob 5513 Sales 45000 Ed 6766 312 Executive Confidential
20
Jacob (Jack) Gryn - Presented November 28, 200220 In XML, there are few restrictions to how data can be laid out The tag names can represent either attribute names or data itself Tag names can be defined to anything the creator wishes
21
Jacob (Jack) Gryn - Presented November 28, 200221 But, there are still a few restrictions Every tag that is opened, must be closed. Bob Close tag is not needed for empty data If one tag is opened inside the field of another tag, it must be closed before the outer tag is closed. Bob Bob> Tags are case sensitive
22
Jacob (Jack) Gryn - Presented November 28, 200222 How can XML be represented? As a tree structure As text/markup tags
23
Jacob (Jack) Gryn - Presented November 28, 200223 How can XML be represented? As a tree structure: Take our previous example: Leaf nodes generally, but do not necessarily store the data Recent web browsers will show this structure
24
Jacob (Jack) Gryn - Presented November 28, 200224 How can XML be represented? As a text/markup language: Take our previous example: Bob 5513 Sales 45000 Ed 6766 312 Executive Confidential
25
Jacob (Jack) Gryn - Presented November 28, 200225 Other features of XML It is easy to parse It can be queried like a database It can be used with XSL Templates to easily generate web pages from data It can be used with DTS (Document Type Definition) to run as a fully structured database
26
Jacob (Jack) Gryn - Presented November 28, 200226 Disadvantages to XML Difficult create indexes on Difficult to optimize queries Requires additional disk space Text format Redundant data in tags No single standard of how data should be stored in XML
27
Jacob (Jack) Gryn - Presented November 28, 200227 Summary / Key points of XML Data stored using text-based markup language Can also be represented in tree format Can store structured and semi- structured data Easy to parse and query, but inefficient
28
Jacob (Jack) Gryn - Presented November 28, 200228 Where to Get More Information Search the web, you ’ ll find something!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.