Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE591: Data Mining by H. Liu

Similar presentations


Presentation on theme: "CSE591: Data Mining by H. Liu"— Presentation transcript:

1 CSE591: Data Mining by H. Liu
XML and RDF Reference 1: Webmaster in a Nutshell Reference 2: A Query Language for XML 5/4/2019 CSE591: Data Mining by H. Liu

2 Extensible Markup Language
A document processing standard proposed by W3C A simplified form of SGML (standard generalized) The ML of choice for dynamically generated content A meta-language to create and format one’s own document markups. With HTML, existing markup is static, can’t be changed or extended With XML, you can create your own markup tags and configure each to your liking. 5/4/2019 CSE591: Data Mining by H. Liu

3 CSE591: Data Mining by H. Liu
XML terms An element with two tags <Class>This is CSE591</Class> Elements can have attributes <Class level=“graduate”>CSE591</Class> Empty elements are often used to add nontextual content <Picture src=“ASU-Campus.gif”/> 5/4/2019 CSE591: Data Mining by H. Liu

4 CSE591: Data Mining by H. Liu
An XML document Three files are processed by an XML-compliant application to display XML content The file contains document data tagged with meaningful XML elements A stylesheet dictates how document elements should be formatted when they are displayed (the separation between content and formatting) Document type definition (DTD) specifies rules for how XML elements, attributes, and other data are defined and logically related 5/4/2019 CSE591: Data Mining by H. Liu

5 CSE591: Data Mining by H. Liu
Well-formed XML The document must either use a DTD or contain an XML declaration with the standalone attribute set to “no”. E.g., <?xml version=“1.0” standalone=“no”?> Well-formed XML documents w/o a DTD must have all attributes of type CDATA by default Valid XML if it adheres to the specifications outlined by its DTD Occurrence operators (?, +, *) Xlink and Xpointer support inter-document linking and provide a way of referring to data from multiple sources in one XML document 5/4/2019 CSE591: Data Mining by H. Liu

6 CSE591: Data Mining by H. Liu
A simple XML document Basic components XML declaration <?xml … ?> root element <!DOCTYPE name SYSTEM “some.dtd”> comment <!-- you can put a comment here --> namespace (xmlns): each element tag could have two parts separated by “A:B”. “A” forms the tag’s namespace, “B” identifies the name of the tag A simple example (Example 10-1, p128) 5/4/2019 CSE591: Data Mining by H. Liu

7 CSE591: Data Mining by H. Liu
A simple DTD Basic components <!-- DTD for some document --> <!ELEMENT …> construct declares each valid element for the document A simple example (Example 10-2, p130) 5/4/2019 CSE591: Data Mining by H. Liu

8 CSE591: Data Mining by H. Liu
Data types ANY - both other tags and general CDATA PCDATA - parsed character data CDATA - character data (by default if w/o DTD) NDATA - notation data 5/4/2019 CSE591: Data Mining by H. Liu

9 Resource Description Framework
RDF provides a standard framework for describing resource metadata (info about info) As such, it is important for the future development of search engines and other web navigation applications Netscape’s Meta Content Framework - tracking info about Web sites The Platform for Internet Content Selection - filtering inappropriate material based on external descriptions of content 5/4/2019 CSE591: Data Mining by H. Liu

10 CSE591: Data Mining by H. Liu
Web Data and EDI Electronic data interchange One important application of XML is EDI between two or more data sources on the Web E.g., search bots could integrate automatically information from related sources that published in XML EDI applications require tools that support: extraction of data from large XML documents conversion of data between relational or OO DBs and XML data transformation of data from one DTD to another DTD integration of multiple XML data sources 5/4/2019 CSE591: Data Mining by H. Liu

11 CSE591: Data Mining by H. Liu
XML-QL Query languages solutions to data extraction, conversion, transformation, and integration Why not adapt SQL or OQL to querying XML XML is not rigidly structured Schema information is stored with data in XML XML data can naturally model irregularities that cannot be modeled by relational or object-oriented data flexibility to accommodate all the irregularities is crucial for EDI applications 5/4/2019 CSE591: Data Mining by H. Liu

12 CSE591: Data Mining by H. Liu
Examples in XML-QL Matching data using element patterns Constructing XML data Prototype of XML-QL can be found at 5/4/2019 CSE591: Data Mining by H. Liu


Download ppt "CSE591: Data Mining by H. Liu"

Similar presentations


Ads by Google