XML Documents Chao-Hsien Chu, Ph.D. School of Information Sciences and Technology The Pennsylvania State University Elements Attributes Comments PI Document Type
Components of XML Systems XML Parser (Processor) XML Application XML Document (Contents) XML DTD (Rule) Well-Formed (Syntax) Validate (Structure)
Further Processing (optional) How a Parser Interprets XML - Validate XML Document Data Type Definition Issue Warning/Stop Processing Well Formed? DTD? Valid? Issue Warning/Stop Processing no yes
XML Document Syntax Processing Instructions (PI) Document Type Declarations (optional) Comments (optional) Element Start and End Tags Attributes Entity References Character Data Sections (CDATA)
The Panoramic Perspective of XML XML Document Prolog Doc. Type Declaration Root Element Comments Processing Instructions Comments Processing Instructions Comments Processing Instructions Entity References CDATA Sections Elements PCDATA Attributes Entity References CDATA, Entities, ID,.. Doc Type Definitions Element Declaration Attribute Declaration Entity Declaration Notations Declaration : Optional
An Example of XML Document Alley Gator 001 (010) Main Street Muddy Waters FL Process Instruction Elements Root Element Document Type Declaration
Processing Instructions (PI) PI is used to provide information regarding processing such as processor (name and version of the processor) Syntax: Examples: DTD File
Document Type Declaration A statement embedded in an XML document whose purpose is to point to the existence and location of a document type definition (DTD). DTD is optional. Syntax: Example:
Comments A place to write a note for reminding, simple documentation, or commenting out codes for debugging, etc., which will not be seen by the end users. <!-- This is a comment area --> You can use any character inside the comment area except “--” itself There is no limitation on the length of the comment area. Comments may not come before the XML declaration. Comments may not be placed inside a tag.
Guideline for Elements Elements are the building blocks of XML documents. Every document needs to have one and only one root element. An element must start with a starting tag and ends with a corresponding ending tag. Element names are case sensitive. Element names must open and close with identical cases. Spaces are not allowed between the forward slash and element name.
Example of Element Tag Name Attribute Name Attribute Value Attribute End Tag Start Tag Element Contents Texts Elements Element
Guideline for Elements Elements can be used to both contain information and define structure. The structure of information is encoded by the nesting of tags. Empty elements, which don’t have contents, are being used as placeholders or to signify their existence E.G.,.
Tree Diagram of Address Document Address_ Book Contact IDName PhoneAddress ZipStateCityStreet Root Element
Element Name Element names must begin with a letter or an underscore (_). Subsequent characters may include letters, digits, underscore, hyphens, and periods. Element names cannot begin with a number. Element names cannot include spaces.
Instant Quiz _______ _______ Which of the flowing are “legal” or “illegal” element name?
Attributes Attributes are small descriptive bits of information used for describing elements. Attributes are contained within the start tag of an element after the element name and are followed by an “=“ sign, then the value of the attribute. The attributes value must be enclosed with a pair of single or double quotes.
Instant Quiz 1. _____ 2. _____ 3. _____ 4. _____ 5. _____ Which of the followings are legal attributes?
CDATA Section CDATA sections are used when you want all text to be interpreted as pure character data rather than as markup. This is useful if you have a lot of, & or “ characters. Example:
Entity References Entity references are markup that is replaced with character data when the document is parsed. XML predefines five entity references: & & < < > > " “ ' ‘ Entity references point to either external text file or external picture.
Illustration of Entity Reference XML Document Entity Reference Entity Reference Text File Before
Illustration of Entity Reference XML Document After Parsing Text Contents
Well Formed Document Here are some general guidelines: Contains one and only one root element. All elements must contain both start and end tags. Tags are case sensitive No overlapping tags. Elements must nest inside each other properly. Attribute values must be enclosed in quotes. An empty element must end with “/>” The text characters ( ) and (“) must always be represented by character entities. Well formed XML documents are those documents that are syntactically correct.
Thank You? Any Question?