Download presentation
Presentation is loading. Please wait.
Published byLee Kennedy Modified over 9 years ago
1
XML CORE CSC1310 Fall 2009
2
XML DOCUMENT XML document XML document is a convenient way for parsers to archive data. In other words, it is a way to describe a piece of XML as a unit for parsing. logical entity XML document is logical entity rather than physical one It can consist of many files which may live in different systems. XML parser should have ability to assemble the pieces into the coherent whole.
3
XML DOCUMENT STRUCTURE Document prolog Documentelement
4
DOCUMENT PROLOG Optional. Prolog Prolog is a special section containing metadata: character encodings document model Information about other pieces of the document The XML declaration (if used) should be the first line in the document XML declaration Document type declaration Entity declaration
5
XML DECLARATION XML declaration XML declaration provides some details that prepare an XML processor to work with document. Each parameter: name=“value” version version declares the version of XML used (1.0). The parameter must appear if the other parameters are used. encoding encoding defines the character encoding(UTF- 8, UTF-16) standalone “yes”: standalone informs the parser whether there are any declarations outside of the document (“no”). If “yes”: no external declarations are required to parse document (do not mean no other resources need to be loaded)
6
XML DECLARATION case- sensitive Parameter names and values are case- sensitive. Nameslowercase. Names are always lowercase. Order is important: version encoding standalone version encoding standalone Value is quoted with either single or double quotes.
7
DOCUMENT TYPE DECLARATION Reasons: Define legal tags (entities),default attribute values Support validation by providing a rules (declarations). <!DOCTYPE element DTD identifier DTD identifier [ declaration 1 declaration 1 declaration 2 declaration 2 ]> ]> Element identifies the type of the document type declaration. DTD identifier is optional: path to file, URL. Internal subset Internal subset is a list of declarations in []
8
DTD IDENTIFIER System-specific System-specific : SYSTEM “system identifier“ ( path to file or URL) PublicPublic: –Is never supposed to change –There is no single official registry for them PUBLIC “public identifier“ “backup identifier“ –Backup with emergency system identifier PUBLIC “public identifier“ “backup identifier“ <!DOCTYPE xml PUBLIC “//W3C//DTD XML//EN” “http://www.w3.org/TR/HTML/html.dtd”>
9
ENTITY DECLARATIONS Declarations Declarations are rules to assemble and validate the document Parser first reads external declarations, then internal subset. Entity declaration Entity declaration creates a named piece of XML that can be inserted anywhere in the document. entity Name identifier or value may be a system or public identifier (associate a name with piece of XML in a file outside of the document ( entity )) Declared entity becomes a component of the document that the parser will insert before parsing.
10
ENTITY DECLARATIONS entity reference & name ; You can insert this entity to document using an entity reference & name ;
11
ENTITY DECLARATIONS quoted string Alternatively, entity may specify an explicit value as a quoted string instead of public or system identifiers. CSC1310 ”> CSC1310 ”>
12
XML DOCUMENT STRUCTURE Document prolog Documentelement
13
ELEMENTS Elements are building blocks of XML, dividing a document into hierarchy of regions, each serving its specific purpose: Containers Empty elements
14
SYNTAX Container element content content</name> Content is elements, or characters, or both. Empty element attribute An attribute defines a property of the element:name=“value” There is no limit on the name length (<50 symbols) Whitespace characters (tab, newline, space) are used to separate attributes.
15
REMINDER XML elements must have a closing tag XML tags are case sensitive XML elements must be properly nested XML attribute values must be quoted
16
ATTRIBUTES In the element start tag you can add more information about the element in the form of attributes. An attribute is a value- name pair. cannot Attributes cannot : contain multiple values contain tree structures be easily expandable. attributes metadata Use attributes for metadata (data about data).
17
WHITESPACE Some programs may normalize the space: Strips out whitespaces in element-only content, in the beginning and end of mixed content Collapses a sequence of whitespace characters into a single space. xml:space=“preserve” To prevent a program from removing whitespaces, use xml:space=“preserve” A wind shakes a tree, An empty sound of sadness. The file is not here.
18
TREE STRUCTURE
19
ENTITY Different types of entities have different use: You can substitute characters that are difficult or impossible to type with character entities. You can pull in content that is outside of your document with external entities. You can define general entities to avoid retyping the same information. ENTITIES parametergeneral character unparsed mixed-content internalexternal internalexternal namedpredefined numbered
20
CHARACTER ENTITIES Character entities Character entities contain a single character Predefined character entities Predefined character entities : Numeric references Numeric references refer to the character by its Unicode character set. &# decimal number; &#x hexadecimal number; ç &#e7 (lowercase c with cedilla) Named character entities Named character entities use names so you do not need to memorize all codes. Unlike predefined and numeric character entities, you need to declare named character entities.
21
MIXED-CONTENT ENTITIES Can include markup as well as text Unlimited length. internal entities For internal entities, the replacement text is defined in the entity declaration: Often-repeated phrases Names Boilerplate text Improves accuracy and maintainability Entities can contain entity references. Do not include references to the entity being declared (circular pattern)
22
MIXED-CONTENT ENTITIES external entities, For external entities, replacement text is located in another file: Import content that is shared by many documents Import content that is changed too often to be stored inside the document Split huge document into smaller pieces that can be edited in tandem and needs less space in network transfers. All parts in their locations must follow well- formedness rules. The physical division should be irrelevant to the meaning of document, External entitles are linking mechanism: XML parser must insert a replacement text at the time of parsing. Must always be declared.
23
EXAMPLE <!DOCTYPE doc SYSTEM http://www.dtds-r- us.com/generic.dtd [http://www.dtds-r- us.com/generic.dtd ]> &part1; &part2; &part3; Subdocuments contain XML, but not documents in their own rights (no prolog is allowed). You can validate only whole document.
24
UNPARSED ENTITIES Contain something other than text or XML and should not be parsed: Graphics Sound files Noncharacter data Can be used only as an attribute value. <!DOCTYPE doc [ ]> Here’s a picture
25
COMMENTS Notes in the document that are not interpreted by XML processor. Comments can not go before XML declarations and inside tags. They can contain markup inside: it is valuable if sections should be removed temporarily with keeping it in the file for later use. Do not put comments inside of comments.
26
CDATA SECTIONS CDATA is an alternative way to include forbidden characters. CDATA is “character data” (not a markup) In this case is true In this case “if (& x < & y)” is true
27
PROCESSING INSTRUCTIONS (PI) Container for data that is targeted toward a specific XML processor (presentation information). Parser passes this instructions to special handler. If it recognize target, it may use to choose data; otherwise, data is discarded. XML declaration can be thought as PI for XML processor. If there is no data, the target itself can function as a data. Sometimes we can have very very very long title which should be divided into several lines
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.