Introduction to XML February 07, 2002
From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this class, we will look at the basic conventions of XML and you will see the ways in which they mirror those of HTML.
The Building Blocks of XML XML uses the same building blocks that HTML does: elements, attributes, and values. An XML element is the most basic unit of your document. It can contain practically anything else, including other elements and text. An element is delimited by an opening tag ( ) and a closing tag ( ), which may or may not contain attributes and values.
XML Elements In XML an element is literally what you make and ‘name’ it. The name, which you invent yourself, should describe the element’s purpose and in particular its contents. For example: The Beginning By ‘marking up’ the text like this, you are providing additional information about the tagged text.
XML Elements: An Example HTML The Beginning XML The Beginning The XML element you create provides metadata for the chosen text.
XML Attributes XML attributes, which are contained within the element’s start tag, have quotation-mark delimited values that further describe the purpose and content of the particular tag. e.g. The World or Die Welt An element can have as many attributes as needed as long as they each have an unique name.
Writing XML Code Writing XML code is an almost identical process as writing HTML code. Like HTML, you can insert spaces to make your code easier to view and edit. For example, … … …, etc. can be written as … …
Rules for Writing XML A ‘root’ element is required for each XML document –Every XML document must contain one root element that contains all of the other elements in the document. The only pieces of XML allowed outside the root element are comments and processing instructions. –You can think of it like a container for the content. Closing tags are required for all elements –Every element must have a closing tag. Empty tags should either use the all-in-one closing tag with backslash before the final “>” (e.g., ) or use both opening and closing tags ( ).
Rules for XML… cont’d Elements must be properly nested –If you start element 1, then start element 2, you must first close element 2, and then element 1. e.g. … Case matters –XML is case sensitive. BOOK, Book, and book elements would all be considered different and unrelated!
Rules for XML… cont’d Values must be enclosed in quotations marks –An attributes values (numbers, words, etc.) must always be enclosed in double quotations marks e.g., language=“english” NOT language=english OR language=‘english’ Entity references must be declared –Unlike HTML, any entity reference used in XML must be declared in a DTD (document type description) before being used.
Rules for XML… cont’d You must declare the XML Version used –Every time you create an XML document, you must declare the version of XML used to create it. In this case, since there has only been one version of XML, your declaration will look like this: –This should always be your first line of XML. –The tags enclose processing instructions. In addition to declaring the XML version, processing instructions can specify style sheets, among other things.
Writing XML With your first line of XML being the version declaration processing instructions, the first real XML that you will create is the root element. The root element acts like a container for all the other elements and content in your document. You can liken it to the tag used in HTML. It is your first ‘structural’ statement. Only processing instructions should exist outside the root element.
Root elements: Examples For example, if you were created a XML version of a book, you might create the element. The root would then contain all other content, looking something like this: All other XML and content here.
Root Element: Examples Alternatively, if you were creating an XML version of a sheet of music, you might specify the root element as, resulting in a structure like: All other XML and content here.
Writing XML… cont’d Think of the root element then as the largest unit of structure for your document. You can then plan other lesser units to fit within the root. Using as our example, one can imagine lesser structural units such as elements, elements, as well as presentation elements such as and. Even more than HTML, with XML it is important to plan ahead rather than trying to create elements on the fly.
Writing XML… cont’d Like SGML and HTML, XML also allows for comments within the XML code. Comments are useful annotations or instructions that you put in the code so that future users/designers, including yourself!, can understand what you had originally intended. Comments are create by using the start and end tags, such as:
Writing XML… cont’d Writing Special Characters/Symbols –Unlike HTML which allows for a whole bunch of special characters delineated by an ampersand (&) and a semi-colon (;), such as “&” for “&”, XML allows for only five. All other special characters and symbols must be pre-defined in your DTD (document type description). –The five special characters/symbols allowed in XML are: < (<) > (>) " (“) ' (‘) & (&)
Writing XML… cont’d Observing these few rules, you will be able to create your XML documents just as you would HTML documents. Remember that XML requires you to plan ahead, particularly with defining elements (tags) and entities (such as special characters or repeated text). Take a look at the examples that follow…
HTML vs. XML Gerhardt Rudner’s The Fall Gerhardt Rudner’s The Fall Criticism Introduction Gerhardt Rudner’s The Fall is considered by most to be one of the most influential books of 2001….
HTML vs. XML Gerhardt Rudner’s The Fall Criticism Introduction Gerhardt Rudner’s The Fall is considered by most to be one of the most influential books of 2001….
Any Questions?
Valid and Well-Formed XML You may have hear two adjectives bandied about by XML authors and technical writers. These are: Valid and Well-Formed. Both terms refer to the process of validating your XML document and require that your document meet certain standards. For those of you who have taken Database class, this process is similar to the ‘ordered form’ requirements of databases.
Valid and Well-Formed… cont’d Valid –A valid document must have a DTD--a set of rules that define what tags can appear in the document and how they must nest within each other. The DTD also must declare all entities apart from those five special ones we looked at previously. Entities are reusable bits of data that can be used many times, but need be transmitted only once (more on this later). –Thus, a XML document is valid when it conforms to the rules established in the DTD. That’s it!
Valid and Well-Formed…cont’d Well-Formed –A document that is well-formed is easy for a computer program to read and ready for network delivery. –Specifically, well-formed documents must have these characteristics: All the beginning and end tags match up Empty tags use special XML syntax (e.g., ) All the attributes are double-quoted (e.g., id=“dog”) All the entities (reusable text, special characters, etc.) are properly declared.
Creating DTDs As mentioned, a document type definition, or DTD, specifies the valid syntax, structure, and format for defining your own markup language. If you do not follow the rules in your DTD, your XML parser or browser will complain bitterly. A parser/browser cannot properly display and process a XML document that does not conform to its DTD. Therefore, it is important that you gain a good understanding of DTDs and how they work!
Creating DTDs… cont’d In your DTD, you will define the elements, attributes, and values to be used in your XML markup. Internal or External DTDs –For individual XML documents, it is often simplest to create the DTD within the XML document itself. –However, if you want to use the DTD with a set of documents, to avoid duplication, you will want to create an external DTD.
Creating Internal DTDs To create an Internal DTD: –At the top of your document, after your XML declaration (ie. ), type: <!DOCTYPE root element [ –For example, <!DOCTYPE book [ –Leave some space so that you will have room to put in your definitions of elements, attributes, and values –Then type ]> to end your internal DTD
Creating Internal DTDs… cont’d Example: <!DOCTYPE book [ ]> ….
Creating External DTDs To write an External DTD: –Create a new text file with any simple text editor –Define the rules attributed to your elements, attributes, and values –Save the file as text only with the “.dtd” extension –There are certain conventions that you need to follow when naming your DTD…
Creating External DTDs… cont’d There are two kinds of external DTDs that you need to be aware of: –Personal External DTDs - private DTDs created solely for your documents –Public External DTDs - public DTDs created for use by anyone
Declaring personal DTDs To declare a personal DTD, you must first add an attribute to your XML declaration and then specify where the parser/browser can find your DTD: –In the XML version declaration, add the attribute standalone=“no” e.g. –Next, type <!DOCTYPE root where root is the name of your root element (e.g. <!DOCTYPE book) –Add a space and type SYSTEM, to indicate a personal DTD on your system, and the absolute URL to your DTD file e.g.
XML personal DTD Declaration In the end, it should look like this: ….
Creating External DTDs… cont’d Naming public External DTDs: –You should name your public DTD using a formal public identifier or FPI. To create an FPI: First, type “+” if your DTD is approved by the ISO, or “-” if it is not a recognized standard Next, type “// yourname // DTD”, where yourname is the name of the individual or organization who created the DTD Type a space and then a label (often the root element) where the label describes the DTD. Finally, type “//xx//” where xx is the two letter abbreviation for the language of the XML document (e.g., EN for English, FR for French)
Declaring Public DTDs To declare a public DTD, you must first add an attribute to your XML declaration and then specify where the parser/browser can find your DTD: –In the XML version declaration, add the attribute standalone=“no” e.g. –Next, type <!DOCTYPE root where root is the name of your root element (e.g. <!DOCTYPE book) –Add a space and type PUBLIC, to indicate a public DTD, and then your FPI in double-quotes e.g. –Then add another space and the absolute URL of your DTD file in quotation marks.
XML Public DTD Declaration Your final public external DTD declaration should look something like: ….
Defining Elements and Attributes A DTD must define rules for each and every element and attribute that will appear in your XML document. Otherwise, it will not be valid. Whenever you change your XML, remember to make the corresponding changes to your DTD, particularly if you are adding elements or attributes to your document.
Defining Elements In order to define your XML markup, you must first define the content and structure of each element contain within your XML documents. To define an element, type <!ELEMENT yourtag where yourtag is the tag your are creating and wish to define. Next, type ( contents ) where contents describes the elements contained within the element you are defining, or type EMPTY if the element you are defining has no content. Keep in mind that you must not forget the parentheses and that EMPTY elements will often contain attributes.
Examples Defining our root element – Defining an image element – –Empty elements are often used to reference external files (such as images) and binary data Always remember that XML is case-sensitive!
Structural Definitions Definitions like the root definition on the previous page, describe structures within your XML document. That is, we define our root element as containing a number of other elements. Elements can be defined so as to contain only one other element (e.g., ) or a sequence of elements (such as our root example. These definitions define how the structure of your XML breaks down, forming a hierarchy or tree pattern.
Text Definitions However, not all element contain structural information. Many will contain only content or textual information. To define an element to contain text: –Type <!ELEMENT yourtag where yourtag is the tag you are creating and wish to define. –Next, type a space and (#PCDATA)> –This states that the element you define will only contain text –PCDATA stands for parsed character data and refers to everything except your XML code.
Defining Elements… cont’d So, for example, your DTD will contain structural elements, such as your root element, which describes what other elements are contained within it, as well as textual elements that contain only text: ….
Defining Elements… cont’d To place further constraints on the number of times that a given element can appear in your document (e.g., you don’t want 2 book titles), XML provides three special symbols: ? + * –Placing an “?” after an element indicates that this element can appear only once, if at all, in your document –Placing an “+” after an element indicates that the element must appear at least once, and can appear as many times as needed –Placing an “*” after an element indicates that the element can appear as many times as needed, or not at all. Furthermore, adding an asterisk to a sequence in parentheses means that the elements can appear in any number and in any order.
Number constraints example If we take our book element example, we can augment it like this: Here we limit so that it can only appear once, as well as indicating that a book must have at least one, and that a book can contain as many s as necessary.
Creating attributes While you can break down an element into smaller and smaller units of information, it is sometimes more useful to add supplementary data to the element itself rather than to the element’s content. In other words, information contained in attributes tends to be about your XML document, rather than your content. They are primarily metadata Attributes are very commonly used with empty elements to point or link to the content of the element.
Attributes… cont’d To define attributes: –Type <!ATTLIST yourtag where yourtag is the name of the element in which the attribute will appear. –Type the name of the attribute –Then, either type CDATA (not #PCDATA) for any combination of numbers or text (basically for anything), or type ( value1 | value2 | etc.) where either value1 or value2 (etc.) is the ONLY value acceptable. You could make huge strings of values by simply continuing to place a vertical bar between values.
Defining Attributes… cont’d Finally, you must type one of the following: –“ value” where value will be the default value if none is explicitly set –“#FIXED value ” where value is the default and ONLY value for that attribute (i.e., it is fixed) –“#REQUIRED” to specify that the attribute must contain some (not pre-specified value) –Or, “#IMPLIED” to specify that there is no default value, and the value may be omitted if desired. Finish the to complete your definition
Attribute examples This attribute definition says that the date element may contain an optional (#IMPLIED) year attribute that contains any number of characters (CDATA). This attribute definition says that the date element must be used (#REQUIRED) and that the value must be one of 1999, 2000, 2001, or Those are the only choices (from value list).