1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel Lecture 10 Markup languages – XML DTDs
2 herbert van de sompel Mid Term examination Rhodes 471 Examination on the PC Open PC format (you have access to all the material) 50 minutes Questions: ~ 6 multiple choice questions on the readings (not the Bush paper) ~ 4 questions for which you write answers prepare in Notepad / copy into form about the core topics that we addressed (identifiers, KWF, XML, digitization, UNICODE, HTTP to test your understanding of the issues
3 herbert van de sompel Markup and style sheets rendering software formatted document document content & structure markup-ed document style sheet rendering instructions
4 herbert van de sompel Multiple renderings from same markup-ed documents rendering software PC display document content & structure markup-ed document style sheet 1 print rendering software style sheet 2
5 herbert van de sompel Example: Oxford English Dictionary typography of printed text represented semantic information. Keyboard the text, capturing all typographic information. Automatic parser to extract semantics (e.g., date, quotation, phonetics, etc.). Markup in SGML to tag semantic information. Separate style sheets for various editions: print, CD- ROM, online. Before the web, yet used with the web.
6 herbert van de sompel XML – basic terminology XML instance document: the document that contains the text in a mark-up-ed form style sheet: the document that contains the formatting instructions to be applied to an instance document Document Type Definition: the document that defines the grammar with which instance documents are compliant (elements, attributes, character set, required elements, optional elements, …) XML Schema: similar as DTD, but more powerful An XML application will usually process 3 types of documents
7 herbert van de sompel XML – sample instance document (with DTD) Kevin Davies Cracking the Genome €
8 herbert van de sompel XML – XML declaration XML processing instructions: XML version character encoding used in the text standalone: is a DTD required to interpret this document? attribute order is significant
9 herbert van de sompel XML – declaration of DTD in instance document declares: Book is the root element (outermost tag) the DTD to which this document complies is books.dtd that DTD is available on the URI shown next to the SYSTEM parameter alternatively: PUBLIC “some name that is known”
10 herbert van de sompel <!ATTLIST author birthday CDATA #REQUIRED sex (male|female) #IMPLIED> XML DTD for which the instance document is valid
11 herbert van de sompel XML DTD: element definition Name of the element is Book Book consists of elements that must occur in the specified order ISBN? – ISBN appears 0 or 1 time author+ - appears 1 or more times title and price appears 1 time also: * for 0 or more times also | instead of, as separator for choice instead of sequence
12 herbert van de sompel XML DTD : element definition The ISBN element contains parsed character data (general characters) 5 categories of element content: EMPTY – the element contains no text, only attributes element – the element contains only elements, no text of itself mixed – the element can contain both text and other elements ANY – the element contains any well-formed XML (for instance also CDATA) PCDATA – the element contains parsed character data
13 herbert van de sompel The author element can come with 2 attributes: birthday is required sex is not required, but if it appears it can only have one of the two shown values XML DTD : attribute definition sex (male|female) #IMPLIED>
14 herbert van de sompel Allows writing: € in an XML instance document XML processor will change € into € XML DTD : general entity reference Notes: Not only for single characters (for instance complete copyright statements) Predefined entities in XML: & & -- > -- “ " -- ‘ ' pre-Unicode, this used to be the way to define special characters Unicode: use character references!
15 herbert van de sompel Allows writing: &competingprice; in an XML instance document XML processor will change &competingprice; into the XML document price.xml XML DTD : external entity
16 herbert van de sompel Allows writing: image1 is not inside the XML document, it is referred by the XML document. The XML processor can include it at processing time. XML DTD : unparsed entity Notes: For non-XML content Used in attributes defined with ENTITY type NDATA: unparsed GIF89A: NOTATION declaration gives more info on what GIF89A means
17 herbert van de sompel <!DOCTYPE Book [ <!ATTLIST author birthday CDATA #IMPLIED sex (male|female) #IMPLIED> ]> Kevin Davies Cracking the Genome €
18 herbert van de sompel XML - more Quick DTD tutorial XML Spy software - XML http://