Download presentation
Presentation is loading. Please wait.
Published byJasmine Lindsey Scott Modified over 8 years ago
1
Spring 2013 Markup – Validate – Transform Introduction to Digital Text and XML http://mbingenheimer.net/webclassmb/teiWorkshopRice/ Rice University, April 5 th 2013 Marcus Bingenheimer (Temple University)
2
Digital Humanities... Academic effort to digitize, research and preserve all aspects of Human culture in a digital environment: Oral & Written text Images & Architecture Music Performance & Ritual Geography/Topography Networks....
3
Martin Hilbert and Priscila López 2011 (Science 332, 60): ● 2000: 75% of stored information was in an analog format (e.g. video cassettes) ● 2007: 94% of it was digital. 1993 2000 2007 1986
4
“Digitization” ● → General level: The transformation of analog information into digital information ● → Modelling analog information in the digital (with bits and bytes) ● → Modelling is a social endeavour: it relies on (open) standards ● → How to model (natural-language) text?
5
Different ways to model a printed page digitally ● Digital Image (raster or vector) ● Text file (Characters encoded by a series of bytes, output as glyphs on screen or page controlled via a code-page) ● “Plain” text ● Word processor file ● PDF file ● Text with Markup, e.g. XML
7
t.s. eliot: facsimile copy of the draft of “the waste land” with annotations by e. pound and v. eliot Modeled in typeset: (pound = red, v. eliot = italics)
8
How to create high-end digital editions of texts?
9
Enter Markup ● Editors add information, they do not merely reproduce text ● The way this is done in digital text is by using markup ● Markup means applying tags to encode information about the form and content of a text ● These days most markup standards are expressed in a grammar called XML (eXtensible Markup Language)
10
and ● XML and HTML are "sister languages," both developed from a standard called SGML (Standard Generalized Markup Language) ● Thus, the appearance of, and the rules governing these two languages is similar, e.g. they both use bracketed tags to encode information
11
vs. ● HTML is a application specific standard primarily used to encode the style and structure of web-pages to make them appear in browsers ● XML is a more flexible master format which can encode an infinite variety of structure and semantic content ● XML is merely a set of grammar rules. It does not have a fixed tag-set or vocabulary like HTML
12
XML (X)HTML New Japanese-English Dictionary 新和英大辞典 Koh Masuda (Ed.) Tokyo: Kenkyusha, 2000 XHTML
13
New Japanese-English Dictionary 新和英大辞典 Koh Masuda Tokyo Kenkyusha 2000 XML
15
XML basics
16
Well-formed & Valid Every XML document... 1. must be well-formed 2. can in principle be validated against a “schema”
17
Well-formed means: that the document conforms to the XML rules. E.g. ● One Root Element - The XML document may only have one root element. ● All start-tags have end-tags ● Each element is properly nested within the root element ("nesting"). ● Names are always case sensitive
18
Broken XML Code Mr. Garcia Hello there! How are we today? Well-formed XML Code Mr. Garcia Hello there! How are we today?
19
Valid means... ● that the document conforms to the vocabulary and syntax of a markup standard (e.g. TEI, XHTML, Music ML, MathML) expressed in a document schema (written in DTD, W3 Schema, or Relax NG)
20
Parsing: XML text well-formed not well-formed valid not valid Parser step 1 (DTD/ Schema) e.g.TEI Parser step 2
21
XML rules 1 (declaration) An XML document should begin with an XML declaration the declaration has the form: (encoding and standalone are optional)
22
XML rules 2 (root element) It has one, and only one root element containing all other elements and the character data
23
XML rules 3 (end-tags) Every start tag must have a matching end-tag Exception: Empty elements
24
XML rules 4 (nesting) Elements must be properly nested like this: not like this:
25
XML rules 5 (xml names) XML is case-sensitive Element names must start with a letter (including CJK 漢字 ) or the “_” May contain only alphanumeric characters (letters and digits) and “_” “-” “.” the colon “:” is reserved for XML-namespaces
26
CSS Document Model: DTD, Relax NG, XML Schema <XML Document> XQuery XSLT XPath XSL-FO JScript HTML PDF Any xml: docx, odt... OUTPUT TRANSFORM ePub validates
27
Practice Write a “firstdocument.xml” Open it in Firefox Make some XML mistakes and refresh Firefox Add this stylesheet declaration: Refresh Firefox
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.