Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spring 2013 Markup – Validate – Transform Introduction to Digital Text and XML Rice University, April.

Similar presentations


Presentation on theme: "Spring 2013 Markup – Validate – Transform Introduction to Digital Text and XML Rice University, April."— Presentation transcript:

1 Spring 2013 Markup – Validate – Transform Introduction to Digital Text and XML http://mbingenheimer.net/webclassmb/teiWorkshopRice/ Rice University, April 5 th 2013 Marcus Bingenheimer (Temple University)

2 Digital Humanities... Academic effort to digitize, research and preserve all aspects of Human culture in a digital environment: Oral & Written text Images & Architecture Music Performance & Ritual Geography/Topography Networks....

3 Martin Hilbert and Priscila López 2011 (Science 332, 60): ● 2000: 75% of stored information was in an analog format (e.g. video cassettes) ● 2007: 94% of it was digital. 1993 2000 2007 1986

4 “Digitization” ● → General level: The transformation of analog information into digital information ● → Modelling analog information in the digital (with bits and bytes) ● → Modelling is a social endeavour: it relies on (open) standards ● → How to model (natural-language) text?

5 Different ways to model a printed page digitally ● Digital Image (raster or vector) ● Text file (Characters encoded by a series of bytes, output as glyphs on screen or page controlled via a code-page) ● “Plain” text ● Word processor file ● PDF file ● Text with Markup, e.g. XML

6

7 t.s. eliot: facsimile copy of the draft of “the waste land” with annotations by e. pound and v. eliot Modeled in typeset: (pound = red, v. eliot = italics)

8 How to create high-end digital editions of texts?

9 Enter Markup ● Editors add information, they do not merely reproduce text ● The way this is done in digital text is by using markup ● Markup means applying tags to encode information about the form and content of a text ● These days most markup standards are expressed in a grammar called XML (eXtensible Markup Language)

10 and ● XML and HTML are "sister languages," both developed from a standard called SGML (Standard Generalized Markup Language) ● Thus, the appearance of, and the rules governing these two languages is similar, e.g. they both use bracketed tags to encode information

11 vs. ● HTML is a application specific standard primarily used to encode the style and structure of web-pages to make them appear in browsers ● XML is a more flexible master format which can encode an infinite variety of structure and semantic content ● XML is merely a set of grammar rules. It does not have a fixed tag-set or vocabulary like HTML

12 XML (X)HTML New Japanese-English Dictionary 新和英大辞典 Koh Masuda (Ed.) Tokyo: Kenkyusha, 2000 XHTML

13 New Japanese-English Dictionary 新和英大辞典 Koh Masuda Tokyo Kenkyusha 2000 XML

14

15 XML basics

16 Well-formed & Valid Every XML document... 1. must be well-formed 2. can in principle be validated against a “schema”

17 Well-formed means: that the document conforms to the XML rules. E.g. ● One Root Element - The XML document may only have one root element. ● All start-tags have end-tags ● Each element is properly nested within the root element ("nesting"). ● Names are always case sensitive

18 Broken XML Code Mr. Garcia Hello there! How are we today? Well-formed XML Code Mr. Garcia Hello there! How are we today?

19 Valid means... ● that the document conforms to the vocabulary and syntax of a markup standard (e.g. TEI, XHTML, Music ML, MathML) expressed in a document schema (written in DTD, W3 Schema, or Relax NG)

20 Parsing: XML text well-formed not well-formed valid not valid  Parser step 1 (DTD/ Schema) e.g.TEI Parser step 2

21 XML rules 1 (declaration) An XML document should begin with an XML declaration the declaration has the form: (encoding and standalone are optional)

22 XML rules 2 (root element) It has one, and only one root element containing all other elements and the character data

23 XML rules 3 (end-tags) Every start tag must have a matching end-tag Exception: Empty elements

24 XML rules 4 (nesting) Elements must be properly nested like this: not like this:

25 XML rules 5 (xml names) XML is case-sensitive Element names must start with a letter (including CJK 漢字 ) or the “_” May contain only alphanumeric characters (letters and digits) and “_” “-” “.” the colon “:” is reserved for XML-namespaces

26 CSS Document Model: DTD, Relax NG, XML Schema <XML Document> XQuery XSLT XPath XSL-FO JScript HTML PDF Any xml: docx, odt... OUTPUT TRANSFORM ePub validates

27 Practice Write a “firstdocument.xml” Open it in Firefox Make some XML mistakes and refresh Firefox Add this stylesheet declaration: Refresh Firefox


Download ppt "Spring 2013 Markup – Validate – Transform Introduction to Digital Text and XML Rice University, April."

Similar presentations


Ads by Google